aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-25pds_core: check for work queue before useShannon Nelson1-1/+1
Add a check that the wq exists before queuing up work for a failed devcmd, as the PF is responsible for health and the VF doesn't have a wq. Fixes: c2dbb0904310 ("pds_core: health timer and workqueue") Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25pds_core: no reset command for VFShannon Nelson1-1/+2
The VF doesn't need to send a reset command, and in a PCI reset scenario it might not have a valid IO space to write to anyway. Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25pds_core: no health reporter in VFShannon Nelson1-3/+5
Make sure the health reporter is set up before we use it in our devlink health updates, especially since the VF doesn't set up the health reporter. Fixes: 25b450c05a49 ("pds_core: add devlink health facilities") Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25pds_core: protect devlink callbacks from fw_down stateShannon Nelson1-0/+3
Don't access structs that have been cleared when in the fw_down state and the various structs have been cleaned and are waiting to recover. This caused a panic on rmmod when already in fw_down and devlink_param_unregister() tried to check the parameters. Fixes: 40ced8944536 ("pds_core: devlink params for enabling VIF support") Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25doc/netlink: Add delete operation to ovs_vport specDonald Hunter1-1/+12
Add del operation to the spec to help with testing. Signed-off-by: Donald Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net/sched: sch_hfsc: Ensure inner classes have fsc curveBudimir Markovic1-0/+4
HFSC assumes that inner classes have an fsc curve, but it is currently possible for classes without an fsc curve to become parents. This leads to bugs including a use-after-free. Don't allow non-root classes without HFSC_FSC to become parents. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Budimir Markovic <[email protected]> Signed-off-by: Budimir Markovic <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25sfc: Check firmware supports Ethernet PTP filterAlex Austin1-1/+3
Not all firmware variants support RSS filters. Do not fail all PTP functionality when raw ethernet PTP filters fail to insert. Fixes: e4616f64726b ("sfc: support PTP over Ethernet") Signed-off-by: Alex Austin <[email protected]> Acked-by: Edward Cree <[email protected]> Reviewed-by: Pieter Jansen van Vuuren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25tools: ynl-gen: fix uAPI generation after tempfile changesJakub Kicinski1-10/+20
We use a tempfile for code generation, to avoid wiping the target file out if the code generator crashes. File contents are copied from tempfile to actual destination at the end of main(). uAPI generation is relatively simple so when generating the uAPI header we return from main() early, and never reach the "copy code over" stage. Since commit under Fixes uAPI headers are not updated by ynl-gen. Move the copy/commit of the code into CodeWriter, to make it easier to call at any point in time. Hook it into the destructor to make sure we don't miss calling it. Fixes: f65f305ae008 ("tools: ynl-gen: use temporary file for rendering") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge branch 'stmmac-cleanups'Jakub Kicinski9-39/+70
Russell King says: ==================== stmmac cleanups One of the comments I had on Feiyang Chen's series was concerning the initialisation of phylink... and so I've decided to do something about it, cleaning it up a bit. This series: 1) adds a new phylink function to limit the MAC capabilities according to a maximum speed. This allows us to greatly simplify stmmac's initialisation of phylink's mac capabilities. 2) everywhere that uses priv->plat->phylink_node first converts this to a fwnode before doing anything with it. This is silly. Let's instead store it as a fwnode to eliminate these conversions in multiple places. 3) clean up passing the fwnode to phylink - it might as well happen at the phylink_create() callsite, rather than being scattered throughout the entire function. 4) same for mdio_bus_data 5) use phylink_limit_mac_speed() to handle the priv->plat->max_speed restriction. 6) add a method to get the MAC-specific capabilities from the code dealing with the MACs, and arrange to call it at an appropriate time. 7) convert the gmac4 users to use the MAC specific method. 8) same for xgmac. 9) group all the simple phylink_config initialisations together. 10) convert half-duplex logic to being positive logic. While looking into all of this, this raised eyebrows: if (priv->plat->tx_queues_to_use > 1) priv->phylink_config.mac_capabilities &= ~(MAC_10HD | MAC_100HD | MAC_1000HD); priv->plat->tx_queues_to_use is initialised by platforms to either 1, 4 or 8, and can be controlled from userspace via the --set-channels ethtool op. The implementation of this op in this driver limits the number of channels to priv->dma_cap.number_tx_queues, which is derived from the DMA hwcap. So, the obvious questions are: 1) what guarantees that the static initialisation of tx_queues_to_use will always be less than or equal to number_tx_queues from the DMA hw cap? 2) tx_queues_to_use starts off as 1, but number_tx_queues is larger, we will leave the half-duplex capabilities in place, but userspace can increase tx_queues_to_use above 1. Does that mean half-duplex is then not supported? 3) Should we be basing the decision whether half-duplex is supported off the DMA capabilities? 4) What about priv->dma_cap.half_duplex? Doesn't that get a say in whether half-duplex is supported or not? Why isn't this used? Why is it only reported via debugfs? If it's not being used by the driver, what's the point of reporting it via debugfs? ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: convert half-duplex support to positive logicRussell King (Oracle)1-6/+7
Rather than detecting when half-duplex is not supported, and clearing the MAC capabilities, reverse the if() condition and use it to set the capabilities instead. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: move priv->phylink_config.mac_managed_pmRussell King (Oracle)1-1/+1
Move priv->phylink_config.mac_managed_pm to be along side the other phylink initialisations. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: move xgmac specific phylink caps to dwxgmac2 coreRussell King (Oracle)2-10/+10
Move the xgmac specific phylink capabilities to the dwxgmac2 support core. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: move gmac4 specific phylink capabilities to gmac4Russell King (Oracle)2-3/+9
Move the setup of gmac4 speicifc phylink capabilities into gmac4 code. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: provide stmmac_mac_phylink_get_caps()Russell King (Oracle)2-0/+7
Allow MACs to provide their own capabilities via the MAC operations struct. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: use phylink_limit_mac_speed()Russell King (Oracle)1-21/+14
Use phylink_limit_mac_speed() to limit the MAC capabilities rather than coding this for each speed. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: use "mdio_bus_data" local variableRussell King (Oracle)1-2/+4
We have a local variable for priv->plat->mdio_bus_data, which we use later in the conditional if() block, but we evaluate the above within the conditional expression. Use mdio_bus_data instead. Since these will be the only two users of this local variable, move its assignment just before the if(). Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: clean up passing fwnode to phylinkRussell King (Oracle)1-4/+5
Move the initialisation of the fwnode variable closer to its use site, rather than scattered throughout stmmac_phy_setup(). Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: stmmac: convert plat->phylink_node to fwnodeRussell King (Oracle)4-5/+6
All users of plat->phylink_node first convert it to a fwnode. Rather than repeatedly convert to a fwnode, store it as a fwnode. To reflect this change, call it plat->port_node instead - it is used for more than just phylink. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25net: phylink: add phylink_limit_mac_speed()Russell King (Oracle)2-0/+20
Add a function which can be used to limit the phylink MAC capabilities to an upper speed limit. Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25veth: Avoid NAPI scheduling on failed SKB forwardingLiang Chen1-3/+2
When an skb fails to be forwarded to the peer(e.g., skb data buffer length exceeds MTU), it will not be added to the peer's receive queue. Therefore, we should schedule the peer's NAPI poll function only when skb forwarding is successful to avoid unnecessary overhead. Signed-off-by: Liang Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge branch 'fix-pfc-related-issues'Jakub Kicinski3-21/+16
Suman Ghosh says: ==================== Fix PFC related issues This patchset fixes multiple PFC related issues related to Octeon. Patch #1: octeontx2-pf: Fix PFC TX scheduler free Patch #2: octeontx2-af: CN10KB: fix PFC configuration Patch #3: octeonxt2-pf: Fix backpressure config for multiple PFC priorities to work simultaneously ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25cteonxt2-pf: Fix backpressure config for multiple PFC priorities to work ↵Suman Ghosh1-3/+3
simultaneously MAC (CGX or RPM) asserts backpressure at TL3 or TL2 node of the egress hierarchical scheduler tree depending on link level config done. If there are multiple PFC priorities enabled at a time and for all such flows to backoff, each priority will have to assert backpressure at different TL3/TL2 scheduler nodes and these flows will need to submit egress pkts to these nodes. Current PFC configuration has an issue where in only one backpressure scheduler node is being allocated which is resulting in only one PFC priority to work. This patch fixes this issue. Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support") Signed-off-by: Suman Ghosh <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25octeontx2-af: CN10KB: fix PFC configurationHariprasad Kelam1-8/+9
Suppose user has enabled pfc with prio 0,1 on a PF netdev(eth0) dcb pfc set dev eth0 prio-pfc o:on 1:on later user enabled pfc priorities 2 and 3 on the VF interface(eth1) dcb pfc set dev eth1 prio-pfc 2:on 3:on Instead of enabling pfc on all priorities (0..3), the driver only enables on priorities 2,3. This patch corrects the issue by using the proper CSR address. Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25octeontx2-pf: Fix PFC TX scheduler freeSuman Ghosh2-11/+5
During PFC TX schedulers free, flag TXSCHQ_FREE_ALL was being set which caused free up all schedulers other than the PFC schedulers. This patch fixes that to free only the PFC Tx schedulers. Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support") Signed-off-by: Suman Ghosh <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge tag 'for-netdev' of ↵Jakub Kicinski104-4212/+3719
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-25 We've added 87 non-merge commits during the last 8 day(s) which contain a total of 104 files changed, 3719 insertions(+), 4212 deletions(-). The main changes are: 1) Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds, from Jiri Olsa. 2) Add support BPF cpu v4 instructions for arm64 JIT compiler, from Xu Kuohai. 3) Add support BPF cpu v4 instructions for riscv64 JIT compiler, from Pu Lehui. 4) Fix LWT BPF xmit hooks wrt their return values where propagating the result from skb_do_redirect() would trigger a use-after-free, from Yan Zhai. 5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch, from Yonghong Song. 6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement, from Kumar Kartikeya Dwivedi. 7) Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak, from Yafang Shao. 8) Remove unmaintained XDP BPF samples as they are maintained in xdp-tools repository out of tree, from Toke Høiland-Jørgensen. 9) Batch of fixes for the tracing programs from BPF samples in order to make them more libbpf-aware, from Daniel T. Lee. 10) Fix a libbpf signedness determination bug in the CO-RE relocation handling logic, from Andrii Nakryiko. 11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up fixes for bpf_refcount shared ownership implementation, both from Dave Marchevsky. 12) Add a new bpf_object__unpin() API function to libbpf, from Daniel Xu. 13) Fix a memory leak in libbpf to also free btf_vmlinux when the bpf_object gets closed, from Hao Luo. 14) Small error output improvements to test_bpf module, from Helge Deller. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits) selftests/bpf: Add tests for rbtree API interaction in sleepable progs bpf: Allow bpf_spin_{lock,unlock} in sleepable progs bpf: Consider non-owning refs to refcounted nodes RCU protected bpf: Reenable bpf_refcount_acquire bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes bpf: Consider non-owning refs trusted bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire selftests/bpf: Enable cpu v4 tests for RV64 riscv, bpf: Support unconditional bswap insn riscv, bpf: Support signed div/mod insns riscv, bpf: Support 32-bit offset jmp insn riscv, bpf: Support sign-extension mov insns riscv, bpf: Support sign-extension load insns riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W samples/bpf: Add note to README about the XDP utilities moved to xdp-tools samples/bpf: Cleanup .gitignore samples/bpf: Remove the xdp_sample_pkts utility samples/bpf: Remove the xdp1 and xdp2 utilities samples/bpf: Remove the xdp_rxq_info utility samples/bpf: Remove the xdp_redirect* utilities ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge tag 'wireless-next-2023-08-25' of ↵Jakub Kicinski150-1155/+2232
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.6 The second pull request for v6.6, this time with both stack and driver changes. Unusually we have only one major new feature but lots of small cleanup all over, I guess this is due to people have been on vacation the last month. Major changes: rtw89 - Introduce Time Averaged SAR (TAS) support * tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (114 commits) wifi: rtlwifi: rtl8723: Remove unused function rtl8723_cmd_send_packet() wifi: rtw88: usb: kill and free rx urbs on probe failure wifi: rtw89: Fix clang -Wimplicit-fallthrough in rtw89_query_sar() wifi: rtw89: phy: modify register setting of ENV_MNTR, PHYSTS and DIG wifi: rtw89: phy: add phy_gen_def::cr_base to support WiFi 7 chips wifi: rtw89: mac: define register address of rx_filter to generalize code wifi: rtw89: mac: define internal memory address for WiFi 7 chip wifi: rtw89: mac: generalize code to indirectly access WiFi internal memory wifi: rtw89: mac: add mac_gen_def::band1_offset to map MAC band1 register address wifi: wlcore: sdio: Use module_sdio_driver macro to simplify the code wifi: rtw89: initialize multi-channel handling wifi: rtw89: provide functions to configure NoA for beacon update wifi: rtw89: call rtw89_chan_get() by vif chanctx if aware of vif wifi: rtw89: sar: let caller decide the center frequency to query wifi: rtw89: refine rtw89_correct_cck_chan() by rtw89_hw_to_nl80211_band() wifi: rtw89: add function prototype for coex request duration Fix nomenclature for USB and PCI wireless devices wifi: ath: Use is_multicast_ether_addr() to check multicast Ether address wifi: ath12k: Remove unused declarations wifi: ath12k: add check max message length while scanning with extraie ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge tag 'for-net-next-2023-08-24' of ↵Jakub Kicinski16-307/+818
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Introduce HCI_QUIRK_BROKEN_LE_CODED - Add support for PA/BIG sync - Add support for NXP IW624 chipset - Add support for Qualcomm WCN7850 * tag 'for-net-next-2023-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: btusb: Do not call kfree_skb() under spin_lock_irqsave() Bluetooth: btusb: Fix quirks table naming Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED Bluetooth: btintel: Send new command for PPAG Bluetooth: ISO: Add support for periodic adv reports processing Bluetooth: hci_conn: fail SCO/ISO via hci_conn_failed if ACL gone early Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync Bluetooth: qca: add support for WCN7850 Bluetooth: qca: use switch case for soc type behavior dt-bindings: net: bluetooth: qualcomm: document WCN7850 chipset Bluetooth: hci_conn: Fix sending BT_HCI_CMD_LE_CREATE_CONN_CANCEL Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync Bluetooth: btnxpuart: Improve inband Independent Reset handling Bluetooth: btnxpuart: Add support for IW624 chipset Bluetooth: btnxpuart: Remove check for CTS low after FW download ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-25Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds3-48/+51
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "One clk driver fix and two clk framework fixes: - Fix an OOB access when devm_get_clk_from_child() is used and devm_clk_release() casts the void pointer to the wrong type - Move clk_rate_exclusive_{get,put}() within the correct ifdefs in clk.h so that the stubs are used when CONFIG_COMMON_CLK=n - Register the proper clk provider function depending on the value of #clock-cells in the TI keystone driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Fix slab-out-of-bounds error in devm_clk_release() clk: Fix undefined reference to `clk_rate_exclusive_{get,put}' clk: keystone: syscon-clk: Fix audio refclk
2023-08-25LoadPin: Annotate struct dm_verity_loadpin_trusted_root_digest with __counted_byKees Cook2-3/+2
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dm_verity_loadpin_trusted_root_digest. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Alasdair Kergon <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: [email protected] Cc: Paul Moore <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2023-08-25kallsyms: Change func signature for cleanup_symbol_name()Yonghong Song1-6/+4
All users of cleanup_symbol_name() do not use the return value. So let us change the return value of cleanup_symbol_name() to 'void' to reflect its usage pattern. Suggested-by: Nick Desaulniers <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2023-08-25lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernelsHelge Deller1-26/+6
The gcc compiler translates on some architectures the 64-bit __builtin_clzll() function to a call to the libgcc function __clzdi2(), which should take a 64-bit parameter on 32- and 64-bit platforms. But in the current kernel code, the built-in __clzdi2() function is defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG == 32, thus the return values on 32-bit kernels are in the range from [0..31] instead of the expected [0..63] range. This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to take a 64-bit parameter on 32-bit kernels as well, thus it makes the functions identical for 32- and 64-bit kernels. This bug went unnoticed since kernel 3.11 for over 10 years, and here are some possible reasons for that: a) Some architectures have assembly instructions to count the bits and which are used instead of calling __clzdi2(), e.g. on x86 the bsr instruction and on ppc cntlz is used. On such architectures the wrong __clzdi2() implementation isn't used and as such the bug has no effect and won't be noticed. b) Some architectures link to libgcc.a, and the in-kernel weak functions get replaced by the correct 64-bit variants from libgcc.a. c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many places in the kernel, and most likely only in uncritical functions, e.g. when printing hex values via seq_put_hex_ll(). The wrong return value will still print the correct number, but just in a wrong formatting (e.g. with too many leading zeroes). d) 32-bit kernels aren't used that much any longer, so they are less tested. A trivial testcase to verify if the currently running 32-bit kernel is affected by the bug is to look at the output of /proc/self/maps: Here the kernel uses a correct implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... and this kernel uses the broken implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat 0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat 000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap] 00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... Signed-off-by: Helge Deller <[email protected]> Fixes: 4df87bb7b6a22 ("lib: add weak clz/ctz functions") Cc: Chanho Min <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: [email protected] # v3.11+ Signed-off-by: Linus Torvalds <[email protected]>
2023-08-25Merge branches 'pm-devfreq' and 'pm-tools'Rafael J. Wysocki12-21/+155
Merge devfreq changes and power management tools changes for 6.6-rc1: - Fix memory leak in devfreq_dev_release() (Boris Brezillon). - Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan Sadhasivam). - Explicitly include correct DT includes in devfreq (Rob Herring). - Add turbo-boost support to cpupower (Wyes Karny). - Add support for amd_pstate mode change to cpupower (Wyes Karny). - Fix 'cpupower idle_set' command to accept only numeric values of arguments (Likhitha Korrapati). * pm-devfreq: PM / devfreq: Fix leak in devfreq_dev_release() PM / devfreq: Reword the kernel-doc comment for devfreq_monitor_start() API PM / devfreq: Explicitly include correct DT includes * pm-tools: cpupower: Fix cpuidle_set to accept only numeric values for idle-set operation. cpupower: Add turbo-boost support in cpupower cpupower: Add support for amd_pstate mode change cpupower: Add EPP value change support cpupower: Add is_valid_path API cpupower: Recognise amd-pstate active mode driver cpupower: Bump soname version
2023-08-25Merge branches 'pm-sleep', 'pm-qos' and 'powercap'Rafael J. Wysocki5-108/+259
Merge system-wide power management changes and power capping updates for 6.6-rc1: - Add device PM helpers to allow a device to remain powered-on during system-wide transitions (Ulf Hansson). - Rework hibernation memory snapshotting to avoid storing pages filled with zeros in hibernation image files (Brian Geffon). - Add check to make sure that CPU latency QoS constraints do not use negative values (Clive Lin). - Optimize rp->domains memory allocation in the Intel RAPL power capping driver (xiongxin). - Remove recursion while parsing zones in the arm_scmi power capping driver (Cristian Marussi). * pm-sleep: PM: sleep: Add helpers to allow a device to remain powered-on PM: hibernate: don't store zero pages in the image file * pm-qos: PM: QoS: Add check to make sure CPU latency is non-negative * powercap: powercap: intel_rapl: Optimize rp->domains memory allocation powercap: arm_scmi: Remove recursion while parsing zones
2023-08-25Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki7-151/+231
Merge CPU power management updates for 6.6-rc1: - Rework the menu and teo cpuidle governors to avoid calling tick_nohz_get_sleep_length(), which is likely to become quite expensive going forward, too often and improve making decisions regarding whether or not to stop the scheduler tick in the teo governor (Rafael Wysocki). - Improve the performance of cpufreq_stats_create_table() in some cases (Liao Chang). - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal). - Use clamp() helper macro to improve the code readability in cpufreq_verify_within_limits() (Liao Chang). - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies). * pm-cpuidle: cpuidle: teo: Avoid unnecessary variable assignments cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Gather statistics regarding whether or not to stop the tick cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront cpuidle: teo: Drop utilized from struct teo_cpu cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out cpuidle: teo: Update idle duration estimate when choosing shallower state * pm-cpufreq: cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver cpufreq: amd-pstate-ut: Remove module parameter access cpufreq: Use clamp() helper macro to improve the code readability cpufreq: intel_pstate: set stale CPU frequency to minimum cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
2023-08-25Merge branch 'pnp'Rafael J. Wysocki1-0/+3
Merge a PNP change for 6.6-rc1: - Fix string truncation warning in pnpacpi_add_device() (Sunil V L). * pnp: PNP: ACPI: Fix string truncation warning
2023-08-25Merge branch 'acpi-pm'Rafael J. Wysocki2-36/+73
Merge ACPI power management updates for 6.6-rc1: - Fix and clean up suspend-to-idle interface for AMD systems (Mario Limonciello, Andy Shevchenko). * acpi-pm: ACPI: x86: s2idle: Add a function to get LPS0 constraint for a device ACPI: x86: s2idle: Add for_each_lpi_constraint() helper ACPI: x86: s2idle: Add more debugging for AMD constraints parsing ACPI: x86: s2idle: Fix a logic error parsing AMD constraints table ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects ACPI: x86: s2idle: Post-increment variables when getting constraints ACPI: Adjust #ifdef for *_lps0_dev use
2023-08-25Merge branches 'acpi-scan', 'acpi-tad', 'acpi-extlog' and 'acpi-misc'Rafael J. Wysocki13-28/+58
Merge ACPI device enumeration changes, ACPI TAD and extlog drivers updates, and miscellaneous ACPI-related changes for 6.6-rc1: - Defer enumeration of devices with _DEP pointing to IVSC (Wentong Wu). - Install SystemCMOS address space handler for ACPI000E (TAD) to meet platform firmware expectations on some platforms (Zhang Rui). - Fix finding the generic error data in the ACPi extlog driver for compatibility with old and new firmware interface versions (Xiaochun Lee). - Remove assorted unused declarations of functions (Yue Haibing). - Move AMBA bus scan handling into arm64 specific directory (Sudeep Holla). * acpi-scan: ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device * acpi-tad: ACPI: TAD: Install SystemCMOS address space handler for ACPI000E * acpi-extlog: ACPI: extlog: Fix finding the generic error data for v3 structure * acpi-misc: ACPI: Remove assorted unused declarations of functions ACPI: Remove unused extern declaration acpi_paddr_to_node() ACPI: Move AMBA bus scan handling into arm64 specific directory
2023-08-25Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of ↵Linus Torvalds32-73/+279
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4 issues or aren't considered suitable for a -stable backport" * tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: shmem: fix smaps BUG sleeping while atomic selftests: cachestat: catch failing fsync test on tmpfs selftests: cachestat: test for cachestat availability maple_tree: disable mas_wr_append() when other readers are possible madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check mm: multi-gen LRU: don't spin during memcg release mm: memory-failure: fix unexpected return value in soft_offline_page() radix tree: remove unused variable mm: add a call to flush_cache_vmap() in vmap_pfn() selftests/mm: FOLL_LONGTERM need to be updated to 0x100 nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers() mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast selftests: cgroup: fix test_kmem_basic less than error mm: enable page walking API to lock vmas during the walk smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
2023-08-25Merge branch 'acpi-thermal'Rafael J. Wysocki6-265/+239
Merge ACPI thermal driver changes for 6.6-rc1: - Drop non-functional nocrt parameter from ACPI thermal (Mario Limonciello). - Clean up the ACPI thermal driver, rework the handling of firmware notifications in it and make it provide a table of generic trip point structures to the core during initialization (Rafael Wysocki). * acpi-thermal: ACPI: thermal: Eliminate code duplication from acpi_thermal_notify() ACPI: thermal: Drop unnecessary thermal zone callbacks ACPI: thermal: Rework thermal_get_trend() ACPI: thermal: Use trip point table to register thermal zones thermal: core: Rework and rename __for_each_thermal_trip() ACPI: thermal: Introduce struct acpi_thermal_trip ACPI: thermal: Carry out trip point updates under zone lock ACPI: thermal: Clean up acpi_thermal_register_thermal_zone() thermal: core: Add priv pointer to struct thermal_trip thermal: core: Introduce thermal_zone_device_exec() thermal: core: Do not handle trip points with invalid temperature ACPI: thermal: Drop redundant local variable from acpi_thermal_resume() ACPI: thermal: Do not attach private data to ACPI handles ACPI: thermal: Drop enabled flag from struct acpi_thermal_active ACPI: thermal: Drop nocrt parameter
2023-08-25Merge branch 'acpi-processor'Rafael J. Wysocki11-188/+226
Merge ACPI processor driver changes for 6.6-rc1: - Support obtaining physical CPU ID from MADT on LoongArch (Bibo Mao). - Convert ACPI CPU initialization to using _OSC instead of _PDC that has been depreceted since 2018 and dropped from the specification in ACPI 6.5 (Michal Wilczynski, Rafael Wysocki). * acpi-processor: ACPI: processor: LoongArch: Get physical ID from MADT ACPI: processor: Refine messages in acpi_early_processor_control_setup() ACPI: processor: Remove acpi_hwp_native_thermal_lvt_osc() ACPI: processor: Use _OSC to convey OSPM processor support information ACPI: processor: Introduce acpi_processor_osc() ACPI: processor: Set CAP_SMP_T_SWCOORD in arch_acpi_set_proc_cap_bits() ACPI: processor: Clear C_C2C3_FFH and C_C1_FFH in arch_acpi_set_proc_cap_bits() ACPI: processor: Rename ACPI_PDC symbols ACPI: processor: Refactor arch_acpi_set_pdc_bits() ACPI: processor: Move processor_physically_present() to acpi_processor.c ACPI: processor: Move MWAIT quirk out of acpi_processor.c
2023-08-25Merge branches 'acpi-bus' and 'acpi-video'Rafael J. Wysocki9-39/+179
Merge changes related to the ACPI bus type and ACPI backlight driver changes for 6.6-rc1: - Introduce new wrappers for ACPICA notify handler install/remove and convert multiple drivers to using their own Notify() handlers instead of the ACPI bus type .notify() slated for removal (Michal Wilczynski). - Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 (Hans de Goede). - Put ACPI video and its child devices explicitly into D0 on boot to avoid platform firmware confusion (Kai-Heng Feng). - Add backlight=native DMI quirk for Lenovo Ideapad Z470 (Jiri Slaby). * acpi-bus: ACPI: thermal: Install Notify() handler directly ACPI: NFIT: Remove unnecessary .remove callback ACPI: NFIT: Install Notify() handler directly ACPI: HED: Install Notify() handler directly ACPI: battery: Install Notify() handler directly ACPI: video: Install Notify() handler directly ACPI: AC: Install Notify() handler directly ACPI: bus: Set driver_data to NULL every time .add() fails ACPI: bus: Introduce wrappers for ACPICA notify handler install/remove * acpi-video: ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2 ACPI: video: Put ACPI video and its child devices into D0 on boot ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470
2023-08-25Merge branch 'acpica'Rafael J. Wysocki17-24/+189
Merge ACPICA material for 6.6-rc1. This includes some fixes, cleanups and new material, mostly related to parsing tables. Specifics: - Suppress a GCC 12 dangling-pointer warning (Philip Prindeville). - Reformat the ACPI_STATE_COMMON macro and its users (George Guo). - Replace the ternary operator with ACPI_MIN() (Jiangshan Yi). - Add support for _DSC as per ACPI 6.5 (Saket Dumbre). - Remove a duplicate macro from zephyr header (Najumon B.A). - Add data structures for GED and _EVT tracking (Jose Marinho). - Fix misspelled CDAT DSMAS define (Dave Jiang). - Simplify an error message in acpi_ds_result_push() (Christophe Jaillet). - Add a struct size macro related to SRAT (Dave Jiang). - Add AML_NO_OPERAND_RESOLVE flag to Timer (Abhishek Mainkar). - Add support for RISC-V external interrupt controllers in MADT (Sunil V L). - Add RHCT flags, CMO and MMU nodes (Sunil V L). - Change ACPICA version to 20230628 (Bob Moore). * acpica: ACPICA: Update version to 20230628 ACPICA: RHCT: Add flags, CMO and MMU nodes ACPICA: MADT: Add RISC-V external interrupt controllers ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer ACPICA: Add a define for size of struct acpi_srat_generic_affinity device_handle ACPICA: Slightly simplify an error message in acpi_ds_result_push() ACPICA: Fix misspelled CDAT DSMAS define ACPICA: Add interrupt command to acpiexec ACPICA: Detect GED device and keep track of _EVT ACPICA: fix for conflict macro definition on zephyr interface ACPICA: Add support for _DSC as per ACPI 6.5 ACPICA: exserial.c: replace ternary operator with ACPI_MIN() ACPICA: Modify ACPI_STATE_COMMON ACPICA: Fix GCC 12 dangling-pointer warning
2023-08-25kallsyms: Fix kallsyms_selftest failureYonghong Song2-32/+8
Kernel test robot reported a kallsyms_test failure when clang lto is enabled (thin or full) and CONFIG_KALLSYMS_SELFTEST is also enabled. I can reproduce in my local environment with the following error message with thin lto: [ 1.877897] kallsyms_selftest: Test for 1750th symbol failed: (tsc_cs_mark_unstable) addr=ffffffff81038090 [ 1.877901] kallsyms_selftest: abort It appears that commit 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") caused the failure. Commit 8cc32a9bbf29 changed cleanup_symbol_name() based on ".llvm." instead of '.' where ".llvm." is appended to a before-lto-optimization local symbol name. We need to propagate such knowledge in kallsyms_selftest.c as well. Further more, compare_symbol_name() in kallsyms.c needs change as well. In scripts/kallsyms.c, kallsyms_names and kallsyms_seqs_of_names are used to record symbol names themselves and index to symbol names respectively. For example: kallsyms_names: ... __amd_smn_rw._entry <== seq 1000 __amd_smn_rw._entry.5 <== seq 1001 __amd_smn_rw.llvm.<hash> <== seq 1002 ... kallsyms_seqs_of_names are sorted based on cleanup_symbol_name() through, so the order in kallsyms_seqs_of_names actually has index 1000: seq 1002 <== __amd_smn_rw.llvm.<hash> (actual symbol comparison using '__amd_smn_rw') index 1001: seq 1000 <== __amd_smn_rw._entry index 1002: seq 1001 <== __amd_smn_rw._entry.5 Let us say at a particular point, at index 1000, symbol '__amd_smn_rw.llvm.<hash>' is comparing to '__amd_smn_rw._entry' where '__amd_smn_rw._entry' is the one to search e.g., with function kallsyms_on_each_match_symbol(). The current implementation will find out '__amd_smn_rw._entry' is less than '__amd_smn_rw.llvm.<hash>' and then continue to search e.g., index 999 and never found a match although the actual index 1001 is a match. To fix this issue, let us do cleanup_symbol_name() first and then do comparison. In the above case, comparing '__amd_smn_rw' vs '__amd_smn_rw._entry' and '__amd_smn_rw._entry' being greater than '__amd_smn_rw', the next comparison will be > index 1000 and eventually index 1001 will be hit an a match is found. For any symbols not having '.llvm.' substr, there is no functionality change for compare_symbol_name(). Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Yonghong Song <[email protected]> Reviewed-by: Song Liu <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]>
2023-08-25Merge tag 'riscv-for-linus-6.5-rc8' of ↵Linus Torvalds5-75/+7
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "This is obviously not ideal, particularly for something this late in the cycle. Unfortunately we found some uABI issues in the vector support while reviewing the GDB port, which has triggered a revert -- probably a good sign we should have reviewed GDB before merging this, I guess I just dropped the ball because I was so worried about the context extension and libc suff I forgot. Hence the late revert. There's some risk here as we're still exposing the vector context for signal handlers, but changing that would have meant reverting all of the vector support. The issues we've found so far have been fixed already and they weren't absolute showstoppers, so we're essentially just playing it safe by holding ptrace support for another release (or until we get through a proper userspace code review). Summary: - The vector ucontext extension has been extended with vlenb - The vector registers ELF core dump note type has been changed to avoid aliasing with the CSR type used in embedded systems - Support for accessing vector registers via ptrace() has been reverted - Another build fix for the ISA spec changes around Zifencei/Zicsr that manifests on some systems built with binutils-2.37 and gcc-11.2" * tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix build errors using binutils2.37 toolchains RISC-V: vector: export VLENB csr in __sc_riscv_v_state RISC-V: Remove ptrace support for vectors
2023-08-25Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes'Alexei Starovoitov7-14/+165
Dave Marchevsky says: ==================== BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes This series is the third of three (or more) followups to address issues in the bpf_refcount shared ownership implementation discovered by Kumar. This series addresses the use-after-free scenario described in [0]. The first followup series ([1]) also attempted to address the same use-after-free, but only got rid of the splat without addressing the underlying issue. After this series the underyling issue is fixed and bpf_refcount_acquire can be re-enabled. The main fix here is migration of bpf_obj_drop to use bpf_mem_free_rcu. To understand why this fixes the issue, let us consider the example interleaving provided by Kumar in [0]: CPU 0 CPU 1 n = bpf_obj_new lock(lock1) bpf_rbtree_add(rbtree1, n) m = bpf_rbtree_acquire(n) unlock(lock1) kptr_xchg(map, m) // move to map // at this point, refcount = 2 m = kptr_xchg(map, NULL) lock(lock2) lock(lock1) bpf_rbtree_add(rbtree2, m) p = bpf_rbtree_first(rbtree1) if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A bpf_rbtree_remove(rbtree1, p) unlock(lock1) bpf_obj_drop(p) // B bpf_refcount_acquire(m) // use-after-free ... Before this series, bpf_obj_drop returns memory to the allocator using bpf_mem_free. At this point (B in the example) there might be some non-owning references to that memory which the verifier believes are valid, but where the underlying memory was reused for some other allocation. Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs") attempted to fix this by doing refcount_inc_non_zero on refcount_acquire in instead of refcount_inc under the assumption that preventing erroneous incr-on-0 would be sufficient. This isn't true, though: refcount_inc_non_zero must *check* if the refcount is zero, and the memory it's checking could have been reused, so the check may look at and incr random reused bytes. If we wait to reuse this memory until all non-owning refs that could point to it are gone, there is no possibility of this scenario happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted nodes accomplishes this. For such nodes, the validity of their underlying memory is now tied to RCU critical section. This matches MEM_RCU trustedness expectations, so the series takes the opportunity to more explicitly mark this trustedness state. The functional effects of trustedness changes here are rather small. This is largely due to local kptrs having separate verifier handling - with implicit trustedness assumptions - than arbitrary kptrs. Regardless, let's take the opportunity to move towards a world where trustedness is more explicitly handled. Changelog: v1 -> v2: https://lore.kernel.org/bpf/[email protected]/ Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire") * Spent some time experimenting with a better approach as per convo w/ Yonghong on v1's patch. It started getting too complex, so left unchanged for now. Yonghong was fine with this approach being shipped. Patch 2 ("bpf: Consider non-owning refs trusted") * Add Yonghong ack Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes") * Add Yonghong ack Patch 4 ("bpf: Reenable bpf_refcount_acquire") * Add Yonghong ack Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected") * Undo a nonfunctional whitespace change that shouldn't have been included (Yonghong) * Better logging message when complaining about rcu_read_{lock,unlock} in rbtree cb (Alexei) * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock (Yonghong, Alexei) Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS") * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei) * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly. * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock} (Alexei) * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in sleepable progs" Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs") * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success test (Alexei) Summary of patch contents, with sub-bullets being leading questions and comments I think are worth reviewer attention: * Patches 1 and 2 are moreso documententation - and enforcement, in patch 1's case - of existing semantics / expectations * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that their underlying memory is not reused until RCU grace period elapses * Perhaps it makes sense to move to mem_free_rcu for _all_ non-owning refs in the future, not just refcounted. This might allow custom non-owning ref lifetime + invalidation logic to be entirely subsumed by MEM_RCU handling. IMO this needs a bit more thought and should be tackled outside of a fix series, so it's not attempted here. * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix the remaining use-after-free * One might expect this patch to be last in the series, or last before selftest changes. Patches 5 and 6 don't change verification or runtime behavior for existing BPF progs, though. * Patch 5 brings the verifier's understanding of refcounted node trustedness in line with Patch 4's changes * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable progs. Marked RFC for a few reasons: * bpf_spin_{lock,unlock} haven't been usable in sleepable progs since before the introduction of bpf linked list and rbtree. As such this feels more like a new feature that may not belong in this fixes series. * Patch 7 adds tests [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/ [1]: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-08-25selftests/bpf: Add tests for rbtree API interaction in sleepable progsDave Marchevsky2-0/+99
Confirm that the following sleepable prog states fail verification: * bpf_rcu_read_unlock before bpf_spin_unlock * RCU CS will last at least as long as spin_lock CS Also confirm that correct usage passes verification, specifically: * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog * Implied RCU CS due to spin_lock CS None of the selftest progs actually attach to bpf_testmod's bpf_testmod_test_read. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-08-25bpf: Allow bpf_spin_{lock,unlock} in sleepable progsDave Marchevsky2-6/+5
Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks") disabled bpf_spin_lock usage in sleepable progs, stating: Sleepable LSM programs can be preempted which means that allowng spin locks will need more work (disabling preemption and the verifier ensuring that no sleepable helpers are called when a spin lock is held). This patch disables preemption before grabbing bpf_spin_lock. The second requirement above "no sleepable helpers are called when a spin lock is held" is implicitly enforced by current verifier logic due to helper calls in spin_lock CS being disabled except for a few exceptions, none of which sleep. Due to above preemption changes, bpf_spin_lock CS can also be considered a RCU CS, so verifier's in_rcu_cs check is modified to account for this. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-08-25bpf: Consider non-owning refs to refcounted nodes RCU protectedDave Marchevsky2-2/+14
An earlier patch in the series ensures that the underlying memory of nodes with bpf_refcount - which can have multiple owners - is not reused until RCU grace period has elapsed. This prevents use-after-free with non-owning references that may point to recently-freed memory. While RCU read lock is held, it's safe to dereference such a non-owning ref, as by definition RCU GP couldn't have elapsed and therefore underlying memory couldn't have been reused. From the perspective of verifier "trustedness" non-owning refs to refcounted nodes are now trusted only in RCU CS and therefore should no longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them MEM_RCU in order to reflect this new state. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-08-25bpf: Reenable bpf_refcount_acquireDave Marchevsky2-4/+27
Now that all reported issues are fixed, bpf_refcount_acquire can be turned back on. Also reenable all bpf_refcount-related tests which were disabled. This a revert of: * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled") * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed") Signed-off-by: Dave Marchevsky <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-08-25bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodesDave Marchevsky1-1/+5
This is the final fix for the use-after-free scenario described in commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs"). That commit, by virtue of changing bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed the "refcount incr on 0" splat. The not_zero check in refcount_inc_not_zero, though, still occurs on memory that could have been free'd and reused, so the commit didn't properly fix the root cause. This patch actually fixes the issue by free'ing using the recently-added bpf_mem_free_rcu, which ensures that the memory is not reused until RCU grace period has elapsed. If that has happened then there are no non-owning references alive that point to the recently-free'd memory, so it can be safely reused. Signed-off-by: Dave Marchevsky <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>