aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-11Merge branch 'netconsole-fix-potential-race-condition-and-improve-code-clarity'Jakub Kicinski1-3/+2
Breno Leitao says: ==================== netconsole: improve code clarity These changes aim to enhance the reliability of netconsole by eliminating the potential race condition and improve maintainability by making the code more straightforward to understand and modify. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net: netconsole: Eliminate redundant setting of enabled fieldBreno Leitao1-2/+1
When disabling a netconsole target, enabled_store() is called with enabled=false. Currently, this results in updating the nt->enabled field twice: 1. Inside the if/else block, with the target_list_lock spinlock held 2. Later, without the target_list_lock This patch eliminates the redundancy by setting the field only once, improving efficiency and reducing potential race conditions. Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net: netconsole: Remove unnecessary cast from boolBreno Leitao1-1/+1
The 'enabled' variable is already a bool, so casting it to its value is redundant. Remove the superfluous cast, improving code clarity without changing functionality. Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11selftests: openvswitch: retry instead of sleepAdrian Moreno2-8/+38
There are a couple of places where the test script "sleep"s to wait for some external condition to be met. This is error prone, specially in slow systems (identified in CI by "KSFT_MACHINE_SLOW=yes"). To fix this, add a "ovs_wait" function that tries to execute a command a few times until it succeeds. The timeout used is set to 5s for "normal" systems and doubled if a slow CI machine is detected. This should make the following work: $ vng --build \ --config tools/testing/selftests/net/config \ --config kernel/configs/debug.config $ vng --run . --user root -- "make -C tools/testing/selftests/ \ KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests" Signed-off-by: Adrian Moreno <[email protected]> Reviewed-by: Ilya Maximets <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11netdevice: define and allocate &net_device _properly_Alexander Lobakin3-31/+16
In fact, this structure contains a flexible array at the end, but historically its size, alignment etc., is calculated manually. There are several instances of the structure embedded into other structures, but also there's ongoing effort to remove them and we could in the meantime declare &net_device properly. Declare the array explicitly, use struct_size() and store the array size inside the structure, so that __counted_by() can be applied. Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the allocated buffer is aligned to what the user expects. Also, change its alignment from %NETDEV_ALIGN to the cacheline size as per several suggestions on the netdev ML. bloat-o-meter for vmlinux: free_netdev 445 440 -5 netdev_freemem 24 - -24 alloc_netdev_mqs 1481 1450 -31 On x86_64 with several NICs of different vendors, I was never able to get a &net_device pointer not aligned to the cacheline size after the change. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net: psample: fix flag being set in wrong skbAdrian Moreno2-6/+9
A typo makes PSAMPLE_ATTR_SAMPLE_RATE netlink flag be added to the wrong sk_buff. Fix the error and make the input sk_buff pointer "const" so that it doesn't happen again. Acked-by: Eelco Chaudron <[email protected]> Fixes: 7b1b2b60c63f ("net: psample: allow using rate as probability") Signed-off-by: Adrian Moreno <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Antoine Tenart <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-12bna: adjust 'name' buf size of bna_tcb and bna_ccb structuresAlexey Kodanev2-6/+7
To have enough space to write all possible sprintf() args. Currently 'name' size is 16, but the first '%s' specifier may already need at least 16 characters, since 'bnad->netdev->name' is used there. For '%d' specifiers, assume that they require: * 1 char for 'tx_id + tx_info->tcb[i]->id' sum, BNAD_MAX_TXQ_PER_TX is 8 * 2 chars for 'rx_id + rx_info->rx_ctrl[i].ccb->id', BNAD_MAX_RXP_PER_RX is 16 And replace sprintf with snprintf. Detected using the static analysis tool - Svace. Fixes: 8b230ed8ec96 ("bna: Brocade 10Gb Ethernet device driver") Signed-off-by: Alexey Kodanev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-11i40e: fix: remove needless retries of NVM updateAleksandr Loktionov1-4/+0
Remove wrong EIO to EGAIN conversion and pass all errors as is. After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only replace F/W specific error codes with Linux kernel generic, all EIO errors suddenly started to be converted into EAGAIN which leads nvmupdate to retry until it timeouts and sometimes fails after more than 20 minutes in the middle of NVM update, so NVM becomes corrupted. The bug affects users only at the time when they try to update NVM, and only F/W versions that generate errors while nvmupdate. For example, X710DA2 with 0x8000ECB7 F/W is affected, but there are probably more... Command for reproduction is just NVM update: ./nvmupdate64 In the log instead of: i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM) appears: i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM i40e: eeprom check failed (-5), Tx/Rx traffic disabled The problematic code did silently convert EIO into EAGAIN which forced nvmupdate to ignore EAGAIN error and retry the same operation until timeout. That's why NVM update takes 20+ minutes to finish with the fail in the end. Fixes: 230f3d53a547 ("i40e: remove i40e_status") Co-developed-by: Kelvin Kang <[email protected]> Signed-off-by: Kelvin Kang <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11Merge tag 'wireless-next-2024-07-11' of ↵Jakub Kicinski162-3638/+16629
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.11 Most likely the last "new features" pull request for v6.11 with changes both in stack and in drivers. The big thing is the multiple radios for wiphy feature which makes it possible to better advertise radio capabilities to user space. mt76 enabled MLO and iwlwifi re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support. Major changes: cfg80211/mac80211 * remove DEAUTH_NEED_MGD_TX_PREP flag * multiple radios per wiphy support mac80211_hwsim * multi-radio wiphy support ath12k * DebugFS support for datapath statistics * WCN7850: support for WoW (Wake on WLAN) * WCN7850: device-tree bindings ath11k * QCA6390: device-tree bindings iwlwifi * mvm: re-enable Multi-Link Operation (MLO) * aggregation (A-MSDU) optimisations rtw89 * preparation for RTL8852BE-VT support * WoWLAN support for WiFi 6 chips * 36-bit PCI DMA support mt76 * mt7925 Multi-Link Operation (MLO) support * tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (204 commits) wifi: mac80211: fix AP chandef capturing in CSA wifi: iwlwifi: correctly reference TSO page information wifi: mt76: mt792x: fix scheduler interference in drv own process wifi: mt76: mt7925: enabling MLO when the firmware supports it wifi: mt76: mt7925: remove the unused mt7925_mcu_set_chan_info wifi: mt76: mt7925: update mt7925_mac_link_bss_add for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_basic_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_set_timing for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_phy_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_rate_ctrl_tlv for MLO wifi: mt76: mt7925: add mt7925_mcu_sta_eht_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_update for MLO wifi: mt76: mt7925: update mt7925_mcu_add_bss_info for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_mld_tlv for MLO wifi: mt76: mt7925: add mt7925_[assign,unassign]_vif_chanctx wifi: mt76: add def_wcid to struct mt76_wcid wifi: mt76: mt7925: report link information in rx status wifi: mt76: mt7925: update rate index according to link id wifi: mt76: mt7925: add link handling in the mt7925_ipv6_addr_change ... ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net: ethtool: Fix RSS settingSaeed Mahameed1-1/+2
When user submits a rxfh set command without touching XFRM_SYM_XOR, rxfh.input_xfrm is set to RXH_XFRM_NO_CHANGE, which is equal to 0xff. Testing if (rxfh.input_xfrm & RXH_XFRM_SYM_XOR && !ops->cap_rss_sym_xor_supported) return -EOPNOTSUPP; Will always be true on devices that don't set cap_rss_sym_xor_supported, since rxfh.input_xfrm & RXH_XFRM_SYM_XOR is always true, if input_xfrm was not set, i.e RXH_XFRM_NO_CHANGE=0xff, which will result in failure of any command that doesn't require any change of XFRM, e.g RSS context or hash function changes. To avoid this breakage, test if rxfh.input_xfrm != RXH_XFRM_NO_CHANGE before testing other conditions. Note that the problem will only trigger with XFRM-aware userspace, old ethtool CLI would continue to work. Fixes: 0dd415d15505 ("net: ethtool: add a NO_CHANGE uAPI for new RXFH's input_xfrm") Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Ahmed Zaki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net: reduce rtnetlink_rcv_msg() stack usageEric Dumazet1-6/+12
IFLA_MAX is increasing slowly but surely. Some compilers use more than 512 bytes of stack in rtnetlink_rcv_msg() because it calls rtnl_calcit() for RTM_GETLINK message. Use noinline_for_stack attribute to not inline rtnl_calcit(), and directly use nla_for_each_attr_type() (Jakub suggestion) because we only care about IFLA_EXT_MASK at this stage. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11net/sched: act_skbmod: convert comma to semicolonChen Ni1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11bcachefs: bch2_gc_btree() should not use btree_root_lockKent Overstreet1-8/+22
btree_root_lock is for the root keys in btree_root, not the pointers to the nodes themselves; this fixes a lock ordering issue between btree_root_lock and btree node locks. Signed-off-by: Kent Overstreet <[email protected]>
2024-07-11bcachefs: Set PF_MEMALLOC_NOFS when trans->lockedKent Overstreet4-8/+30
proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <[email protected]>
2024-07-11bcachefs; Use trans_unlock_long() when waiting on allocatorKent Overstreet1-1/+1
not using unlock_long() blocks key cache reclaim, and the allocator may take awhile Signed-off-by: Kent Overstreet <[email protected]>
2024-07-11Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"Kent Overstreet1-2/+1
This reverts commit 86d81ec5f5f05846c7c6e48ffb964b24cba2e669. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <[email protected]>
2024-07-11Merge tag 'for-6.10/dm-fixes-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - Fix broken discard for device mapper VDO target * tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm vdo: replace max_discard_sectors with max_hw_discard_sectors
2024-07-11Merge branch ↵Jakub Kicinski4-19/+44
'ethtool-use-the-rss-context-xarray-in-ring-deactivation-safety-check' Jakub Kicinski says: ==================== ethtool: use the rss context XArray in ring deactivation safety-check Now that we have an XArray storing information about all extra RSS contexts - use it to extend checks already performed using ethtool_get_max_rxfh_channel(). ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11ethtool: use the rss context XArray in ring deactivation safety-checkJakub Kicinski1-4/+29
ethtool_get_max_rxfh_channel() gets called when user requests deactivating Rx channels. Check the additional RSS contexts, too. While we do track whether RSS context has an indirection table explicitly set by the user, no driver looks at that bit. Assume drivers won't auto-regenerate the additional tables, to be safe. Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11ethtool: fail closed if we can't get max channel used in indirection tablesJakub Kicinski4-19/+19
Commit 0d1b7d6c9274 ("bnxt: fix crashes when reducing ring count with active RSS contexts") proves that allowing indirection table to contain channels with out of bounds IDs may lead to crashes. Currently the max channel check in the core gets skipped if driver can't fetch the indirection table or when we can't allocate memory. Both of those conditions should be extremely rare but if they do happen we should try to be safe and fail the channel change. Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11bpf: annotate BTF show functions with __printfAlan Maguire1-4/+4
-Werror=suggest-attribute=format warns about two functions in kernel/bpf/btf.c [1]; add __printf() annotations to silence these warnings since for CONFIG_WERROR=y they will trigger build failures. [1] https://lore.kernel.org/bpf/[email protected]/ Fixes: 31d0bc81637d ("bpf: Move to generic BTF show support, apply it to seq files/strings") Reported-by: Mirsad Todorovac <[email protected]> Signed-off-by: Alan Maguire <[email protected]> Tested-by: Mirsad Todorovac <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski195-1334/+2210
Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11dm vdo: replace max_discard_sectors with max_hw_discard_sectorsBruce Johnston1-1/+1
Commit 4f563a64732d ("block: add a max_user_discard_sectors queue limit") changed block core to set max_discard_sectors to: min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors) Commit 825d8bbd2f32 ("dm: always manage discard support in terms of max_hw_discard_sectors") fixed most dm targetss to deal with this, by replacing max_discard_sectors with max_hw_discard_sectors. Unfortunately, dm-vdo did not get fixed at that time. Fixes: 825d8bbd2f32 ("dm: always manage discard support in terms of max_hw_discard_sectors") Signed-off-by: Bruce Johnston <[email protected]> Signed-off-by: Matthew Sakai <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]>
2024-07-11Merge tag 'spi-fix-v6.10-rc7' of ↵Linus Torvalds7-26/+49
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "This fixes two regressions that have been bubbling along for a large part of this release. One is a revert of the multi mode support for the OMAP SPI controller, this introduced regressions on a number of systems and while there has been progress on fixing those we've not got something that works for everyone yet so let's just drop the change for now. The other is a series of fixes from David Lechner for his recent message optimisation work, this interacted badly with spi-mux which is altogether too clever with recursive use of the bus and creates situations that hadn't been considered. There are also a couple of small driver specific fixes, including one more patch from David for sleep duration calculations in the AXI driver" * tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: mux: set ctlr->bits_per_word_mask spi: add defer_optimize_message controller flag spi: don't unoptimize message in spi_async() spi: omap2-mcspi: Revert multi mode support spi: davinci: Unset POWERDOWN bit when releasing resources spi: axi-spi-engine: fix sleep calculation spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
2024-07-11igc: Remove the internal 'eee_advert' fieldSasha Neftin3-10/+0
Since the kernel's 'ethtool_keee' structure is in use, the internal 'eee_advert' field becomes pointless and can be removed. This patch comes to clean up this redundant code. Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Mor Bar-Gabay <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11ice: remove eswitch rebuildMichal Swiatkowski3-24/+0
Since the port representors are added one by one there is no need to do eswitch rebuild. Each port representor is detached and attached in VF reset path. Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11ice: Add support for devlink local_forwarding paramPawel Kaminski5-1/+166
Add support for driver-specific devlink local_forwarding param. Supported values are "enabled", "disabled" and "prioritized". Default configuration is set to "enabled". Add documentation in networking/devlink/ice.rst. In previous generations of Intel NICs the transmit scheduler was only limited by PCIe bandwidth when scheduling/assigning hairpin-bandwidth between VFs. Changes to E810 HW design introduced scheduler limitation, so that available hairpin-bandwidth is bound to external port speed. In order to address this limitation and enable NFV services such as "service chaining" a knob to adjust the scheduler config was created. Driver can send a configuration message to the FW over admin queue and internal FW logic will reconfigure HW to prioritize and add more BW to VF to VF traffic. An end result, for example, 10G port will no longer limit hairpin-bandwidth to 10G and much higher speeds can be achieved. Devlink local_forwarding param set to "prioritized" enables higher hairpin-bandwitdh on related PFs. Configuration is applicable only to 8x10G and 4x25G cards. Changing local_forwarding configuration will trigger CORER reset in order to take effect. Example command to change current value: devlink dev param set pci/0000:b2:00.3 name local_forwarding \ value prioritized \ cmode runtime Co-developed-by: Michal Wilczynski <[email protected]> Signed-off-by: Michal Wilczynski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Pawel Kaminski <[email protected]> Signed-off-by: Wojciech Drewek <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11i40e: correct i40e_addr_to_hkey() name in kdocSimon Horman1-1/+1
Correct name of i40e_addr_to_hkey() in it's kdoc. kernel-doc -none reports: drivers/net/ethernet/intel/i40e/i40e.h:739: warning: expecting prototype for i40e_mac_to_hkey(). Prototype was for i40e_addr_to_hkey() instead Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11net: intel: Remove MODULE_AUTHORsTony Nguyen14-14/+0
We are moving away from the Sourceforge email address. Rather than removing or updating the email for the affected entries, remove the MODULE_AUTHOR altogether as its usage is incorrect [1]. Link: https://lore.kernel.org/netdev/20200626115236.7f36d379@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1] Reviewed-by: Jesse Brandeburg <[email protected]> Acked-by: Alexander Lobakin <[email protected]> # libeth, libie Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11net: pse-pd: pd692x0: Fix spelling mistake "availables" -> "available"Colin Ian King1-1/+1
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Acked-by: Kory Maincent <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-11ice: Add tracepoint for adding and removing switch rulesMarcin Szycik4-2/+43
Track the number of rules and recipes added to switch. Add a tracepoint to ice_aq_sw_rules(), which shows both rule and recipe count. This information can be helpful when designing a set of rules to program to the hardware, as it shows where the practical limit is. Actual limits are known (64 recipes, 32k rules), but it's hard to translate these values to how many rules the *user* can actually create, because of extra metadata being implicitly added, and recipe/rule chaining. Chaining combines several recipes/rules to create a larger recipe/rule, so one large rule added by the user might actually consume multiple rules from hardware perspective. Rule counter is simply incremented/decremented in ice_aq_sw_rules(), since all rules are added or removed via it. Counting recipes is harder, as recipes can't be removed (only overwritten). Recipes added via ice_aq_add_recipe() could end up being unused, when there is an error in later stages of rule creation. Instead, track the allocation and freeing of recipes, which should reflect the actual usage of recipes (if something fails after recipe(s) were created, caller should free them). Also, a number of recipes are loaded from NVM by default - initialize the recipe counter with the number of these recipes on switch initialization. Example configuration: cd /sys/kernel/tracing echo function > current_tracer echo ice_aq_sw_rules > set_ftrace_filter echo ice_aq_sw_rules > set_event echo 1 > tracing_on cat trace Example output: tc-4097 [069] ...1. 787.595536: ice_aq_sw_rules <-ice_rem_adv_rule tc-4097 [069] ..... 787.595705: ice_aq_sw_rules: rules=9 recipes=15 tc-4098 [057] ...1. 787.652033: ice_aq_sw_rules <-ice_add_adv_rule tc-4098 [057] ..... 787.652201: ice_aq_sw_rules: rules=10 recipes=16 Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11ice: Remove unused members from switch APIMarcin Szycik4-81/+10
Remove several members of struct ice_sw_recipe and struct ice_prot_lkup_ext. Remove struct ice_recp_grp_entry and struct ice_pref_recipe_group, since they are now unused as well. All of the deleted members were only written to and never read, so it's pointless to keep them. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11ice: Optimize switch recipe creationMarcin Szycik3-337/+212
Currently when creating switch recipes, switch ID is always added as the first word in every recipe. There are only 5 words in a recipe, so one word is always wasted. This is also true for the last recipe, which stores result indexes (in case of chain recipes). Therefore the maximum usable length of a chain recipe is 4 * 4 = 16 words. 4 words in a recipe, 4 recipes that can be chained (using a 5th one for result indexes). Current max size chained recipe: 0: smmmm 1: smmmm 2: smmmm 3: smmmm 4: srrrr Where: s - switch ID m - regular match (e.g. ipv4 src addr, udp dst port, etc.) r - result index Switch ID does not actually need to be present in every recipe, only in one of them (in case of chained recipe). This frees up to 8 extra words: 3 from recipes in the middle (because first recipe still needs to have switch ID), and 5 from one extra recipe (because now the last recipe also does not have switch ID, so it can chain 1 more recipe). Max size chained recipe after changes: 0: smmmm 1: Mmmmm 2: Mmmmm 3: Mmmmm 4: MMMMM 5: Rrrrr Extra usable words available after this change are highlighted with capital letters. Changing how switch ID is added is not straightforward, because it's not a regular lookup. Its FV index and mask can't be determined based on protocol + offset pair read from package and instead need to be added manually. Additionally, change how result indexes are added. Currently they are always inserted in a new recipe at the end. Example for 13 words, (with above optimization, switch ID being one of the words): 0: smmmm 1: mmmmm 2: mmmxx 3: rrrxx Where: x - unused word In this and some other cases, the result indexes can be moved just after last matches because there are unused words, saving one recipe. Example for 13 words after both optimizations: 0: smmmm 1: mmmmm 2: mmmrr Note how one less result index is needed in this case, because the last recipe does not need to "link" to itself. There are cases when adding an additional recipe for result indexes cannot be avoided. In that cases result indexes are all put in the last recipe. Example for 14 words after both optimizations: 0: smmmm 1: mmmmm 2: mmmmx 3: rrrxx With these two changes, recipes/rules are more space efficient, allowing more to be created in total. Co-developed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11Merge tag 'net-6.10-rc8' of ↵Linus Torvalds37-255/+561
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - core: fix rc7's __skb_datagram_iter() regression Current release - new code bugs: - eth: bnxt: fix crashes when reducing ring count with active RSS contexts Previous releases - regressions: - sched: fix UAF when resolving a clash - skmsg: skip zero length skb in sk_msg_recvmsg2 - sunrpc: fix kernel free on connection failure in xs_tcp_setup_socket - tcp: avoid too many retransmit packets - tcp: fix incorrect undo caused by DSACK of TLP retransmit - udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). - eth: ks8851: fix deadlock with the SPI chip variant - eth: i40e: fix XDP program unloading while removing the driver Previous releases - always broken: - bpf: - fix too early release of tcx_entry - fail bpf_timer_cancel when callback is being cancelled - bpf: fix order of args in call to bpf_map_kvcalloc - netfilter: nf_tables: prefer nft_chain_validate - ppp: reject claimed-as-LCP but actually malformed packets - wireguard: avoid unaligned 64-bit memory accesses" * tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket net/sched: Fix UAF when resolving a clash net: ks8851: Fix potential TX stall after interface reopen udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). netfilter: nf_tables: prefer nft_chain_validate netfilter: nfnetlink_queue: drop bogus WARN_ON ethtool: netlink: do not return SQI value if link is down ppp: reject claimed-as-LCP but actually malformed packets selftests/bpf: Add timer lockup selftest net: ethernet: mtk-star-emac: set mac_managed_pm when probing e1000e: fix force smbus during suspend flow tcp: avoid too many retransmit packets bpf: Defer work in bpf_timer_cancel_and_free bpf: Fail bpf_timer_cancel when callback is being cancelled bpf: fix order of args in call to bpf_map_kvcalloc net: ethernet: lantiq_etop: fix double free in detach i40e: Fix XDP program unloading while removing the driver net: fix rc7's __skb_datagram_iter() net: ks8851: Fix deadlock with the SPI chip variant octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability() ...
2024-07-11Merge tag 'vfs-6.10-rc8.fixes' of ↵Linus Torvalds27-133/+213
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "cachefiles: - Export an existing and add a new cachefile helper to be used in filesystems to fix reference count bugs - Use the newly added fscache_ty_get_volume() helper to get a reference count on an fscache_volume to handle volumes that are about to be removed cleanly - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN wait for all ongoing cookie lookups to complete and for the object count to reach zero - Propagate errors from vfs_getxattr() to avoid an infinite loop in cachefiles_check_volume_xattr() because it keeps seeing ESTALE - Don't send new requests when an object is dropped by raising CACHEFILES_ONDEMAND_OJBSTATE_DROPPING - Cancel all requests for an object that is about to be dropped - Wait for the ondemand_boject_worker to finish before dropping a cachefiles object to prevent use-after-free - Use cyclic allocation for message ids to better handle id recycling - Add missing lock protection when iterating through the xarray when polling netfs: - Use standard logging helpers for debug logging VFS: - Fix potential use-after-free in file locks during trace_posix_lock_inode(). The tracepoint could fire while another task raced it and freed the lock that was requested to be traced - Only increment the nr_dentry_negative counter for dentries that are present on the superblock LRU. Currently, DCACHE_LRU_LIST list is used to detect this case. However, the flag is also raised in combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru is used. So checking only DCACHE_LRU_LIST will lead to wrong nr_dentry_negative count. Fix the check to not count dentries that are on a shrink related list Misc: - hfsplus: fix an uninitialized value issue in copy_name - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even though we switched it to kmap_local_page() a while ago" * tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minixfs: Fix minixfs_rename with HIGHMEM hfsplus: fix uninit-value in copy_name vfs: don't mod negative dentry count when on shrinker list filelock: fix potential use-after-free in posix_lock_inode cachefiles: add missing lock protection when polling cachefiles: cyclic allocation of msg_id to avoid reuse cachefiles: wait for ondemand_object_worker to finish when dropping object cachefiles: cancel all requests for the object that is being dropped cachefiles: stop sending new request when dropping object cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie() cachefiles: fix slab-use-after-free in fscache_withdraw_volume() netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume() netfs: Switch debug logging to pr_debug()
2024-07-11ice: remove unused recipe bookkeeping dataMichal Swiatkowski3-43/+15
Remove root_buf from recipe struct. Its only usage was in ice_find_recp(), where if recipe had an inverse action, it was skipped, but actually the driver never adds inverse actions, so effectively it was pointless. Without root_buf, the recipe data element in ice_add_sw_recipe() does not need to be persistent and can also be automatically deallocated with __free, which nicely simplifies unroll. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIGPuranjay Mohan1-2/+2
When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls __bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them the struct bpf_tramp_image *im pointer as an argument in R0. The trampoline generation code uses emit_addr_mov_i64() to emit instructions for moving the bpf_tramp_image address into R0, but emit_addr_mov_i64() assumes the address to be in the vmalloc() space and uses only 48 bits. Because bpf_tramp_image is allocated using kzalloc(), its address can use more than 48-bits, in this case the trampoline will pass an invalid address to __bpf_tramp_enter/exit() causing a kernel crash. Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64() as it can work with addresses that are greater than 48-bits. Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Signed-off-by: Puranjay Mohan <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Closes: https://lore.kernel.org/all/SJ0PR15MB461564D3F7E7A763498CA6A8CBDB2@SJ0PR15MB4615.namprd15.prod.outlook.com/ Link: https://lore.kernel.org/bpf/[email protected]
2024-07-11ice: Simplify bitmap setting in adding recipeMichal Swiatkowski1-15/+9
Remove unnecessary size checks when copying bitmaps in ice_add_sw_recipe() and replace them with compile time assert. Check if the bitmaps are equal size, as they are copied both ways. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Co-developed-by: Marcin Szycik <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11ice: Remove reading all recipes before adding a new oneMichal Swiatkowski1-27/+2
The content of the first read recipe is used as a template when adding a recipe. It isn't needed - only prune index is directly set from there. Set it in the code instead. Also, now there's no need to set rid and lookup indexes to 0, as the whole recipe buffer is initialized to 0. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11mmc: davinci_mmc: Prevent transmitted data size from exceeding sgm's lengthBastien Curutchet1-0/+3
No check is done on the size of the data to be transmiited. This causes a kernel panic when this size exceeds the sg_miter's length. Limit the number of transmitted bytes to sgm->length. Cc: [email protected] Fixes: ed01d210fd91 ("mmc: davinci_mmc: Use sg_miter for PIO") Signed-off-by: Bastien Curutchet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2024-07-11mmc: sdhci: Fix max_seg_size for 64KiB PAGE_SIZEAdrian Hunter1-0/+15
blk_queue_max_segment_size() ensured: if (max_size < PAGE_SIZE) max_size = PAGE_SIZE; whereas: blk_validate_limits() makes it an error: if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE)) return -EINVAL; The change from one to the other, exposed sdhci which was setting maximum segment size too low in some circumstances. Fix the maximum segment size when it is too low. Fixes: 616f87661792 ("mmc: pass queue_limits to blk_mq_alloc_disk") Cc: [email protected] Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Jon Hunter <[email protected]> Tested-by: Jon Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2024-07-11ice: Remove unused struct ice_prot_lkup_ext membersMarcin Szycik2-29/+19
Remove field_off as it's never used. Remove done bitmap, as its value is only checked and never assigned. Reusing sub-recipes while creating new root recipes is currently not supported in the driver. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-11Merge tag 'asoc-fix-v6.10-rc7' of ↵Takashi Iwai4-71/+176
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.10 A few fairly small fixes for ASoC, there's a relatively large set of hardening changes for the cs_dsp firmware file parsing and a couple of other small device specific fixes.
2024-07-11btrfs: avoid races when tracking progress for extent map shrinkingFilipe Manana4-29/+76
We store the progress (root and inode numbers) of the extent map shrinker in fs_info without any synchronization but we can have multiple tasks calling into the shrinker during memory allocations when there's enough memory pressure for example. This can result in a task A reading fs_info->extent_map_shrinker_last_ino after another task B updates it, and task A reading fs_info->extent_map_shrinker_last_root before task B updates it, making task A see an odd state that isn't necessarily harmful but may make it skip certain inode ranges or do more work than necessary by going over the same inodes again. These unprotected accesses would also trigger warnings from tools like KCSAN. So add a lock to protect access to these progress fields. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-07-11btrfs: stop extent map shrinker if reschedule is neededFilipe Manana1-8/+31
The extent map shrinker can be called in a variety of contexts where we are under memory pressure, and of them is when a task is trying to allocate memory. For this reason the shrinker is typically called with a value of struct shrink_control::nr_to_scan that is much smaller than what we return in the nr_cached_objects callback of struct super_operations (fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does not take a long time and cause high latencies. However we can still take a lot of time in the shrinker even for a limited amount of nr_to_scan: 1) When traversing the red black tree that tracks open inodes in a root, as for example with millions of open inodes we get a deep tree which takes time searching for an inode; 2) Iterating over the extent map tree, which is a red black tree, of an inode when doing the rb_next() calls and when removing an extent map from the tree, since often that requires rebalancing the red black tree; 3) When trying to write lock an inode's extent map tree we may wait for a significant amount of time, because there's either another task about to do IO and searching for an extent map in the tree or inserting an extent map in the tree, and we can have thousands or even millions of extent maps for an inode. Furthermore, there can be concurrent calls to the shrinker so the lock might be busy simply because there is already another task shrinking extent maps for the same inode; 4) We often reschedule if we need to, which further increases latency. So improve on this by stopping the extent map shrinking code whenever we need to reschedule and make it skip an inode if we can't immediately lock its extent map tree. Reported-by: Mikhail Gavrilov <[email protected]> Reported-by: Andrea Gelmini <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CABXGCsMmmb36ym8hVNGTiU8yfUS_cGvoUmGCcBrGWq9OxTrs+A@mail.gmail.com/ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-07-11btrfs: use delayed iput during extent map shrinkingFilipe Manana1-1/+1
When putting an inode during extent map shrinking we're doing a standard iput() but that may take a long time in case the inode is dirty and we are doing the final iput that triggers eviction - the VFS will have to wait for writeback before calling the btrfs evict callback (see fs/inode.c:evict()). This slows down the task running the shrinker which may have been triggered while updating some tree for example, meaning locks are held as well as an open transaction handle. Also if the iput() ends up triggering eviction and the inode has no links anymore, then we trigger item truncation which requires flushing delayed items, space reservation to start a transaction and that may trigger the space reclaim task and wait for it, resulting in deadlocks in case the reclaim task needs for example to commit a transaction and the shrinker is being triggered from a path holding a transaction handle. Syzbot reported such a case with the following stack traces: ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted ------------------------------------------------------ kswapd0/111 is trying to acquire lock: ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 but task is already holding lock: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire mm/page_alloc.c:3783 [inline] fs_reclaim_acquire+0x102/0x160 mm/page_alloc.c:3797 might_alloc include/linux/sched/mm.h:334 [inline] slab_pre_alloc_hook mm/slub.c:3890 [inline] slab_alloc_node mm/slub.c:3980 [inline] kmem_cache_alloc_lru_noprof+0x58/0x2f0 mm/slub.c:4019 btrfs_alloc_inode+0x118/0xb20 fs/btrfs/inode.c:8411 alloc_inode+0x5d/0x230 fs/inode.c:261 iget5_locked fs/inode.c:1235 [inline] iget5_locked+0x1c9/0x2c0 fs/inode.c:1228 btrfs_iget_locked fs/btrfs/inode.c:5590 [inline] btrfs_iget_path fs/btrfs/inode.c:5607 [inline] btrfs_iget+0xfb/0x230 fs/btrfs/inode.c:5636 create_reloc_inode+0x403/0x820 fs/btrfs/relocation.c:3911 btrfs_relocate_block_group+0x471/0xe60 fs/btrfs/relocation.c:4114 btrfs_relocate_chunk+0x143/0x450 fs/btrfs/volumes.c:3373 __btrfs_balance fs/btrfs/volumes.c:4157 [inline] btrfs_balance+0x211a/0x3f00 fs/btrfs/volumes.c:4534 btrfs_ioctl_balance fs/btrfs/ioctl.c:3675 [inline] btrfs_ioctl+0x12ed/0x8290 fs/btrfs/ioctl.c:4742 __do_compat_sys_ioctl+0x2c3/0x330 fs/ioctl.c:1007 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #2 (btrfs_trans_num_extwriters){++++}-{0:0}: join_transaction+0x164/0xf40 fs/btrfs/transaction.c:315 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #1 (btrfs_trans_num_writers){++++}-{0:0}: join_transaction+0x148/0xf40 fs/btrfs/transaction.c:314 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #0 (sb_internal#3){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Chain exists of: sb_internal#3 --> btrfs_trans_num_extwriters --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(btrfs_trans_num_extwriters); lock(fs_reclaim); rlock(sb_internal#3); *** DEADLOCK *** 2 locks held by kswapd0/111: #0: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_trylock_shared fs/super.c:562 [inline] #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_cache_scan+0x96/0x550 fs/super.c:196 stack backtrace: CPU: 0 PID: 111 Comm: kswapd0 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> So fix this by using btrfs_add_delayed_iput() so that the final iput is delegated to the cleaner kthread. Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: [email protected] Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-07-11libceph: fix crush_choose_firstn() kernel-doc warningsJeff Johnson1-0/+4
Currently, when built with "make W=1", the following warnings are generated: net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'work' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_firstn' net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_firstn' Update the crush_choose_firstn() kernel-doc to document these parameters. Signed-off-by: Jeff Johnson <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2024-07-11libceph: suppress crush_choose_indep() kernel-doc warningsJeff Johnson1-2/+1
Currently, when built with "make W=1", the following warnings are generated: net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'map' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'work' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'bucket' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'x' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'left' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'numrep' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'type' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'outpos' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'tries' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_tries' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_to_leaf' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out2' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'parent_r' not described in 'crush_choose_indep' net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_indep' These warnings are generated because the prologue comment for crush_choose_indep() uses the kernel-doc prefix, but the actual comment is a very brief description that is not in kernel-doc format. Since this is a static function there is no need to fully document the function, so replace the kernel-doc comment prefix with a standard comment prefix to remove these warnings. Signed-off-by: Jeff Johnson <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2024-07-11Merge tag 'nf-24-07-11' of ↵Paolo Abeni2-146/+14
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue. Patch #2 fixes a crash due to stack overflow in chain loop detection by using the existing chain validation routines Both patches from Florian Westphal. netfilter pull request 24-07-11 * tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: prefer nft_chain_validate netfilter: nfnetlink_queue: drop bogus WARN_ON ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-07-11Merge tag 'for-netdev' of ↵Paolo Abeni4-19/+262
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-11 The following pull-request contains BPF updates for your *net* tree. We've added 4 non-merge commits during the last 2 day(s) which contain a total of 4 files changed, 262 insertions(+), 19 deletions(-). The main changes are: 1) Fixes for a BPF timer lockup and a use-after-free scenario when timers are used concurrently, from Kumar Kartikeya Dwivedi. 2) Fix the argument order in the call to bpf_map_kvcalloc() which could otherwise lead to a compilation error, from Mohammad Shehar Yaar Tausif. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add timer lockup selftest bpf: Defer work in bpf_timer_cancel_and_free bpf: Fail bpf_timer_cancel when callback is being cancelled bpf: fix order of args in call to bpf_map_kvcalloc ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>