aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-25Merge tag 'nf-24-04-25' of ↵Jakub Kicinski2-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains two Netfilter/IPVS fixes for net: Patch #1 fixes SCTP checksumming for IPVS with gso packets, from Ismael Luceno. Patch #2 honor dormant flag from netdev event path to fix a possible double hook unregistration. * tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: honor table dormant flag from netdev release event path ipvs: Fix checksumming on GSO of SCTP packets ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().Kuniyuki Iwashima2-1/+4
syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25rust: remove `params` from `module` macro exampleAswin Unnikrishnan1-12/+0
Remove argument `params` from the `module` macro example, because the macro does not currently support module parameters since it was not sent with the initial merge. Signed-off-by: Aswin Unnikrishnan <[email protected]> Reviewed-by: Alice Ryhl <[email protected]> Cc: [email protected] Fixes: 1fbde52bde73 ("rust: add `macros` crate") Link: https://lore.kernel.org/r/[email protected] [ Reworded slightly. ] Signed-off-by: Miguel Ojeda <[email protected]>
2024-04-25kbuild: rust: force `alloc` extern to allow "empty" Rust filesMiguel Ojeda1-1/+1
If one attempts to build an essentially empty file somewhere in the kernel tree, it leads to a build error because the compiler does not recognize the `new_uninit` unstable feature: error[E0635]: unknown feature `new_uninit` --> <crate attribute>:1:9 | 1 | feature(new_uninit) | ^^^^^^^^^^ The reason is that we pass `-Zcrate-attr='feature(new_uninit)'` (together with `-Zallow-features=new_uninit`) to let non-`rust/` code use that unstable feature. However, the compiler only recognizes the feature if the `alloc` crate is resolved (the feature is an `alloc` one). `--extern alloc`, which we pass, is not enough to resolve the crate. Introducing a reference like `use alloc;` or `extern crate alloc;` solves the issue, thus this is not seen in normal files. For instance, `use`ing the `kernel` prelude introduces such a reference, since `alloc` is used inside. While normal use of the build system is not impacted by this, it can still be fairly confusing for kernel developers [1], thus use the unstable `force` option of `--extern` [2] (added in Rust 1.71 [3]) to force the compiler to resolve `alloc`. This new unstable feature is only needed meanwhile we use the other unstable feature, since then we will not need `-Zcrate-attr`. Cc: [email protected] # v6.6+ Reported-by: Daniel Almeida <[email protected]> Reported-by: Julian Stecklina <[email protected]> Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/x/near/424096982 [1] Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Link: https://github.com/rust-lang/rust/issues/111302 [2] Link: https://github.com/rust-lang/rust/pull/109421 [3] Reviewed-by: Alice Ryhl <[email protected]> Reviewed-by: Gary Guo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]>
2024-04-25net: b44: set pause params only when interface is upPeter Münster1-6/+8
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers) unconditionally, but b44::rx_buffers is only valid when the device is up (they get allocated in b44_open(), and deallocated again in b44_close()), any other time these are just a NULL pointers. So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do). Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2) Cc: [email protected] Reported-by: Peter Münster <[email protected]> Suggested-by: Jonas Gorski <[email protected]> Signed-off-by: Vaclav Svoboda <[email protected]> Tested-by: Peter Münster <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Peter Münster <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25tls: fix lockless read of strp->msg_ready in ->pollSabrina Dubroca3-5/+6
tls_sk_poll is called without locking the socket, and needs to read strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are only performed when the socket is locked. Fixes: 121dca784fc0 ("tls: suppress wakeups unless we have a full record") Signed-off-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/0b7ee062319037cf86af6b317b3d72f7bfcd2e97.1713797701.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25dpll: fix dpll_pin_on_pin_register() for multiple parent pinsArkadiusz Kubalewski1-25/+33
In scenario where pin is registered with multiple parent pins via dpll_pin_on_pin_register(..), all belonging to the same dpll device. A second call to dpll_pin_on_pin_unregister(..) would cause a call trace, as it tries to use already released registration resources (due to fix introduced in b446631f355e). In this scenario pin was registered twice, so resources are not yet expected to be release until each registered pin/pin pair is unregistered. Currently, the following crash/call trace is produced when ice driver is removed on the system with installed E810T NIC which includes dpll device: WARNING: CPU: 51 PID: 9155 at drivers/dpll/dpll_core.c:809 dpll_pin_ops+0x20/0x30 RIP: 0010:dpll_pin_ops+0x20/0x30 Call Trace: ? __warn+0x7f/0x130 ? dpll_pin_ops+0x20/0x30 dpll_msg_add_pin_freq+0x37/0x1d0 dpll_cmd_pin_get_one+0x1c0/0x400 ? __nlmsg_put+0x63/0x80 dpll_pin_event_send+0x93/0x140 dpll_pin_on_pin_unregister+0x3f/0x100 ice_dpll_deinit_pins+0xa1/0x230 [ice] ice_remove+0xf1/0x210 [ice] Fix by adding a parent pointer as a cookie when creating a registration, also when searching for it. For the regular pins pass NULL, this allows to create separated registration for each parent the pin is registered with. Fixes: b446631f355e ("dpll: fix dpll_xa_ref_*_del() for multiple registrations") Signed-off-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25net: ravb: Fix registered interrupt namesGeert Uytterhoeven1-6/+5
As interrupts are now requested from ravb_probe(), before calling register_netdev(), ndev->name still contains the template "eth%d", leading to funny names in /proc/interrupts. E.g. on R-Car E3: 89: 0 0 GICv2 93 Level eth%d:ch22:multi 90: 0 3 GICv2 95 Level eth%d:ch24:emac 91: 0 23484 GICv2 71 Level eth%d:ch0:rx_be 92: 0 0 GICv2 72 Level eth%d:ch1:rx_nc 93: 0 13735 GICv2 89 Level eth%d:ch18:tx_be 94: 0 0 GICv2 90 Level eth%d:ch19:tx_nc Worse, on platforms with multiple RAVB instances (e.g. R-Car V4H), all interrupts have similar names. Fix this by using the device name instead, like is done in several other drivers: 89: 0 0 GICv2 93 Level e6800000.ethernet:ch22:multi 90: 0 1 GICv2 95 Level e6800000.ethernet:ch24:emac 91: 0 28578 GICv2 71 Level e6800000.ethernet:ch0:rx_be 92: 0 0 GICv2 72 Level e6800000.ethernet:ch1:rx_nc 93: 0 14044 GICv2 89 Level e6800000.ethernet:ch18:tx_be 94: 0 0 GICv2 90 Level e6800000.ethernet:ch19:tx_nc Rename the local variable dev_name, as it shadows the dev_name() function, and pre-initialize it, to simplify the code. Fixes: 32f012b8c01ca9fd ("net: ravb: Move getting/requesting IRQs in the probe() method") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Niklas Söderlund <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Reviewed-by: Claudiu Beznea <[email protected]> Tested-by: Claudiu Beznea <[email protected]> # on RZ/G3S Link: https://lore.kernel.org/r/cde67b68adf115b3cf0b44c32334ae00b2fbb321.1713944647.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25octeontx2-af: fix the double free in rvu_npc_freemem()Su Hui1-1/+0
Clang static checker(scan-build) warning: drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c:line 2184, column 2 Attempt to free released memory. npc_mcam_rsrcs_deinit() has released 'mcam->counters.bmap'. Deleted this redundant kfree() to fix this double free problem. Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Signed-off-by: Su Hui <[email protected]> Reviewed-by: Geetha sowjanya <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Hariprasad Kelam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packetsJason Reeder1-0/+5
The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.) field from the second nibble of the PTP header which is defined in the PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification the first two bytes of the PTP header are defined as the versionType which is always 0x0001. This means that any PTPv1 packets that are tagged for TX timestamping by the CPTS will have their messageType set to 0x0 which corresponds to a Sync message type. This causes issues when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1) timestamp that never appears. Fix this by checking if the ptp_class of the timestamped TX packet is PTP_CLASS_V1 and then matching the PTP sequence ID to the stored sequence ID in the skb->cb data structure. If the sequence IDs match and the packet is of type PTPv1 then there is a chance that the messageType has been incorrectly stored by the CPTS so overwrite the messageType stored by the CPTS with the messageType from the skb->cb data structure. This allows the PTPv1 stack to receive TX timestamps for Delay_Req packets which are necessary to lock onto a PTP Leader. Signed-off-by: Jason Reeder <[email protected]> Signed-off-by: Ravi Gunasekaran <[email protected]> Tested-by: Ed Trexel <[email protected]> Fixes: f6bd59526ca5 ("net: ethernet: ti: introduce am654 common platform time sync driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25Merge branch 'intel-wired-lan-driver-updates-2024-04-23-i40e-iavf-ice'Jakub Kicinski3-12/+40
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-04-23 (i40e, iavf, ice) This series contains updates to i40e, iavf, and ice drivers. Sindhu removes WQ_MEM_RECLAIM flag from workqueue for i40e. Erwan Velu adjusts message to avoid confusion on base being reported on i40e. Sudheer corrects insufficient check for TC equality on iavf. Jake corrects ordering of locks to avoid possible deadlock on ice. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25Merge branch ↵Jakub Kicinski5-21/+65
'fix-isolation-of-broadcast-traffic-and-unmatched-unicast-traffic-with-macsec-offload' Rahul Rameshbabu says: ==================== Fix isolation of broadcast traffic and unmatched unicast traffic with MACsec offload Some device drivers support devices that enable them to annotate whether a Rx skb refers to a packet that was processed by the MACsec offloading functionality of the device. Logic in the Rx handling for MACsec offload does not utilize this information to preemptively avoid forwarding to the macsec netdev currently. Because of this, things like multicast messages or unicast messages with an unmatched destination address such as ARP requests are forwarded to the macsec netdev whether the message received was MACsec encrypted or not. The goal of this patch series is to improve the Rx handling for MACsec offload for devices capable of annotating skbs received that were decrypted by the NIC offload for MACsec. Here is a summary of the issue that occurs with the existing logic today. * The current design of the MACsec offload handling path tries to use "best guess" mechanisms for determining whether a packet associated with the currently handled skb in the datapath was processed via HW offload * The best guess mechanism uses the following heuristic logic (in order of precedence) - Check if header destination MAC address matches MACsec netdev MAC address -> forward to MACsec port - Check if packet is multicast traffic -> forward to MACsec port - MACsec security channel was able to be looked up from skb offload context (mlx5 only) -> forward to MACsec port * Problem: plaintext traffic can potentially solicit a MACsec encrypted response from the offload device - Core aspect of MACsec is that it identifies unauthorized LAN connections and excludes them from communication + This behavior can be seen when not enabling offload for MACsec - The offload behavior violates this principle in MACsec I believe this behavior is a security bug since applications utilizing MACsec could be exploited using this behavior, and the correct way to resolve this is by having the hardware correctly indicate whether MACsec offload occurred for the packet or not. In the patches in this series, I leave a warning for when the problematic path occurs because I cannot figure out a secure way to fix the security issue that applies to the core MACsec offload handling in the Rx path without breaking MACsec offload for other vendors. Shown at the bottom is an example use case where plaintext traffic sent to a physical port of a NIC configured for MACsec offload is unable to be handled correctly by the software stack when the NIC provides awareness to the kernel about whether the received packet is MACsec traffic or not. In this specific example, plaintext ARP requests are being responded with MACsec encrypted ARP replies (which leads to routing information being unable to be built for the requester). Side 1 ip link del macsec0 ip address flush mlx5_1 ip address add 1.1.1.1/24 dev mlx5_1 ip link set dev mlx5_1 up ip link add link mlx5_1 macsec0 type macsec sci 1 encrypt on ip link set dev macsec0 address 00:11:22:33:44:66 ip macsec offload macsec0 mac ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 ip macsec add macsec0 rx sci 2 on ip macsec add macsec0 rx sci 2 sa 0 pn 1 on key 00 ead3664f508eb06c40ac7104cdae4ce5 ip address flush macsec0 ip address add 2.2.2.1/24 dev macsec0 ip link set dev macsec0 up # macsec0 enters promiscuous mode. # This enables all traffic received on macsec_vlan to be processed by # the macsec offload rx datapath. This however means that traffic # meant to be received by mlx5_1 will be incorrectly steered to # macsec0 as well. ip link add link macsec0 name macsec_vlan type vlan id 1 ip link set dev macsec_vlan address 00:11:22:33:44:88 ip address flush macsec_vlan ip address add 3.3.3.1/24 dev macsec_vlan ip link set dev macsec_vlan up Side 2 ip link del macsec0 ip address flush mlx5_1 ip address add 1.1.1.2/24 dev mlx5_1 ip link set dev mlx5_1 up ip link add link mlx5_1 macsec0 type macsec sci 2 encrypt on ip link set dev macsec0 address 00:11:22:33:44:77 ip macsec offload macsec0 mac ip macsec add macsec0 tx sa 0 pn 1 on key 00 ead3664f508eb06c40ac7104cdae4ce5 ip macsec add macsec0 rx sci 1 on ip macsec add macsec0 rx sci 1 sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 ip address flush macsec0 ip address add 2.2.2.2/24 dev macsec0 ip link set dev macsec0 up # macsec0 enters promiscuous mode. # This enables all traffic received on macsec_vlan to be processed by # the macsec offload rx datapath. This however means that traffic # meant to be received by mlx5_1 will be incorrectly steered to # macsec0 as well. ip link add link macsec0 name macsec_vlan type vlan id 1 ip link set dev macsec_vlan address 00:11:22:33:44:99 ip address flush macsec_vlan ip address add 3.3.3.2/24 dev macsec_vlan ip link set dev macsec_vlan up Side 1 ping -I mlx5_1 1.1.1.2 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 mlx5_1: 56(84) bytes of data. From 1.1.1.1 icmp_seq=1 Destination Host Unreachable ping: sendmsg: No route to host From 1.1.1.1 icmp_seq=2 Destination Host Unreachable From 1.1.1.1 icmp_seq=3 Destination Host Unreachable Changes: v2->v3: * Made dev paramater const for eth_skb_pkt_type helper as suggested by Sabrina Dubroca <[email protected]> v1->v2: * Fixed series subject to detail the issue being fixed * Removed strange characters from cover letter * Added comment in example that illustrates the impact involving promiscuous mode * Added patch for generalizing packet type detection * Added Fixes: tags and targeting net * Removed pointless warning in the heuristic Rx path for macsec offload * Applied small refactor in Rx path offload to minimize scope of rx_sc local variable Link: https://github.com/Binary-Eater/macsec-rx-offload/blob/trunk/MACsec_violation_in_core_stack_offload_rx_handling.pdf Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25ice: fix LAG and VF lock dependency in ice_reset_vf()Jacob Keller1-8/+8
9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf(). The commit placed this lock acquisition just prior to the acquisition of the VF configuration lock. If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK flag, this could deadlock with ice_vc_cfg_qs_msg() because it always acquires the locks in the order of the VF configuration lock and then the LAG mutex. Lockdep reports this violation almost immediately on creating and then removing 2 VF: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-rc6 #54 Tainted: G W O ------------------------------------------------------ kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice] but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pf->lag_mutex){+.+.}-{3:3}: __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_vc_cfg_qs_msg+0x45/0x690 [ice] ice_vc_process_vf_msg+0x4f5/0x870 [ice] __ice_clean_ctrlq+0x2b5/0x600 [ice] ice_service_task+0x2c9/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 -> #0 (&vf->cfg_lock){+.+.}-{3:3}: check_prev_add+0xe2/0xc50 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_reset_vf+0x22f/0x4d0 [ice] ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pf->lag_mutex); lock(&vf->cfg_lock); lock(&pf->lag_mutex); lock(&vf->cfg_lock); *** DEADLOCK *** 4 locks held by kworker/60:3/6771: #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice] #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] stack backtrace: CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54 Hardware name: Workqueue: ice ice_service_task [ice] Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x12d/0x150 check_prev_add+0xe2/0xc50 ? save_trace+0x59/0x230 ? add_chain_cache+0x109/0x450 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 ? lockdep_hardirqs_on+0x7d/0x100 lock_acquire+0xd4/0x2d0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? lock_is_held_type+0xc7/0x120 __mutex_lock+0x9b/0xbf0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? ice_reset_vf+0x22f/0x4d0 [ice] ? rcu_is_watching+0x11/0x50 ? ice_reset_vf+0x22f/0x4d0 [ice] ice_reset_vf+0x22f/0x4d0 [ice] ? process_one_work+0x176/0x4d0 ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x104/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> To avoid deadlock, we must acquire the LAG mutex only after acquiring the VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only after we either acquire or check that the VF configuration lock is held. Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Dave Ertman <[email protected]> Reviewed-by: Mateusz Polchlopek <[email protected]> Tested-by: Przemek Kitszel <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25iavf: Fix TC config comparison with existing adapter TC configSudheer Mogilappagari1-1/+29
Same number of TCs doesn't imply that underlying TC configs are same. The config could be different due to difference in number of queues in each TC. Add utility function to determine if TC configs are same. Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Sudheer Mogilappagari <[email protected]> Tested-by: Mineri Bhange <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25i40e: Report MFS in decimal base instead of hexErwan Velu1-2/+2
If the MFS is set below the default (0x2600), a warning message is reported like the following : MFS for port 1 has been set below the default: 600 This message is a bit confusing as the number shown here (600) is in fact an hexa number: 0x600 = 1536 Without any explicit "0x" prefix, this message is read like the MFS is set to 600 bytes. MFS, as per MTUs, are usually expressed in decimal base. This commit reports both current and default MFS values in decimal so it's less confusing for end-users. A typical warning message looks like the following : MFS for port 1 (1536) has been set below the default (9728) Signed-off-by: Erwan Velu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25i40e: Do not use WQ_MEM_RECLAIM flag for workqueueSindhu Devale1-1/+1
Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Mateusz Polchlopek <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Tested-by: Robert Ganzynkowicz <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()Dan Carpenter1-3/+5
The rx_chn->irq[] array is unsigned int but it should be signed for the error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero then we should return -ENXIO instead of success. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Roger Quadros <[email protected]> Reviewed-by: MD Danish Anwar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsecRahul Rameshbabu1-0/+1
mlx5 Rx flow steering and CQE handling enable the driver to be able to update an skb's md_dst attribute as MACsec when MACsec traffic arrives when a device is configured for offloading. Advertise this to the core stack to take advantage of this capability. Cc: [email protected] Fixes: b7c9400cbc48 ("net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst") Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Benjamin Poirier <[email protected]> Reviewed-by: Cosmin Ratiu <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25macsec: Detect if Rx skb is macsec-related for offloading devices that ↵Rahul Rameshbabu1-10/+36
update md_dst Can now correctly identify where the packets should be delivered by using md_dst or its absence on devices that provide it. This detection is not possible without device drivers that update md_dst. A fallback pattern should be used for supporting such device drivers. This fallback mode causes multicast messages to be cloned to both the non-macsec and macsec ports, independent of whether the multicast message received was encrypted over MACsec or not. Other non-macsec traffic may also fail to be handled correctly for devices in promiscuous mode. Link: https://lore.kernel.org/netdev/ZULRxX9eIbFiVi7v@hog/ Cc: Sabrina Dubroca <[email protected]> Cc: [email protected] Fixes: 860ead89b851 ("net/macsec: Add MACsec skb_metadata_dst Rx Data path support") Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Benjamin Poirier <[email protected]> Reviewed-by: Cosmin Ratiu <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25ethernet: Add helper for assigning packet type when dest address does not ↵Rahul Rameshbabu2-11/+26
match device address Enable reuse of logic in eth_type_trans for determining packet type. Suggested-by: Sabrina Dubroca <[email protected]> Cc: [email protected] Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25macsec: Enable devices to advertise whether they update sk_buff md_dst ↵Rahul Rameshbabu1-0/+2
during offloads Cannot know whether a Rx skb missing md_dst is intended for MACsec or not without knowing whether the device is able to update this field during an offload. Assume that an offload to a MACsec device cannot support updating md_dst by default. Capable devices can advertise that they do indicate that an skb is related to a MACsec offloaded packet using the md_dst. Cc: Sabrina Dubroca <[email protected]> Cc: [email protected] Fixes: 860ead89b851 ("net/macsec: Add MACsec skb_metadata_dst Rx Data path support") Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Benjamin Poirier <[email protected]> Reviewed-by: Cosmin Ratiu <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-25Revert "drm/etnaviv: Expose a few more chipspecs to userspace"Christian Gmeiner4-71/+0
This reverts commit 1dccdba084897443d116508a8ed71e0ac8a031a4. In userspace a different approach was choosen - hwdb. As a result, there is no need for these values. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
2024-04-25drm/etnaviv: fix tx clock gating on some GC7000 variantsDerek Foreman1-2/+2
commit 4bce244272513 ("drm/etnaviv: disable tx clock gating for GC7000 rev6203") accidentally applied the fix for i.MX8MN errata ERR050226 to GC2000 instead of GC7000, failing to disable tx clock gating for GC7000 rev 0x6023 as intended. Additional clean-up further propagated this issue, partially breaking the clock gating fixes added for GC7000 rev 6202 in commit 432f51e7deeda ("drm/etnaviv: add clock gating workaround for GC7000 r6202"). Signed-off-by: Derek Foreman <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
2024-04-25LoongArch: Lately init pmu after smp is onlineBibo Mao1-1/+1
There is an smp function call named reset_counters() to init PMU registers of every CPU in PMU initialization state. It requires that all CPUs are online. However there is an early_initcall() wrapper for the PMU init funciton init_hw_perf_events(), so that pmu init funciton is called in do_pre_smp_initcalls() which before function smp_init(). Function reset_counters() cannot work on other CPUs since they haven't boot up still. Here replace the wrapper early_initcall() with pure_initcall(), so that the PMU init function is called after every cpu is online. Signed-off-by: Bibo Mao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-04-25cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=nSean Christopherson4-6/+30
Explicitly disallow enabling mitigations at runtime for kernels that were built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code entirely if mitigations are disabled at compile time. E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS, and trying to provide sane behavior for retroactively enabling mitigations is extremely difficult, bordering on impossible. E.g. page table isolation and call depth tracking require build-time support, BHI mitigations will still be off without additional kernel parameters, etc. [ bp: Touchups. ] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-25cpu: Re-enable CPU mitigations by default for !X86 architecturesSean Christopherson3-7/+16
Rename x86's to CPU_MITIGATIONS, define it in generic code, and force it on for all architectures exception x86. A recent commit to turn mitigations off by default if SPECULATION_MITIGATIONS=n kinda sorta missed that "cpu_mitigations" is completely generic, whereas SPECULATION_MITIGATIONS is x86-specific. Rename x86's SPECULATIVE_MITIGATIONS instead of keeping both and have it select CPU_MITIGATIONS, as having two configs for the same thing is unnecessary and confusing. This will also allow x86 to use the knob to manage mitigations that aren't strictly related to speculative execution. Use another Kconfig to communicate to common code that CPU_MITIGATIONS is already defined instead of having x86's menu depend on the common CPU_MITIGATIONS. This allows keeping a single point of contact for all of x86's mitigations, and it's not clear that other architectures *want* to allow disabling mitigations at compile-time. Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n") Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au Reported-by: Stephen Rothwell <[email protected]> Reported-by: Michael Ellerman <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-04-25Merge tag 'intel-gpio-v6.9-2' of ↵Bartosz Golaszewski1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current intel-gpio for v6.9-2 * Make data pointer dereference robust in Intel Tangier driver The following is an automated git shortlog grouped by driver: tangier: - Use correct type for the IRQ chip data
2024-04-25irqchip/gic-v3-its: Prevent double free on errorGuanrui Huang1-7/+2
The error handling path in its_vpe_irq_domain_alloc() causes a double free when its_vpe_init() fails after successfully allocating at least one interrupt. This happens because its_vpe_irq_domain_free() frees the interrupts along with the area bitmap and the vprop_page and its_vpe_irq_domain_alloc() subsequently frees the area bitmap and the vprop_page again. Fix this by unconditionally invoking its_vpe_irq_domain_free() which handles all cases correctly and by removing the bitmap/vprop_page freeing from its_vpe_irq_domain_alloc(). [ tglx: Massaged change log ] Fixes: 7d75bbb4bc1a ("irqchip/gic-v3-its: Add VPE irq domain allocation/teardown") Signed-off-by: Guanrui Huang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-04-25Merge tag 'wireless-2024-04-23' of ↵David S. Miller19-44/+152
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes berg says: ==================== Fixes for the current cycle: * ath11k: convert to correct RCU iteration of IPv6 addresses * iwlwifi: link ID, FW API version, scanning and PASN fixes * cfg80211: NULL-deref and tracing fixes * mac80211: connection mode, mesh fast-TX, multi-link and various other small fixes ==================== Signed-off-by: David S. Miller <[email protected]>
2024-04-25net: phy: dp83869: Fix MII mode failureMD Danish Anwar1-1/+2
The DP83869 driver sets the MII bit (needed for PHY to work in MII mode) only if the op-mode is either DP83869_100M_MEDIA_CONVERT or DP83869_RGMII_100_BASE. Some drivers i.e. ICSSG support MII mode with op-mode as DP83869_RGMII_COPPER_ETHERNET for which the MII bit is not set in dp83869 driver. As a result MII mode on ICSSG doesn't work and below log is seen. TI DP83869 300b2400.mdio:0f: selected op-mode is not valid with MII mode icssg-prueth icssg1-eth: couldn't connect to phy ethernet-phy@0 icssg-prueth icssg1-eth: can't phy connect port MII0 Fix this by setting MII bit for DP83869_RGMII_COPPER_ETHERNET op-mode as well. Fixes: 94e86ef1b801 ("net: phy: dp83869: support mii mode when rgmii strap cfg is used") Signed-off-by: MD Danish Anwar <[email protected]> Reviewed-by: Ravi Gunasekaran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-25netfilter: nf_tables: honor table dormant flag from netdev release event pathPablo Neira Ayuso1-1/+3
Check for table dormant flag otherwise netdev release event path tries to unregister an already unregistered hook. [524854.857999] ------------[ cut here ]------------ [524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260 [...] [524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365 [524854.858869] Workqueue: netns cleanup_net [524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260 [524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41 [524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246 [524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a [524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438 [524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34 [524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005 [524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00 [524854.858971] FS: 0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000 [524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0 [524854.859000] Call Trace: [524854.859006] <TASK> [524854.859013] ? __warn+0x9f/0x1a0 [524854.859027] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859044] ? report_bug+0x1b1/0x1e0 [524854.859060] ? handle_bug+0x3c/0x70 [524854.859071] ? exc_invalid_op+0x17/0x40 [524854.859083] ? asm_exc_invalid_op+0x1a/0x20 [524854.859100] ? __nf_unregister_net_hook+0x6a/0x260 [524854.859116] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables] [524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859461] ? packet_notifier+0xb3/0x360 [524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40 [524854.859489] ? dcbnl_netdevice_event+0x35/0x140 [524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859661] notifier_call_chain+0x7d/0x140 [524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0 Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-25ARM: dts: imx6ull-tarragon: fix USB over-current polarityMichael Heimpold1-0/+1
Our Tarragon platform uses a active-low signal to inform the i.MX6ULL about the over-current detection. Fixes: 5e4f393ccbf0 ("ARM: dts: imx6ull: Add chargebyte Tarragon support") Signed-off-by: Michael Heimpold <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2024-04-24Merge tag 'for-net-2024-04-24' of ↵Jakub Kicinski12-49/+124
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional() - hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor - qca: fix invalid device address check - hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync - Fix type of len in {l2cap,sco}_sock_getsockopt_old() - btusb: mediatek: Fix double free of skb in coredump - btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853 - btusb: Fix triggering coredump implementation for QCA * tag 'for-net-2024-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional() Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor Bluetooth: qca: fix NULL-deref on non-serdev setup Bluetooth: qca: fix NULL-deref on non-serdev suspend Bluetooth: btusb: mediatek: Fix double free of skb in coredump Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID Bluetooth: qca: fix invalid device address check Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE Bluetooth: btusb: Fix triggering coredump implementation for QCA Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853 Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski1-26/+18
I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24igc: Fix LED-related deadlock on driver unbindLukas Wunner3-8/+35
Roman reports a deadlock on unplug of a Thunderbolt docking station containing an Intel I225 Ethernet adapter. The root cause is that led_classdev's for LEDs on the adapter are registered such that they're device-managed by the netdev. That results in recursive acquisition of the rtnl_lock() mutex on unplug: When the driver calls unregister_netdev(), it acquires rtnl_lock(), then frees the device-managed resources. Upon unregistering the LEDs, netdev_trig_deactivate() invokes unregister_netdevice_notifier(), which tries to acquire rtnl_lock() again. Avoid by using non-device-managed LED registration. Stack trace for posterity: schedule+0x6e/0xf0 schedule_preempt_disabled+0x15/0x20 __mutex_lock+0x2a0/0x750 unregister_netdevice_notifier+0x40/0x150 netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] led_trigger_set+0x102/0x330 led_classdev_unregister+0x4b/0x110 release_nodes+0x3d/0xb0 devres_release_all+0x8b/0xc0 device_del+0x34f/0x3c0 unregister_netdevice_many_notify+0x80b/0xaf0 unregister_netdev+0x7c/0xd0 igc_remove+0xd8/0x1e0 [igc] pci_device_remove+0x3f/0xb0 Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") Reported-by: Roman Lozko <[email protected]> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ Reported-by: "Marek Marczykowski-Górecki" <[email protected]> Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/ Signed-off-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Cc: Heiner Kallweit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Kurt Kanzenbach <[email protected]> # Intel i225 Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"Duanqiang Wen1-1/+1
This reverts commit e30cef001da259e8df354b813015d0e5acc08740. commit 99f4570cfba1 ("clkdev: Update clkdev id usage to allow for longer names") can fix clk_name exceed MAX_DEV_ID limits, so this commit is meaningless. Signed-off-by: Duanqiang Wen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24Revert "net: txgbe: fix i2c dev name cannot match clkdev"Duanqiang Wen1-5/+3
This reverts commit c644920ce9220d83e070f575a4df711741c07f07. when register i2c dev, txgbe shorten "i2c_designware" to "i2c_dw", will cause this i2c dev can't match platfom driver i2c_designware_platform. Signed-off-by: Duanqiang Wen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24stackdepot: respect __GFP_NOLOCKDEP allocation flagAndrey Ryabinin1-2/+2
If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Tested-by: Xiubo Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24hugetlb: check for anon_vma prior to folio allocationVishal Moola (Oracle)1-4/+7
Commit 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") may bailout after allocating a folio if we do not hold the mmap lock. When this occurs, vmf_anon_prepare() will release the vma lock. Hugetlb then attempts to call restore_reserve_on_error(), which depends on the vma lock being held. We can move vmf_anon_prepare() prior to the folio allocation in order to avoid calling restore_reserve_on_error() without the vma lock. Link: https://lkml.kernel.org/r/ZiFqSrSRLhIV91og@fedora Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") Reported-by: [email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24mm: zswap: fix shrinker NULL crash with cgroup_disable=memoryJohannes Weiner1-9/+16
Christian reports a NULL deref in zswap that he bisected down to the zswap shrinker. The issue also cropped up in the bug trackers of libguestfs [1] and the Red Hat bugzilla [2]. The problem is that when memcg is disabled with the boot time flag, the zswap shrinker might get called with sc->memcg == NULL. This is okay in many places, like the lruvec operations. But it crashes in memcg_page_state() - which is only used due to the non-node accounting of cgroup's the zswap memory to begin with. Nhat spotted that the memcg can be NULL in the memcg-disabled case, and I was then able to reproduce the crash locally as well. [1] https://github.com/libguestfs/libguestfs/issues/139 [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Christian Heusel <[email protected]> Debugged-by: Nhat Pham <[email protected]> Suggested-by: Nhat Pham <[email protected]> Tested-by: Christian Heusel <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Cc: Chengming Zhou <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Richard W.M. Jones <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: <[email protected]> [v6.8] Signed-off-by: Andrew Morton <[email protected]>
2024-04-24mm: turn folio_test_hugetlb into a PageTypeMatthew Wilcox (Oracle)4-59/+39
The current folio_test_hugetlb() can be fooled by a concurrent folio split into returning true for a folio which has never belonged to hugetlbfs. This can't happen if the caller holds a refcount on it, but we have a few places (memory-failure, compaction, procfs) which do not and should not take a speculative reference. Since hugetlb pages do not use individual page mapcounts (they are always fully mapped and use the entire_mapcount field to record the number of mappings), the PageType field is available now that page_mapcount() ignores the value in this field. In compaction and with CONFIG_DEBUG_VM enabled, the current implementation can result in an oops, as reported by Luis. This happens since 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR") effectively added some VM_BUG_ON() checks in the PageHuge() testing path. [[email protected]: update vmcoreinfo] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reported-by: Luis Chamberlain <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227 Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24mm: support page_mapcount() on page_has_type() pagesMatthew Wilcox (Oracle)3-10/+9
Return 0 for pages which can't be mapped. This matches how page_mapped() works. It is more convenient for users to not have to filter out these pages. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macrosMatthew Wilcox (Oracle)1-23/+47
Following the separation of FOLIO_FLAGS from PAGEFLAGS, separate FOLIO_FLAG_FALSE from PAGEFLAG_FALSE and FOLIO_TYPE_OPS from PAGE_TYPE_OPS. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24mm/hugetlb: fix missing hugetlb_lock for resv unchargePeter Xu1-1/+4
There is a recent report on UFFDIO_COPY over hugetlb: https://lore.kernel.org/all/[email protected]/ 350: lockdep_assert_held(&hugetlb_lock); Should be an issue in hugetlb but triggered in an userfault context, where it goes into the unlikely path where two threads modifying the resv map together. Mike has a fix in that path for resv uncharge but it looks like the locking criteria was overlooked: hugetlb_cgroup_uncharge_folio_rsvd() will update the cgroup pointer, so it requires to be called with the lock held. Link: https://lkml.kernel.org/r/[email protected] Fixes: 79aa925bf239 ("hugetlb_cgroup: fix reservation accounting") Signed-off-by: Peter Xu <[email protected]> Reported-by: [email protected] Reviewed-by: Mina Almasry <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24selftests: mm: fix unused and uninitialized variable warningMuhammad Usama Anjum1-1/+1
Fix the warnings by initializing and marking the variable as unused. I've caught the warnings by using clang. split_huge_page_test.c:303:6: warning: variable 'dummy' set but not used [-Wunused-but-set-variable] 303 | int dummy; | ^ split_huge_page_test.c:343:3: warning: variable 'dummy' is uninitialized when used here [-Wuninitialized] 343 | dummy += *(*addr + i); | ^~~~~ split_huge_page_test.c:303:11: note: initialize the variable 'dummy' to silence this warning 303 | int dummy; | ^ | = 0 2 warnings generated. Link: https://lkml.kernel.org/r/[email protected] Fixes: fc4d182316bd ("mm: huge_memory: enable debugfs to split huge pages to any order") Signed-off-by: Muhammad Usama Anjum <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24selftests/harness: remove use of LINE_MAXEdward Liaw2-4/+9
Android was seeing a compliation error because its C library does not define LINE_MAX. This replaces the use of LINE_MAX / snprintf with asprintf, which will change the behavior to not truncate the test name if it is over 2048 chars long. See also: https://github.com/llvm/llvm-project/issues/88119 [[email protected]: remove limits.h include, per Edward] [[email protected]: check asprintf() return] [[email protected]: fix undeclared function error] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 38c957f07038 ("selftests: kselftest_harness: generate test name once") Signed-off-by: Edward Liaw <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Bill Wendling <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Edward Liaw <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Mike Rapoport (IBM)" <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Will Drewry <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-24Merge branch 'mlxsw-various-acl-fixes'Jakub Kicinski2-45/+75
Petr Machata says: ==================== mlxsw: Various ACL fixes Ido Schimmel writes: Fix various problems in the ACL (i.e., flower offload) code. See the commit messages for more details. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash workIdo Schimmel1-1/+5
The rehash delayed work is rescheduled with a delay if the number of credits at end of the work is not negative as supposedly it means that the migration ended. Otherwise, it is rescheduled immediately. After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash" the above is no longer accurate as a non-negative number of credits is no longer indicative of the migration being done. It can also happen if the work encountered an error in which case the migration will resume the next time the work is scheduled. The significance of the above is that it is possible for the work to be pending and associated with hints that were allocated when the migration started. This leads to the hints being leaked [1] when the work is canceled while pending as part of ACL region dismantle. Fix by freeing the hints if hints are associated with a work that was canceled while pending. Blame the original commit since the reliance on not having a pending work associated with hints is fragile. [1] unreferenced object 0xffff88810e7c3000 (size 256): comm "kworker/0:16", pid 176, jiffies 4295460353 hex dump (first 32 bytes): 00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a....... 00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@........... backtrace (crc 2544ddb9): [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0 [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390 [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400 [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160 [<00000000e81fd734>] process_one_work+0x59c/0xf20 [<00000000ceee9e81>] worker_thread+0x799/0x12c0 [<00000000bda6fe39>] kthread+0x246/0x300 [<0000000070056d23>] ret_from_fork+0x34/0x70 [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix incorrect list API usageIdo Schimmel1-0/+7
Both the function that migrates all the chunks within a region and the function that migrates all the entries within a chunk call list_first_entry() on the respective lists without checking that the lists are not empty. This is incorrect usage of the API, which leads to the following warning [1]. Fix by returning if the lists are empty as there is nothing to migrate in this case. [1] WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0> Modules linked in: CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0 [...] Call Trace: <TASK> mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-24mlxsw: spectrum_acl_tcam: Fix warning during rehashIdo Schimmel1-3/+17
As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>