aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-03netfilter: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados7-21/+5
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) * Remove sentinel elements from ctl_table structs * Remove instances where an array element is zeroed out to make it look like a sentinel. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Remove the need for having __NF_SYSCTL_CT_LAST_SYSCTL as the sysctl array size is now in NF_SYSCTL_CT_LAST_SYSCTL * Remove extra element in ctl_table arrays declarations Acked-by: Kees Cook <[email protected]> # loadpin & yama Signed-off-by: Joel Granados <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03net: Remove ctl_table sentinel elements from several networking subsystemsJoel Granados10-26/+9
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) To avoid lots of small commits, this commit brings together network changes from (as they appear in MAINTAINERS) LLC, MPTCP, NETROM NETWORK LAYER, PHONET PROTOCOL, ROSE NETWORK LAYER, RXRPC SOCKETS, SCTP PROTOCOL, SHARED MEMORY COMMUNICATIONS (SMC), TIPC NETWORK LAYER and NETWORKING [IPSEC] * Remove sentinel element from ctl_table structs. * Replace empty array registration with the register_net_sysctl_sz call in llc_sysctl_init * Replace the for loop stop condition that tests for procname == NULL with one that depends on array size in sctp_sysctl_net_register * Remove instances where an array element is zeroed out to make it look like a sentinel in xfrm_sysctl_init. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Use a table_size variable to keep the value of ARRAY_SIZE Signed-off-by: Joel Granados <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03net: sunrpc: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados4-4/+0
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) * Remove sentinel element from ctl_table structs. Signed-off-by: Joel Granados <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03net: rds: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados3-3/+0
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) * Remove sentinel element from ctl_table structs. Signed-off-by: Joel Granados <[email protected]> Acked-by: Allison Henderson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados11-35/+13
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) * Remove sentinel element from ctl_table structs. * Remove the zeroing out of an array element (to make it look like a sentinel) in sysctl_route_net_init And ipv6_route_sysctl_init. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration. * Remove extra sentinel element in the declaration of devinet_vars. * Removed the "-1" in __devinet_sysctl_register, sysctl_route_net_init, ipv6_sysctl_net_init and ipv4_sysctl_init_net that adjusted for having an extra empty element when looping over ctl_table arrays * Replace the for loop stop condition in __addrconf_sysctl_register that tests for procname == NULL with one that depends on array size * Removing the unprivileged user check in ipv6_route_sysctl_init is safe as it is replaced by calling ipv6_route_sysctl_table_size; introduced in commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") * Use a table_size variable to keep the value of ARRAY_SIZE Signed-off-by: Joel Granados <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03net: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados6-26/+14
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/) * Remove sentinel element from ctl_table structs. * Remove the zeroing out of an array element (to make it look like a sentinel) in neigh_sysctl_register and lowpan_frags_ns_sysctl_register This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration. * Replace the for loop stop condition in sysctl_core_net_init that tests for procname == NULL with one that depends on array size * Removed the "-1" in mpls_net_init that adjusted for having an extra empty element when looping over ctl_table arrays * Use a table_size variable to keep the value of ARRAY_SIZE Signed-off-by: Joel Granados <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-03Merge tag 'ath-next-20240502' of ↵Kalle Valo64-670/+2650
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath ath.git patches for v6.10 ath12k * debugfs support * dfs_simulate_radar debugfs file * disable Wireless Extensions * suspend and hibernation support * ACPI support * refactoring in preparation of multi-link support ath11k * support hibernation (required changes in qrtr and MHI subsystems) * ieee80211-freq-limit Device Tree property support ath10k * firmware-name Device Tree property support
2024-05-03Merge tag 'mt76-for-kvalo-2024-05-02' of https://github.com/nbd168/wirelessKalle Valo39-145/+656
mt76 patches for 6.10 - fixes - mt7603 stability improvements - mt7921 LED control - mt7925 EHT radiotap support
2024-05-03wifi: mac80211_hwsim: add support for BSS colorAditya Kumar Singh1-0/+6
Advertise support for BSS color and then once the countdown reaches 0, call color change finish. Signed-off-by: Aditya Kumar Singh <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: mac80211: handle color change per linkAditya Kumar Singh7-55/+111
In order to support color change with MLO, handle the link ID now passed from cfg80211, adjust the code to do everything per link and call the notifications to cfg80211 correctly. Signed-off-by: Aditya Kumar Singh <[email protected]> Link: https://msgid.link/[email protected] Link: https://msgid.link/[email protected] Link: https://msgid.link/[email protected] Link: https://msgid.link/[email protected] [squash, move API call updates to this patch] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: cfg80211: handle color change per linkAditya Kumar Singh4-18/+37
Currently, during color change, no link id information is passed down. In order to support color change during Multi Link Operation, it is required to pass link id as well. Additionally, update notification APIs to allow drivers/mac80211 to pass the link ID. Signed-off-by: Aditya Kumar Singh <[email protected]> Link: https://msgid.link/[email protected] Link: https://msgid.link/[email protected] [squash, actually only pass 0 from mac80211] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: cfg80211: Clear mlo_links info when STA disconnectsXin Deng1-0/+1
wdev->valid_links is not cleared when upper layer disconnect from a wdev->AP MLD. It has been observed that this would prevent offchannel operations like remain-on-channel which would be needed for user space operations with Public Action frame. Clear the wdev->valid_links when STA disconnects. Signed-off-by: Xin Deng <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: iwlwifi: pcie: allocate dummy net_device dynamicallyBreno Leitao3-13/+27
struct net_device shouldn't be embedded into any structure, instead, the owner should use the priv space to embed their state into net_device. Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from struct iwl_trans_pcie by converting it into a pointer. Then use the leverage alloc_netdev() to allocate the net_device object at iwl_trans_pcie_alloc. The private data of net_device becomes a pointer for the struct iwl_trans_pcie, so, it is easy to get back to the iwl_trans_pcie parent given the net_device object. [1] https://lore.kernel.org/all/[email protected]/ Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: nl80211: Avoid address calculations via out of bounds array indexingKees Cook1-7/+7
Before request->channels[] can be used, request->n_channels must be set. Additionally, address calculations for memory after the "channels" array need to be calculated from the allocation base ("request") rather than via the first "out of bounds" index of "channels", otherwise run-time bounds checking will throw a warning. Reported-by: Nathan Chancellor <[email protected]> Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Signed-off-by: Kees Cook <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03wifi: iwlwifi: Use request_module_nowaitBen Greear1-9/+1
This appears to work around a deadlock regression that came in with the LED merge in 6.9. The deadlock happens on my system with 24 iwlwifi radios, so maybe it something like all worker threads are busy and some work that needs to complete cannot complete. Link: https://lore.kernel.org/linux-kernel/[email protected]/ Fixes: f5c31bcf604d ("Merge tag 'leds-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds") Signed-off-by: Ben Greear <[email protected]> Link: https://msgid.link/[email protected] [also remove unnecessary "load_module" var and now-wrong comment] Signed-off-by: Johannes Berg <[email protected]>
2024-05-03slimbus: qcom-ngd-ctrl: Add timeout for wait operationViken Dadhaniya1-1/+5
In current driver qcom_slim_ngd_up_worker() indefinitely waiting for ctrl->qmi_up completion object. This is resulting in workqueue lockup on Kthread. Added wait_for_completion_interruptible_timeout to allow the thread to wait for specific timeout period and bail out instead waiting infinitely. Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support") Cc: [email protected] Reviewed-by: Konrad Dybcio <[email protected]> Signed-off-by: Viken Dadhaniya <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-05-02tcp: Use refcount_inc_not_zero() in tcp_twsk_unique().Kuniyuki Iwashima1-1/+7
Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique() with nice analysis. Since commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's sk_refcnt after putting it into ehash and releasing the bucket lock. Thus, there is a small race window where other threads could try to reuse the port during connect() and call sock_hold() in tcp_twsk_unique() for the TIME-WAIT socket with zero refcnt. If that happens, the refcnt taken by tcp_twsk_unique() is overwritten and sock_put() will cause underflow, triggering a real use-after-free somewhere else. To avoid the use-after-free, we need to use refcount_inc_not_zero() in tcp_twsk_unique() and give up on reusing the port if it returns false. [0]: refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110 CPU: 0 PID: 1039313 Comm: trigger Not tainted 6.8.6-200.fc39.x86_64 #1 Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 RIP: 0010:refcount_warn_saturate+0xe5/0x110 Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX: 0000000000000027 RDX: ffff88807be218c8 RSI: 0000000000000001 RDI: ffff88807be218c0 RBP: 0000000000069d70 R08: 0000000000000000 R09: ffffc90006b439f0 R10: ffffc90006b439e8 R11: 0000000000000003 R12: ffff8880029ede84 R13: 0000000000004e20 R14: ffffffff84356dc0 R15: ffff888009bb3ef0 FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020ccb000 CR3: 000000004628c005 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> ? refcount_warn_saturate+0xe5/0x110 ? __warn+0x81/0x130 ? refcount_warn_saturate+0xe5/0x110 ? report_bug+0x171/0x1a0 ? refcount_warn_saturate+0xe5/0x110 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? refcount_warn_saturate+0xe5/0x110 tcp_twsk_unique+0x186/0x190 __inet_check_established+0x176/0x2d0 __inet_hash_connect+0x74/0x7d0 ? __pfx___inet_check_established+0x10/0x10 tcp_v4_connect+0x278/0x530 __inet_stream_connect+0x10f/0x3d0 inet_stream_connect+0x3a/0x60 __sys_connect+0xa8/0xd0 __x64_sys_connect+0x18/0x20 do_syscall_64+0x83/0x170 entry_SYSCALL_64_after_hwframe+0x78/0x80 RIP: 0033:0x7f62c11a885d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX: 00007f62c11a885d RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000003 RBP: 00007f62c1091e90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 00007f62c10926c0 R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffe237885b0 </TASK> Fixes: ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") Reported-by: Anderson Nascimento <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV socketsEric Dumazet3-3/+7
TCP_SYN_RECV state is really special, it is only used by cross-syn connections, mostly used by fuzzers. In the following crash [1], syzbot managed to trigger a divide by zero in tcp_rcv_space_adjust() A socket makes the following state transitions, without ever calling tcp_init_transfer(), meaning tcp_init_buffer_space() is also not called. TCP_CLOSE connect() TCP_SYN_SENT TCP_SYN_RECV shutdown() -> tcp_shutdown(sk, SEND_SHUTDOWN) TCP_FIN_WAIT1 To fix this issue, change tcp_shutdown() to not perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition, which makes no sense anyway. When tcp_rcv_state_process() later changes socket state from TCP_SYN_RECV to TCP_ESTABLISH, then look at sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state, and send a FIN packet from a sane socket state. This means tcp_send_fin() can now be called from BH context, and must use GFP_ATOMIC allocations. [1] divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 5084 Comm: syz-executor358 Not tainted 6.9.0-rc6-syzkaller-00022-g98369dccd2f8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:tcp_rcv_space_adjust+0x2df/0x890 net/ipv4/tcp_input.c:767 Code: e3 04 4c 01 eb 48 8b 44 24 38 0f b6 04 10 84 c0 49 89 d5 0f 85 a5 03 00 00 41 8b 8e c8 09 00 00 89 e8 29 c8 48 0f af c3 31 d2 <48> f7 f1 48 8d 1c 43 49 8d 96 76 08 00 00 48 89 d0 48 c1 e8 03 48 RSP: 0018:ffffc900031ef3f0 EFLAGS: 00010246 RAX: 0c677a10441f8f42 RBX: 000000004fb95e7e RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000027d4b11f R08: ffffffff89e535a4 R09: 1ffffffff25e6ab7 R10: dffffc0000000000 R11: ffffffff8135e920 R12: ffff88802a9f8d30 R13: dffffc0000000000 R14: ffff88802a9f8d00 R15: 1ffff1100553f2da FS: 00005555775c0380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1155bf2304 CR3: 000000002b9f2000 CR4: 0000000000350ef0 Call Trace: <TASK> tcp_recvmsg_locked+0x106d/0x25a0 net/ipv4/tcp.c:2513 tcp_recvmsg+0x25d/0x920 net/ipv4/tcp.c:2578 inet6_recvmsg+0x16a/0x730 net/ipv6/af_inet6.c:680 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x109/0x280 net/socket.c:1068 ____sys_recvmsg+0x1db/0x470 net/socket.c:2803 ___sys_recvmsg net/socket.c:2845 [inline] do_recvmmsg+0x474/0xae0 net/socket.c:2939 __sys_recvmmsg net/socket.c:3018 [inline] __do_sys_recvmmsg net/socket.c:3041 [inline] __se_sys_recvmmsg net/socket.c:3034 [inline] __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7faeb6363db9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcc1997168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faeb6363db9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001c R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02net_sched: sch_sfq: annotate data-races around q->perturb_periodEric Dumazet1-4/+9
sfq_perturbation() reads q->perturb_period locklessly. Add annotations to fix potential issues. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02net: dsa: mv88e6xxx: Correct check for empty listSimon Horman1-2/+2
Since commit a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") mv88e6xxx_default_mdio_bus() has checked that the return value of list_first_entry() is non-NULL. This appears to be intended to guard against the list chip->mdios being empty. However, it is not the correct check as the implementation of list_first_entry is not designed to return NULL for empty lists. Instead, use list_first_entry_or_null() which does return NULL if the list is empty. Flagged by Smatch. Compile tested only. Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02selftests/net: skip partial checksum packets in csum testWillem de Bruijn1-3/+15
Detect packets with ip_summed CHECKSUM_PARTIAL and skip these. These should not exist, as the test sends individual packets between two hosts. But if (HW) GRO is on, with randomized content sometimes subsequent packets can be coalesced. In this case the GSO packet checksum is converted to a pseudo checksum in anticipation of sending out as TSO/USO. So the field will not match the expected value. Do not count these as test errors. Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02selftests: net: py: check process exit code in bkg() and background cmd()Jakub Kicinski1-2/+6
We're a bit too loose with error checking for background processes. cmd() completely ignores the fail argument passed to the constructor if background is True. Default to checking for errors if process is not terminated explicitly. Caller can override with True / False. For bkg() the processing step is called magically by __exit__ so record the value passed in the constructor. Reported-by: Willem de Bruijn <[email protected]> Tested-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02IB/hfi1: allocate dummy net_device dynamicallyBreno Leitao2-3/+8
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from struct hfi1_netdev_rx by converting it into a pointer. Then use the leverage alloc_netdev() to allocate the net_device object at hfi1_alloc_rx(). [1] https://lore.kernel.org/all/[email protected]/ Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02net/mlx5e: flower: check for unsupported control flagsAsbjørn Sloth Tønnesen1-6/+4
Use flow_rule_is_supp_control_flags() to reject filters with unsupported control flags. In case any unsupported control flags are masked, flow_rule_is_supp_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Remove FLOW_DIS_FIRST_FRAG specific error message, and treat it as any other unsupported control flag. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Jianbo Liu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-03Merge tag 'drm-misc-fixes-2024-05-02' of ↵Dave Airlie9-45/+80
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: imagination: - fix page-count macro nouveau: - avoid page-table allocation failures - fix firmware memory allocation panel: - ili9341: avoid OF for device properties; respect deferred probe; fix usage of errno codes ttm: - fix status output vmwgfx: - fix legacy display unit - fix read length in fence signalling Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-03Merge tag 'drm-xe-fixes-2024-05-02' of ↵Dave Airlie2-1/+5
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Fix UAF on rebind worker - Fix ADL-N display integration Signed-off-by: Dave Airlie <[email protected]> From: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/6bontwst3mbxozs6u3ad5n3g5zmaucrngbfwv4hkfhpscnwlym@wlwjgjx6pwue
2024-05-03Merge tag 'amd-drm-fixes-6.9-2024-05-01' of ↵Dave Airlie15-55/+138
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.9-2024-05-01: amdgpu: - Fix VRAM memory accounting - DCN 3.1 fixes - DCN 2.0 fix - DCN 3.1.5 fix - DCN 3.5 fix - DCN 3.2.1 fix - DP fixes - Seamless boot fix - Fix call order in amdgpu_ttm_move() - Fix doorbell regression - Disable panel replay temporarily amdkfd: - Flush wq before creating kfd process Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-02btrfs: make sure that WRITTEN is set on all metadata blocksJosef Bacik2-15/+16
We previously would call btrfs_check_leaf() if we had the check integrity code enabled, which meant that we could only run the extended leaf checks if we had WRITTEN set on the header flags. This leaves a gap in our checking, because we could end up with corruption on disk where WRITTEN isn't set on the leaf, and then the extended leaf checks don't get run which we rely on to validate all of the item pointers to make sure we don't access memory outside of the extent buffer. However, since 732fab95abe2 ("btrfs: check-integrity: remove CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only ever call it on blocks that are being written out, and thus have WRITTEN set, or that are being read in, which should have WRITTEN set. Add checks to make sure we have WRITTEN set appropriately, and then make sure __btrfs_check_leaf() always does the item checking. This will protect us from file systems that have been corrupted and no longer have WRITTEN set on some of the blocks. This was hit on a crafted image tweaking the WRITTEN bit and reported by KASAN as out-of-bound access in the eb accessors. The example is a dir item at the end of an eb. [2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2 [2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI [2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f] [2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1 [2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0 [2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206 [2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0 [2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748 [2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9 [2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a [2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8 [2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 [2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0 [2.621] Call Trace: [2.621] <TASK> [2.621] ? show_regs+0x74/0x80 [2.621] ? die_addr+0x46/0xc0 [2.621] ? exc_general_protection+0x161/0x2a0 [2.621] ? asm_exc_general_protection+0x26/0x30 [2.621] ? btrfs_get_16+0x33a/0x6d0 [2.621] ? btrfs_get_16+0x34b/0x6d0 [2.621] ? btrfs_get_16+0x33a/0x6d0 [2.621] ? __pfx_btrfs_get_16+0x10/0x10 [2.621] ? __pfx_mutex_unlock+0x10/0x10 [2.621] btrfs_match_dir_item_name+0x101/0x1a0 [2.621] btrfs_lookup_dir_item+0x1f3/0x280 [2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10 [2.621] btrfs_get_tree+0xd25/0x1910 Reported-by: lei lu <[email protected]> CC: [email protected] # 6.7+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> [ copy more details from report ] Signed-off-by: David Sterba <[email protected]>
2024-05-02btrfs: qgroup: do not check qgroup inherit if qgroup is disabledQu Wenruo1-0/+2
[BUG] After kernel commit 86211eea8ae1 ("btrfs: qgroup: validate btrfs_qgroup_inherit parameter"), user space tool snapper will fail to create snapshot using its timeline feature. [CAUSE] It turns out that, if using timeline snapper would unconditionally pass btrfs_qgroup_inherit parameter (assigning the new snapshot to qgroup 1/0) for snapshot creation. In that case, since qgroup is disabled there would be no qgroup 1/0, and btrfs_qgroup_check_inherit() would return -ENOENT and fail the whole snapshot creation. [FIX] Just skip the check if qgroup is not enabled. This is to keep the older behavior for user space tools, as if the kernel behavior changed for user space, it is a regression of kernel. Thankfully snapper is also fixing the behavior by detecting if qgroup is running in the first place, so the effect should not be that huge. Link: https://github.com/openSUSE/snapper/issues/894 Fixes: 86211eea8ae1 ("btrfs: qgroup: validate btrfs_qgroup_inherit parameter") CC: [email protected] # 6.8+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski306-1532/+2454
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/[email protected]/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02Merge tag 'for-6.9-rc6-tag' of ↵Linus Torvalds4-16/+40
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - set correct ram_bytes when splitting ordered extent. This can be inconsistent on-disk but harmless as it's not used for calculations and it's only advisory for compression - fix lockdep splat when taking cleaner mutex in qgroups disable ioctl - fix missing mutex unlock on error path when looking up sys chunk for relocation * tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: set correct ram_bytes when splitting ordered extent btrfs: take the cleaner_mutex earlier in qgroup disable btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
2024-05-02Merge tag 's390-6.9-6' of ↵Linus Torvalds9-11/+73
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. Fix the callers that provide the last byte within the range instead. - 3270 Channel Command Word (CCW) may contain zero data address in case there is no data in the request. Add data availability check to avoid erroneous non-zero value as result of virt_to_dma32(NULL) application in cases there is no data - Add missing CFI directives for an unwinder to restore the return address in the vDSO assembler code - NUL-terminate kernel buffer when duplicating user space memory region on Channel IO (CIO) debugfs write inject - Fix wrong format string in zcrypt debug output - Return -EBUSY code when a CCA card is temporarily unavailabile - Restore a loop that retries derivation of a protected key from a secure key in cases the low level reports temporarily unavailability with -EBUSY code * tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/paes: Reestablish retry loop in paes s390/zcrypt: Use EBUSY to indicate temp unavailability s390/zcrypt: Handle ep11 cprb return code s390/zcrypt: Fix wrong format string in debug feature printout s390/cio: Ensure the copied buf is NUL terminated s390/vdso: Add CFI for RA register to asm macro vdso_func s390/3270: Fix buffer assignment s390/mm: Fix clearing storage keys for huge pages s390/mm: Fix storage key clearing for guest huge pages
2024-05-02Merge tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds6-28/+20
Pull xtensa fixes from Max Filippov: - fix unused variable warning caused by empty flush_dcache_page() definition - fix stack unwinding on windowed noMMU XIP configurations - fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform code * tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros tty: xtensa/iss: Use min() to fix Coccinelle warning xtensa: fix MAKE_PC_FROM_RA second argument
2024-05-02x86/xen: return a sane initial apic id when running as PV guestJuergen Gross1-1/+10
With recent sanity checks for topology information added, there are now warnings issued for APs when running as a Xen PV guest: [Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001 This is due to the initial APIC ID obtained via CPUID for PV guests is always 0. Avoid the warnings by synthesizing the CPUID data to contain the same initial APIC ID as xen_pv_smp_config() is using for registering the APIC IDs of all CPUs. Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete") Signed-off-by: Juergen Gross <[email protected]>
2024-05-02drm/xe/display: Fix ADL-N detectionLucas De Marchi1-1/+2
Contrary to i915, in xe ADL-N is kept as a different platform, not a subplatform of ADL-P. Since the display side doesn't need to differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called, just fixup the compat header to check for both P and N. Moving ADL-N to be a subplatform would be more complex as the firmware loading in xe only handles platforms, not subplatforms, as going forward the direction is to check on IP version rather than platforms/subplatforms. Fix warning when initializing display: xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH ------------[ cut here ]------------ xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P)) And wrong paths being taken on the display side. Reviewed-by: Matt Roper <[email protected]> Acked-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 6a2a90cba12b42eb96c2af3426b77ceb4be31df2) Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Lucas De Marchi <[email protected]>
2024-05-02Merge tag 'firewire-fixes-6.9-rc6' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fixes from Takashi Sakamoto: "Two driver fixes: - The firewire-ohci driver for 1394 OHCI hardware does not fill time stamp for response packet when handling asynchronous transaction to local destination. This brings an inconvenience that the response packet is not equivalent between the transaction to local and remote. It is fixed by fulfilling the time stamp with hardware time. The fix should be applied to Linux kernel v6.5 or later as well. - The nosy driver for Texas Instruments TSB12LV21A (PCILynx) has long-standing issue about the behaviour when user space application passes less size of buffer than expected. It is fixed by returning zero according to the convention of UNIX-like systems" * tag 'firewire-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: ohci: fulfill timestamp for some local asynchronous transaction firewire: nosy: ensure user_length is taken into account when fetching packet contents
2024-05-02x86/xen/smp_pv: Register the boot CPU APIC properlyThomas Gleixner1-2/+2
The topology core expects the boot APIC to be registered from earhy APIC detection first and then again when the firmware tables are evaluated. This is used for detecting the real BSP CPU on a kexec kernel. The recent conversion of XEN/PV to register fake APIC IDs failed to register the boot CPU APIC correctly as it only registers it once. This causes the BSP detection mechanism to trigger wrongly: CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1 Additionally this results in one CPU being ignored. Register the boot CPU APIC twice so that the XEN/PV fake enumeration behaves like real firmware. Reported-by: Juergen Gross <[email protected]> Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs") Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx Signed-off-by: Juergen Gross <[email protected]>
2024-05-02Merge tag 'thermal-6.9-rc7' of ↵Linus Torvalds1-14/+45
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix a memory leak and a few locking issues (that may cause the kernel to crash in principle if all goes wrong) in the thermal debug code introduced during the 6.8 development cycle" * tag 'thermal-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/debugfs: Prevent use-after-free from occurring after cdev removal thermal/debugfs: Fix two locking issues with thermal zone debug thermal/debugfs: Free all thermal zone debug memory on zone removal
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds49-193/+378
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02Merge branch 'bnxt_en-updates-for-net-next'Jakub Kicinski6-61/+107
Michael Chan says: ==================== bnxt_en: Updates for net-next The first patch converts the sw_stats field in the completion ring structure to a pointer. This allows the group of completion rings using the same MSIX to share the same sw_stats structure. Prior to this, the correct completion ring must be used to count packets. The next four patches remove the RTNL lock when calling the RoCE driver for asynchronous stop and start during error recovery and firmware reset. The RTNL ilock is replaced with a private mutex used to synchronize RoCE register, unregister, stop, and start. The last patch adds VF PCI IDs for the 5760X chips. v2: Dropped patch #1 from v1. Will work with David to get that patch in separately. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: Add VF PCI ID for 5760X (P7) chipsAjit Khaparde2-1/+4
No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <[email protected]> Signed-off-by: Ajit Khaparde <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: Optimize recovery path ULP locking in the driverKalesh AP3-26/+44
In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <[email protected]> Reviewed-by: Vikas Gupta <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: Add a mutex to synchronize ULP operationsKalesh AP2-1/+22
The current scheme relies heavily on the RTNL lock for all ULP operations between the L2 and the RoCE driver. Add a new en_dev_lock mutex so that the asynchronous ULP_STOP and ULP_START operations can be serialized with bnxt_register_dev() and bnxt_unregister_dev() calls without relying on the RTNL lock. The next patch will remove the RTNL lock from the ULP_STOP and ULP_START calls. Reviewed-by: Selvin Thyparampil Xavier <[email protected]> Reviewed-by: Vikas Gupta <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: Don't call ULP_STOP/ULP_START during L2 resetMichael Chan1-11/+2
There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: Don't support offline self test when RoCE driver is loadedKalesh AP1-3/+8
Offline self test is a very disruptive operation for RoCE and requires all active QPs to be destroyed. With a large number of QPs, it can take a long time to destroy all the QPs and can timeout. Do not allow ethtool offline self test if the RoCE driver is registered on the device. Reviewed-by: Selvin Thyparampil Xavier <[email protected]> Reviewed-by: Vikas Gupta <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02bnxt_en: share NQ ring sw_stats memory with subringsEdwin Peer3-19/+27
On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02Merge branch '40GbE' of ↵Jakub Kicinski10-152/+211
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== i40e: cleanups & refactors Ivan Vecera says: This series do following: Patch 1 - Removes write-only flags field from i40e_veb structure and from i40e_veb_setup() parameters Patch 2 - Refactors parameter of i40e_notify_client_of_l2_param_changes() and i40e_notify_client_of_netdev_close() Patch 3 - Refactors parameter of i40e_detect_recover_hung() Patch 4 - Adds helper i40e_pf_get_main_vsi() to get main VSI and uses it in existing code Patch 5 - Consolidates checks whether given VSI is the main one Patch 6 - Adds helper i40e_pf_get_main_veb() to get main VEB and uses it in existing code Patch 7 - Adds helper i40e_vsi_reconfig_tc() to reconfigure TC for particular and uses it to replace existing open-coded pieces * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: Add and use helper to reconfigure TC for given VSI i40e: Add helper to access main VEB i40e: Consolidate checks whether given VSI is main i40e: Add helper to access main VSI i40e: Refactor argument of i40e_detect_recover_hung() i40e: Refactor argument of several client notification functions i40e: Remove flags field from i40e_veb ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02net/sched: unregister lockdep keys in qdisc_create/qdisc_alloc error pathDavide Caratti2-0/+2
Naresh and Eric report several errors (corrupted elements in the dynamic key hash list), when running tdc.py or syzbot. The error path of qdisc_alloc() and qdisc_create() frees the qdisc memory, but it forgets to unregister the lockdep key, thus causing use-after-free like the following one: ================================================================== BUG: KASAN: slab-use-after-free in lockdep_register_key+0x5f2/0x700 Read of size 8 at addr ffff88811236f2a8 by task ip/7925 CPU: 26 PID: 7925 Comm: ip Kdump: loaded Not tainted 6.9.0-rc2+ #648 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <TASK> dump_stack_lvl+0x7c/0xc0 print_report+0xc9/0x610 kasan_report+0x89/0xc0 lockdep_register_key+0x5f2/0x700 qdisc_alloc+0x21d/0xb60 qdisc_create_dflt+0x63/0x3c0 attach_one_default_qdisc.constprop.37+0x8e/0x170 dev_activate+0x4bd/0xc30 __dev_open+0x275/0x380 __dev_change_flags+0x3f1/0x570 dev_change_flags+0x7c/0x160 do_setlink+0x1ea1/0x34b0 __rtnl_newlink+0x8c9/0x1510 rtnl_newlink+0x61/0x90 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 RIP: 0033:0x7f9503f4fa07 Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007fff6c729068 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000006630c681 RCX: 00007f9503f4fa07 RDX: 0000000000000000 RSI: 00007fff6c7290d0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000078 R10: 000000000000009b R11: 0000000000000246 R12: 0000000000000001 R13: 00007fff6c729180 R14: 0000000000000000 R15: 000055bf67dd9040 </TASK> Allocated by task 7745: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0x7b/0x90 __kmalloc_node+0x1ff/0x460 qdisc_alloc+0xae/0xb60 qdisc_create+0xdd/0xfb0 tc_modify_qdisc+0x37e/0x1960 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 Freed by task 7745: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x36/0x60 __kasan_slab_free+0xfe/0x180 kfree+0x113/0x380 qdisc_create+0xafb/0xfb0 tc_modify_qdisc+0x37e/0x1960 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 Fix this ensuring that lockdep_unregister_key() is called before the qdisc struct is freed, also in the error path of qdisc_create() and qdisc_alloc(). Fixes: af0cb3fa3f9e ("net/sched: fix false lockdep warning on qdisc root lock") Reported-by: Linux Kernel Functional Testing <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Tested-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/2aa1ca0c0a3aa0acc15925c666c777a4b5de553c.1714496886.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02Merge commit '50abcc179e0c9ca667feb223b26ea406d5c4c556' of ↵Jens Axboe10-38/+65
git://git.infradead.org/nvme into block-6.9 Pull NVMe fixes from Keith. * git://git.infradead.org/nvme: nvme-tcp: strict pdu pacing to avoid send stalls on TLS nvmet: fix nvme status code when namespace is disabled nvmet-tcp: fix possible memory leak when tearing down a controller nvme: cancel pending I/O if nvme controller is in terminal state nvmet-auth: replace pr_debug() with pr_err() to report an error. nvmet-auth: return the error code to the nvmet_auth_host_hash() callers nvme: find numa distance only if controller has valid numa id nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
2024-05-02swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=yWill Deacon1-0/+1
Using restricted DMA pools (CONFIG_DMA_RESTRICTED_POOL=y) in conjunction with dynamic SWIOTLB (CONFIG_SWIOTLB_DYNAMIC=y) leads to the following crash when initialising the restricted pools at boot-time: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 | Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP | pc : rmem_swiotlb_device_init+0xfc/0x1ec | lr : rmem_swiotlb_device_init+0xf0/0x1ec | Call trace: | rmem_swiotlb_device_init+0xfc/0x1ec | of_reserved_mem_device_init_by_idx+0x18c/0x238 | of_dma_configure_id+0x31c/0x33c | platform_dma_configure+0x34/0x80 faddr2line reveals that the crash is in the list validation code: include/linux/list.h:83 include/linux/rculist.h:79 include/linux/rculist.h:106 kernel/dma/swiotlb.c:306 kernel/dma/swiotlb.c:1695 because add_mem_pool() is trying to list_add_rcu() to a NULL 'mem->pools'. Fix the crash by initialising the 'mem->pools' list_head in rmem_swiotlb_device_init() before calling add_mem_pool(). Reported-by: Nikita Ioffe <[email protected]> Tested-by: Nikita Ioffe <[email protected]> Fixes: 1aaa736815eb ("swiotlb: allocate a new memory pool when existing pools are full") Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>