aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-31net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutexHerve Codina1-3/+4
The carrier_lock spinlock protects the carrier detection. While it is held, framer_get_status() is called which in turn takes a mutex. This is not correct and can lead to a deadlock. A run with PROVE_LOCKING enabled detected the issue: [ BUG: Invalid wait context ] ... c204ddbc (&framer->mutex){+.+.}-{3:3}, at: framer_get_status+0x40/0x78 other info that might help us debug this: context-{4:4} 2 locks held by ifconfig/146: #0: c0926a38 (rtnl_mutex){+.+.}-{3:3}, at: devinet_ioctl+0x12c/0x664 #1: c2006a40 (&qmc_hdlc->carrier_lock){....}-{2:2}, at: qmc_hdlc_framer_set_carrier+0x30/0x98 Avoid the spinlock usage and convert carrier_lock to a mutex. Fixes: 54762918ca85 ("net: wan: fsl_qmc_hdlc: Add framer support") Cc: [email protected] Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31Merge branch 'mlx5-misc-fixes-2024-07-30'Jakub Kicinski9-11/+26
Tariq Toukan says: ==================== mlx5 misc fixes 2024-07-30 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptysShahar Shitrit1-1/+6
Since the documentation for mlx5_toggle_port_link states that it should only be used after setting the port register, we add a check for the return value from mlx5_port_set_eth_ptys to ensure the register was successfully set before calling it. Fixes: 667daedaecd1 ("net/mlx5e: Toggle link only after modifying port parameters") Signed-off-by: Shahar Shitrit <[email protected]> Reviewed-by: Carolina Jubran <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5e: Fix CT entry update leaks of modify header contextChris Mi1-0/+1
The cited commit allocates a new modify header to replace the old one when updating CT entry. But if failed to allocate a new one, eg. exceed the max number firmware can support, modify header will be an error pointer that will trigger a panic when deallocating it. And the old modify header point is copied to old attr. When the old attr is freed, the old modify header is lost. Fix it by restoring the old attr to attr when failed to allocate a new modify header context. So when the CT entry is freed, the right modify header context will be freed. And the panic of accessing error pointer is also fixed. Fixes: 94ceffb48eac ("net/mlx5e: Implement CT entry update") Signed-off-by: Chris Mi <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capabilityRahul Rameshbabu1-3/+4
Require mlx5 classifier action support when creating IPSec chains in offload path. MLX5_IPSEC_CAP_PRIO should only be set if CONFIG_MLX5_CLS_ACT is enabled. If CONFIG_MLX5_CLS_ACT=n and MLX5_IPSEC_CAP_PRIO is set, configuring IPsec offload will fail due to the mlxx5 ipsec chain rules failing to be created due to lack of classifier action support. Fixes: fa5aa2f89073 ("net/mlx5e: Use chains for IPsec policy priority offload") Signed-off-by: Rahul Rameshbabu <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5: Fix missing lock on sync reset reloadMoshe Shemesh1-1/+4
On sync reset reload work, when remote host updates devlink on reload actions performed on that host, it misses taking devlink lock before calling devlink_remote_reload_actions_performed() which results in triggering lock assert like the following: WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50 … CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S W 6.10.0-rc2+ #116 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core] RIP: 0010:devl_assert_locked+0x3e/0x50 … Call Trace: <TASK> ? __warn+0xa4/0x210 ? devl_assert_locked+0x3e/0x50 ? report_bug+0x160/0x280 ? handle_bug+0x3f/0x80 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? devl_assert_locked+0x3e/0x50 devlink_notify+0x88/0x2b0 ? mlx5_attach_device+0x20c/0x230 [mlx5_core] ? __pfx_devlink_notify+0x10/0x10 ? process_one_work+0x4b6/0xbb0 process_one_work+0x4b6/0xbb0 […] Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks") Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5: Lag, don't use the hardcoded value of the first portMark Bloch1-1/+1
The cited commit didn't change the body of the loop as it should. It shouldn't be using MLX5_LAG_P1. Fixes: 7e978e7714d6 ("net/mlx5: Lag, use actual number of lag ports") Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5: DR, Fix 'stack guard page was hit' error in dr_ruleYevgeny Kliteynik1-1/+1
This patch reduces the size of hw_ste_arr_optimized array that is allocated on stack from 640 bytes (5 match STEs + 5 action STES) to 448 bytes (2 match STEs + 5 action STES). This fixes the 'stack guard page was hit' issue, while still fitting majority of the usecases (up to 2 match STEs). Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5: Fix error handling in irq_pool_request_irqShay Drory1-3/+7
In case mlx5_irq_alloc fails, the previously allocated index remains in the XArray, which could lead to inconsistencies. Fix it by adding error handling that erases the allocated index from the XArray if mlx5_irq_alloc returns an error. Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Maher Sanalla <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net/mlx5: Always drain health in shutdown callbackShay Drory2-1/+2
There is no point in recovery during device shutdown. if health work started need to wait for it to avoid races and NULL pointer access. Hence, drain health WQ on shutdown callback. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31net: Add skbuff.h to MAINTAINERSBreno Leitao1-0/+1
The network maintainers need to be copied if the skbuff.h is touched. This also helps git-send-email to figure out the proper maintainers when touching the file. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31r8169: don't increment tx_dropped in case of NETDEV_TX_BUSYHeiner Kallweit1-6/+2
The skb isn't consumed in case of NETDEV_TX_BUSY, therefore don't increment the tx_dropped counter. Fixes: 188f4af04618 ("r8169: use NETDEV_TX_{BUSY/OK}") Cc: [email protected] Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-31Merge tag 'for-netdev' of ↵Jakub Kicinski2-2/+2
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-31 We've added 2 non-merge commits during the last 2 day(s) which contain a total of 2 files changed, 2 insertions(+), 2 deletions(-). The main changes are: 1) Fix BPF selftest build after tree sync with regards to a _GNU_SOURCE macro redefined compilation error, from Stanislav Fomichev. 2) Fix a wrong test in the ASSERT_OK() check in uprobe_syscall BPF selftest, from Jiri Olsa. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30Merge branch '100GbE' of ↵Jakub Kicinski6-90/+135
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: fix AF_XDP ZC timeout and concurrency issues Maciej Fijalkowski says: Changes included in this patchset address an issue that customer has been facing when AF_XDP ZC Tx sockets were used in combination with flow control and regular Tx traffic. After executing: ethtool --set-priv-flags $dev link-down-on-close on ethtool -A $dev rx on tx on launching multiple ZC Tx sockets on $dev + pinging remote interface (so that regular Tx traffic is present) and then going through down/up of $dev, Tx timeout occurred and then most of the time ice driver was unable to recover from that state. These patches combined together solve the described above issue on customer side. Main focus here is to forbid producing Tx descriptors when either carrier is not yet initialized or process of bringing interface down has already started. v1: https://lore.kernel.org/netdev/[email protected]/ * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: xsk: fix txq interrupt mapping ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog ice: improve updating ice_{t,r}x_ring::xsk_pool ice: toggle netif_carrier when setting up XSK pool ice: modify error handling when setting XSK pool in ndo_bpf ice: replace synchronize_rcu with synchronize_net ice: don't busy wait for Rx queue disable in ice_qp_dis() ice: respect netif readiness in AF_XDP ZC related ndo's ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30net: drop bad gso csum_start and offset in virtio_net_hdrWillem de Bruijn3-11/+12
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb for GSO packets. The function already checks that a checksum requested with VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets this might not hold for segs after segmentation. Syzkaller demonstrated to reach this warning in skb_checksum_help offset = skb_checksum_start_offset(skb); ret = -EINVAL; if (WARN_ON_ONCE(offset >= skb_headlen(skb))) By injecting a TSO packet: WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0 ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774 ip_finish_output_gso net/ipv4/ip_output.c:279 [inline] __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301 iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4850 [inline] netdev_start_xmit include/linux/netdevice.h:4864 [inline] xmit_one net/core/dev.c:3595 [inline] dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611 __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261 packet_snd net/packet/af_packet.c:3073 [inline] The geometry of the bad input packet at tcp_gso_segment: [ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0 [ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244 [ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0)) [ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536 ip_summed=3 complete_sw=0 valid=0 level=0) Mitigate with stricter input validation. csum_offset: for GSO packets, deduce the correct value from gso_type. This is already done for USO. Extend it to TSO. Let UFO be: udp[46]_ufo_fragment ignores these fields and always computes the checksum in software. csum_start: finding the real offset requires parsing to the transport header. Do not add a parser, use existing segmentation parsing. Thanks to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded. Again test both TSO and USO. Do not test UFO for the above reason, and do not test UDP tunnel offload. GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit from devices with no checksum offload"), but then still these fields are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no need to test for ip_summed == CHECKSUM_PARTIAL first. This revises an existing fix mentioned in the Fixes tag, which broke small packets with GSO offload, as detected by kselftests. Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871 Link: https://lore.kernel.org/netdev/[email protected] Fixes: e269d79c7d35 ("net: missing check virtio") Cc: [email protected] Signed-off-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30net: phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115cBartosz Golaszewski1-8/+21
Commit 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values") introduced a workaround for an issue observed on aqr115c. However there were never any reports of it happening on other models and the workaround has been reported to cause and issue on aqr113c (and it may cause the same on any other model not supporting 10M mode). Let's limit the impact of the workaround to aqr113, aqr113c and aqr115c and poll the 100M GLOBAL_CFG register instead as both models are known to support it correctly. Reported-by: Jon Hunter <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values") Tested-by: Jon Hunter <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]> Reviewed-by: Antoine Tenart <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30net: phy: micrel: Fix the KSZ9131 MDI-X status issueRaju Lakkaraju1-15/+19
The MDIX status is not accurately reflecting the current state after the link partner has manually altered its MDIX configuration while operating in forced mode. Access information about Auto mdix completion and pair selection from the KSZ9131's Auto/MDI/MDI-X status register Fixes: b64e6a8794d9 ("net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131") Signed-off-by: Raju Lakkaraju <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall testJiri Olsa1-1/+1
Fixing ASSERT_OK condition check in uprobe_syscall test, otherwise we return from test on pipe success. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-30net: mvpp2: Don't re-use loop iteratorDan Carpenter1-3/+3
This function has a nested loop. The problem is that both the inside and outside loop use the same variable as an iterator. I found this via static analysis so I'm not sure the impact. It could be that it loops forever or, more likely, the loop exits early. Fixes: 3a616b92a9d1 ("net: mvpp2: Add TX flow control support for jumbo frames") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-30net/iucv: fix use after free in iucv_sock_close()Alexandra Winter1-2/+2
iucv_sever_path() is called from process context and from bh context. iucv->path is used as indicator whether somebody else is taking care of severing the path (or it is already removed / never existed). This needs to be done with atomic compare and swap, otherwise there is a small window where iucv_sock_close() will try to work with a path that has already been severed and freed by iucv_callback_connrej() called by iucv_tasklet_fn(). Example: [452744.123844] Call Trace: [452744.123845] ([<0000001e87f03880>] 0x1e87f03880) [452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138 [452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv] [452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv] [452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv] [452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8 [452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48 [452744.124820] [<00000000d5421642>] __fput+0xba/0x268 [452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0 [452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90 [452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8 [452744.125319] Last Breaking-Event-Address: [452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138 [452744.125324] [452744.125325] Kernel panic - not syncing: Fatal exception in interrupt Note that bh_lock_sock() is not serializing the tasklet context against process context, because the check for sock_owned_by_user() and corresponding handling is missing. Ideas for a future clean-up patch: A) Correct usage of bh_lock_sock() in tasklet context, as described in Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/ Re-enqueue, if needed. This may require adding return values to the tasklet functions and thus changes to all users of iucv. B) Change iucv tasklet into worker and use only lock_sock() in af_iucv. Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely") Reviewed-by: Halil Pasic <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30net/smc: prevent UAF in inet_create()D. Wythe1-3/+4
Following syzbot repro crashes the kernel: socketpair(0x2, 0x1, 0x100, &(0x7f0000000140)) (fail_nth: 13) Fix this by not calling sk_common_release() from smc_create_clcsk(). Stack trace: socket: no more sockets ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 5092 at lib/refcount.c:28 refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Modules linked in: CPU: 1 PID: 5092 Comm: syz-executor424 Not tainted 6.10.0-syzkaller-04483-g0be9ae5486cd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Code: 80 f3 1f 8c e8 e7 69 a8 fc 90 0f 0b 90 90 eb 99 e8 cb 4f e6 fc c6 05 8a 8d e8 0a 01 90 48 c7 c7 e0 f3 1f 8c e8 c7 69 a8 fc 90 <0f> 0b 90 90 e9 76 ff ff ff e8 a8 4f e6 fc c6 05 64 8d e8 0a 01 90 RSP: 0018:ffffc900034cfcf0 EFLAGS: 00010246 RAX: 3b9fcde1c862f700 RBX: ffff888022918b80 RCX: ffff88807b39bc00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000003 R08: ffffffff815878a2 R09: fffffbfff1c39d94 R10: dffffc0000000000 R11: fffffbfff1c39d94 R12: 00000000ffffffe9 R13: 1ffff11004523165 R14: ffff888022918b28 R15: ffff888022918b00 FS: 00005555870e7380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000007582e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_create+0xbaf/0xe70 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x2ca/0x720 net/socket.c:1769 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbcb9259669 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffe931c6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000035 RAX: ffffffffffffffda RBX: 00007fffe931c6f0 RCX: 00007fbcb9259669 RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000002 RBP: 0000000000000002 R08: 00007fffe931c476 R09: 00000000000000a0 R10: 0000000020000140 R11: 0000000000000246 R12: 00007fffe931c6ec R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Link: https://lore.kernel.org/r/[email protected]/ Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: syzbot <[email protected]> Signed-off-by: D. Wythe <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30Merge branch 'mptcp-fix-inconsistent-backup-usage'Paolo Abeni11-21/+132
Matthieu Baerts says: ==================== mptcp: fix inconsistent backup usage In all the MPTCP backup related tests, the backup flag was set on one side, and the expected behaviour is to have both sides respecting this decision. That's also the "natural" way, and what the users seem to expect. On the scheduler side, only the 'backup' field was checked, which is supposed to be set only if the other peer flagged a subflow as backup. But in various places, this flag was also set when the local host flagged the subflow as backup, certainly to have the expected behaviour mentioned above. Patch 1 modifies the packet scheduler to check if the backup flag has been set on both directions, not to change its behaviour after having applied the following patches. That's what the default packet scheduler should have done since the beginning in v5.7. Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by accident since its introduction in v5.7. Instead, the received and sent backup flags are properly distinguished in requests. Patch 3 stops setting the received backup flag as well when sending an MP_PRIO, something that was done since the MP_PRIO support in v5.12. Patch 4 adds related and missing MIB counters to be able to easily check if MP_JOIN are sent with a backup flag. Certainly because these counters were not there, the behaviour that is fixed by patches here was not properly verified. Patch 5 validates the previous patch by extending the MPTCP Join selftest. Patch 6 fixes the backup support in signal endpoints: if a signal endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as expected. It was only set for ongoing connections, but not future ones as expected, since the introduction of the backup flag in endpoints in v5.10. Patch 7 validates the previous patch by extending the MPTCP Join selftest as well. Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> --- Matthieu Baerts (NGI0) (7): mptcp: sched: check both directions for backup mptcp: distinguish rcv vs sent backup flag in requests mptcp: pm: only set request_bkup flag when sending MP_PRIO mptcp: mib: count MPJ with backup flag selftests: mptcp: join: validate backup in MPJ mptcp: pm: fix backup support in signal endpoints selftests: mptcp: join: check backup support in signal endp include/trace/events/mptcp.h | 2 +- net/mptcp/mib.c | 2 + net/mptcp/mib.h | 2 + net/mptcp/options.c | 2 +- net/mptcp/pm.c | 12 +++++ net/mptcp/pm_netlink.c | 19 ++++++- net/mptcp/pm_userspace.c | 18 +++++++ net/mptcp/protocol.c | 10 ++-- net/mptcp/protocol.h | 4 ++ net/mptcp/subflow.c | 10 ++++ tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++----- 11 files changed, 132 insertions(+), 21 deletions(-) ==================== Link: https://patch.msgid.link/20240727-upstream-net-20240727-mptcp-backup-signal-v1-0-f50b31604cf1@kernel.org Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30selftests: mptcp: join: check backup support in signal endpMatthieu Baerts (NGI0)1-6/+28
Before the previous commit, 'signal' endpoints with the 'backup' flag were ignored when sending the MP_JOIN. The MPTCP Join selftest has then been modified to validate this case: the "single address, backup" test, is now validating the MP_JOIN with a backup flag as it is what we expect it to do with such name. The previous version has been kept, but renamed to "single address, switch to backup" to avoid confusions. The "single address with port, backup" test is also now validating the MPJ with a backup flag, which makes more sense than checking the switch to backup with an MP_PRIO. The "mpc backup both sides" test is now validating that the backup flag is also set in MP_JOIN from and to the addresses used in the initial subflow, using the special ID 0. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30mptcp: pm: fix backup support in signal endpointsMatthieu Baerts (NGI0)5-0/+54
There was a support for signal endpoints, but only when the endpoint's flag was changed during a connection. If an endpoint with the signal and backup was already present, the MP_JOIN reply was not containing the backup flag as expected. That's confusing to have this inconsistent behaviour. On the other hand, the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was already there, it was just never set before. Now when requesting the local ID from the path-manager, the backup status is also requested. Note that when the userspace PM is used, the backup flag can be set if the local address was already used before with a backup flag, e.g. if the address was announced with the 'backup' flag, or a subflow was created with the 'backup' flag. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: [email protected] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507 Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30selftests: mptcp: join: validate backup in MPJMatthieu Baerts (NGI0)1-10/+32
A peer can notify the other one that a subflow has to be treated as "backup" by two different ways: either by sending a dedicated MP_PRIO notification, or by setting the backup flag in the MP_JOIN handshake. The selftests were previously monitoring the former, but not the latter. This is what is now done here by looking at these new MIB counters when validating the 'backup' cases: MPTcpExtMPJoinSynBackupRx MPTcpExtMPJoinSynAckBackupRx The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it will help to validate a new fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30mptcp: mib: count MPJ with backup flagMatthieu Baerts (NGI0)3-0/+10
Without such counters, it is difficult to easily debug issues with MPJ not having the backup flags on production servers. This is not strictly a fix, but it eases to validate the following patches without requiring to take packet traces, to query ongoing connections with Netlink with admin permissions, or to guess by looking at the behaviour of the packet scheduler. Also, the modification is self contained, isolated, well controlled, and the increments are done just after others, there from the beginning. It looks then safe, and helpful to backport this. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30mptcp: pm: only set request_bkup flag when sending MP_PRIOMatthieu Baerts (NGI0)1-1/+0
The 'backup' flag from mptcp_subflow_context structure is supposed to be set only when the other peer flagged a subflow as backup, not the opposite. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30mptcp: distinguish rcv vs sent backup flag in requestsMatthieu Baerts (NGI0)3-1/+3
When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow as 'backup' by setting the flag with the same name. Before this patch, the backup was set if the other peer set it in its MP_JOIN + SYN request. It is not correct: the backup flag should be set in the MPJ+SYN+ACK only if the host asks for it, and not mirroring what was done by the other peer. It is then required to have a dedicated bit for each direction, similar to what is done in the subflow context. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-30mptcp: sched: check both directions for backupMatthieu Baerts (NGI0)2-5/+7
The 'mptcp_subflow_context' structure has two items related to the backup flags: - 'backup': the subflow has been marked as backup by the other peer - 'request_bkup': the backup flag has been set by the host Before this patch, the scheduler was only looking at the 'backup' flag. That can make sense in some cases, but it looks like that's not what we wanted for the general use, because either the path-manager was setting both of them when sending an MP_PRIO, or the receiver was duplicating the 'backup' flag in the subflow request. Note that the use of these two flags in the path-manager are going to be fixed in the next commits, but this change here is needed not to modify the behaviour. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-29selftests/bpf: Filter out _GNU_SOURCE when compiling test_cppStanislav Fomichev1-1/+1
Jakub reports build failures when merging linux/master with net tree: CXX test_cpp In file included from <built-in>:454: <command line>:2:9: error: '_GNU_SOURCE' macro redefined [-Werror,-Wmacro-redefined] 2 | #define _GNU_SOURCE | ^ <built-in>:445:9: note: previous definition is here 445 | #define _GNU_SOURCE 1 The culprit is commit cc937dad85ae ("selftests: centralize -D_GNU_SOURCE= to CFLAGS in lib.mk") which unconditionally added -D_GNU_SOUCE to CLFAGS. Apparently clang++ also unconditionally adds it for the C++ targets [0] which causes a conflict. Add small change in the selftests makefile to filter it out for test_cpp. Not sure which tree it should go via, targeting bpf for now, but net might be better? 0: https://stackoverflow.com/questions/11670581/why-is-gnu-source-defined-by-default-and-how-to-turn-it-off Signed-off-by: Stanislav Fomichev <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-29ice: xsk: fix txq interrupt mappingMaciej Fijalkowski1-10/+14
ice_cfg_txq_interrupt() internally handles XDP Tx ring. Do not use ice_for_each_tx_ring() in ice_qvec_cfg_msix() as this causing us to treat XDP ring that belongs to queue vector as Tx ring and therefore misconfiguring the interrupts. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_progMaciej Fijalkowski1-1/+1
It is read by data path and modified from process context on remote cpu so it is needed to use WRITE_ONCE to clear the pointer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: improve updating ice_{t,r}x_ring::xsk_poolMaciej Fijalkowski6-55/+87
xsk_buff_pool pointers that ice ring structs hold are updated via ndo_bpf that is executed in process context while it can be read by remote CPU at the same time within NAPI poll. Use synchronize_net() after pointer update and {READ,WRITE}_ONCE() when working with mentioned pointer. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: toggle netif_carrier when setting up XSK poolMaciej Fijalkowski1-1/+7
This so we prevent Tx timeout issues. One of conditions checked on running in the background dev_watchdog() is netif_carrier_ok(), so let us turn it off when we disable the queues that belong to a q_vector where XSK pool is being configured. Turn carrier on in ice_qp_ena() only when ice_get_link_status() tells us that physical link is up. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: modify error handling when setting XSK pool in ndo_bpfMaciej Fijalkowski1-14/+16
Don't bail out right when spotting an error within ice_qp_{dis,ena}() but rather track error and go through whole flow of disabling and enabling queue pair. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: replace synchronize_rcu with synchronize_netMaciej Fijalkowski1-5/+4
Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can be called instead of synchronize_rcu() so that XDP rings can finish its job in a faster way. Also let us do this as earlier in XSK queue disable flow. Additionally, turn off regular Tx queue before disabling irqs and NAPI. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: don't busy wait for Rx queue disable in ice_qp_dis()Maciej Fijalkowski1-3/+1
When ice driver is spammed with multiple xdpsock instances and flow control is enabled, there are cases when Rx queue gets stuck and unable to reflect the disable state in QRX_CTRL register. Similar issue has previously been addressed in commit 13a6233b033f ("ice: Add support to enable/disable all Rx queues before waiting"). To workaround this, let us simply not wait for a disabled state as later patch will make sure that regardless of the encountered error in the process of disabling a queue pair, the Rx queue will be enabled. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29ice: respect netif readiness in AF_XDP ZC related ndo'sMichal Kubiak1-1/+5
Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx ring when link is either not yet fully initialized or process of stopping the netdev has already started. To avoid this, add checks against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). One could argue that bailing out early in ice_xsk_wakeup() would be sufficient but given the fact that we produce Tx descriptors on behalf of NAPI that is triggered for Rx traffic, the latter is also needed. Bringing link up is an asynchronous event executed within ice_service_task so even though interface has been brought up there is still a time frame where link is not yet ok. Without this patch, when AF_XDP ZC Tx is used simultaneously with stack Tx, Tx timeouts occur after going through link flap (admin brings interface down then up again). HW seem to be unable to transmit descriptor to the wire after HW tail register bump which in turn causes bit __QUEUE_STATE_STACK_XOFF to be set forever as netdev_tx_completed_queue() sees no cleaned bytes on the input. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Michal Kubiak <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-07-29Merge branch 'mptcp-endpoint-readd-fixes' into mainDavid S. Miller3-13/+53
Matthieu Baerts says: ==================== mptcp: fix signal endpoint readd Issue #501 [1] showed that the Netlink PM currently doesn't correctly support removal and re-add of signal endpoints. Patches 1 and 2 address the issue: the first one in the userspace path- manager, introduced in v5.19 ; and the second one in the in-kernel path- manager, introduced in v5.7. Patch 3 introduces a related selftest. There is no 'Fixes' tag, because it might be hard to backport it automatically, as missing helpers in Bash will not be caught when compiling the kernel or the selftests. The last two patches address two small issues in the MPTCP selftests, one introduced in v6.6., and the other one in v5.17. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501 [1] ==================== Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29selftests: mptcp: always close input's FD if openedLiu Jing1-4/+4
In main_loop_s function, when the open(cfg_input, O_RDONLY) function is run, the last fd is not closed if the "--cfg_repeat > 0" branch is not taken. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: [email protected] Signed-off-by: Liu Jing <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29selftests: mptcp: fix error pathPaolo Abeni1-1/+1
pm_nl_check_endpoint() currently calls an not existing helper to mark the test as failed. Fix the wrong call. Fixes: 03668c65d153 ("selftests: mptcp: join: rework detailed report") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29selftests: mptcp: add explicit test case for remove/readdPaolo Abeni1-0/+29
Delete and re-create a signal endpoint and ensure that the PM actually deletes and re-create the subflow. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29mptcp: fix NL PM announced address accountingPaolo Abeni1-4/+6
Currently the per connection announced address counter is never decreased. As a consequence, after connection establishment, if the NL PM deletes an endpoint and adds a new/different one, no additional subflow is created for the new endpoint even if the current limits allow that. Address the issue properly updating the signaled address counter every time the NL PM removes such addresses. Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29mptcp: fix user-space PM announced address accountingPaolo Abeni1-4/+13
Currently the per-connection announced address counter is never decreased. When the user-space PM is in use, this just affect the information exposed via diag/sockopt, but it could still foul the PM to wrong decision. Add the missing accounting for the user-space PM's sake. Fixes: 8b1c94da1e48 ("mptcp: only send RM_ADDR in nl_cmd_remove") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29rtnetlink: Don't ignore IFLA_TARGET_NETNSID when ifname is specified in ↵Kuniyuki Iwashima1-1/+1
rtnl_dellink(). The cited commit accidentally replaced tgt_net with net in rtnl_dellink(). As a result, IFLA_TARGET_NETNSID is ignored if the interface is specified with IFLA_IFNAME or IFLA_ALT_IFNAME. Let's pass tgt_net to rtnl_dev_get(). Fixes: cc6090e985d7 ("net: rtnetlink: introduce helper to get net_device instance by ifname") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29net: axienet: start napi before enabling Rx/TxAndy Chiu1-1/+1
softirq may get lost if an Rx interrupt comes before we call napi_enable. Move napi_enable in front of axienet_setoptions(), which turns on the device, to address the issue. Link: https://lists.gnu.org/archive/html/qemu-devel/2024-07/msg06160.html Fixes: cc37610caaf8 ("net: axienet: implement NAPI and GRO receive") Signed-off-by: Andy Chiu <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29tcp: Adjust clamping window for applications specifying SO_RCVBUFSubash Abhinov Kasiviswanathan1-7/+16
tp->scaling_ratio is not updated based on skb->len/skb->truesize once SO_RCVBUF is set leading to the maximum window scaling to be 25% of rcvbuf after commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") and 50% of rcvbuf after commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). 50% tries to emulate the behavior of older kernels using sysctl_tcp_adv_win_scale with default value. Systems which were using a different values of sysctl_tcp_adv_win_scale in older kernels ended up seeing reduced download speeds in certain cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 While the sysctl scheme is no longer acceptable, the value of 50% is a bit conservative when the skb->len/skb->truesize ratio is later determined to be ~0.66. Applications not specifying SO_RCVBUF update the window scaling and the receiver buffer every time data is copied to userspace. This computation is now used for applications setting SO_RCVBUF to update the maximum window scaling while ensuring that the receive buffer is within the application specified limit. Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29Merge branch 'ethtool-rss-fixes' into mainDavid S. Miller3-15/+79
Jakub Kicinski says; ==================== ethtool: more RSS fixes More fixes for RSS setting. First two patches fix my own bugs in bnxt conversion to the new API. The third patch fixes what seems to be a 10 year old issue (present since the Linux RSS API was created). Fourth patch fixes an issue with the XArray state being out of sync. And then a small test. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-07-29selftests: drv-net: rss_ctx: check for all-zero keysJakub Kicinski1-3/+34
We had a handful of bugs relating to key being either all 0 or just reported incorrectly as all 0. Check for this in the tests. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-07-29ethtool: fix the state of additional contexts with old APIJakub Kicinski1-8/+30
We expect drivers implementing the new create/modify/destroy API to populate the defaults in struct ethtool_rxfh_context. In legacy API ctx isn't even passed, and rxfh.indir / rxfh.key are NULL so drivers can't give us defaults even if they want to. Call get_rxfh() to fetch the values. We can reuse rxfh_dev for the get_rxfh(), rxfh stores the input from the user. This fixes IOCTL reporting 0s instead of the default key / indir table for drivers using legacy API. Add a check to try to catch drivers using the new API but not populating the key. Fixes: 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>