aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2023-03-22Merge branch '100GbE' of ↵Jakub Kicinski4-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-03-21 (ice) This series contains updates to ice driver only. Piotr sets first_desc field for proper handling of Flow Director packets. Michal moves error checking for VF earlier in function to properly return error before other checks/reporting; he also corrects VSI filter removal to be done during VSI removal and not rebuild. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: remove filters only if VSI is deleted ice: check if VF exists before mode check ice: fix rx buffers handling for flow director packets ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-21i40e: fix flow director packet filter programmingRadoslaw Tyl1-4/+4
Initialize to zero structures to build a valid Tx Packet used for the filter programming. Fixes: a9219b332f52 ("i40e: VLAN field for flow director") Signed-off-by: Radoslaw Tyl <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-03-21iavf: fix hang on reboot with iceStefan Assmann1-0/+5
When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <[email protected]> Signed-off-by: Stefan Assmann <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-21ice: remove filters only if VSI is deletedMichal Swiatkowski2-2/+8
Filters shouldn't be removed in VSI rebuild path. Removing them on PF VSI results in no rule for PF MAC after changing for example queues amount. Remove all filters only in the VSI remove flow. As unload should also cause the filter to be removed introduce, a new function ice_stop_eth(). It will unroll ice_start_eth(), so remove filters and close VSI. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-03-21ice: check if VF exists before mode checkMichal Swiatkowski1-4/+4
Setting trust on VF should return EINVAL when there is no VF. Move checking for switchdev mode after checking if VF exists. Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration") Signed-off-by: Michal Swiatkowski <[email protected]> Signed-off-by: Kalyan Kodamagula <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-21ice: fix rx buffers handling for flow director packetsPiotr Raczynski1-0/+1
Adding flow director filters stopped working correctly after commit 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side"). As a result, only first flow director filter can be added, adding next filter leads to NULL pointer dereference attached below. Rx buffer handling and reallocation logic has been optimized, however flow director specific traffic was not accounted for. As a result driver handled those packets incorrectly since new logic was based on ice_rx_ring::first_desc which was not set in this case. Fix this by setting struct ice_rx_ring::first_desc to next_to_clean for flow director received packets. [ 438.544867] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 438.551840] #PF: supervisor read access in kernel mode [ 438.556978] #PF: error_code(0x0000) - not-present page [ 438.562115] PGD 7c953b2067 P4D 0 [ 438.565436] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 438.569794] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.2.0-net-bug #1 [ 438.577531] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [ 438.588470] RIP: 0010:ice_clean_rx_irq+0x2b9/0xf20 [ice] [ 438.593860] Code: 45 89 f7 e9 ac 00 00 00 8b 4d 78 41 31 4e 10 41 09 d5 4d 85 f6 0f 84 82 00 00 00 49 8b 4e 08 41 8b 76 1c 65 8b 3d 47 36 4a 3f <48> 8b 11 48 c1 ea 36 39 d7 0f 85 a6 00 00 00 f6 41 08 02 0f 85 9c [ 438.612605] RSP: 0018:ff8c732640003ec8 EFLAGS: 00010082 [ 438.617831] RAX: 0000000000000800 RBX: 00000000000007ff RCX: 0000000000000000 [ 438.624957] RDX: 0000000000000800 RSI: 0000000000000000 RDI: 0000000000000000 [ 438.632089] RBP: ff4ed275a2158200 R08: 00000000ffffffff R09: 0000000000000020 [ 438.639222] R10: 0000000000000000 R11: 0000000000000020 R12: 0000000000001000 [ 438.646356] R13: 0000000000000000 R14: ff4ed275d0daffe0 R15: 0000000000000000 [ 438.653485] FS: 0000000000000000(0000) GS:ff4ed2738fa00000(0000) knlGS:0000000000000000 [ 438.661563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.667310] CR2: 0000000000000000 CR3: 0000007c9f0d6006 CR4: 0000000000771ef0 [ 438.674444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.681573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.688697] PKRU: 55555554 [ 438.691404] Call Trace: [ 438.693857] <IRQ> [ 438.695877] ? profile_tick+0x17/0x80 [ 438.699542] ice_msix_clean_ctrl_vsi+0x24/0x50 [ice] [ 438.702571] ice 0000:b1:00.0: VF 1: ctrl_vsi irq timeout [ 438.704542] __handle_irq_event_percpu+0x43/0x1a0 [ 438.704549] handle_irq_event+0x34/0x70 [ 438.704554] handle_edge_irq+0x9f/0x240 [ 438.709901] iavf 0000:b1:01.1: Failed to add Flow Director filter with status: 6 [ 438.714571] __common_interrupt+0x63/0x100 [ 438.714580] common_interrupt+0xb4/0xd0 [ 438.718424] iavf 0000:b1:01.1: Rule ID: 127 dst_ip: 0.0.0.0 src_ip 0.0.0.0 UDP: dst_port 4 src_port 0 [ 438.722255] </IRQ> [ 438.722257] <TASK> [ 438.722257] asm_common_interrupt+0x22/0x40 [ 438.722262] RIP: 0010:cpuidle_enter_state+0xc8/0x430 [ 438.722267] Code: 6e e9 25 ff e8 f9 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 d7 f1 24 ff 45 84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 438.722269] RSP: 0018:ffffffff86003e50 EFLAGS: 00000246 [ 438.784108] RAX: ff4ed2738fa00000 RBX: ffbe72a64fc01020 RCX: 0000000000000000 [ 438.791234] RDX: 0000000000000000 RSI: ffffffff858d84de RDI: ffffffff85893641 [ 438.798365] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000003158af9d [ 438.805490] R10: 0000000000000008 R11: 0000000000000354 R12: ffffffff862365a0 [ 438.812622] R13: 000000661b472a87 R14: 0000000000000002 R15: 0000000000000000 [ 438.819757] cpuidle_enter+0x29/0x40 [ 438.823333] do_idle+0x1b6/0x230 [ 438.826566] cpu_startup_entry+0x19/0x20 [ 438.830492] rest_init+0xcb/0xd0 [ 438.833717] arch_call_rest_init+0xa/0x30 [ 438.837731] start_kernel+0x776/0xb70 [ 438.841396] secondary_startup_64_no_verify+0xe5/0xeb [ 438.846449] </TASK> Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Piotr Raczynski <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-03-19Merge branch '1GbE' of ↵David S. Miller4-94/+84
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-03-16 (igb, igbvf, igc) This series contains updates to igb, igbvf, and igc drivers. Lin Ma removes rtnl_lock() when disabling SRIOV on remove which was causing deadlock on igb. Akihiko Odaki delays enabling of SRIOV on igb to prevent early messages that could get ignored and clears MAC address when PF returns nack on reset; indicating no MAC address was assigned for igbvf. Gaosheng Cui frees IRQs in error path for igbvf. Akashi Takahiro fixes logic on checking TAPRIO gate support for igc. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-03-17Merge branch '40GbE' of ↵Jakub Kicinski4-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-03-16 (iavf) This series contains updates to iavf driver only. Alex fixes incorrect check against Rx hash feature and corrects payload value for IPv6 UDP packet. Ahmed removes bookkeeping of VLAN 0 filter as it always exists and can cause a false max filter error message. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: do not track VLAN 0 filters iavf: fix non-tunneled IPv6 UDP packet type and hashing iavf: fix inverted Rx hash condition leading to disabled hash ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-16ice: xsk: disable txq irq before flushing hwMaciej Fijalkowski1-3/+2
ice_qp_dis() intends to stop a given queue pair that is a target of xsk pool attach/detach. One of the steps is to disable interrupts on these queues. It currently is broken in a way that txq irq is turned off *after* HW flush which in turn takes no effect. ice_qp_dis(): -> ice_qvec_dis_irq() --> disable rxq irq --> flush hw -> ice_vsi_stop_tx_ring() -->disable txq irq Below splat can be triggered by following steps: - start xdpsock WITHOUT loading xdp prog - run xdp_rxq_info with XDP_TX action on this interface - start traffic - terminate xdpsock [ 256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 256.319560] #PF: supervisor read access in kernel mode [ 256.324775] #PF: error_code(0x0000) - not-present page [ 256.329994] PGD 0 P4D 0 [ 256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G OE 6.2.0-rc5+ #51 [ 256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice] [ 256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44 [ 256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206 [ 256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f [ 256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80 [ 256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000 [ 256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000 [ 256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600 [ 256.421990] FS: 0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000 [ 256.430207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0 [ 256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 256.457770] PKRU: 55555554 [ 256.460529] Call Trace: [ 256.463015] <TASK> [ 256.465157] ? ice_xmit_zc+0x6e/0x150 [ice] [ 256.469437] ice_napi_poll+0x46d/0x680 [ice] [ 256.473815] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 256.478863] __napi_poll+0x29/0x160 [ 256.482409] net_rx_action+0x136/0x260 [ 256.486222] __do_softirq+0xe8/0x2e5 [ 256.489853] ? smpboot_thread_fn+0x2c/0x270 [ 256.494108] run_ksoftirqd+0x2a/0x50 [ 256.497747] smpboot_thread_fn+0x1c1/0x270 [ 256.501907] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 256.506594] kthread+0xea/0x120 [ 256.509785] ? __pfx_kthread+0x10/0x10 [ 256.513597] ret_from_fork+0x29/0x50 [ 256.517238] </TASK> In fact, irqs were not disabled and napi managed to be scheduled and run while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers was already freed. To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also while at it, remove redundant ice_clean_rx_ring() call - this is handled in ice_qp_clean_rings(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Larysa Zaremba <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Acked-by: John Fastabend <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-16igc: fix the validation logic for taprio's gate listAKASHI Takahiro1-10/+10
The check introduced in the commit a5fd39464a40 ("igc: Lift TAPRIO schedule restriction") can detect a false positive error in some corner case. For instance, tc qdisc replace ... taprio num_tc 4 ... sched-entry S 0x01 100000 # slot#1 sched-entry S 0x03 100000 # slot#2 sched-entry S 0x04 100000 # slot#3 sched-entry S 0x08 200000 # slot#4 flags 0x02 # hardware offload Here the queue#0 (the first queue) is on at the slot#1 and #2, and off at the slot#3 and #4. Under the current logic, when the slot#4 is examined, validate_schedule() returns *false* since the enablement count for the queue#0 is two and it is already off at the previous slot (i.e. #3). But this definition is truely correct. Let's fix the logic to enforce a strict validation for consecutively-opened slots. Fixes: a5fd39464a40 ("igc: Lift TAPRIO schedule restriction") Signed-off-by: AKASHI Takahiro <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16igbvf: Regard vf reset nack as successAkihiko Odaki1-3/+10
vf reset nack actually represents the reset operation itself is performed but no address is assigned. Therefore, e1000_reset_hw_vf should fill the "perm_addr" with the zero address and return success on such an occasion. This prevents its callers in netdev.c from saying PF still resetting, and instead allows them to correctly report that no address is assigned. Fixes: 6ddbc4cf1f4d ("igb: Indicate failure on vf reset for empty mac address") Signed-off-by: Akihiko Odaki <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Marek Szlosek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16intel/igbvf: free irq on the error path in igbvf_request_msix()Gaosheng Cui1-2/+6
In igbvf_request_msix(), irqs have not been freed on the err path, we need to free it. Fix it. Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Gaosheng Cui <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Marek Szlosek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16igb: Enable SR-IOV after reinitAkihiko Odaki1-77/+58
Enabling SR-IOV causes the virtual functions to make requests to the PF via the mailbox. Notably, E1000_VF_RESET request will happen during the initialization of the VF. However, unless the reinit is done, the VMMB interrupt, which delivers mailbox interrupt from VF to PF will be kept masked and such requests will be silently ignored. Enable SR-IOV at the very end of the procedure to configure the device for SR-IOV so that the PF is configured properly for SR-IOV when a VF is activated. Fixes: fa44f2f185f7 ("igb: Enable SR-IOV configuration via PCI sysfs interface") Signed-off-by: Akihiko Odaki <[email protected]> Tested-by: Marek Szlosek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16igb: revert rtnl_lock() that causes deadlockLin Ma1-2/+0
The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock to eliminate a false data race shown below (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] The above race will never happen and the extra rtnl_lock causes deadlock below [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc Since the igb_disable_sriov() will call pci_disable_sriov() before releasing any resources, the netdev core will synchronize the cleanup to avoid any races. This patch removes the useless rtnl_(un)lock to guarantee correctness. CC: [email protected] Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") Reported-by: Corinna Vinschen <[email protected]> Link: https://lore.kernel.org/intel-wired-lan/[email protected]/ Signed-off-by: Lin Ma <[email protected]> Tested-by: Corinna Vinschen <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16iavf: do not track VLAN 0 filtersAhmed Zaki2-2/+8
When an interface with the maximum number of VLAN filters is brought up, a spurious error is logged: [257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3 [257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported. The VF driver complains that it cannot add the VLAN 0 filter. On the other hand, the PF driver always adds VLAN 0 filter on VF initialization. The VF does not need to ask the PF for that filter at all. Fix the error by not tracking VLAN 0 filters altogether. With that, the check added by commit 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0 filters") in iavf_virtchnl.c is useless and might be confusing if left as it suggests that we track VLAN 0. Fixes: 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0 filters") Signed-off-by: Ahmed Zaki <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16iavf: fix non-tunneled IPv6 UDP packet type and hashingAlexander Lobakin1-1/+1
Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP. Originally, i40e, ice and iavf were affected. Commit 73df8c9e3e3d ("i40e: Correct UDP packet header for non_tunnel-ipv6") fixed that in i40e, then commit 638a0c8c8861 ("ice: fix incorrect payload indicator on PTYPE") fixed that for ice. IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash type for such packets, so that the stack won't calculate it on CPU when needs it. Fixes: 206812b5fccb ("i40e/i40evf: i40e implementation for skb_set_hash") Reviewed-by: Larysa Zaremba <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-16iavf: fix inverted Rx hash condition leading to disabled hashAlexander Lobakin1-1/+1
Condition, which checks whether the netdev has hashing enabled is inverted. Basically, the tagged commit effectively disabled passing flow hash from descriptor to skb, unless user *disables* it via Ethtool. Commit a876c3ba59a6 ("i40e/i40evf: properly report Rx packet hash") fixed this problem, but only for i40e. Invert the condition now in iavf and unblock passing hash to skbs again. Fixes: 857942fd1aa1 ("i40e: Fix Rx hash reported to the stack by our driver") Reviewed-by: Larysa Zaremba <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-15ice: avoid bonding causing auxiliary plug/unplug under RTNL lockDave Ertman2-20/+13
RDMA is not supported in ice on a PF that has been added to a bonded interface. To enforce this, when an interface enters a bond, we unplug the auxiliary device that supports RDMA functionality. This unplug currently happens in the context of handling the netdev bonding event. This event is sent to the ice driver under RTNL context. This is causing a deadlock where the RDMA driver is waiting for the RTNL lock to complete the removal. Defer the unplugging/re-plugging of the auxiliary device to the service task so that it is not performed under the RTNL lock context. Cc: [email protected] # 6.1.x Reported-by: Jaroslav Pulchart <[email protected]> Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi7vb621A@mail.gmail.com/ Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") Signed-off-by: Dave Ertman <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-10i40e: Fix kernel crash during reboot when adapter is in recovery modeIvan Vecera1-0/+1
If the driver detects during probe that firmware is in recovery mode then i40e_init_recovery_mode() is called and the rest of probe function is skipped including pci_set_drvdata(). Subsequent i40e_shutdown() called during shutdown/reboot dereferences NULL pointer as pci_get_drvdata() returns NULL. To fix call pci_set_drvdata() also during entering to recovery mode. Reproducer: 1) Lets have i40e NIC with firmware in recovery mode 2) Run reboot Result: [ 139.084698] i40e: Intel(R) Ethernet Connection XL710 Network Driver [ 139.090959] i40e: Copyright (c) 2013 - 2019 Intel Corporation. [ 139.108438] i40e 0000:02:00.0: Firmware recovery mode detected. Limiting functionality. [ 139.116439] i40e 0000:02:00.0: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode. [ 139.129499] i40e 0000:02:00.0: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a] [ 139.215932] i40e 0000:02:00.0 enp2s0f0: renamed from eth0 [ 139.223292] i40e 0000:02:00.1: Firmware recovery mode detected. Limiting functionality. [ 139.231292] i40e 0000:02:00.1: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode. [ 139.244406] i40e 0000:02:00.1: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a] [ 139.329209] i40e 0000:02:00.1 enp2s0f1: renamed from eth0 ... [ 156.311376] BUG: kernel NULL pointer dereference, address: 00000000000006c2 [ 156.318330] #PF: supervisor write access in kernel mode [ 156.323546] #PF: error_code(0x0002) - not-present page [ 156.328679] PGD 0 P4D 0 [ 156.331210] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 156.335567] CPU: 26 PID: 15119 Comm: reboot Tainted: G E 6.2.0+ #1 [ 156.343126] Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 156.353369] RIP: 0010:i40e_shutdown+0x15/0x130 [i40e] [ 156.358430] Code: c1 fc ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 fd 53 48 8b 9f 48 01 00 00 <f0> 80 8b c2 06 00 00 04 f0 80 8b c0 06 00 00 08 48 8d bb 08 08 00 [ 156.377168] RSP: 0018:ffffb223c8447d90 EFLAGS: 00010282 [ 156.382384] RAX: ffffffffc073ee70 RBX: 0000000000000000 RCX: 0000000000000001 [ 156.389510] RDX: 0000000080000001 RSI: 0000000000000246 RDI: ffff95db49988000 [ 156.396634] RBP: ffff95db49988000 R08: ffffffffffffffff R09: ffffffff8bd17d40 [ 156.403759] R10: 0000000000000001 R11: ffffffff8a5e3d28 R12: ffff95db49988000 [ 156.410882] R13: ffffffff89a6fe17 R14: ffff95db49988150 R15: 0000000000000000 [ 156.418007] FS: 00007fe7c0cc3980(0000) GS:ffff95ea8ee80000(0000) knlGS:0000000000000000 [ 156.426083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.431819] CR2: 00000000000006c2 CR3: 00000003092fc005 CR4: 0000000000770ee0 [ 156.438944] PKRU: 55555554 [ 156.441647] Call Trace: [ 156.444096] <TASK> [ 156.446199] pci_device_shutdown+0x38/0x60 [ 156.450297] device_shutdown+0x163/0x210 [ 156.454215] kernel_restart+0x12/0x70 [ 156.457872] __do_sys_reboot+0x1ab/0x230 [ 156.461789] ? vfs_writev+0xa6/0x1a0 [ 156.465362] ? __pfx_file_free_rcu+0x10/0x10 [ 156.469635] ? __call_rcu_common.constprop.85+0x109/0x5a0 [ 156.475034] do_syscall_64+0x3e/0x90 [ 156.478611] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 156.483658] RIP: 0033:0x7fe7bff37ab7 Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support") Signed-off-by: Ivan Vecera <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-07ethernet: ice: avoid gcc-9 integer overflow warningArnd Bergmann1-4/+4
With older compilers like gcc-9, the calculation of the vlan priority field causes a false-positive warning from the byteswap: In file included from drivers/net/ethernet/intel/ice/ice_tc_lib.c:4: drivers/net/ethernet/intel/ice/ice_tc_lib.c: In function 'ice_parse_cls_flower': include/uapi/linux/swab.h:15:15: error: integer overflow in expression '(int)(short unsigned int)((int)match.key-><U67c8>.<U6698>.vlan_priority << 13) & 57344 & 255' of type 'int' results in '0' [-Werror=overflow] 15 | (((__u16)(x) & (__u16)0x00ffU) << 8) | \ | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~ include/uapi/linux/swab.h:106:2: note: in expansion of macro '___constant_swab16' 106 | ___constant_swab16(x) : \ | ^~~~~~~~~~~~~~~~~~ include/uapi/linux/byteorder/little_endian.h:42:43: note: in expansion of macro '__swab16' 42 | #define __cpu_to_be16(x) ((__force __be16)__swab16((x))) | ^~~~~~~~ include/linux/byteorder/generic.h:96:21: note: in expansion of macro '__cpu_to_be16' 96 | #define cpu_to_be16 __cpu_to_be16 | ^~~~~~~~~~~~~ drivers/net/ethernet/intel/ice/ice_tc_lib.c:1458:5: note: in expansion of macro 'cpu_to_be16' 1458 | cpu_to_be16((match.key->vlan_priority << | ^~~~~~~~~~~ After a change to be16_encode_bits(), the code becomes more readable to both people and compilers, which avoids the warning. Fixes: 34800178b302 ("ice: Add support for VLAN priority filters in switchdev") Suggested-by: Alexander Lobakin <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-03-07ice: don't ignore return codes in VSI related codeMichal Swiatkowski1-7/+10
There were few smatch warnings reported by Dan: - ice_vsi_cfg_xdp_txqs can return 0 instead of ret, which is cleaner - return values in ice_vsi_cfg_def were ignored - in ice_vsi_rebuild return value was ignored in case rebuild failed, it was a never reached code, however, rewrite it for clarity. - ice_vsi_cfg_tc can return 0 instead of ret Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-03-07ice: Fix DSCP PFC TLV creationDave Ertman1-1/+1
When creating the TLV to send to the FW for configuring DSCP mode PFC,the PFCENABLE field was being masked with a 4 bit mask (0xF), but this is an 8 bit bitmask for enabled classes for PFC. This means that traffic classes 4-7 could not be enabled for PFC. Remove the mask completely, as it is not necessary, as we are assigning 8 bits to an 8 bit field. Fixes: 2a87bd73e50d ("ice: Add DSCP support") Signed-off-by: Dave Ertman <[email protected]> Signed-off-by: Karen Ostrowska <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-03-03ice: copy last block omitted in ice_get_module_eeprom()Petr Oros1-2/+4
ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") In this refactor, ice_get_module_eeprom() reads the eeprom in blocks of size 8. But the condition that should protect the buffer overflow ignores the last block. The last block always contains zeros. Bug uncovered by ethtool upstream commit 9538f384b535 ("netlink: eeprom: Defer page requests to individual parsers") After this commit, ethtool reads a block with length = 1; to read the SFF-8024 identifier value. unpatched driver: $ ethtool -m enp65s0f0np0 offset 0x90 length 8 Offset Values ------ ------ 0x0090: 00 00 00 00 00 00 00 00 $ ethtool -m enp65s0f0np0 offset 0x90 length 12 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c 00 00 00 00 $ $ ethtool -m enp65s0f0np0 Offset Values ------ ------ 0x0000: 11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00 0x0070: 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 patched driver: $ ethtool -m enp65s0f0np0 offset 0x90 length 8 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c $ ethtool -m enp65s0f0np0 offset 0x90 length 12 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c 61 6e 6f 78 $ ethtool -m enp65s0f0np0 Identifier : 0x11 (QSFP28) Extended identifier : 0x00 Extended identifier description : 1.5W max. Power consumption Extended identifier description : No CDR in TX, No CDR in RX Extended identifier description : High Power Class (> 3.5 W) not enabled Connector : 0x23 (No separable connector) Transceiver codes : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 40G Ethernet: 40G Base-CR4 Transceiver type : 25G Ethernet: 25G Base-CR CA-N Encoding : 0x05 (64B/66B) BR, Nominal : 25500Mbps Rate identifier : 0x00 Length (SMF,km) : 0km Length (OM3 50um) : 0m Length (OM2 50um) : 0m Length (OM1 62.5um) : 0m Length (Copper or Active cable) : 1m Transmitter technology : 0xa0 (Copper cable unequalized) Attenuation at 2.5GHz : 4db Attenuation at 5.0GHz : 5db Attenuation at 7.0GHz : 7db Attenuation at 12.9GHz : 10db ........ .... Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool") Signed-off-by: Petr Oros <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Tested-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-02-26ice: remove unnecessary CONFIG_ICE_GNSSJacob Keller3-6/+4
CONFIG_ICE_GNSS was added by commit c7ef8221ca7d ("ice: use GNSS subsystem instead of TTY") as a way to allow the ice driver to optionally support GNSS features without forcing a dependency on CONFIG_GNSS. The original implementation of that commit at [1] used IS_REACHABLE. This was rejected by Olek at [2] with the suggested implementation of CONFIG_ICE_GNSS. Eventually after merging, Linus reported a .config which had CONFIG_ICE_GNSS = y when both GNSS = n and ICE = n. This confused him and he felt that the config option was not useful, and commented about it at [3]. CONFIG_ICE_GNSS is defined to y whenever GNSS = ICE. This results in it being set in cases where both options are not enabled. The goal of CONFIG_ICE_GNSS is to ensure that the GNSS support in the ice driver is enabled when GNSS is enabled. The complaint from Olek about the original IS_REACHABLE was due to the required IS_REACHABLE checks throughout the ice driver code and the fact that ice_gnss.c was compiled regardless of GNSS support. This can be fixed in the Makefile by using ice-$(CONFIG_GNSS) += ice_gnss.o In this case, if GNSS = m and ICE = y, we can result in some confusing behavior where GNSS support is not enabled because its not built in. See [4]. To disallow this, have CONFIG_ICE depend on GNSS || GNSS = n. This ensures that we cannot enable CONFIG_ICE as builtin while GNSS is a module. Drop CONFIG_ICE_GNSS, and replace the IS_ENABLED checks for it with checks for GNSS. Update the Makefile to add the ice_gnss.o object based on CONFIG_GNSS. This works to ensure that GNSS support can optionally be enabled, doesn't have an unnnecessary extra config option, and has Kbuild enforce the dependency such that you can't accidentally enable GNSS as a module and ICE as a builtin. [1] https://lore.kernel.org/intel-wired-lan/[email protected]/ [2] https://lore.kernel.org/intel-wired-lan/[email protected]/ [3] https://lore.kernel.org/all/CAHk-=wi_410KZqHwF-WL5U7QYxnpHHHNP-3xL=g_y89XnKc-uw@mail.gmail.com/ [4] https://lore.kernel.org/netdev/[email protected]/ Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Fixes: c7ef8221ca7d ("ice: use GNSS subsystem instead of TTY") Cc: Arkadiusz Kubalewski <[email protected]> Cc: Alexander Lobakin <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Anthony Nguyen <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-02-20Merge tag 'for-netdev' of ↵Jakub Kicinski5-91/+140
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20ice: properly alloc ICE_VSI_LBMichal Swiatkowski1-0/+1
Devlink reload patchset introduced regression. ICE_VSI_LB wasn't taken into account when doing default allocation. Fix it by adding a case for ICE_VSI_LB in ice_vsi_alloc_def(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Maciej Fijalkowski <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-02-17Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller7-43/+119
Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <[email protected]>
2023-02-15Merge branch '100GbE' of ↵Jakub Kicinski8-11/+109
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-14 (ice) This series contains updates to ice driver only. Karol extends support for GPIO pins to E823 devices. Daniel Vacek stops processing of PTP packets when link is down. Pawel adds support for BIG TCP for IPv6. Tony changes return type of ice_vsi_realloc_stat_arrays() as it always returns success. Zhu Yanjun updates kdoc stating supported TLVs. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Mention CEE DCBX in code comment ice: Change ice_vsi_realloc_stat_arrays() to void ice: add support BIG TCP on IPv6 ice/ptp: fix the PTP worker retrying indefinitely if the link went down ice: Add GPIO pin support for E823 products ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15igb: conditionalize I2C bit banging on external thermal sensor supportCorinna Vinschen1-10/+32
Commit a97f8783a937 ("igb: unbreak I2C bit-banging on i350") introduced code to change I2C settings to bit banging unconditionally. However, this patch introduced a regression: On an Intel S2600CWR Server Board with three NICs: - 1x dual-port copper Intel I350 Gigabit Network Connection [8086:1521] (rev 01) fw 1.63, 0x80000dda - 2x quad-port SFP+ with copper SFP Avago ABCU-5700RZ Intel I350 Gigabit Fiber Network Connection [8086:1522] (rev 01) fw 1.52.0 the SFP NICs no longer get link at all. Reverting commit a97f8783a937 or switching to the Intel out-of-tree driver both fix the problem. Per the igb out-of-tree driver, I2C bit banging on i350 depends on support for an external thermal sensor (ETS). However, commit a97f8783a937 added bit banging unconditionally. Additionally, the out-of-tree driver always calls init_thermal_sensor_thresh on probe, while our driver only calls init_thermal_sensor_thresh only in igb_reset(), and only if an ETS is present, ignoring the internal thermal sensor. The affected SFPs don't provide an ETS. Per Intel, the behaviour is a result of i350 firmware requirements. This patch fixes the problem by aligning the behaviour to the out-of-tree driver: - split igb_init_i2c() into two functions: - igb_init_i2c() only performs the basic I2C initialization. - igb_set_i2c_bb() makes sure that E1000_CTRL_I2C_ENA is set and enables bit-banging. - igb_probe() only calls igb_set_i2c_bb() if an ETS is present. - igb_probe() calls init_thermal_sensor_thresh() unconditionally. - igb_reset() aligns its behaviour to igb_probe(), i. e., call igb_set_i2c_bb() if an ETS is present and call init_thermal_sensor_thresh() unconditionally. Fixes: a97f8783a937 ("igb: unbreak I2C bit-banging on i350") Tested-by: Mateusz Palczewski <[email protected]> Co-developed-by: Jamie Bainbridge <[email protected]> Signed-off-by: Jamie Bainbridge <[email protected]> Signed-off-by: Corinna Vinschen <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15Merge branch '10GbE' of ↵Jakub Kicinski3-12/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-14 (ixgbe, i40e) This series contains updates to ixgbe and i40e drivers. Jason Xing corrects comparison of frame sizes for setting MTU with XDP on ixgbe and adjusts frame size to account for a second VLAN header on ixgbe and i40e. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: add double of VLAN header when computing the max MTU i40e: add double of VLAN header when computing the max MTU ixgbe: allow to increase MTU to 3K with XDP enabled ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15ice: update xdp_features with xdp multi-buffLorenzo Bianconi1-6/+12
Now ice driver supports xdp multi-buffer so add it to xdp_features. Check vsi type before setting xdp_features flag. Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/8a4781511ab6e3cd280e944eef69158954f1a15f.1676385351.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15i40e: check vsi type before setting xdp_features flagLorenzo Bianconi1-2/+4
Set xdp_features flag just for I40E_VSI_MAIN vsi type since XDP is supported just in this configuration. Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/f2b537f86b34fc176fbc6b3d249b46a20a87a2f3.1676405131.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-14igb: Fix PPS input and output using 3rd and 4th SDPMiroslav Lichvar1-6/+6
Fix handling of the tsync interrupt to compare the pin number with IGB_N_SDP instead of IGB_N_EXTTS/IGB_N_PEROUT and fix the indexing to the perout array. Fixes: cf99c1dd7b77 ("igb: move PEROUT and EXTTS isr logic to separate functions") Reported-by: Matt Corallo <[email protected]> Signed-off-by: Miroslav Lichvar <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-14ice: Mention CEE DCBX in code commentZhu Yanjun1-2/+2
From the function ice_parse_org_tlv, CEE DCBX TLV is also supported. So update the comment. Or else, it is confusing. Signed-off-by: Zhu Yanjun <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14ice: Change ice_vsi_realloc_stat_arrays() to voidTony Nguyen1-7/+4
smatch reports: smatch warnings: drivers/net/ethernet/intel/ice/ice_lib.c:3612 ice_vsi_rebuild() warn: missing error code 'ret' If an error is encountered for ice_vsi_realloc_stat_arrays(), ret is not assigned an error value so the goto error path would return success. The function, however, only returns 0 so an error will never be reported; due to this, change the function to return void. Reported-by: kernel test robot <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
2023-02-14ice: add support BIG TCP on IPv6Pawel Chmielewski3-0/+7
Enable sending BIG TCP packets on IPv6 in the ice driver using generic ipv6_hopopt_jumbo_remove helper for stripping HBH header. Tested: netperf -t TCP_RR -H 2001:db8:0:f101::1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,TRANSACTION_RATE Tested on two different setups. In both cases, the following settings were applied after loading the changed driver: ip link set dev enp175s0f1np1 gso_max_size 130000 ip link set dev enp175s0f1np1 gro_max_size 130000 ip link set dev enp175s0f1np1 mtu 9000 First setup: Before: Minimum 90th 99th Transaction Latency Percentile Percentile Rate Microseconds Latency Latency Tran/s Microseconds Microseconds 134 279 410 3961.584 After: Minimum 90th 99th Transaction Latency Percentile Percentile Rate Microseconds Latency Latency Tran/s Microseconds Microseconds 135 178 216 6093.404 The other setup: Before: Minimum 90th 99th Transaction Latency Percentile Percentile Rate Microseconds Latency Latency Tran/s Microseconds Microseconds 218 414 478 2944.765 After: Minimum 90th 99th Transaction Latency Percentile Percentile Rate Microseconds Latency Latency Tran/s Microseconds Microseconds 146 238 266 4700.596 Signed-off-by: Pawel Chmielewski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14ice/ptp: fix the PTP worker retrying indefinitely if the link went downDaniel Vacek1-2/+6
When the link goes down the ice_ptp_tx_tstamp() may loop re-trying to process the packets till the 2 seconds timeout finally drops them. In such a case it makes sense to just drop them right away. Signed-off-by: Daniel Vacek <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14ice: Add GPIO pin support for E823 productsKarol Kolacinski3-0/+90
Add GPIO pin setup for E823, which is only 1PPS input and output. Signed-off-by: Karol Kolacinski <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14ixgbe: add double of VLAN header when computing the max MTUJason Xing2-2/+3
Include the second VLAN HLEN into account when computing the maximum MTU size as other drivers do. Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP") Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14i40e: add double of VLAN header when computing the max MTUJason Xing1-1/+1
Include the second VLAN HLEN into account when computing the maximum MTU size as other drivers do. Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions") Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-14ixgbe: allow to increase MTU to 3K with XDP enabledJason Xing1-9/+16
Recently I encountered one case where I cannot increase the MTU size directly from 1500 to a much bigger value with XDP enabled if the server is equipped with IXGBE card, which happened on thousands of servers in production environment. After applying the current patch, we can set the maximum MTU size to 3K. This patch follows the behavior of changing MTU as i40e/ice does. References: [1] commit 23b44513c3e6 ("ice: allow 3k MTU for XDP") [2] commit 0c8493d90b6b ("i40e: add XDP support for pass and drop actions") Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP") Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-13ice: fix lost multicast packets in promisc modeJesse Brandeburg1-0/+26
There was a problem reported to us where the addition of a VF with an IPv6 address ending with a particular sequence would cause the parent device on the PF to no longer be able to respond to neighbor discovery packets. In this case, we had an ovs-bridge device living on top of a VLAN, which was on top of a PF, and it would not be able to talk anymore (the neighbor entry would expire and couldn't be restored). The root cause of the issue is that if the PF is asked to be in IFF_PROMISC mode (promiscuous mode) and it had an ipv6 address that needed the 33:33:ff:00:00:04 multicast address to work, then when the VF was added with the need for the same multicast address, the VF would steal all the traffic destined for that address. The ice driver didn't auto-subscribe a request of IFF_PROMISC to the "multicast replication from other port's traffic" meaning that it won't get for instance, packets with an exact destination in the VF, as above. The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04, results in no packets for that multicast address making it to the PF (which is in promisc but NOT "multicast replication"). The fix is to enable "multicast promiscuous" whenever the driver is asked to enable IFF_PROMISC, and make sure to disable it when appropriate. Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-02-13ice: Micro-optimize .ndo_xdp_xmit() pathAlexander Lobakin3-15/+20
After the recent mbuf changes, ice_xmit_xdp_ring() became a 3-liner. It makes no sense to keep it global in a different file than its caller. Move it just next to the sole call site and mark static. Also, it doesn't need a full xdp_convert_frame_to_buff(). Save several cycles and fill only the fields used by __ice_xmit_xdp_ring() later on. Finally, since it doesn't modify @xdpf anyhow, mark the argument const to save some more (whole -11 bytes of .text! :D). Thanks to 1 jump less and less calcs as well, this yields as many as 6.7 Mpps per queue. `xdp.data_hard_start = xdpf` is fully intentional again (see xdp_convert_buff_to_frame()) and just works when there are no source device's driver issues. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: Fix freeing XDP frames backed by Page PoolAlexander Lobakin4-11/+43
As already mentioned, freeing any &xdp_frame via page_frag_free() is wrong, as it assumes the frame is backed by either an order-0 page or a page with no "patrons" behind them, while in fact frames backed by Page Pool can be redirected to a device, which's driver doesn't use it. Keep storing a pointer to the raw buffer and then freeing it unconditionally via page_frag_free() for %XDP_TX frames, but introduce a separate type in the enum for frames coming through .ndo_xdp_xmit(), and free them via xdp_return_frame_bulk(). Note that saving xdpf as xdp_buff->data_hard_start is intentional and is always true when everything is configured properly. After this change, %XDP_REDIRECT from a Page Pool based driver to ice becomes zero-alloc as it should be and horrendous 3.3 Mpps / queue turn into 6.6, hehe. Let it go with no "Fixes:" tag as it spans across good 5+ commits and can't be trivially backported. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: Robustify cleaning/completing XDP Tx buffersAlexander Lobakin4-36/+63
When queueing frames from a Page Pool for redirecting to a device backed by the ice driver, `perf top` shows heavy load on page_alloc() and page_frag_free(), despite that on a properly working system it must be fully or at least almost zero-alloc. The problem is in fact a bit deeper and raises from how ice cleans up completed Tx buffers. The story so far: when cleaning/freeing the resources related to a particular completed Tx frame (skbs, DMA mappings etc.), ice uses some heuristics only without setting any type explicitly (except for dummy Flow Director packets, which are marked via ice_tx_buf::tx_flags). This kinda works, but only up to some point. For example, currently ice assumes that each frame coming to __ice_xmit_xdp_ring(), is backed by either plain order-0 page or plain page frag, while it may also be backed by Page Pool or any other possible memory models introduced in future. This means any &xdp_frame must be freed properly via xdp_return_frame() family with no assumptions. In order to do that, the whole heuristics must be replaced with setting the Tx buffer/frame type explicitly, just how it's always been done via an enum. Let us reuse 16 bits from ::tx_flags -- 1 bit-and instr won't hurt much -- especially given that sometimes there was a check for %ICE_TX_FLAGS_DUMMY_PKT, which is now turned from a flag to an enum member. The rest of the changes is straightforward and most of it is just a conversion to rely now on the type set in &ice_tx_buf rather than to some secondary properties. For now, no functional changes intended, the change only prepares the ground for starting freeing XDP frames properly next step. And it must be done atomically/synchronously to not break stuff. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: Remove two impossible branches on XDP Tx cleaningAlexander Lobakin2-8/+2
The tagged commit started sending %XDP_TX frames from XSk Rx ring directly without converting it to an &xdp_frame. However, when XSk is enabled on a queue pair, it has its separate Tx cleaning functions, so neither ice_clean_xdp_irq() nor ice_unmap_and_free_tx_buf() ever happens there. Remove impossible branches in order to reduce the diffstat of the upcoming change. Fixes: a24b4c6e9aab ("ice: xsk: Do not convert to buff to frame for XDP_TX") Suggested-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: Fix XDP Tx ring overrunAlexander Lobakin1-7/+12
Sometimes, under heavy XDP Tx traffic, e.g. when using XDP traffic generator (%BPF_F_TEST_XDP_LIVE_FRAMES), the machine can catch OOM due to the driver not freeing all of the pages passed to it by .ndo_xdp_xmit(). Turned out that during the development of the tagged commit, the check, which ensures that we have a free descriptor to queue a frame, moved into the branch happening only when a buffer has frags. Otherwise, we only run a cleaning cycle, but don't check anything. ATST, there can be situations when the driver gets new frames to send, but there are no buffers that can be cleaned/completed and the ring has no free slots. It's very rare, but still possible (> 6.5 Mpps per ring). The driver then fills the next buffer/descriptor, effectively overwriting the data, which still needs to be freed. Restore the check after the cleaning routine to make sure there is a slot to queue a new frame. When there are frags, there still will be a separate check that we can place all of them, but if the ring is full, there's no point in wasting any more time. (minor: make `!ready_frames` unlikely since it happens ~1-2 times per billion of frames) Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side") Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: fix ice_tx_ring:: Xdp_tx_active underflowAlexander Lobakin1-2/+3
xdp_tx_active is used to indicate whether an XDP ring has any %XDP_TX frames queued to shortcut processing Tx cleaning for XSk-enabled queues. When !XSk, it simply indicates whether the ring has any queued frames in general. It gets increased on each frame placed onto the ring and counts the whole frame, not each frag. However, currently it gets decremented in ice_clean_xdp_tx_buf(), which is called per each buffer, i.e. per each frag. Thus, on completing multi-frag frames, an underflow happens. Move the decrement to the outer function and do it once per frame, not buf. Also, do that on the stack and update the ring counter after the loop is done to save several cycles. XSk rings are fine since there are no frags at the moment. Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side") Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-13ice: Fix check for weight and priority of a scheduling nodeMichal Wilczynski1-2/+2
Currently checks for weight and priority ranges don't check incoming value from the devlink. Instead it checks node current weight or priority. This makes those checks useless. Change range checks in ice_set_object_tx_priority() and ice_set_object_tx_weight() to check against incoming priority an weight. Fixes: 42c2eb6b1f43 ("ice: Implement devlink-rate API") Signed-off-by: Michal Wilczynski <[email protected]> Acked-by: Jesse Brandeburg <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-02-10i40e: Add checking for null for nlmsg_find_attr()Natalia Petrova1-0/+2
The result of nlmsg_find_attr() 'br_spec' is dereferenced in nla_for_each_nested(), but it can take NULL value in nla_find() function, which will result in an error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Signed-off-by: Natalia Petrova <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>