Age | Commit message (Collapse) | Author | Files | Lines |
|
ice_qp_dis() intends to stop a given queue pair that is a target of xsk
pool attach/detach. One of the steps is to disable interrupts on these
queues. It currently is broken in a way that txq irq is turned off
*after* HW flush which in turn takes no effect.
ice_qp_dis():
-> ice_qvec_dis_irq()
--> disable rxq irq
--> flush hw
-> ice_vsi_stop_tx_ring()
-->disable txq irq
Below splat can be triggered by following steps:
- start xdpsock WITHOUT loading xdp prog
- run xdp_rxq_info with XDP_TX action on this interface
- start traffic
- terminate xdpsock
[ 256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 256.319560] #PF: supervisor read access in kernel mode
[ 256.324775] #PF: error_code(0x0000) - not-present page
[ 256.329994] PGD 0 P4D 0
[ 256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G OE 6.2.0-rc5+ #51
[ 256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice]
[ 256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44
[ 256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206
[ 256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f
[ 256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80
[ 256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000
[ 256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000
[ 256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600
[ 256.421990] FS: 0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
[ 256.430207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0
[ 256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 256.457770] PKRU: 55555554
[ 256.460529] Call Trace:
[ 256.463015] <TASK>
[ 256.465157] ? ice_xmit_zc+0x6e/0x150 [ice]
[ 256.469437] ice_napi_poll+0x46d/0x680 [ice]
[ 256.473815] ? _raw_spin_unlock_irqrestore+0x1b/0x40
[ 256.478863] __napi_poll+0x29/0x160
[ 256.482409] net_rx_action+0x136/0x260
[ 256.486222] __do_softirq+0xe8/0x2e5
[ 256.489853] ? smpboot_thread_fn+0x2c/0x270
[ 256.494108] run_ksoftirqd+0x2a/0x50
[ 256.497747] smpboot_thread_fn+0x1c1/0x270
[ 256.501907] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 256.506594] kthread+0xea/0x120
[ 256.509785] ? __pfx_kthread+0x10/0x10
[ 256.513597] ret_from_fork+0x29/0x50
[ 256.517238] </TASK>
In fact, irqs were not disabled and napi managed to be scheduled and run
while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers
was already freed.
To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also
while at it, remove redundant ice_clean_rx_ring() call - this is handled
in ice_qp_clean_rings().
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The check introduced in the commit a5fd39464a40 ("igc: Lift TAPRIO schedule
restriction") can detect a false positive error in some corner case.
For instance,
tc qdisc replace ... taprio num_tc 4
...
sched-entry S 0x01 100000 # slot#1
sched-entry S 0x03 100000 # slot#2
sched-entry S 0x04 100000 # slot#3
sched-entry S 0x08 200000 # slot#4
flags 0x02 # hardware offload
Here the queue#0 (the first queue) is on at the slot#1 and #2,
and off at the slot#3 and #4. Under the current logic, when the slot#4
is examined, validate_schedule() returns *false* since the enablement
count for the queue#0 is two and it is already off at the previous slot
(i.e. #3). But this definition is truely correct.
Let's fix the logic to enforce a strict validation for consecutively-opened
slots.
Fixes: a5fd39464a40 ("igc: Lift TAPRIO schedule restriction")
Signed-off-by: AKASHI Takahiro <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
vf reset nack actually represents the reset operation itself is
performed but no address is assigned. Therefore, e1000_reset_hw_vf
should fill the "perm_addr" with the zero address and return success on
such an occasion. This prevents its callers in netdev.c from saying PF
still resetting, and instead allows them to correctly report that no
address is assigned.
Fixes: 6ddbc4cf1f4d ("igb: Indicate failure on vf reset for empty mac address")
Signed-off-by: Akihiko Odaki <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In igbvf_request_msix(), irqs have not been freed on the err path,
we need to free it. Fix it.
Fixes: d4e0fe01a38a ("igbvf: add new driver to support 82576 virtual functions")
Signed-off-by: Gaosheng Cui <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Enabling SR-IOV causes the virtual functions to make requests to the
PF via the mailbox. Notably, E1000_VF_RESET request will happen during
the initialization of the VF. However, unless the reinit is done, the
VMMB interrupt, which delivers mailbox interrupt from VF to PF will be
kept masked and such requests will be silently ignored.
Enable SR-IOV at the very end of the procedure to configure the device
for SR-IOV so that the PF is configured properly for SR-IOV when a VF is
activated.
Fixes: fa44f2f185f7 ("igb: Enable SR-IOV configuration via PCI sysfs interface")
Signed-off-by: Akihiko Odaki <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below
(FREE from device detaching) | (USE from netdev core)
igb_remove | igb_ndo_get_vf_config
igb_disable_sriov | vf >= adapter->vfs_allocated_count?
kfree(adapter->vf_data) |
adapter->vfs_allocated_count = 0 |
| memcpy(... adapter->vf_data[vf]
The above race will never happen and the extra rtnl_lock causes deadlock
below
[ 141.420169] <TASK>
[ 141.420672] __schedule+0x2dd/0x840
[ 141.421427] schedule+0x50/0xc0
[ 141.422041] schedule_preempt_disabled+0x11/0x20
[ 141.422678] __mutex_lock.isra.13+0x431/0x6b0
[ 141.423324] unregister_netdev+0xe/0x20
[ 141.423578] igbvf_remove+0x45/0xe0 [igbvf]
[ 141.423791] pci_device_remove+0x36/0xb0
[ 141.423990] device_release_driver_internal+0xc1/0x160
[ 141.424270] pci_stop_bus_device+0x6d/0x90
[ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20
[ 141.424789] pci_iov_remove_virtfn+0xba/0x120
[ 141.425452] sriov_disable+0x2f/0xf0
[ 141.425679] igb_disable_sriov+0x4e/0x100 [igb]
[ 141.426353] igb_remove+0xa0/0x130 [igb]
[ 141.426599] pci_device_remove+0x36/0xb0
[ 141.426796] device_release_driver_internal+0xc1/0x160
[ 141.427060] driver_detach+0x44/0x90
[ 141.427253] bus_remove_driver+0x55/0xe0
[ 141.427477] pci_unregister_driver+0x2a/0xa0
[ 141.428296] __x64_sys_delete_module+0x141/0x2b0
[ 141.429126] ? mntput_no_expire+0x4a/0x240
[ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0
[ 141.429653] do_syscall_64+0x5b/0x80
[ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0
[ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30
[ 141.430849] ? do_syscall_64+0x67/0x80
[ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30
[ 141.432482] ? do_syscall_64+0x67/0x80
[ 141.432714] ? exc_page_fault+0x64/0x140
[ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.
CC: [email protected]
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna Vinschen <[email protected]>
Link: https://lore.kernel.org/intel-wired-lan/[email protected]/
Signed-off-by: Lin Ma <[email protected]>
Tested-by: Corinna Vinschen <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When an interface with the maximum number of VLAN filters is brought up,
a spurious error is logged:
[257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3
[257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported.
The VF driver complains that it cannot add the VLAN 0 filter.
On the other hand, the PF driver always adds VLAN 0 filter on VF
initialization. The VF does not need to ask the PF for that filter at
all.
Fix the error by not tracking VLAN 0 filters altogether. With that, the
check added by commit 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0
filters") in iavf_virtchnl.c is useless and might be confusing if left as
it suggests that we track VLAN 0.
Fixes: 0e710a3ffd0c ("iavf: Fix VF driver counting VLAN 0 filters")
Signed-off-by: Ahmed Zaki <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type
of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP.
Originally, i40e, ice and iavf were affected.
Commit 73df8c9e3e3d ("i40e: Correct UDP packet header for non_tunnel-ipv6")
fixed that in i40e, then
commit 638a0c8c8861 ("ice: fix incorrect payload indicator on PTYPE")
fixed that for ice.
IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash
type for such packets, so that the stack won't calculate it on CPU when
needs it.
Fixes: 206812b5fccb ("i40e/i40evf: i40e implementation for skb_set_hash")
Reviewed-by: Larysa Zaremba <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Condition, which checks whether the netdev has hashing enabled is
inverted. Basically, the tagged commit effectively disabled passing flow
hash from descriptor to skb, unless user *disables* it via Ethtool.
Commit a876c3ba59a6 ("i40e/i40evf: properly report Rx packet hash")
fixed this problem, but only for i40e.
Invert the condition now in iavf and unblock passing hash to skbs again.
Fixes: 857942fd1aa1 ("i40e: Fix Rx hash reported to the stack by our driver")
Reviewed-by: Larysa Zaremba <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: refactor mailbox overflow detection
Jake Keller says:
The primary motivation of this series is to cleanup and refactor the mailbox
overflow detection logic such that it will work with Scalable IOV. In
addition a few other minor cleanups are done while I was working on the
code in the area.
First, the mailbox overflow functions in ice_vf_mbx.c are refactored to
store the data per-VF as an embedded structure in struct ice_vf, rather than
stored separately as a fixed-size array which only works with Single Root
IOV. This reduces the overall memory footprint when only a handful of VFs
are used.
The overflow detection functions are also cleaned up to reduce the need for
multiple separate calls to determine when to report a VF as potentially
malicious.
Finally, the ice_is_malicious_vf function is cleaned up and moved into
ice_virtchnl.c since it is not Single Root IOV specific, and thus does not
belong in ice_sriov.c
I could probably have done this in fewer patches, but I split pieces out to
hopefully aid in reviewing the overall sequence of changes. This does cause
some additional thrash as it results in intermediate versions of the
refactor, but I think its worth it for making each step easier to
understand.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg()
ice: move ice_is_malicious_vf() to ice_virtchnl.c
ice: print message if ice_mbx_vf_state_handler returns an error
ice: pass mbxdata to ice_is_malicious_vf()
ice: remove unnecessary &array[0] and just use array
ice: always report VF overflowing mailbox even without PF VSI
ice: declare ice_vc_process_vf_msg in ice_virtchnl.h
ice: initialize mailbox snapshot earlier in PF init
ice: merge ice_mbx_report_malvf with ice_mbx_vf_state_handler
ice: remove ice_mbx_deinit_snapshot
ice: move VF overflow message count into struct ice_mbx_vf_info
ice: track malicious VFs in new ice_mbx_vf_info structure
ice: convert ice_mbx_clear_malvf to void and use WARN
ice: re-order ice_mbx_reset_snapshot function
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
RDMA is not supported in ice on a PF that has been added to a bonded
interface. To enforce this, when an interface enters a bond, we unplug
the auxiliary device that supports RDMA functionality. This unplug
currently happens in the context of handling the netdev bonding event.
This event is sent to the ice driver under RTNL context. This is causing
a deadlock where the RDMA driver is waiting for the RTNL lock to complete
the removal.
Defer the unplugging/re-plugging of the auxiliary device to the service
task so that it is not performed under the RTNL lock context.
Cc: [email protected] # 6.1.x
Reported-by: Jaroslav Pulchart <[email protected]>
Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi7vb621A@mail.gmail.com/
Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
Signed-off-by: Dave Ertman <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
i40e: support XDP multi-buffer
Tirthendu Sarkar says:
This patchset adds multi-buffer support for XDP. Tx side already has
support for multi-buffer. This patchset focuses on Rx side. The last
patch contains actual multi-buffer changes while the previous ones are
preparatory patches.
On receiving the first buffer of a packet, xdp_buff is built and its
subsequent buffers are added to it as frags. While 'next_to_clean' keeps
pointing to the first descriptor, the newly introduced 'next_to_process'
keeps track of every descriptor for the packet.
On receiving EOP buffer the XDP program is called and appropriate action
is taken (building skb for XDP_PASS, reusing page for XDP_DROP, adjusting
page offsets for XDP_{REDIRECT,TX}).
The patchset also streamlines page offset adjustments for buffer reuse
to make it easier to post process the rx_buffers after running XDP prog.
With this patchset there does not seem to be any performance degradation
for XDP_PASS and some improvement (~1% for XDP_TX, ~5% for XDP_DROP) when
measured using xdp_rxq_info program from samples/bpf/ for 64B packets.
v1: https://lore.kernel.org/netdev/[email protected]/
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: add support for XDP multi-buffer Rx
i40e: add xdp_buff to i40e_ring struct
i40e: introduce next_to_process to i40e_ring
i40e: use frame_sz instead of recalculating truesize for building skb
i40e: Change size to truesize when using i40e_rx_buffer_flip()
i40e: add pre-xdp page_count in rx_buffer
i40e: change Rx buffer size for legacy-rx to support XDP multi-buffer
i40e: consolidate maximum frame size calculation for vsi
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The main loop in __ice_clean_ctrlq first checks if a VF might be malicious
before calling ice_vc_process_vf_msg(). This results in duplicate code in
both functions to obtain a reference to the VF, and exports the
ice_is_malicious_vf() from ice_virtchnl.c unnecessarily.
Refactor ice_is_malicious_vf() to be a static function that takes a pointer
to the VF. Call this in ice_vc_process_vf_msg() just after we obtain a
reference to the VF by calling ice_get_vf_by_id.
Pass the mailbox data from the __ice_clean_ctrlq function into
ice_vc_process_vf_msg() instead of calling ice_is_malicious_vf().
This reduces the number of exported functions and avoids the need to obtain
the VF reference twice for every mailbox message.
Note that the state check for ICE_VF_STATE_DIS is kept in
ice_is_malicious_vf() and we call this before checking that state in
ice_vc_process_vf_msg. This is intentional, as we stop responding to VF
messages from a VF once we detect that it may be overflowing the mailbox.
This ensures that we continue to silently ignore the message as before
without responding via ice_vc_send_msg_to_vf().
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_is_malicious_vf() function is currently implemented in ice_sriov.c
This function is not Single Root specific, and a future change is going to
refactor the ice_vc_process_vf_msg() function to call this instead of
calling it before ice_vc_process_vf_msg() in the main loop of
__ice_clean_ctrlq.
To make that change easier to review, first move this function into
ice_virtchnl.c but leave the call in __ice_clean_ctrlq() alone.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
If ice_mbx_vf_state_handler() returns an error, the ice_is_malicious_vf()
function just exits without printing anything.
Instead, use dev_warn_ratelimited to print a warning that we were unable to
check the status for this VF. The _ratelimited variant is used to avoid
potentially spamming the log if this function is failing consistently for
every single mailbox message.
Also we can drop the "goto" as it simply skips over a report_malvf check.
That variable should always be false if ice_mbx_vf_state_handler returns
non-zero.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_is_malicious_vf() function takes information about the current
state of the mailbox during a single interrupt. This information includes
the number of messages processed so far, as well as the number of pending
messages not yet processed.
A future refactor is going to make ice_vc_process_vf_msg() call
ice_is_malicious_vf() instead of having it called separately in ice_main.c
This change will require passing all the necessary arguments into
ice_vc_process_vf_msg().
To make this simpler, have the main loop fill in the struct ice_mbx_data
and pass that rather than passing in the num_msg_proc and num_msg_pending.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In ice_is_malicious_vf we print the VF MAC address using %pM by passing the
address of the first element of vf->dev_lan_addr. This is equivalent to
just passing vf->dev_lan_addr, so do that.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In ice_is_malicious_vf we report a message warning the system administrator
when a VF is potentially spamming the PF with asynchronous messages that
could overflow the PF mailbox.
The specific message was requested by our customer support team to include
the VF and PF MAC address. In some cases we may not be able to locate the
PF VSI to obtain the MAC address for the PF. The current implementation
discards the message entirely in this case. Fix this to instead print a
zero address in that case so that we always print something here. Note that
dev_warn will also include the PCI device information allowing another
mechanism for determining on which PF the potentially malicious VF belongs.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_vc_process_vf_msg function is the main entry point for handling
virtchnl messages. This function is defined in ice_virtchnl.c but its
declaration is still in ice_sriov.c
The ice_sriov.c file used to contain all of the virtualization logic until
commit bf93bf791cec ("ice: introduce ice_virtchnl.c and ice_virtchnl.h")
moved the virtchnl logic to its own file.
The ice_vc_process_vf_msg function should have had its declaration moved to
ice_virtchnl.h then. Fix this.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Now that we no longer depend on the number of VFs being allocated, we can
move the ice_mbx_init_snapshot function earlier. This will be required by
Scalable IOV as we will not be calling ice_sriov_configure for Scalable
VFs.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_mbx_report_malvf function is used to update the
ice_mbx_vf_info.malicious member after we detect a malicious VF. This is
done by calling ice_mbx_report_malvf after ice_mbx_vf_state_handler sets
its "is_malvf" return parameter true.
Instead of requiring two steps, directly update the malicious bit in the
state handler, and remove the need for separately calling
ice_mbx_report_malvf.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_mbx_deinit_snapshot function's only remaining job is to clear the
previous snapshot data. This snapshot data is initialized when SR-IOV adds
VFs, so it is not necessary to clear this data when removing VFs. Since no
allocation occurs we no longer need to free anything and we can safely
remove this function.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice driver has some logic in ice_vf_mbx.c used to detect potentially
malicious VF behavior with regards to overflowing the PF mailbox. This
logic currently stores message counts in struct ice_mbx_vf_counter.vf_cntr
as an array. This array is allocated during initialization with
ice_mbx_init_snapshot.
This logic makes sense for SR-IOV where all VFs are allocated at once up
front. However, in the future with Scalable IOV this logic will not work.
VFs can be added and removed dynamically. We could try to keep the vf_cntr
array for the maximum possible number of VFs, but this is a waste of
memory.
Use the recently introduced struct ice_mbx_vf_info structure to store the
message count. Pass a pointer to the mbx_info for a VF instead of using its
VF ID. Replace the array of VF message counts with a linked list that
tracks all currently active mailbox tracking info structures.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently the PF tracks malicious VFs in a malvfs bitmap which is used by
the ice_mbx_clear_malvf and ice_mbx_report_malvf functions. This bitmap is
used to ensure that we only report a VF as malicious once rather than
continuously spamming the event log.
This mechanism of storage for the malicious indication works well enough
for SR-IOV. However, it will not work with Scalable IOV. This is because
Scalable IOV VFs can be allocated dynamically and might change VF ID when
their underlying VSI changes.
To support this, the mailbox overflow logic will need to be refactored.
First, introduce a new ice_mbx_vf_info structure which will be used to
store data about a VF. Embed this structure in the struct ice_vf, and
ensure it gets initialized when a new VF is created.
For now this only stores the malicious indicator bit. Pass a pointer to the
VF's mbx_info structure instead of using a bitmap to keep track of these
bits.
A future change will extend this structure and the rest of the logic
associated with the overflow detection.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The ice_mbx_clear_malvf function checks for a few error conditions before
clearing the appropriate data. These error conditions are really warnings
that should never occur in a properly initialized driver. Every caller of
ice_mbx_clear_malvf just prints a dev_dbg message on failure which will
generally be ignored.
Convert this function to void and switch the error return values to
WARN_ON. This will make any potentially misconfiguration more visible and
makes future refactors that involve changing how we store the malicious VF
data easier.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
A future change is going to refactor the VF mailbox overflow detection
logic, including modifying ice_mbx_reset_snapshot and its callers. To make
this change easier to review, first move the ice_mbx_reset_snapshot
function higher in the ice_vf_mbx.c file.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
If the driver detects during probe that firmware is in recovery
mode then i40e_init_recovery_mode() is called and the rest of
probe function is skipped including pci_set_drvdata(). Subsequent
i40e_shutdown() called during shutdown/reboot dereferences NULL
pointer as pci_get_drvdata() returns NULL.
To fix call pci_set_drvdata() also during entering to recovery mode.
Reproducer:
1) Lets have i40e NIC with firmware in recovery mode
2) Run reboot
Result:
[ 139.084698] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[ 139.090959] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[ 139.108438] i40e 0000:02:00.0: Firmware recovery mode detected. Limiting functionality.
[ 139.116439] i40e 0000:02:00.0: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode.
[ 139.129499] i40e 0000:02:00.0: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a]
[ 139.215932] i40e 0000:02:00.0 enp2s0f0: renamed from eth0
[ 139.223292] i40e 0000:02:00.1: Firmware recovery mode detected. Limiting functionality.
[ 139.231292] i40e 0000:02:00.1: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode.
[ 139.244406] i40e 0000:02:00.1: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a]
[ 139.329209] i40e 0000:02:00.1 enp2s0f1: renamed from eth0
...
[ 156.311376] BUG: kernel NULL pointer dereference, address: 00000000000006c2
[ 156.318330] #PF: supervisor write access in kernel mode
[ 156.323546] #PF: error_code(0x0002) - not-present page
[ 156.328679] PGD 0 P4D 0
[ 156.331210] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 156.335567] CPU: 26 PID: 15119 Comm: reboot Tainted: G E 6.2.0+ #1
[ 156.343126] Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
[ 156.353369] RIP: 0010:i40e_shutdown+0x15/0x130 [i40e]
[ 156.358430] Code: c1 fc ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 fd 53 48 8b 9f 48 01 00 00 <f0> 80 8b c2 06 00 00 04 f0 80 8b c0 06 00 00 08 48 8d bb 08 08 00
[ 156.377168] RSP: 0018:ffffb223c8447d90 EFLAGS: 00010282
[ 156.382384] RAX: ffffffffc073ee70 RBX: 0000000000000000 RCX: 0000000000000001
[ 156.389510] RDX: 0000000080000001 RSI: 0000000000000246 RDI: ffff95db49988000
[ 156.396634] RBP: ffff95db49988000 R08: ffffffffffffffff R09: ffffffff8bd17d40
[ 156.403759] R10: 0000000000000001 R11: ffffffff8a5e3d28 R12: ffff95db49988000
[ 156.410882] R13: ffffffff89a6fe17 R14: ffff95db49988150 R15: 0000000000000000
[ 156.418007] FS: 00007fe7c0cc3980(0000) GS:ffff95ea8ee80000(0000) knlGS:0000000000000000
[ 156.426083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 156.431819] CR2: 00000000000006c2 CR3: 00000003092fc005 CR4: 0000000000770ee0
[ 156.438944] PKRU: 55555554
[ 156.441647] Call Trace:
[ 156.444096] <TASK>
[ 156.446199] pci_device_shutdown+0x38/0x60
[ 156.450297] device_shutdown+0x163/0x210
[ 156.454215] kernel_restart+0x12/0x70
[ 156.457872] __do_sys_reboot+0x1ab/0x230
[ 156.461789] ? vfs_writev+0xa6/0x1a0
[ 156.465362] ? __pfx_file_free_rcu+0x10/0x10
[ 156.469635] ? __call_rcu_common.constprop.85+0x109/0x5a0
[ 156.475034] do_syscall_64+0x3e/0x90
[ 156.478611] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 156.483658] RIP: 0033:0x7fe7bff37ab7
Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support")
Signed-off-by: Ivan Vecera <[email protected]>
Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-07 (igc)
This series contains updates to igc driver only.
Muhammad adds tracking and reporting of QBV config errors.
Tan Tee adds support for configuring max SDU for each Tx queue.
Sasha removes check for alternate media as only one media type is
supported.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igc: Clean up and optimize watchdog task
igc: offload queue max SDU from tc-taprio
igc: Add qbv_config_change_errors counter
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This patch adds multi-buffer support for the i40e_driver.
i40e_clean_rx_irq() is modified to collate all the buffers of a packet
before calling the XDP program. xdp_buff is built for the first frag of
the packet and subsequent frags are added to it. 'next_to_process' is
incremented for all non-EOP frags while 'next_to_clean' stays at the
first descriptor of the packet. XDP program is called only on receiving
EOP frag.
New functions are added for adding frags to xdp_buff and for post
processing of the buffers once the xdp prog has run. For XDP_PASS this
results in a skb with multiple fragments.
i40e_build_skb() builds the skb around xdp buffer that already contains
frags data. So i40e_add_rx_frag() helper function is now removed. Since
fields before 'dataref' in skb_shared_info are cleared during
napi_skb_build(), xdp_update_skb_shared_info() is called to set those.
For i40e_construct_skb(), all the frags data needs to be copied from
xdp_buffer's shared_skb_info to newly constructed skb's shared_skb_info.
This also means 'skb' does not need to be preserved across i40e_napi_poll()
calls and hence is removed from i40e_ring structure.
Previously i40e_alloc_rx_buffers() was called for every 32 cleaned
buffers. For multi-buffers this may not be optimal as there may be more
cleaned buffers in each i40e_clean_rx_irq() call. So this is now called
when at least half of the ring size has been cleaned.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Store xdp_buff on Rx ring struct in preparation for XDP multi-buffer
support. This will allow us to combine fragmented frames across
separate NAPI cycles in the same way as currently skb fragments are
handled. This means that skb pointer on Rx ring will become redundant
and will be removed in a later patch. As a consequence i40e_trace() now
uses xdp instead of skb pointer.
Truesize only needs to be calculated for page sizes bigger than 4k as it
is always half-page for 4k pages. With xdp_buff on ring, frame size can
now be set during xdp_init_buff() and need not be repopulated in each
NAPI call for 4k pages. As a consequence i40e_rx_frame_truesize() is now
used only for bigger pages.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add a new field called next_to_process in the i40e_ring that is
advanced for every buffer and change the semantics of next_to_clean to
point to the first buffer of a packet. Driver will use next_to_process
in the same way next_to_clean was used previously.
For the non multi-buffer case, next_to_process and next_to_clean will
always be the same since each packet consists of a single buffer.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In skb path truesize is calculated while building skb. This is now
avoided and xdp->frame_is used instead for both i40e_build_skb() and
i40e_construct_skb().
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Truesize is now passed directly to i40e_rx_buffer_flip() instead of size
so that it does not need to recalculate truesize from size using
i40e_rx_frame_truesize() before adjusting page offset.
With these change the function can now be used during skb building and
adding frags. In later patches it will also be easier for adjusting
page offsets for multi-buffers.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Page count of rx_buffer needs to be stored prior to XDP call to prevent
page recycling in case that buffer would be freed within xdp redirect
path. Instead of storing it on the stack, now it is stored in the
rx_buffer struct. This will help in processing multi-buffers as the page
counts of all rx_buffers (of the same packet) don't need to be stored on
stack.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Adding support for XDP multi-buffer entails adding information of all
the fragments of the packet in the xdp_buff. This approach implies that
underlying buffer has to provide tailroom for skb_shared_info.
In the legacy-rx mode, driver can only configure up to 2k sized Rx buffers
and with the current configuration of 2k sized Rx buffers there is no way
to do tailroom reservation for skb_shared_info. Hence size of Rx buffers
is now lowered to 2048 - sizeof(skb_shared_info). Also, driver can only
chain up to 5 Rx buffers and this means max MTU supported for legacy-rx
is now 8614 (5 * rx_buffer_len - ETH header with VLAN).
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Introduce new helper function to calculate max frame size for validating
and setting of vsi frame size. This is used while configuring vsi,
changing the MTU and attaching an XDP program to the vsi.
This is in preparation of the legacy rx and multi-buffer changes to be
introduced in later patches.
Signed-off-by: Tirthendu Sarkar <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
<linux/aer.h> is unused, so remove it.
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: [email protected]
Acked-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
i225/i226 parts used only one media type copper. The copper media type is
not replaceable. Clean up the code accordingly, and remove the obsolete
media replacement and reset options.
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add support for configuring the max SDU for each Tx queue.
If not specified, keep the default.
Signed-off-by: Tan Tee Min <[email protected]>
Signed-off-by: Muhammad Husaini Zulkifli <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add ConfigChangeError(qbv_config_change_errors) when user try to set the
AdminBaseTime to past value while the current GCL is still running.
The ConfigChangeError counter should not be increased when a gate control
list is scheduled into the future.
User can use "ethtool -S <interface> | grep qbv_config_change_errors"
command to check the counter values.
Signed-off-by: Muhammad Husaini Zulkifli <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
With older compilers like gcc-9, the calculation of the vlan
priority field causes a false-positive warning from the byteswap:
In file included from drivers/net/ethernet/intel/ice/ice_tc_lib.c:4:
drivers/net/ethernet/intel/ice/ice_tc_lib.c: In function 'ice_parse_cls_flower':
include/uapi/linux/swab.h:15:15: error: integer overflow in expression '(int)(short unsigned int)((int)match.key-><U67c8>.<U6698>.vlan_priority << 13) & 57344 & 255' of type 'int' results in '0' [-Werror=overflow]
15 | (((__u16)(x) & (__u16)0x00ffU) << 8) | \
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:106:2: note: in expansion of macro '___constant_swab16'
106 | ___constant_swab16(x) : \
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/little_endian.h:42:43: note: in expansion of macro '__swab16'
42 | #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
| ^~~~~~~~
include/linux/byteorder/generic.h:96:21: note: in expansion of macro '__cpu_to_be16'
96 | #define cpu_to_be16 __cpu_to_be16
| ^~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_tc_lib.c:1458:5: note: in expansion of macro 'cpu_to_be16'
1458 | cpu_to_be16((match.key->vlan_priority <<
| ^~~~~~~~~~~
After a change to be16_encode_bits(), the code becomes more
readable to both people and compilers, which avoids the warning.
Fixes: 34800178b302 ("ice: Add support for VLAN priority filters in switchdev")
Suggested-by: Alexander Lobakin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There were few smatch warnings reported by Dan:
- ice_vsi_cfg_xdp_txqs can return 0 instead of ret, which is cleaner
- return values in ice_vsi_cfg_def were ignored
- in ice_vsi_rebuild return value was ignored in case rebuild failed,
it was a never reached code, however, rewrite it for clarity.
- ice_vsi_cfg_tc can return 0 instead of ret
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|