Age | Commit message (Collapse) | Author | Files | Lines |
|
When creating the TLV to send to the FW for configuring DSCP mode PFC,the
PFCENABLE field was being masked with a 4 bit mask (0xF), but this is an 8
bit bitmask for enabled classes for PFC. This means that traffic classes
4-7 could not be enabled for PFC.
Remove the mask completely, as it is not necessary, as we are assigning 8
bits to an 8 bit field.
Fixes: 2a87bd73e50d ("ice: Add DSCP support")
Signed-off-by: Dave Ertman <[email protected]>
Signed-off-by: Karen Ostrowska <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice:
Reimplement module reads used by ethtool") In this refactor,
ice_get_module_eeprom() reads the eeprom in blocks of size 8.
But the condition that should protect the buffer overflow
ignores the last block. The last block always contains zeros.
Bug uncovered by ethtool upstream commit 9538f384b535
("netlink: eeprom: Defer page requests to individual parsers")
After this commit, ethtool reads a block with length = 1;
to read the SFF-8024 identifier value.
unpatched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset Values
------ ------
0x0090: 00 00 00 00 00 00 00 00
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c 00 00 00 00
$
$ ethtool -m enp65s0f0np0
Offset Values
------ ------
0x0000: 11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
0x0070: 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
patched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
$ ethtool -m enp65s0f0np0
Identifier : 0x11 (QSFP28)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x23 (No separable connector)
Transceiver codes : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 40G Ethernet: 40G Base-CR4
Transceiver type : 25G Ethernet: 25G Base-CR CA-N
Encoding : 0x05 (64B/66B)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 1m
Transmitter technology : 0xa0 (Copper cable unequalized)
Attenuation at 2.5GHz : 4db
Attenuation at 5.0GHz : 5db
Attenuation at 7.0GHz : 7db
Attenuation at 12.9GHz : 10db
........
....
Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
Signed-off-by: Petr Oros <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Jesse Brandeburg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
CONFIG_ICE_GNSS was added by commit c7ef8221ca7d ("ice: use GNSS subsystem
instead of TTY") as a way to allow the ice driver to optionally support
GNSS features without forcing a dependency on CONFIG_GNSS.
The original implementation of that commit at [1] used IS_REACHABLE. This
was rejected by Olek at [2] with the suggested implementation of
CONFIG_ICE_GNSS.
Eventually after merging, Linus reported a .config which had
CONFIG_ICE_GNSS = y when both GNSS = n and ICE = n. This confused him and
he felt that the config option was not useful, and commented about it at
[3].
CONFIG_ICE_GNSS is defined to y whenever GNSS = ICE. This results in it
being set in cases where both options are not enabled.
The goal of CONFIG_ICE_GNSS is to ensure that the GNSS support in the ice
driver is enabled when GNSS is enabled.
The complaint from Olek about the original IS_REACHABLE was due to the
required IS_REACHABLE checks throughout the ice driver code and the fact
that ice_gnss.c was compiled regardless of GNSS support.
This can be fixed in the Makefile by using ice-$(CONFIG_GNSS) += ice_gnss.o
In this case, if GNSS = m and ICE = y, we can result in some confusing
behavior where GNSS support is not enabled because its not built in. See
[4].
To disallow this, have CONFIG_ICE depend on GNSS || GNSS = n. This ensures
that we cannot enable CONFIG_ICE as builtin while GNSS is a module.
Drop CONFIG_ICE_GNSS, and replace the IS_ENABLED checks for it with
checks for GNSS. Update the Makefile to add the ice_gnss.o object based on
CONFIG_GNSS.
This works to ensure that GNSS support can optionally be enabled, doesn't
have an unnnecessary extra config option, and has Kbuild enforce the
dependency such that you can't accidentally enable GNSS as a module and ICE
as a builtin.
[1] https://lore.kernel.org/intel-wired-lan/[email protected]/
[2] https://lore.kernel.org/intel-wired-lan/[email protected]/
[3] https://lore.kernel.org/all/CAHk-=wi_410KZqHwF-WL5U7QYxnpHHHNP-3xL=g_y89XnKc-uw@mail.gmail.com/
[4] https://lore.kernel.org/netdev/[email protected]/
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Fixes: c7ef8221ca7d ("ice: use GNSS subsystem instead of TTY")
Cc: Arkadiusz Kubalewski <[email protected]>
Cc: Alexander Lobakin <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Anthony Nguyen <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Devlink reload patchset introduced regression. ICE_VSI_LB wasn't
taken into account when doing default allocation. Fix it by adding a
case for ICE_VSI_LB in ice_vsi_alloc_def().
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Maciej Fijalkowski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some of the devlink bits were tricky, but I think I got it right.
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-14 (ice)
This series contains updates to ice driver only.
Karol extends support for GPIO pins to E823 devices.
Daniel Vacek stops processing of PTP packets when link is down.
Pawel adds support for BIG TCP for IPv6.
Tony changes return type of ice_vsi_realloc_stat_arrays() as it always
returns success.
Zhu Yanjun updates kdoc stating supported TLVs.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Mention CEE DCBX in code comment
ice: Change ice_vsi_realloc_stat_arrays() to void
ice: add support BIG TCP on IPv6
ice/ptp: fix the PTP worker retrying indefinitely if the link went down
ice: Add GPIO pin support for E823 products
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit a97f8783a937 ("igb: unbreak I2C bit-banging on i350") introduced
code to change I2C settings to bit banging unconditionally.
However, this patch introduced a regression: On an Intel S2600CWR
Server Board with three NICs:
- 1x dual-port copper
Intel I350 Gigabit Network Connection [8086:1521] (rev 01)
fw 1.63, 0x80000dda
- 2x quad-port SFP+ with copper SFP Avago ABCU-5700RZ
Intel I350 Gigabit Fiber Network Connection [8086:1522] (rev 01)
fw 1.52.0
the SFP NICs no longer get link at all. Reverting commit a97f8783a937
or switching to the Intel out-of-tree driver both fix the problem.
Per the igb out-of-tree driver, I2C bit banging on i350 depends on
support for an external thermal sensor (ETS). However, commit
a97f8783a937 added bit banging unconditionally. Additionally, the
out-of-tree driver always calls init_thermal_sensor_thresh on probe,
while our driver only calls init_thermal_sensor_thresh only in
igb_reset(), and only if an ETS is present, ignoring the internal
thermal sensor. The affected SFPs don't provide an ETS. Per Intel,
the behaviour is a result of i350 firmware requirements.
This patch fixes the problem by aligning the behaviour to the
out-of-tree driver:
- split igb_init_i2c() into two functions:
- igb_init_i2c() only performs the basic I2C initialization.
- igb_set_i2c_bb() makes sure that E1000_CTRL_I2C_ENA is set
and enables bit-banging.
- igb_probe() only calls igb_set_i2c_bb() if an ETS is present.
- igb_probe() calls init_thermal_sensor_thresh() unconditionally.
- igb_reset() aligns its behaviour to igb_probe(), i. e., call
igb_set_i2c_bb() if an ETS is present and call
init_thermal_sensor_thresh() unconditionally.
Fixes: a97f8783a937 ("igb: unbreak I2C bit-banging on i350")
Tested-by: Mateusz Palczewski <[email protected]>
Co-developed-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Jamie Bainbridge <[email protected]>
Signed-off-by: Corinna Vinschen <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-14 (ixgbe, i40e)
This series contains updates to ixgbe and i40e drivers.
Jason Xing corrects comparison of frame sizes for setting MTU with XDP on
ixgbe and adjusts frame size to account for a second VLAN header on ixgbe
and i40e.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: add double of VLAN header when computing the max MTU
i40e: add double of VLAN header when computing the max MTU
ixgbe: allow to increase MTU to 3K with XDP enabled
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Now ice driver supports xdp multi-buffer so add it to xdp_features.
Check vsi type before setting xdp_features flag.
Signed-off-by: Lorenzo Bianconi <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/8a4781511ab6e3cd280e944eef69158954f1a15f.1676385351.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Set xdp_features flag just for I40E_VSI_MAIN vsi type since XDP is
supported just in this configuration.
Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/f2b537f86b34fc176fbc6b3d249b46a20a87a2f3.1676405131.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix handling of the tsync interrupt to compare the pin number with
IGB_N_SDP instead of IGB_N_EXTTS/IGB_N_PEROUT and fix the indexing to
the perout array.
Fixes: cf99c1dd7b77 ("igb: move PEROUT and EXTTS isr logic to separate functions")
Reported-by: Matt Corallo <[email protected]>
Signed-off-by: Miroslav Lichvar <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
From the function ice_parse_org_tlv, CEE DCBX TLV is also supported.
So update the comment. Or else, it is confusing.
Signed-off-by: Zhu Yanjun <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
smatch reports:
smatch warnings:
drivers/net/ethernet/intel/ice/ice_lib.c:3612 ice_vsi_rebuild() warn: missing error code 'ret'
If an error is encountered for ice_vsi_realloc_stat_arrays(), ret is not
assigned an error value so the goto error path would return success. The
function, however, only returns 0 so an error will never be reported; due
to this, change the function to return void.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
|
|
Enable sending BIG TCP packets on IPv6 in the ice driver using generic
ipv6_hopopt_jumbo_remove helper for stripping HBH header.
Tested:
netperf -t TCP_RR -H 2001:db8:0:f101::1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,TRANSACTION_RATE
Tested on two different setups. In both cases, the following settings were
applied after loading the changed driver:
ip link set dev enp175s0f1np1 gso_max_size 130000
ip link set dev enp175s0f1np1 gro_max_size 130000
ip link set dev enp175s0f1np1 mtu 9000
First setup:
Before:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
134 279 410 3961.584
After:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
135 178 216 6093.404
The other setup:
Before:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
218 414 478 2944.765
After:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
146 238 266 4700.596
Signed-off-by: Pawel Chmielewski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When the link goes down the ice_ptp_tx_tstamp() may loop re-trying to
process the packets till the 2 seconds timeout finally drops them.
In such a case it makes sense to just drop them right away.
Signed-off-by: Daniel Vacek <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add GPIO pin setup for E823, which is only 1PPS input and output.
Signed-off-by: Karol Kolacinski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Recently I encountered one case where I cannot increase the MTU size
directly from 1500 to a much bigger value with XDP enabled if the
server is equipped with IXGBE card, which happened on thousands of
servers in production environment. After applying the current patch,
we can set the maximum MTU size to 3K.
This patch follows the behavior of changing MTU as i40e/ice does.
References:
[1] commit 23b44513c3e6 ("ice: allow 3k MTU for XDP")
[2] commit 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
After the recent mbuf changes, ice_xmit_xdp_ring() became a 3-liner.
It makes no sense to keep it global in a different file than its caller.
Move it just next to the sole call site and mark static. Also, it
doesn't need a full xdp_convert_frame_to_buff(). Save several cycles
and fill only the fields used by __ice_xmit_xdp_ring() later on.
Finally, since it doesn't modify @xdpf anyhow, mark the argument const
to save some more (whole -11 bytes of .text! :D).
Thanks to 1 jump less and less calcs as well, this yields as many as
6.7 Mpps per queue. `xdp.data_hard_start = xdpf` is fully intentional
again (see xdp_convert_buff_to_frame()) and just works when there are
no source device's driver issues.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
As already mentioned, freeing any &xdp_frame via page_frag_free() is
wrong, as it assumes the frame is backed by either an order-0 page or
a page with no "patrons" behind them, while in fact frames backed by
Page Pool can be redirected to a device, which's driver doesn't use it.
Keep storing a pointer to the raw buffer and then freeing it
unconditionally via page_frag_free() for %XDP_TX frames, but introduce
a separate type in the enum for frames coming through .ndo_xdp_xmit(),
and free them via xdp_return_frame_bulk(). Note that saving xdpf as
xdp_buff->data_hard_start is intentional and is always true when
everything is configured properly.
After this change, %XDP_REDIRECT from a Page Pool based driver to ice
becomes zero-alloc as it should be and horrendous 3.3 Mpps / queue
turn into 6.6, hehe.
Let it go with no "Fixes:" tag as it spans across good 5+ commits and
can't be trivially backported.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
When queueing frames from a Page Pool for redirecting to a device backed
by the ice driver, `perf top` shows heavy load on page_alloc() and
page_frag_free(), despite that on a properly working system it must be
fully or at least almost zero-alloc. The problem is in fact a bit deeper
and raises from how ice cleans up completed Tx buffers.
The story so far: when cleaning/freeing the resources related to
a particular completed Tx frame (skbs, DMA mappings etc.), ice uses some
heuristics only without setting any type explicitly (except for dummy
Flow Director packets, which are marked via ice_tx_buf::tx_flags).
This kinda works, but only up to some point. For example, currently ice
assumes that each frame coming to __ice_xmit_xdp_ring(), is backed by
either plain order-0 page or plain page frag, while it may also be
backed by Page Pool or any other possible memory models introduced in
future. This means any &xdp_frame must be freed properly via
xdp_return_frame() family with no assumptions.
In order to do that, the whole heuristics must be replaced with setting
the Tx buffer/frame type explicitly, just how it's always been done via
an enum. Let us reuse 16 bits from ::tx_flags -- 1 bit-and instr won't
hurt much -- especially given that sometimes there was a check for
%ICE_TX_FLAGS_DUMMY_PKT, which is now turned from a flag to an enum
member. The rest of the changes is straightforward and most of it is
just a conversion to rely now on the type set in &ice_tx_buf rather than
to some secondary properties.
For now, no functional changes intended, the change only prepares the
ground for starting freeing XDP frames properly next step. And it must
be done atomically/synchronously to not break stuff.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The tagged commit started sending %XDP_TX frames from XSk Rx ring
directly without converting it to an &xdp_frame. However, when XSk is
enabled on a queue pair, it has its separate Tx cleaning functions, so
neither ice_clean_xdp_irq() nor ice_unmap_and_free_tx_buf() ever happens
there.
Remove impossible branches in order to reduce the diffstat of the
upcoming change.
Fixes: a24b4c6e9aab ("ice: xsk: Do not convert to buff to frame for XDP_TX")
Suggested-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Sometimes, under heavy XDP Tx traffic, e.g. when using XDP traffic
generator (%BPF_F_TEST_XDP_LIVE_FRAMES), the machine can catch OOM due
to the driver not freeing all of the pages passed to it by
.ndo_xdp_xmit().
Turned out that during the development of the tagged commit, the check,
which ensures that we have a free descriptor to queue a frame, moved
into the branch happening only when a buffer has frags. Otherwise, we
only run a cleaning cycle, but don't check anything.
ATST, there can be situations when the driver gets new frames to send,
but there are no buffers that can be cleaned/completed and the ring has
no free slots. It's very rare, but still possible (> 6.5 Mpps per ring).
The driver then fills the next buffer/descriptor, effectively
overwriting the data, which still needs to be freed.
Restore the check after the cleaning routine to make sure there is a
slot to queue a new frame. When there are frags, there still will be a
separate check that we can place all of them, but if the ring is full,
there's no point in wasting any more time.
(minor: make `!ready_frames` unlikely since it happens ~1-2 times per
billion of frames)
Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side")
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
xdp_tx_active is used to indicate whether an XDP ring has any %XDP_TX
frames queued to shortcut processing Tx cleaning for XSk-enabled queues.
When !XSk, it simply indicates whether the ring has any queued frames in
general.
It gets increased on each frame placed onto the ring and counts the
whole frame, not each frag. However, currently it gets decremented in
ice_clean_xdp_tx_buf(), which is called per each buffer, i.e. per each
frag. Thus, on completing multi-frag frames, an underflow happens.
Move the decrement to the outer function and do it once per frame, not
buf. Also, do that on the stack and update the ring counter after the
loop is done to save several cycles.
XSk rings are fine since there are no frags at the moment.
Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side")
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently checks for weight and priority ranges don't check incoming value
from the devlink. Instead it checks node current weight or priority. This
makes those checks useless.
Change range checks in ice_set_object_tx_priority() and
ice_set_object_tx_weight() to check against incoming priority an weight.
Fixes: 42c2eb6b1f43 ("ice: Implement devlink-rate API")
Signed-off-by: Michal Wilczynski <[email protected]>
Acked-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The result of nlmsg_find_attr() 'br_spec' is dereferenced in
nla_for_each_nested(), but it can take NULL value in nla_find() function,
which will result in an error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops")
Signed-off-by: Natalia Petrova <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-09 (i40e)
Jan removes i40e_status from the driver; replacing them with standard
kernel error codes.
Kees Cook replaces 0-length array with flexible array.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
net/i40e: Replace 0-length array with flexible array
i40e: use ERR_PTR error print in i40e messages
i40e: use int for i40e_status
i40e: Remove string printing for i40e_status
i40e: Remove unused i40e status codes
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Incrementation of xsk_frames inside the for-loop produces
infinite loop, if we have both normal AF_XDP-TX and XDP_TXed
buffers to complete.
Split xsk_frames into 2 variables (xsk_frames and completed_frames)
to eliminate this bug.
Fixes: 29322791bc8b ("ice: xsk: change batched Tx descriptor cleaning")
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Larysa Zaremba <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Acked-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
====================
pull-request: bpf-next 2023-02-11
We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).
There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d01 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:
[...]
ice_set_netdev_features(netdev);
netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_XSK_ZEROCOPY;
ice_set_ops(netdev);
[...]
Stephen's merge conflict mail:
https://lore.kernel.org/bpf/[email protected]/
The main changes are:
1) Add support for BPF trampoline on s390x which finally allows to remove many
test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.
2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.
3) Add capability to export the XDP features supported by the NIC.
Along with that, add a XDP compliance test tool,
from Lorenzo Bianconi & Marek Majtyka.
4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
from David Vernet.
5) Add a deep dive documentation about the verifier's register
liveness tracking algorithm, from Eduard Zingerman.
6) Fix and follow-up cleanups for resolve_btfids to be compiled
as a host program to avoid cross compile issues,
from Jiri Olsa & Ian Rogers.
7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
when testing on different NICs, from Jesper Dangaard Brouer.
8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.
9) Extend libbpf to add an option for when the perf buffer should
wake up, from Jon Doron.
10) Follow-up fix on xdp_metadata selftest to just consume on TX
completion, from Stanislav Fomichev.
11) Extend the kfuncs.rst document with description on kfunc
lifecycle & stability expectations, from David Vernet.
12) Fix bpftool prog profile to skip attaching to offline CPUs,
from Tonghao Zhang.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
net/devlink/leftover.c / net/core/devlink.c:
565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
f05bd8ebeb69 ("devlink: move code to a dedicated directory")
687125b5799c ("devlink: split out core code")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Zero-length arrays are deprecated[1]. Replace struct i40e_lump_tracking's
"list" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:
In function 'i40e_put_lump',
inlined from 'i40e_clear_interrupt_scheme' at drivers/net/ethernet/intel/i40e/i40e_main.c:5145:2:
drivers/net/ethernet/intel/i40e/i40e_main.c:278:27: warning: array subscript <unknown> is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Warray-bounds=]
278 | pile->list[i] = 0;
| ~~~~~~~~~~^~~
drivers/net/ethernet/intel/i40e/i40e.h: In function 'i40e_clear_interrupt_scheme':
drivers/net/ethernet/intel/i40e/i40e.h:179:13: note: while referencing 'list'
179 | u16 list[0];
| ^~~~
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In i40e_status removal patches, i40e_status conversion
to strings was removed in order to easily refactor
the code to use standard errornums. This however made it
more difficult for read error logs.
Use %pe formatter to print error messages in human-readable
format.
Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
To prepare for removal of i40e_status, change the variables
from i40e_status to int. This eases the transition when values
are changed to return standard int error codes over enum i40e_status.
As such changes often also change variable orders, a cleanup
is also applied here to make variables conform to RCT and
some lines are also reformatted where applicable.
Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Remove the i40e_stat_str() function which prints the string
representation of the i40e_status error code. With upcoming changes
moving away from i40e_status, there will be no need for this function
Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In an effort to remove i40e status codes and replace them
with standard kernel errornums, unused values of i40e_status_code
were removed.
Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Current taprio software implementation is haunted by the shadow of the
igb/igc hardware model. It iterates over child qdiscs in increasing
order of TXQ index, therefore giving higher xmit priority to TXQ 0 and
lower to TXQ N. According to discussions with Vinicius, that is the
default (perhaps even unchangeable) prioritization scheme used for the
NICs that taprio was first written for (igb, igc), and we have a case of
two bugs canceling out, resulting in a functional setup on igb/igc, but
a less sane one on other NICs.
To the best of my understanding, taprio should prioritize based on the
traffic class, so it should really dequeue starting with the highest
traffic class and going down from there. We get to the TXQ using the
tc_to_txq[] netdev property.
TXQs within the same TC have the same (strict) priority, so we should
pick from them as fairly as we can. We can achieve that by implementing
something very similar to q->curband from multiq_dequeue().
Since igb/igc really do have TXQ 0 of higher hardware priority than
TXQ 1 etc, we need to preserve the behavior for them as well. We really
have no choice, because in txtime-assist mode, taprio is essentially a
software scheduler towards offloaded child tc-etf qdiscs, so the TXQ
selection really does matter (not all igb TXQs support ETF/SO_TXTIME,
says Kurt Kanzenbach).
To preserve the behavior, we need a capability bit so that taprio can
determine if it's running on igb/igc, or on something else. Because igb
doesn't offload taprio at all, we can't piggyback on the
qdisc_offload_query_caps() call from taprio_enable_offload(), but
instead we need a separate call which is also made for software
scheduling.
Introduce two static keys to minimize the performance penalty on systems
which only have igb/igc NICs, and on systems which only have other NICs.
For mixed systems, taprio will have to dynamically check whether to
dequeue using one prioritization algorithm or using the other.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-02-06 (ice)
This series contains updates to ice driver only.
Ani removes WQ_MEM_RECLAIM flag from workqueue to resolve
check_flush_dependency warning.
Michal fixes KASAN out-of-bounds warning.
Brett corrects behaviour for port VLAN Rx filters to prevent receiving
of unintended traffic.
Dan Carpenter fixes possible off by one issue.
Zhang Changzhong adjusts error path for switch recipe to prevent memory
leak.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: switch: fix potential memleak in ice_add_adv_recipe()
ice: Fix off by one in ice_tc_forward_to_queue()
ice: Fix disabling Rx VLAN filtering with port VLAN enabled
ice: fix out-of-bounds KASAN warning in virtchnl
ice: Do not use WQ_MEM_RECLAIM flag for workqueue
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
On some platforms, 100/1000/2500 speeds seem to have sometimes problems
reporting false positive tx unit hang during stressful UDP traffic. Likely
other Intel drivers introduce responses to a tx hang. Update the 'tx hang'
comparator with the comparison of the head and tail of ring pointers and
restore the tx_timeout_factor to the previous value (one).
This can be test by using netperf or iperf3 applications.
Example:
iperf3 -s -p 5001
iperf3 -c 192.168.0.2 --udp -p 5001 --time 600 -b 0
netserver -p 16604
netperf -H 192.168.0.2 -l 600 -p 16604 -t UDP_STREAM -- -m 64000
Fixes: b27b8dc77b5e ("igc: Increase timeout value for Speed 100/1000/2500")
Signed-off-by: Sasha Neftin <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When ice_add_special_words() fails, the 'rm' is not released, which will
lead to a memory leak. Fix this up by going to 'err_unroll' label.
Compile tested only.
Fixes: 8b032a55c1bd ("ice: low level support for tunnels")
Signed-off-by: Zhang Changzhong <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
The > comparison should be >= to prevent reading one element beyond
the end of the array.
The "vsi->num_rxq" is not strictly speaking the number of elements in
the vsi->rxq_map[] array. The array has "vsi->alloc_rxq" elements and
"vsi->num_rxq" is less than or equal to the number of elements in the
array. The array is allocated in ice_vsi_alloc_arrays(). It's still
an off by one but it might not access outside the end of the array.
Fixes: 143b86f346c7 ("ice: Enable RX queue selection using skbedit action")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Amritha Nambiar <[email protected]>
Tested-by: Bharathi Sreenivas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
If the user turns on the vf-true-promiscuous-support flag, then Rx VLAN
filtering will be disabled if the VF requests to enable promiscuous
mode. When the VF is in a port VLAN, this is the incorrect behavior
because it will allow the VF to receive traffic outside of its port VLAN
domain. Fortunately this only resulted in the VF(s) receiving broadcast
traffic outside of the VLAN domain because all of the VLAN promiscuous
rules are based on the port VLAN ID. Fix this by setting the
.disable_rx_filtering VLAN op to a no-op when a port VLAN is enabled on
the VF.
Also, make sure to make this fix for both Single VLAN Mode and Double
VLAN Mode enabled devices.
Fixes: c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Karen Ostrowska <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
KASAN reported:
[ 9793.708867] BUG: KASAN: global-out-of-bounds in ice_get_link_speed+0x16/0x30 [ice]
[ 9793.709205] Read of size 4 at addr ffffffffc1271b1c by task kworker/6:1/402
[ 9793.709222] CPU: 6 PID: 402 Comm: kworker/6:1 Kdump: loaded Tainted: G B OE 6.1.0+ #3
[ 9793.709235] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[ 9793.709245] Workqueue: ice ice_service_task [ice]
[ 9793.709575] Call Trace:
[ 9793.709582] <TASK>
[ 9793.709588] dump_stack_lvl+0x44/0x5c
[ 9793.709613] print_report+0x17f/0x47b
[ 9793.709632] ? __cpuidle_text_end+0x5/0x5
[ 9793.709653] ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.709986] ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.710317] kasan_report+0xb7/0x140
[ 9793.710335] ? ice_get_link_speed+0x16/0x30 [ice]
[ 9793.710673] ice_get_link_speed+0x16/0x30 [ice]
[ 9793.711006] ice_vc_notify_vf_link_state+0x14c/0x160 [ice]
[ 9793.711351] ? ice_vc_repr_cfg_promiscuous_mode+0x120/0x120 [ice]
[ 9793.711698] ice_vc_process_vf_msg+0x7a7/0xc00 [ice]
[ 9793.712074] __ice_clean_ctrlq+0x98f/0xd20 [ice]
[ 9793.712534] ? ice_bridge_setlink+0x410/0x410 [ice]
[ 9793.712979] ? __request_module+0x320/0x520
[ 9793.713014] ? ice_process_vflr_event+0x27/0x130 [ice]
[ 9793.713489] ice_service_task+0x11cf/0x1950 [ice]
[ 9793.713948] ? io_schedule_timeout+0xb0/0xb0
[ 9793.713972] process_one_work+0x3d0/0x6a0
[ 9793.714003] worker_thread+0x8a/0x610
[ 9793.714031] ? process_one_work+0x6a0/0x6a0
[ 9793.714049] kthread+0x164/0x1a0
[ 9793.714071] ? kthread_complete_and_exit+0x20/0x20
[ 9793.714100] ret_from_fork+0x1f/0x30
[ 9793.714137] </TASK>
[ 9793.714151] The buggy address belongs to the variable:
[ 9793.714158] ice_aq_to_link_speed+0x3c/0xffffffffffff3520 [ice]
[ 9793.714632] Memory state around the buggy address:
[ 9793.714642] ffffffffc1271a00: f9 f9 f9 f9 00 00 05 f9 f9 f9 f9 f9 00 00 02 f9
[ 9793.714656] ffffffffc1271a80: f9 f9 f9 f9 00 00 04 f9 f9 f9 f9 f9 00 00 00 00
[ 9793.714670] >ffffffffc1271b00: 00 00 00 04 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
[ 9793.714680] ^
[ 9793.714690] ffffffffc1271b80: 00 00 00 00 00 04 f9 f9 f9 f9 f9 f9 00 00 00 00
[ 9793.714704] ffffffffc1271c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The ICE_AQ_LINK_SPEED_UNKNOWN define is BIT(15). The value is bigger
than both legacy and normal link speed tables. Add one element (0 -
unknown) to both tables. There is no need to explicitly set table size,
leave it empty.
Fixes: 1d0e28a9be1f ("ice: Remove and replace ice speed defines with ethtool.h versions")
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
When both ice and the irdma driver are loaded, a warning in
check_flush_dependency is being triggered. This is due to ice driver
workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
is not.
According to kernel documentation, this flag should be set if the
workqueue will be involved in the kernel's memory reclamation flow.
Since it is not, there is no need for the ice driver's WQ to have this
flag set so remove it.
Example trace:
[ +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
[ +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
[ +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[ +0.000161] [last unloaded: bonding]
[ +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
[ +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[ +0.000003] Workqueue: ice ice_service_task [ice]
[ +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
[ +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
[ +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
[ +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
[ +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
[ +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
[ +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
[ +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
[ +0.000004] FS: 0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
[ +0.000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
[ +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ +0.000002] PKRU: 55555554
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000003] __flush_workqueue+0x203/0x840
[ +0.000006] ? mutex_unlock+0x84/0xd0
[ +0.000008] ? __pfx_mutex_unlock+0x10/0x10
[ +0.000004] ? __pfx___flush_workqueue+0x10/0x10
[ +0.000006] ? mutex_lock+0xa3/0xf0
[ +0.000005] ib_cache_cleanup_one+0x39/0x190 [ib_core]
[ +0.000174] __ib_unregister_device+0x84/0xf0 [ib_core]
[ +0.000094] ib_unregister_device+0x25/0x30 [ib_core]
[ +0.000093] irdma_ib_unregister_device+0x97/0xc0 [irdma]
[ +0.000064] ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
[ +0.000059] ? up_write+0x5c/0x90
[ +0.000005] irdma_remove+0x36/0x90 [irdma]
[ +0.000062] auxiliary_bus_remove+0x32/0x50
[ +0.000007] device_release_driver_internal+0xfa/0x1c0
[ +0.000005] bus_remove_device+0x18a/0x260
[ +0.000007] device_del+0x2e5/0x650
[ +0.000005] ? __pfx_device_del+0x10/0x10
[ +0.000003] ? mutex_unlock+0x84/0xd0
[ +0.000004] ? __pfx_mutex_unlock+0x10/0x10
[ +0.000004] ? _raw_spin_unlock+0x18/0x40
[ +0.000005] ice_unplug_aux_dev+0x52/0x70 [ice]
[ +0.000160] ice_service_task+0x1309/0x14f0 [ice]
[ +0.000134] ? __pfx___schedule+0x10/0x10
[ +0.000006] process_one_work+0x3b1/0x6c0
[ +0.000008] worker_thread+0x69/0x670
[ +0.000005] ? __kthread_parkme+0xec/0x110
[ +0.000007] ? __pfx_worker_thread+0x10/0x10
[ +0.000005] kthread+0x17f/0x1b0
[ +0.000005] ? __pfx_kthread+0x10/0x10
[ +0.000004] ret_from_fork+0x29/0x50
[ +0.000009] </TASK>
Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Signed-off-by: Marcin Szycik <[email protected]>
Tested-by: Jakub Andrysiak <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
|
|
The dev_lan_addr and hw_lan_addr members of ice_vf are used only to store
the MAC address for the VF. They are defined using virtchnl_ether_addr, but
only the .addr sub-member is actually used. Drop the use of
virtchnl_ether_addr and just use a u8 array of length [ETH_ALEN].
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The Scalable IOV implementation will require notifying the VDCM driver when
an IRQ must be closed. This allows the VDCM to handle releasing stale IRQ
context values and properly reconfigure.
To handle this, introduce a new optional .irq_close callback to the VF
operations structure. This will be implemented by Scalable IOV to handle
the shutdown of the IRQ context.
Since the SR-IOV implementation does not need this, we must check that its
non-NULL before calling it.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When hardware is reset, the VF relies on the VFGEN_RSTAT register to detect
when the VF is finished resetting. This is a tri-state register where 0
indicates a reset is in progress, 1 indicates the hardware is done
resetting, and 2 indicates that the software is done resetting.
Currently the PF driver relies on the device hardware resetting VFGEN_RSTAT
when a global reset occurs. This works ok, but it does mean that the VF
might not immediately notice a reset when the driver first detects that the
global reset is occurring.
This is also problematic for Scalable IOV, because there is no read/write
equivalent VFGEN_RSTAT register for the Scalable VSI type. Instead, the
Scalable IOV VFs will need to emulate this register.
To support this, introduce a new VF operation, clear_reset_state, which is
called when the PF driver first detects a global reset. The Single Root IOV
implementation can just write to VFGEN_RSTAT to ensure it's cleared
immediately, without waiting for the actual hardware reset to begin. The
Scalable IOV implementation will use this as part of its tracking of the
reset status to allow properly reporting the emulated VFGEN_RSTAT to the VF
driver.
Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The .vsi_rebuild function exists for ice_reset_vf. It is used to release
and re-create the VSI during a single-VF reset.
This function is only called when we need to re-create the VSI, and not
when rebuilding an existing VSI. This makes the single-VF reset process
different from the process used to restore functionality after a
hardware reset such as the PF reset or EMP reset.
When we add support for Scalable IOV VFs, the implementation will be very
similar. The primary difference will be in the fact that each VF type uses
a different underlying VSI type in hardware.
Move the common functionality into a new ice_vf_recreate VSI function. This
will allow the two IOV paths to share this functionality. Rework the
.vsi_rebuild vf_op into .create_vsi, only performing the task of creating a
new VSI.
This creates a nice dichotomy between the ice_vf_rebuild_vsi and
ice_vf_recreate_vsi, and should make it more clear why the two flows atre
distinct.
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|