Age | Commit message (Collapse) | Author | Files | Lines |
|
Add ability to change speed through ethtool -s <interface> speed <speed in Mb>
Driver advertises all link modes that support requested speed.
Autoneg must be set off e.g.:
ethtool -s <interface> autoneg off speed <speed in Mb>.
Add helper function that translate speed in Mb to
enum i40e_aq_link_speed and compare it to supported speeds from
given ethtool_link_ksettings. Add in i40e_set_link_ksettings hold
for requested speed and set copy_ks.base.speed to safe_ks.base.speed
to be sure that user is not changing unsupported setting.
In i40e_speed_to_link_speed compare requested speed with supported
speeds. Set speed to requested speed.
Signed-off-by: Norbert Zulinski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.
Tested-by: Sarkar Tirthendu <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After PF reset and ethtool -t there was call trace in dmesg
sometimes leading to panic. When there was some time, around 5
seconds, between reset and test there were no errors.
Problem was that pf reset calls i40e_vsi_close in prep_for_reset
and ethtool -t calls i40e_vsi_close in diag_test. If there was not
enough time between those commands the second i40e_vsi_close starts
before previous i40e_vsi_close was done which leads to crash.
Add check to diag_test if pf is in reset and don't start offline
tests if it is true.
Add netif_info("testing failed") into unhappy path of i40e_diag_test()
Fixes: e17bc411aea8 ("i40e: Disable offline diagnostics if VFs are enabled")
Fixes: 510efb2682b3 ("i40e: Fix ethtool offline diagnostic with netqueues")
Signed-off-by: Michal Jaron <[email protected]>
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
If ADQ is enabled for a VF, then actual number of queue pair
is a number of currently available traffic classes for this VF.
Without this change the configuration of the Rx/Tx queues
fails with error.
Fixes: d29e0d233e0d ("i40e: missing input validation on VF message handling by the PF")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Bharathi Sreenivas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Procedure of configure tc flower filters erroneously allows to create
filters on TC0 where unfiltered packets are also directed by default.
Issue was caused by insufficient checks of hw_tc parameter specifying
the hardware traffic class to pass matching packets to.
Fix checking hw_tc parameter which blocks creation of filters on TC0.
Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Bharathi Sreenivas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters configured.
Add new private flag vf-vlan-pruning that allows changing of default
VF behavior for tagged traffic. When the flag is turned on
untrusted VF will only be able to receive untagged traffic
or traffic with VLAN tags it has created interfaces for
The flag is off by default and can only be changed if
there are no VFs spawned on the PF. This flag will only be effective
when no PVID is set on VF and VF is not trusted.
Add new function that computes the correct VLAN ID for VF VLAN filters
based on trust, PVID, vf-vlan-prune-disable flag and current VLAN ID.
Testing Hints:
Test 1: vf-vlan-pruning == off
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Use scapy to send any VLAN tagged traffic and make sure the VF
receives all VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
Test 2: vf-vlan-pruning == on
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning on
2. Use scapy to send any VLAN tagged traffic and make sure the VF does
not receive any VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
3. Add a VLAN filter on the VF netdev
> ip link add link eth0v0 name vlan10 type vlan id 10
4. Bring the VLAN netdev up
> ip link set vlan10 up
4. Use scapy to send traffic with VLAN 10, VLAN 11 (anything not VLAN
10), and untagged traffic. Make sure the VF only receives VLAN 10
and untagged traffic when the link partner is sending.
Test 3: vf-vlan-pruning == off && VF is in a port VLAN
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Create a VF
> echo 1 > sriov_numvfs
3. Put the VF in a port VLAN
> ip link set eth0 vf 0 vlan 10
4. Use scapy to send traffic with VLAN 10 and VLAN 11 (anything not VLAN
10) and make sure the VF only receives untagged traffic when the link
partner is sending VLAN 10 tagged traffic as the VLAN tag is expected
to be stripped by HW for port VLANs and not visible to the VF.
Test 4: Change vf-vlan-pruning while VFs are created
==============================
echo 0 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning off
echo 1 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning on (expect failure)
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The bug is here:
ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err);
The list iterator 'ch' will point to a bogus position containing
HEAD if the list is empty or no element is found. This case must
be checked before any use of the iterator, otherwise it will
lead to a invalid memory access.
To fix this bug, use a new variable 'iter' as the list iterator,
while use the origin variable 'ch' as a dedicated pointer to
point to the found element.
Cc: [email protected]
Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e")
Signed-off-by: Xiaomeng Tong <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-27
We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).
The main changes are:
1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
information about failed CO-RE relocations, from Andrii Nakryiko.
2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
(via probe read) and referenced ones that can be passed to in-kernel helpers,
from Kumar Kartikeya Dwivedi.
3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.
4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
the effective prog array before the rcu_read_lock, from Stanislav Fomichev.
5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
for USDT arguments under riscv{32,64}, from Pu Lehui.
6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.
7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
so it can be shipped under Alpine which is musl-based, from Dominique Martinet.
8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
need to use call_rcu_tasks_trace() barrier, from KP Singh.
9) Improve libbpf API documentation and fix error return handling of various
API functions, from Grant Seltzer.
10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
length of frags + frag_list may surpass old offset limit, from Liu Jian.
11) Various improvements to prog_tests in area of logging, test execution
and by-name subtest selection, from Mykola Lysenko.
12) Simplify map_btf_id generation for all map types by moving this process
to build time with help of resolve_btfids infra, from Menglong Dong.
13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
helpers; the probing caused always to use old helpers, from Runqing Yang.
14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
tracing macros, from Vladimir Isaev.
15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
switched to memcg-based memory accouting a while ago, from Yafang Shao.
16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.
17) Fix BPF selftests in two occasions to work around regressions caused by latest
LLVM to unblock CI until their fixes are worked out, from Yonghong Song.
18) Misc cleanups all over the place, from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add libbpf's log fixup logic selftests
libbpf: Fix up verifier log for unguarded failed CO-RE relos
libbpf: Simplify bpf_core_parse_spec() signature
libbpf: Refactor CO-RE relo human description formatting routine
libbpf: Record subprog-resolved CO-RE relocations unconditionally
selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
libbpf: Avoid joining .BTF.ext data with BPF programs by section name
libbpf: Fix logic for finding matching program for CO-RE relocation
libbpf: Drop unhelpful "program too large" guess
libbpf: Fix anonymous type check in CO-RE logic
bpf: Compute map_btf_id during build time
selftests/bpf: Add test for strict BTF type check
selftests/bpf: Add verifier tests for kptr
selftests/bpf: Add C tests for kptr
libbpf: Add kptr type tag macros to bpf_helpers.h
bpf: Make BTF type match stricter for release arguments
bpf: Teach verifier about kptr_get kfunc helpers
bpf: Wire up freeing of referenced kptr
bpf: Populate pairs of btf_id and destructor kfunc in btf
bpf: Adapt copy_map_value for multiple offset case
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:
XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED}
XDP_TX -> I40E_XDP_{TX,CONSUMED}
XDP_DROP -> I40E_XDP_CONSUMED
XDP_ABORTED -> I40E_XDP_CONSUMED
XDP_PASS -> I40E_XDP_PASS
Commit b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation
XDP_REDIRECT -> I40E_XDP_EXIT
which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.
Simply drop this leftover.
Fixes: b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.
Change ENXIOs in i40e's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.
There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was
returned from xdp_do_redirect() with a XSK Rx queue being full. In such
case, terminate the Rx processing that is being done on the current HW
Rx ring and let the user space consume descriptors from XSK Rx queue so
that there is room that driver can use later on.
Introduce new internal return code I40E_XDP_EXIT that will indicate case
described above.
Note that it does not affect Tx processing that is bound to the same
NAPI context, nor the other Rx rings.
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add support for Ethernet Connection X722 for 10GbE SFP+ cards.
Make possible for the driver to bind to the card.
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add vsi.tx_restart to the i40e driver ethtool statistics
Signed-off-by: Nabil S. Alramli <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Track TX queue stop events and export the new stat with ethtool.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
This change adds support for TSO of MPLS packets.
In my tests with tcpdump it seems to work. Note this test setup has
a 9000 byte MTU:
MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
Flags [P.], seq 593345:644401, ack 0, win 420,
options [nop,nop,TS val 45022534 ecr 1722291395], length 51056
IP dstip.1234 > srcip.50086: Flags [.], ack 593345, win 122,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
IP dstip.1234 > srcip.50086: Flags [.], ack 602289, win 105,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
IP dstip.1234 > srcip.50086: Flags [.], ack 620177, win 71,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
Flags [P.], seq 644401:655953, ack 0, win 420,
options [nop,nop,TS val 45022534 ecr 1722291395], length 11552
IP dstip.1234 > srcip.50086: Flags [.], ack 638065, win 37,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
IP dstip.1234 > srcip.50086: Flags [.], ack 644401, win 25,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
IP dstip.1234 > srcip.50086: Flags [.], ack 653345, win 8,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
IP dstip.1234 > srcip.50086: Flags [.], ack 655953, win 3,
options [nop,nop,TS val 1722291395 ecr 45022534], length 0
Signed-off-by: Joe Damato <[email protected]>
Co-developed-by: Mike Gallo <[email protected]>
Signed-off-by: Mike Gallo <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
The calculation of the checksum can fail.
So move converting the checksum to little endian
to inside the return status check.
Signed-off-by: Tom Rix <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
|
|
net/dsa/dsa2.c
commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/intel/ice/ice.h
commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU")
commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/[email protected]/
drivers/staging/gdm724x/gdm_lte.c
commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
commit 4bcc4249b4cf ("staging: Use netif_rx().")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf)
function has logic to detect "failure" responses sent to the VF. If a VF
is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is
marked as disabled. In either case, a dev_info message is printed
stating that a VF opcode failed.
This logic originates from the early implementation of VF support in
commit 5c3c48ac6bf5 ("i40e: implement virtual device interface").
That commit did not go far enough. The "logic" for this behavior seems
to be that error responses somehow indicate a malicious VF. This is not
really true. The PF might be sending an error for any number of reasons
such as lacking resources, an unsupported operation, etc. This does not
indicate a malicious VF. We already have a separate robust malicious VF
detection which relies on hardware logic to detect and prevent a variety
of behaviors.
There is no justification for this behavior in the original
implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some
message levels") reduced the opcode failure message from a dev_err to a
dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not
show opcode msg on unsuccessful VF MAC change") changed the logic to
allow quieting it for expected failures.
That commit prevented this logic from kicking in for specific
circumstances. This change did not go far enough. The behavior is not
documented nor is it part of any requirement for our products. Other
operating systems such as the FreeBSD implementation of our driver do
not include this logic.
It is clear this check does not make sense, and causes problems which
led to ugly workarounds.
Fix this by just removing the entire logic and the need for the
i40e_vc_send_msg_to_vf_ex function.
Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change")
Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
tools/testing/selftests/net/mptcp/mptcp_join.sh
34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")
857898eb4b28 ("selftests: mptcp: add missing join check")
6ef84b1517e0 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions")
c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")
09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr")
efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.
Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391
This reverts commit 3d2504663c41104b4359a15f35670cfa82de1bbf.
Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The 'if (ntu == rx_ring->count)' block in i40e_alloc_rx_buffers_zc()
was previously residing in the loop, but after introducing the
batched interface it is used only to wrap-around the NTU descriptor,
thus no more need to assign 'xdp'.
'cleaned_count' in i40e_clean_rx_irq_zc() was previously being
incremented in the loop, but after commit f12738b6ec06
("i40e: remove unnecessary cleaned_count updates") it gets
assigned only once after it, so the initialization can be dropped.
Fixes: 6aab0bb0c5cd ("i40e: Use the xsk batched rx allocation interface")
Fixes: f12738b6ec06 ("i40e: remove unnecessary cleaned_count updates")
Signed-off-by: Alexander Lobakin <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-02-09
We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
The main changes are:
1) Add custom BPF allocator for JITs that pack multiple programs into a huge
page to reduce iTLB pressure, from Song Liu.
2) Add __user tagging support in vmlinux BTF and utilize it from BPF
verifier when generating loads, from Yonghong Song.
3) Add per-socket fast path check guarding from cgroup/BPF overhead when
used by only some sockets, from Pavel Begunkov.
4) Continued libbpf deprecation work of APIs/features and removal of their
usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
and various others.
5) Improve BPF instruction set documentation by adding byte swap
instructions and cleaning up load/store section, from Christoph Hellwig.
6) Switch BPF preload infra to light skeleton and remove libbpf dependency
from it, from Alexei Starovoitov.
7) Fix architecture-agnostic macros in libbpf for accessing syscall
arguments from BPF progs for non-x86 architectures,
from Ilya Leoshkevich.
8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
9) Add new bpf_copy_from_user_task() helper to read memory from a different
task than current. Add ability to create sleepable BPF iterator progs,
from Kenny Yu.
10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
11) Generate temporary netns names for BPF selftests to avoid naming
collisions, from Hangbin Liu.
12) Implement bpf_core_types_are_compat() with limited recursion for
in-kernel usage, from Matteo Croce.
13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
14) Misc minor fixes to libbpf and selftests from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
libbpf: Fix compilation warning due to mismatched printf format
selftests/bpf: Test BPF_KPROBE_SYSCALL macro
libbpf: Add BPF_KPROBE_SYSCALL macro
libbpf: Fix accessing the first syscall argument on s390
libbpf: Fix accessing the first syscall argument on arm64
libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
libbpf: Fix accessing syscall arguments on riscv
libbpf: Fix riscv register names
libbpf: Fix accessing syscall arguments on powerpc
selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
libbpf: Add PT_REGS_SYSCALL_REGS macro
selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
bpf: Fix leftover header->pages in sparc and powerpc code.
libbpf: Fix signedness bug in btf_dump_array_data()
selftests/bpf: Do not export subtest as standalone test
bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In some cases, pages cannot be reused by i40e because the page is busy. Add
a counter for this event.
Busy page count is accessible via ethtool.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
In some cases, pages can not be reused because they are not associated with
the correct NUMA zone. Knowing how often pages are waived helps users to
understand the interaction between the driver's memory usage and their
system.
Pass rx_stats through to i40e_can_reuse_rx_page to allow tracking when
pages are waived.
The page waive count is accessible via ethtool.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Add a counter for new page allocations in the i40e RX path. This stat is
accessible with ethtool.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
rx page reuse was already being tracked by the i40e driver per RX ring.
Aggregate the counts and make them accessible via ethtool.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Page reuse was being tracked from two locations:
- i40e_reuse_rx_page (via 40e_clean_rx_irq), and
- i40e_alloc_mapped_page
Remove the double count and only count reuse from i40e_alloc_mapped_page
when the page is about to be reused.
Signed-off-by: Joe Damato <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-02-03
This series contains updates to the i40e client header file and driver.
Mateusz disables HW TC offload by default.
Joe Damato removes a no longer used statistic.
Jakub Kicinski removes an unused enum from the client header file.
Jedrzej changes some admin queue commands to occur under atomic context
and adds new functions for admin queue MAC VLAN filters to avoid a
potential race that could occur due storing results in a structure that
could be overwritten by the next admin queue call.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There was a race condition in access to hw->aq.asq_last_status
while adding and deleting MAC/VLAN filters causing
incorrect error status to be printed as ERROR OK instead of
the correct error.
Change calls to i40e_aq_add_macvlan in i40e_aqc_add_filters
and i40e_aq_remove_macvlan in i40e_aqc_del_filters
to _v2 versions that return Admin Queue status on the stack
to avoid race conditions in access to hw->aq.asq_last_status.
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new _v2 version of i40e_aq_add_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new versions of send ASQ command functions that return
Admin Queue status on the stack to avoid race conditions
in access to hw->aq.asq_last_status.
Add new _v2 version of i40e_aq_remove_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Change functions:
- i40e_aq_add_macvlan
- i40e_aq_remove_macvlan
- i40e_aq_delete_element
- i40e_aq_add_vsi
- i40e_aq_update_vsi_params
to explicitly use i40e_asq_send_command_atomic(..., true)
instead of i40e_asq_send_command, as they use mutexes and do some
work in an atomic context.
Without this change setting vlan via netdev will fail with
call trace cased by bug "BUG: scheduling while atomic".
Signed-off-by: Witold Fijalkowski <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
After commit 1a557afc4dd5 ("i40e: Refactor receive routine"),
rx_stats.realloc_count is no longer being incremented, so remove it.
The debugfs string was left, but hardcoded to 0. This is intended to
prevent breaking any existing code / scripts that are parsing debugfs
for i40e.
Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
After loading driver hw-tc-offload is enabled by default.
Change the behaviour of driver to disable hw-tc-offload by default as
this is the expected state. Additionally since this impacts ntuple
feature state change the way of checking NETIF_F_HW_TC flag.
Signed-off-by: Norbert Zulinski <[email protected]>
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Fix the crash in kernel while dereferencing the NULL pointer,
when the driver is unloaded and simultaneously the VSI rings
are being stopped.
The hardware requires 50msec in order to finish RX queues
disable. For this purpose the driver spins in mdelay function
for the operation to be completed.
For example changing number of queues which requires reset would
fail in the following call stack:
1) i40e_prep_for_reset
2) i40e_pf_quiesce_all_vsi
3) i40e_quiesce_vsi
4) i40e_vsi_close
5) i40e_down
6) i40e_vsi_stop_rings
7) i40e_vsi_control_rx -> disable requires the delay of 50msecs
8) continue back in i40e_down function where
i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash
When the driver was spinning vsi_release called
i40e_vsi_free_arrays where the vsi->tx_rings resources
were freed and the pointer was set to NULL.
Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver")
Signed-off-by: Slawomir Laba <[email protected]>
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
There was an AQ error I40E_AQ_RC_EINVAL when trying
to reset bw limit as part of bw allocation setup.
This was caused by trying to reset bw limit with
DCB enabled. Bw limit should not be reset when
DCB is enabled. The code was relying on the pf->flags
to check if DCB is enabled but if only 1 TC is available
this flag will not be set even though DCB is enabled.
Add a check for number of TC and if it is 1
don't try to reset bw limit even if pf->flags shows
DCB as disabled.
Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled")
Suggested-by: Alexander Lobakin <[email protected]> # Flatten the condition
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Tested-by: Imam Hassan Reza Biswas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
For now, if the XDP prog returns XDP_PASS on XSK, the metadata will
be lost as it doesn't get copied to the skb.
Copy it along with the frame headers. Account its size on skb
allocation, and when copying just treat it as a part of the frame
and do a pull after to "move" it to the "reserved" zone.
net_prefetch() xdp->data_meta and align the copy size to speed-up
memcpy() a little and better match i40e_construct_skb().
Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Suggested-by: Jesper Dangaard Brouer <[email protected]>
Suggested-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
{__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
+ NET_IP_ALIGN for any skb.
OTOH, i40e_construct_skb_zc() currently allocates and reserves
additional `xdp->data - xdp->data_hard_start`, which is
XDP_PACKET_HEADROOM for XSK frames.
There's no need for that at all as the frame is post-XDP and will
go only to the networking stack core.
Pass the size of the actual data only to __napi_alloc_skb() and
don't reserve anything. This will give enough headroom for stack
processing.
Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Move desc_array from the driver to the pool. The reason behind this is
that we can then reuse this array as a temporary storage for descriptors
in all zero-copy drivers that use the batched interface. This will make
it easier to add batching to more drivers.
i40e is the only driver that has a batched Tx zero-copy
implementation, so no need to touch any other driver.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match
the width of the stats counters in struct i40e_rx_queue_stats.
Update debugfs code to use the correct format specifier for u64.
Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Joe Damato <[email protected]>
Reported-by: kernel test robot <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Fix for failed to init adminq: -53 while VF is resetting via MAC
address changing procedure.
Added sync module to avoid reading deadbeef value in reinit adminq
during software reset.
Without this patch it is possible to trigger VF reset procedure
during reinit adminq. This resulted in an incorrect reading of
value from the AQP registers and generated the -53 error.
Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When XDP was configured on a system with large number of CPUs
and X722 NIC there was a call trace with NULL pointer dereference.
i40e 0000:87:00.0: failed to get tracking for 256 queues for VSI 0 err -12
i40e 0000:87:00.0: setup of MAIN VSI failed
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:i40e_xdp+0xea/0x1b0 [i40e]
Call Trace:
? i40e_reconfig_rss_queues+0x130/0x130 [i40e]
dev_xdp_install+0x61/0xe0
dev_xdp_attach+0x18a/0x4c0
dev_change_xdp_fd+0x1e6/0x220
do_setlink+0x616/0x1030
? ahci_port_stop+0x80/0x80
? ata_qc_issue+0x107/0x1e0
? lock_timer_base+0x61/0x80
? __mod_timer+0x202/0x380
rtnl_setlink+0xe5/0x170
? bpf_lsm_binder_transaction+0x10/0x10
? security_capable+0x36/0x50
rtnetlink_rcv_msg+0x121/0x350
? rtnl_calcit.isra.0+0x100/0x100
netlink_rcv_skb+0x50/0xf0
netlink_unicast+0x1d3/0x2a0
netlink_sendmsg+0x22a/0x440
sock_sendmsg+0x5e/0x60
__sys_sendto+0xf0/0x160
? __sys_getsockname+0x7e/0xc0
? _copy_from_user+0x3c/0x80
? __sys_setsockopt+0xc8/0x1a0
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f83fa7a39e0
This was caused by PF queue pile fragmentation due to
flow director VSI queue being placed right after main VSI.
Because of this main VSI was not able to resize its
queue allocation for XDP resulting in no queues allocated
for main VSI when XDP was turned on.
Fix this by always allocating last queue in PF queue pile
for a flow director VSI.
Fixes: 41c445ff0f48 ("i40e: main driver core")
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Kiran Bhandare <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Before this patch VF interface vanished when
maximum queue number was exceeded. Driver tried
to add next queues even if there was not enough
space. PF sent incorrect number of queues to
the VF when there were not enough of them.
Add an additional condition introduced to check
available space in 'qp_pile' before proceeding.
This condition makes it impossible to add queues
if they number is greater than the number resulting
from available space.
Also add the search for free space in PF queue
pair piles.
Without this patch VF interfaces are not seen
when available space for queues has been
exceeded and following logs appears permanently
in dmesg:
"Unable to get VF config (-32)".
"VF 62 failed opcode 3, retval: -5"
"Unable to get VF config due to PF error condition, not retrying"
Fixes: 7daa6bf3294e ("i40e: driver core headers")
Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Jaroslaw Gawin <[email protected]>
Signed-off-by: Slawomir Laba <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Recently simplified i40e_rebuild causes that FW sometimes
is not ready after NVM update, the ping does not return.
Increase the delay in case of EMP reset.
Old delay of 300 ms was introduced for specific cards for 710 series.
Now it works for all the cards and delay was increased.
Fixes: 1fa51a650e1d ("i40e: Add delay after EMP reset for firmware to recover")
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Provide a new interface for affinity hints to provide a separation
between hint and actual affinity change which has become a hidden
property of the current interface
- Fix up the in tree usage of the affinity hint interfaces
Drivers:
- No new irqchip drivers!
- Fix GICv3 redistributor table reservation with RT across kexec
- Fix GICv4.1 redistributor view of the VPE table across kexec
- Add support for extra interrupts on spear-shirq
- Make obtaining some interrupts optional for the Renesas drivers
- Various cleanups and bug fixes"
* tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt
irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt
irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
irqchip/ingenic-tcu: Use correctly sized arguments for bit field
irqchip/gic-v2m: Add const to of_device_id
irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init
irqchip/spear-shirq: Add support for IRQ 0..6
irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime
irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
net/mlx4: Use irq_update_affinity_hint()
net/mlx5: Use irq_set_affinity_and_hint()
hinic: Use irq_set_affinity_and_hint()
scsi: lpfc: Use irq_set_affinity()
mailbox: Use irq_update_affinity_hint()
ixgbe: Use irq_update_affinity_hint()
be2net: Use irq_update_affinity_hint()
enic: Use irq_update_affinity_hint()
RDMA/irdma: Use irq_update_affinity_hint()
scsi: mpt3sas: Use irq_set_affinity_and_hint()
...
|
|
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
Signed-off-by: Ingo Molnar <[email protected]>
|