Age | Commit message (Collapse) | Author | Files | Lines |
|
When the action of an xdp program was XDP_TX, lan966x was creating
a xdp_frame and use this one to send the frame back. But it is also
possible to send back the frame without needing a xdp_frame, because
it is possible to send it back using the page.
And then once the frame is transmitted is possible to use directly
page_pool_recycle_direct as lan966x is using page pools.
This would save some CPU usage on this path, which results in higher
number of transmitted frames. Bellow are the statistics:
Frame size: Improvement:
64 ~8%
256 ~11%
512 ~8%
1000 ~0%
1500 ~0%
Signed-off-by: Horatiu Vultur <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Send and complete XSK pool frames within TX NAPI context. NAPI context
is triggered by ndo_xsk_wakeup.
Test results with A53 1.2GHz:
xdpsock txonly copy mode, 64 byte frames:
pps pkts 1.00
tx 284,409 11,398,144
Two CPUs with 100% and 10% utilization.
xdpsock txonly zero-copy mode, 64 byte frames:
pps pkts 1.00
tx 511,929 5,890,368
Two CPUs with 100% and 1% utilization.
xdpsock l2fwd copy mode, 64 byte frames:
pps pkts 1.00
rx 248,985 7,315,885
tx 248,921 7,315,885
Two CPUs with 100% and 10% utilization.
xdpsock l2fwd zero-copy mode, 64 byte frames:
pps pkts 1.00
rx 254,735 3,039,456
tx 254,735 3,039,456
Two CPUs with 100% and 4% utilization.
Packet rate increases and CPU utilization is reduced in both cases.
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add support for XSK zero-copy to RX path. The setup of the XSK pool can
be done at runtime. If the netdev is running, then the queue must be
disabled and enabled during reconfiguration. This can be done easily
with functions introduced in previous commits.
A more important property is that, if the netdev is running, then the
setup of the XSK pool shall not stop the netdev in case of errors. A
broken netdev after a failed XSK pool setup is bad behavior. Therefore,
the allocation and setup of resources during XSK pool setup is done only
before any queue is disabled. Additionally, freeing and later allocation
of resources is eliminated in some cases. Page pool entries are kept for
later use. Two memory models are registered in parallel. As a result,
the XSK pool setup cannot fail during queue reconfiguration.
In contrast to other drivers, XSK pool setup and XDP BPF program setup
are separate actions. XSK pool setup can be done without any XDP BPF
program. The XDP BPF program can be added, removed or changed without
any reconfiguration of the XSK pool.
Test results with A53 1.2GHz:
xdpsock rxdrop copy mode, 64 byte frames:
pps pkts 1.00
rx 856,054 10,625,775
Two CPUs with both 100% utilization.
xdpsock rxdrop zero-copy mode, 64 byte frames:
pps pkts 1.00
rx 889,388 4,615,284
Two CPUs with 100% and 20% utilization.
Packet rate increases and CPU utilization is reduced.
100% CPU load seems to the base load. This load is consumed by ksoftirqd
just for dropping the generated packets without xdpsock running.
Using batch API reduced CPU utilization slightly, but measurements are
not stable enough to provide meaningful numbers.
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The function tsnep_rx_poll() is already pretty long and the skb receive
action can be reused for XSK zero-copy support. Move page based skb
receive to separate function.
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Move queue enable and disable code to separate functions. This way the
activation and deactivation of the queues are defined actions, which can
be used in future execution paths.
This functions will be used for the queue reconfiguration at runtime,
which is necessary for XSK zero-copy support.
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Make initialization of TX and RX queues less dynamic by moving some
initialization from netdev open/close to device probing.
Additionally, move some initialization code to separate functions to
enable future use in other execution paths.
This is done as preparation for queue reconfigure at runtime, which is
necessary for XSK zero-copy support.
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
TX/RX ring size is static and power of 2 to enable compiler to optimize
modulo operation to mask operation. Make this optimization already in
the code and don't rely on the compiler.
CPU utilisation during high packet rate has not changed. So no
performance improvement has been measured. But it is best practice to
prevent modulo operations.
Suggested-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Gerhard Engleder <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
netdev/napi_alloc_frag() may fall back to single page which is smaller
than the requested size.
Add error checking to avoid memory overwritten.
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Rename mana_refill_rxoob for naming consistency.
And remove some empty lines between function call and error
checking.
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/[email protected]/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
|
|
It appears that dpaa_enable_tx_csum() only calls skb_reset_mac_header()
to get to the VLAN header using skb_mac_header().
We can use skb_vlan_eth_hdr() to get to the VLAN header based on
skb->data directly. This avoids spending a few cycles to set
skb->mac_header.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Similar to skb_eth_hdr() introduced in commit 96cc4b69581d ("macvlan: do
not assume mac_header is set in macvlan_broadcast()"), let's introduce a
skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get
to the VLAN header based on skb->data rather than based on the
skb_mac_header(skb).
We also consolidate the drivers that dereference skb->data to go through
this helper.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Smatch complains that:
mtk_wed_hw_add_debugfs() warn: 'dir' is an error pointer or valid
Debugfs checks are generally not supposed to be checked
for errors and it is not necessary here.
fix it by just deleting the dead code.
Signed-off-by: Wang Zhang <[email protected]>
Reviewed-by: Dongliang Mu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
This series lowers the CPU usage of the ice driver when using its
provided /dev/gnss*.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Older chips (like MT7622) use a different bit in ib2 to enable hardware
counter support. Add macros for both and select the appropriate bit.
Fixes: 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Daniel Golle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In order to support wireless offloading on MT7981 we need to load the
appropriate firmware. Recognize MT7981 and load mt7981_wo.bin.
Signed-off-by: Daniel Golle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-04-20
1) Dragos Improves RX page pool, and provides some fixes to his previous
series:
1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
1.2) Hook NAPIs to page pools to gain more performance.
2) From Roi, Some cleanups to TC and eswitch modules.
3) Maher migrates vnic diagnostic counters reporting from debugfs to a
dedicated devlink health reporter
Maher Says:
===========
net/mlx5: Expose vnic diagnostic counters using devlink
Currently, vnic diagnostic counters are exposed through the following
debugfs:
$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow
nic_receive_steering_discard
The current design does not allow the hypervisor to view the diagnostic
counters of its VFs, in case the VFs get bound to a VM. In other words,
the counters are not exposed for representor interfaces.
Furthermore, the debugfs design is inconvenient future-wise, in case more
counters need to be reported by the driver in the future.
As these counters pertain to vNIC health, it is more appropriate to
utilize the devlink health reporter to expose them.
Thus, this patchest includes the following changes:
* Drop the current vnic diagnostic counters debugfs interface.
* Add a vnic devlink health reporter for PFs/VFs core devices, which
when diagnosed will dump vnic diagnostic counter values that are
queried from FW.
* Add a vnic devlink health reporter for the representor interface, which
serves the same purpose listed in the previous point, in addition to
allowing the hypervisor to view its VFs diagnostic counters, even when
the VFs are bounded to external VMs.
Example of devlink health reporter usage is:
$devlink health diagnose pci/0000:08:00.0 reporter vnic
vNIC env counters:
total_error_queues: 0 send_queue_priority_update_flow: 0
comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
invalid_command: 0 quota_exceeded_command: 0
nic_receive_steering_discard: 0
===========
4) SW steering fixes and improvements
Yevgeny Kliteynik Says:
=======================
These short patch series are just small fixes / improvements for
SW steering:
- Patch 1: Fix dumping of legacy modify_hdr in debug dump to
align to what is expected by parser
- Patch 2: Have separate threshold for ICM sync per ICM type
- Patch 3: Add more info to the steering debug dump - Linux
version and device name
- Patch 4: Keep track of number of buddies that are currently
in use per domain per buddy type
=======================
* tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Update op_mode to op_mod for port selection
net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()
net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()
net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()
net/mlx5e: RX, Hook NAPIs to page pools
net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
net/mlx5e: Add vnic devlink health reporter to representors
net/mlx5: Add vnic devlink health reporter to PFs/VFs
Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
Revert "net/mlx5: Expose steering dropped packets counter"
net/mlx5: DR, Add memory statistics for domain object
net/mlx5: DR, Add more info in domain dbg dump
net/mlx5: DR, Calculate sync threshold of each pool according to its type
net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2023-04-19
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
Revert "net/mlx5e: Don't use termination table when redundant"
net/mlx5e: Nullify table pointer when failing to create
net/mlx5: Use recovery timeout on sync reset flow
Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function"
net/mlx5e: Fix error flow in representor failing to add vport rx rule
net/mlx5: Release tunnel device after tc update skb
net/mlx5: E-switch, Don't destroy indirect table in split rule
net/mlx5: E-switch, Create per vport table based on devlink encap mode
net/mlx5e: Release the label when replacing existing ct entry
net/mlx5e: Don't clone flow post action attributes second time
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
ixgbe: Multiple RSS bugfixes
Joe Damato says:
This series fixes two bugs I stumbled on with ixgbe:
1. The flow hash cannot be set manually with ethool at all. Patch 1/2
addresses this by fixing what appears to be a small bug in set_rxfh in
ixgbe. See the commit message for more details.
2. Once the above patch is applied and the flow hash can be set,
resetting the flow hash to default fails if the number of queues is
greater than the number of queues supported by RSS. Other drivers (like
i40e) will reset the flowhash to use the maximum number of queues
supported by RSS even if the queue count configured is larger. In other
words: some queues will not have packets distributed to them by the RSS
hash if the queue count is too large. Patch 2/2 allows the user to reset
ixgbe to default and the flowhash is set correctly to either the
maximum number of queues supported by RSS or the configured queue count,
whichever is smaller.
I believe this is correct and it mimics the behavior of i40e;
`ethtool -X $iface default` should probably always succeed even if all the
queues cannot be utilized. See the commit message for more details and
examples.
I tested these on an ixgbe system I have access to and they appear to
work as intended, but I would appreciate a review by the experts on this
list :)
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: Enable setting RSS table to default values
ixgbe: Allow flow hash to be set via ethtool
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There are two pointers in struct xfrm_dev_offload, *dev, *real_dev.
The *dev points whether bonding interface or real interface, if
bonding IPsec offload is used, it points bonding interface; if not,
it points real interface. And *real_dev always points real interface.
So nfp should always use real_dev instead of dev.
Prior to this change the system becomes unresponsive when offloading
IPsec for a device which is a lower device to a bonding device.
Fixes: 859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
CC: [email protected]
Signed-off-by: Huanhuan Wang <[email protected]>
Acked-by: Simon Horman <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The return value is not initialized on the success path.
Fixes: 901bdff2f529 ("net: fman: Change return type of disable to void")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Madalin Bucur <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
ARP discovery code has same logic for RX and TX flows, but with
different source and destination fields. Instead of duplicating
same code in mlx5e_ipsec_init_macs, let's refactor.
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are some flows in which work structure is not allocated at all
and it is needed to be checked prior release of data structure.
general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events xfrm_state_gc_task
RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core]
Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89
RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000
RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050
RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000
R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000
R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118
FS: 0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
___xfrm_state_destroy+0x3c8/0x5e0
xfrm_state_gc_task+0xf6/0x140
? ___xfrm_state_destroy+0x5e0/0x5e0
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
? pwq_dec_nr_in_flight+0x230/0x230
? spin_bug+0x1d0/0x1d0
worker_thread+0x59d/0xec0
? __kthread_parkme+0xd9/0x1d0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core]
---[ end trace 0000000000000000 ]---
Fixes: 4562116f8a56 ("net/mlx5e: Generalize IPsec work structs")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix size argument in memcmp to compare whole IPv6 address.
Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Reviewed-by: Raed Salem <[email protected]>
Reviewed-by: Emeel Hakim <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Addition of new err_xfrm label caused to error messages be overwritten.
Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
in a default message.
Fixes: aa8bd0c9518c ("net/mlx5e: Support IPsec acquire default SA")
Reviewed-by: Raed Salem <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When trying to set IPsec policy block action the following error is
generated:
mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x8708c3), err(-22)
This error means that drop action is not allowed when modify action is
set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.
Fixes: 6721239672fe ("net/mlx5e: Skip IPsec encryption for TX path without matching policy")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The system hang because of dsa_tag_8021q_port_setup()->
stmmac_vlan_rx_add_vid().
I found in stmmac_drv_probe() that cailing pm_runtime_put()
disabled the clock.
First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
resume/suspend is active.
Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
registers after stmmac's clock is closed.
I would suggest adding the pm_runtime_resume_and_get() to the
stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
while in use.
Fixes: b3dcb3127786 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()")
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Yan Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remaining documentation and Kconfig hook for building the driver.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler. Add the code to
pass along the events to the clients.
The entry points pdsc_register_notify() and pdsc_unregister_notify()
are EXPORTed for other drivers that want to listen for these events.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add the client API operations for running adminq commands.
The core registers the client with the FW, then the client
has a context for requesting adminq services. We expect
to add additional operations for other clients, including
requesting additional private adminqs and IRQs, but don't have
the need yet.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add the devlink parameter switches so the user can enable
the features supported by the VFs. The only feature supported
at the moment is vDPA.
Example:
devlink dev param set pci/0000:2b:00.0 \
name enable_vnet cmode runtime value true
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
An auxiliary_bus device is created for each vDPA type VF at VF
probe and destroyed at VF remove. The aux device name comes
from the driver name + VIF type + the unique id assigned at PCI
probe. The VFs are always removed on PF remove, so there should
be no issues with VFs trying to access missing PF structures.
The auxiliary_device names will look like "pds_core.vDPA.nn"
where 'nn' is the VF's uid.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is the initial VF PCI driver framework for the new
pds_vdpa VF device, which will work in conjunction with an
auxiliary_bus client of the pds_core driver. This does the
very basics of registering for the new VF device, setting
up debugfs entries, and registering with devlink.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The Virtual Interfaces (VIFs) supported by the DSC's
configuration (vDPA, Eth, RDMA, etc) are reported in the
dev_ident struct and made visible in debugfs. At this point
only vDPA is supported in this driver so we only setup
devices for that feature.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add in the support for doing firmware updates. Of the two
main banks available, a and b, this updates the one not in
use and then selects it for the next boot.
Example:
devlink dev flash pci/0000:b2:00.0 \
file pensando/dsc_fw_1.63.0-22.tar
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add the service routines for submitting and processing
the adminq messages and for handling notifyq events.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Set up the basic adminq and notifyq queue structures. These are
used mostly by the client drivers for feature configuration.
These are essentially the same adminq and notifyq as in the
ionic driver.
Part of this includes querying for device identity and FW
information, so we can make that available to devlink dev info.
$ devlink dev info pci/0000:b5:00.0
pci/0000:b5:00.0:
driver pds_core
serial_number FLM18420073
versions:
fixed:
asic.id 0x0
asic.rev 0x0
running:
fw 1.51.0-73
stored:
fw.goldfw 1.15.9-C-22
fw.mainfwa 1.60.0-73
fw.mainfwb 1.60.0-57
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add devlink health reporting on top of our fw watchdog.
Example:
# devlink health show pci/0000:2b:00.0 reporter fw
pci/0000:2b:00.0:
reporter fw
state healthy error 0 recover 0
# devlink health diagnose pci/0000:2b:00.0 reporter fw
Status: healthy State: 1 Generation: 0 Recoveries: 0
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add in the periodic health check and the related workqueue,
as well as the handlers for when a FW reset is seen.
The firmware is polled every 5 seconds to be sure that it is
still alive and that the FW generation didn't change.
The alive check looks to see that the PCI bus is still readable
and the fw_status still has the RUNNING bit on. If not alive,
the driver stops activity and tears things down. When the FW
recovers and the alive check again succeeds, the driver sets
back up for activity.
The generation check looks at the fw_generation to see if it
has changed, which can happen if the FW crashed and recovered
or was updated in between health checks. If changed, the
driver counts that as though the alive test failed and forces
the fw_down/fw_up cycle.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The devcmd interface is the basic connection to the device through the
PCI BAR for low level identification and command services. This does
the early device initialization and finds the identity data, and adds
devcmd routines to be used by later driver bits.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is the initial PCI driver framework for the new pds_core device
driver and its family of devices. This does the very basics of
registering for the new PF PCI device 1dd8:100c, setting up debugfs
entries, and registering with devlink.
Signed-off-by: Shannon Nelson <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Offloading MACsec when its configured over VLAN with current MACsec
TX steering rules will wrongly insert MACsec sec tag after inserting
the VLAN header leading to a ETHERNET | SECTAG | VLAN packet when
ETHERNET | VLAN | SECTAG is configured.
The above issue is due to adding the SECTAG by HW which is a later
stage compared to the VLAN header insertion stage.
Detect such a case and adjust TX steering rules to insert the
SECTAG in the correct place by using reformat_param_0 field in
the packet reformat to indicate the offset of SECTAG from end of
the MAC header to account for VLANs in granularity of 4Bytes.
Signed-off-by: Emeel Hakim <[email protected]>
Reviewed-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
MACsec device may have a VLAN device on top of it.
Detect MACsec state correctly under this condition,
and return the correct net device accordingly.
Signed-off-by: Emeel Hakim <[email protected]>
Reviewed-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Enable MACsec offload feature over VLAN by adding NETIF_F_HW_MACSEC
to the device vlan_features.
Signed-off-by: Emeel Hakim <[email protected]>
Reviewed-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These have been useful in debugging various problems related to frame
preemption, so make them available through ethtool --register-dump for
later too.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support
for preemptible traffic classes") since it's relatively complicated.
Where this makes a difference is with a configuration as follows:
ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on
Preemptible packets should only be sent when the MAC Merge TX direction
becomes active (i.o.w. when the verification process succeeds, aka when
the link partner confirms it can process preemptible traffic). But the
tc qdisc with the preemptible traffic classes is offloaded completely
asynchronously w.r.t. the MM becoming active.
The ENETC manual does suggest that this should be handled in the driver:
"On startup, software should wait for the verification process to
complete (MMCSR[VSTS]=011) before initiating traffic".
Adding the necessary logic allows future selftests to uphold the claim
that an inactive or disabled MAC Merge layer should never send data
packets through the pMAC.
This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c,
where its only caller is now - enetc_mm_commit_preemptible_tcs().
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MMCSR register contains 2 fields with overlapping meaning:
- LPA (Local preemption active):
This read-only status bit indicates whether preemption is active for
this port. This bit will be set if preemption is both enabled and has
completed the verification process.
- TXSTS (Merge status):
This read-only status field provides the state of the MAC Merge sublayer
transmit status as defined in IEEE Std 802.3-2018 Clause 99.
00 Transmit preemption is inactive
01 Transmit preemption is active
10 Reserved
11 Reserved
However none of these 2 fields offer reliable reporting to software.
When connecting ENETC to a link partner which is not capable of Frame
Preemption, the expectation is that ENETC's verification should fail
(VSTS=4) and its MM TX direction should be inactive (LPA=0, TXSTS=00)
even though the MM TX is enabled (ME=1). But surprise, the LPA bit of
MMCSR stays set even if VSTS=4 and ME=1.
OTOH, the TXSTS field has the opposite problem. I cannot get its value
to change from 0, even when connecting to a link partner capable of
frame preemption, which does respond to its verification frames (ME=1
and VSTS=3, "SUCCEEDED").
The only option with such buggy hardware seems to be to reimplement the
formula for calculating tx-active in software, which is for tx-enabled
to be true, and for the verify-status to be either SUCCEEDED, or
DISABLED.
Without reliable tx-active reporting, we have no good indication when
to commit the preemptible traffic classes to hardware, which makes it
possible (but not desirable) to send preemptible traffic to a link
partner incapable of receiving it. However, currently we do not have the
logic to wait for TX to be active yet, so the impact is limited.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Current enetc_set_mm() is designed to set the priv->active_offloads bit
ENETC_F_QBU for enetc_mm_link_state_update() to act on, but if the link
is already up, it modifies the ENETC_MMCSR_ME ("Merge Enable") bit
directly.
The problem is that it only *sets* ENETC_MMCSR_ME if the link is up, it
doesn't *clear* it if needed. So subsequent enetc_get_mm() calls still
see tx-enabled as true, up until a link down event, which is when
enetc_mm_link_state_update() will get called.
This is not a functional issue as far as I can assess. It has only come
up because I'd like to uphold a simple API rule in core ethtool code:
the pMAC cannot be disabled if TX is going to be enabled. Currently,
the fact that TX remains enabled for longer than expected (after the
enetc_set_mm() call that disables it) is going to violate that rule,
which is how it was caught.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix the following warning about risky iterator use:
drivers/net/ethernet/mellanox/mlx5/core/eq.c:1010 mlx5_comp_irq_get_affinity_mask() warn: iterator used outside loop: 'eq'
Acked-by: Saeed Mahameed <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This reverts commit 14624d7247fcd0f3114a6f5f17b3c8d1020fbbb7.
The termination table usage is requires for DMFS steering mode as firmware
doesn't support mixed table destinations list which causes following
syndrome with hairpin rules:
[81922.283225] mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 25977): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xaca205), err(-22)
Fixes: 14624d7247fc ("net/mlx5e: Don't use termination table when redundant")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|