aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2023-04-24lan966x: Don't use xdp_frame when action is XDP_TXHoratiu Vultur3-23/+28
When the action of an xdp program was XDP_TX, lan966x was creating a xdp_frame and use this one to send the frame back. But it is also possible to send back the frame without needing a xdp_frame, because it is possible to send it back using the page. And then once the frame is transmitted is possible to use directly page_pool_recycle_direct as lan966x is using page pools. This would save some CPU usage on this path, which results in higher number of transmitted frames. Bellow are the statistics: Frame size: Improvement: 64 ~8% 256 ~11% 512 ~8% 1000 ~0% 1500 ~0% Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add XDP socket zero-copy TX supportGerhard Engleder2-11/+121
Send and complete XSK pool frames within TX NAPI context. NAPI context is triggered by ndo_xsk_wakeup. Test results with A53 1.2GHz: xdpsock txonly copy mode, 64 byte frames: pps pkts 1.00 tx 284,409 11,398,144 Two CPUs with 100% and 10% utilization. xdpsock txonly zero-copy mode, 64 byte frames: pps pkts 1.00 tx 511,929 5,890,368 Two CPUs with 100% and 1% utilization. xdpsock l2fwd copy mode, 64 byte frames: pps pkts 1.00 rx 248,985 7,315,885 tx 248,921 7,315,885 Two CPUs with 100% and 10% utilization. xdpsock l2fwd zero-copy mode, 64 byte frames: pps pkts 1.00 rx 254,735 3,039,456 tx 254,735 3,039,456 Two CPUs with 100% and 4% utilization. Packet rate increases and CPU utilization is reduced in both cases. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add XDP socket zero-copy RX supportGerhard Engleder3-14/+556
Add support for XSK zero-copy to RX path. The setup of the XSK pool can be done at runtime. If the netdev is running, then the queue must be disabled and enabled during reconfiguration. This can be done easily with functions introduced in previous commits. A more important property is that, if the netdev is running, then the setup of the XSK pool shall not stop the netdev in case of errors. A broken netdev after a failed XSK pool setup is bad behavior. Therefore, the allocation and setup of resources during XSK pool setup is done only before any queue is disabled. Additionally, freeing and later allocation of resources is eliminated in some cases. Page pool entries are kept for later use. Two memory models are registered in parallel. As a result, the XSK pool setup cannot fail during queue reconfiguration. In contrast to other drivers, XSK pool setup and XDP BPF program setup are separate actions. XSK pool setup can be done without any XDP BPF program. The XDP BPF program can be added, removed or changed without any reconfiguration of the XSK pool. Test results with A53 1.2GHz: xdpsock rxdrop copy mode, 64 byte frames: pps pkts 1.00 rx 856,054 10,625,775 Two CPUs with both 100% utilization. xdpsock rxdrop zero-copy mode, 64 byte frames: pps pkts 1.00 rx 889,388 4,615,284 Two CPUs with 100% and 20% utilization. Packet rate increases and CPU utilization is reduced. 100% CPU load seems to the base load. This load is consumed by ksoftirqd just for dropping the generated packets without xdpsock running. Using batch API reduced CPU utilization slightly, but measurements are not stable enough to provide meaningful numbers. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Move skb receive action to separate functionGerhard Engleder1-16/+23
The function tsnep_rx_poll() is already pretty long and the skb receive action can be reused for XSK zero-copy support. Move page based skb receive to separate function. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add functions for queue enable/disableGerhard Engleder1-33/+64
Move queue enable and disable code to separate functions. This way the activation and deactivation of the queues are defined actions, which can be used in future execution paths. This functions will be used for the queue reconfiguration at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Rework TX/RX queue initializationGerhard Engleder1-43/+51
Make initialization of TX and RX queues less dynamic by moving some initialization from netdev open/close to device probing. Additionally, move some initialization code to separate functions to enable future use in other execution paths. This is done as preparation for queue reconfigure at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Replace modulo operation with maskGerhard Engleder2-14/+15
TX/RX ring size is static and power of 2 to enable compiler to optimize modulo operation to mask operation. Make this optimization already in the code and don't rely on the compiler. CPU utilisation during high packet rate has not changed. So no performance improvement has been measured. But it is best practice to prevent modulo operations. Suggested-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: mana: Check if netdev/napi_alloc_frag returns single pageHaiyang Zhang1-0/+15
netdev/napi_alloc_frag() may fall back to single page which is smaller than the requested size. Add error checking to avoid memory overwritten. Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: mana: Rename mana_refill_rxoob and remove some empty linesHaiyang Zhang1-6/+3
Rename mana_refill_rxoob for naming consistency. And remove some empty lines between function call and error checking. Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/[email protected]/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-04-23net: dpaa: avoid one skb_reset_mac_header() in dpaa_enable_tx_csum()Vladimir Oltean1-7/+2
It appears that dpaa_enable_tx_csum() only calls skb_reset_mac_header() to get to the VLAN header using skb_mac_header(). We can use skb_vlan_eth_hdr() to get to the VLAN header based on skb->data directly. This avoids spending a few cycles to set skb->mac_header. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Acked-by: Madalin Bucur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-23net: vlan: introduce skb_vlan_eth_hdr()Vladimir Oltean9-15/+11
Similar to skb_eth_hdr() introduced in commit 96cc4b69581d ("macvlan: do not assume mac_header is set in macvlan_broadcast()"), let's introduce a skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get to the VLAN header based on skb->data rather than based on the skb_mac_header(skb). We also consolidate the drivers that dereference skb->data to go through this helper. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-23net: ethernet: mediatek: remove return value check of `mtk_wed_hw_add_debugfs`Wang Zhang1-2/+0
Smatch complains that: mtk_wed_hw_add_debugfs() warn: 'dir' is an error pointer or valid Debugfs checks are generally not supposed to be checked for errors and it is not necessary here. fix it by just deleting the dead code. Signed-off-by: Wang Zhang <[email protected]> Reviewed-by: Dongliang Mu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-23Merge branch '100GbE' of ↵David S. Miller5-53/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== This series lowers the CPU usage of the ice driver when using its provided /dev/gnss*. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-04-23net: mtk_eth_soc: mediatek: fix ppe flow accounting for v1 hardwareFelix Fietkau2-3/+10
Older chips (like MT7622) use a different bit in ib2 to enable hardware counter support. Add macros for both and select the appropriate bit. Fixes: 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-22net: ethernet: mtk_eth_soc: use WO firmware for MT7981Daniel Golle2-1/+7
In order to support wireless offloading on MT7981 we need to load the appropriate firmware. Recognize MT7981 and load mt7981_wo.bin. Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21Merge tag 'mlx5-updates-2023-04-20' of ↵Jakub Kicinski17-272/+262
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-20 1) Dragos Improves RX page pool, and provides some fixes to his previous series: 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case 1.2) Hook NAPIs to page pools to gain more performance. 2) From Roi, Some cleanups to TC and eswitch modules. 3) Maher migrates vnic diagnostic counters reporting from debugfs to a dedicated devlink health reporter Maher Says: =========== net/mlx5: Expose vnic diagnostic counters using devlink Currently, vnic diagnostic counters are exposed through the following debugfs: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow nic_receive_steering_discard The current design does not allow the hypervisor to view the diagnostic counters of its VFs, in case the VFs get bound to a VM. In other words, the counters are not exposed for representor interfaces. Furthermore, the debugfs design is inconvenient future-wise, in case more counters need to be reported by the driver in the future. As these counters pertain to vNIC health, it is more appropriate to utilize the devlink health reporter to expose them. Thus, this patchest includes the following changes: * Drop the current vnic diagnostic counters debugfs interface. * Add a vnic devlink health reporter for PFs/VFs core devices, which when diagnosed will dump vnic diagnostic counter values that are queried from FW. * Add a vnic devlink health reporter for the representor interface, which serves the same purpose listed in the previous point, in addition to allowing the hypervisor to view its VFs diagnostic counters, even when the VFs are bounded to external VMs. Example of devlink health reporter usage is: $devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 =========== 4) SW steering fixes and improvements Yevgeny Kliteynik Says: ======================= These short patch series are just small fixes / improvements for SW steering: - Patch 1: Fix dumping of legacy modify_hdr in debug dump to align to what is expected by parser - Patch 2: Have separate threshold for ICM sync per ICM type - Patch 3: Add more info to the steering debug dump - Linux version and device name - Patch 4: Keep track of number of buddies that are currently in use per domain per buddy type ======================= * tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Update op_mode to op_mod for port selection net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc() net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn() net/mlx5e: RX, Hook NAPIs to page pools net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ net/mlx5e: Add vnic devlink health reporter to representors net/mlx5: Add vnic devlink health reporter to PFs/VFs Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" Revert "net/mlx5: Expose steering dropped packets counter" net/mlx5: DR, Add memory statistics for domain object net/mlx5: DR, Add more info in domain dbg dump net/mlx5: DR, Calculate sync threshold of each pool according to its type net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-21Merge tag 'mlx5-fixes-2023-04-20' of ↵Jakub Kicinski16-54/+40
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-04-19 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Revert "net/mlx5e: Don't use termination table when redundant" net/mlx5e: Nullify table pointer when failing to create net/mlx5: Use recovery timeout on sync reset flow Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function" net/mlx5e: Fix error flow in representor failing to add vport rx rule net/mlx5: Release tunnel device after tc update skb net/mlx5: E-switch, Don't destroy indirect table in split rule net/mlx5: E-switch, Create per vport table based on devlink encap mode net/mlx5e: Release the label when replacing existing ct entry net/mlx5e: Don't clone flow post action attributes second time ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-21Merge branch '10GbE' of ↵Jakub Kicinski1-11/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ixgbe: Multiple RSS bugfixes Joe Damato says: This series fixes two bugs I stumbled on with ixgbe: 1. The flow hash cannot be set manually with ethool at all. Patch 1/2 addresses this by fixing what appears to be a small bug in set_rxfh in ixgbe. See the commit message for more details. 2. Once the above patch is applied and the flow hash can be set, resetting the flow hash to default fails if the number of queues is greater than the number of queues supported by RSS. Other drivers (like i40e) will reset the flowhash to use the maximum number of queues supported by RSS even if the queue count configured is larger. In other words: some queues will not have packets distributed to them by the RSS hash if the queue count is too large. Patch 2/2 allows the user to reset ixgbe to default and the flowhash is set correctly to either the maximum number of queues supported by RSS or the configured queue count, whichever is smaller. I believe this is correct and it mimics the behavior of i40e; `ethtool -X $iface default` should probably always succeed even if all the queues cannot be utilized. See the commit message for more details and examples. I tested these on an ixgbe system I have access to and they appear to work as intended, but I would appreciate a review by the experts on this list :) * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: Enable setting RSS table to default values ixgbe: Allow flow hash to be set via ethtool ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-21nfp: fix incorrect pointer deference when offloading IPsec with bondingHuanhuan Wang1-2/+2
There are two pointers in struct xfrm_dev_offload, *dev, *real_dev. The *dev points whether bonding interface or real interface, if bonding IPsec offload is used, it points bonding interface; if not, it points real interface. And *real_dev always points real interface. So nfp should always use real_dev instead of dev. Prior to this change the system becomes unresponsive when offloading IPsec for a device which is a lower device to a bonding device. Fixes: 859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer") CC: [email protected] Signed-off-by: Huanhuan Wang <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Louis Peens <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-21net: dpaa: Fix uninitialized variable in dpaa_stop()Dan Carpenter1-1/+2
The return value is not initialized on the success path. Fixes: 901bdff2f529 ("net: fman: Change return type of disable to void") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Madalin Bucur <[email protected]> Reviewed-by: Sean Anderson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-21net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macsLeon Romanovsky1-24/+21
ARP discovery code has same logic for RX and TX flows, but with different source and destination fields. Instead of duplicating same code in mlx5e_ipsec_init_macs, let's refactor. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5e: Properly release work data structureLeon Romanovsky1-2/+4
There are some flows in which work structure is not allocated at all and it is needed to be checked prior release of data structure. general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057] CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events xfrm_state_gc_task RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core] Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89 RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000 RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050 RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000 R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000 R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118 FS: 0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ___xfrm_state_destroy+0x3c8/0x5e0 xfrm_state_gc_task+0xf6/0x140 ? ___xfrm_state_destroy+0x5e0/0x5e0 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 ? pwq_dec_nr_in_flight+0x230/0x230 ? spin_bug+0x1d0/0x1d0 worker_thread+0x59d/0xec0 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 4562116f8a56 ("net/mlx5e: Generalize IPsec work structs") Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5e: Compare all fields in IPv6 addressLeon Romanovsky1-1/+1
Fix size argument in memcmp to compare whole IPv6 address. Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Reviewed-by: Raed Salem <[email protected]> Reviewed-by: Emeel Hakim <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5e: Don't overwrite extack message returned from IPsec SA validatorLeon Romanovsky1-1/+1
Addition of new err_xfrm label caused to error messages be overwritten. Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change in a default message. Fixes: aa8bd0c9518c ("net/mlx5e: Support IPsec acquire default SA") Reviewed-by: Raed Salem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5e: Fix FW error while setting IPsec policy block actionLeon Romanovsky1-8/+8
When trying to set IPsec policy block action the following error is generated: mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3), err(-22) This error means that drop action is not allowed when modify action is set, so update the code to skip modify header for XFRM_POLICY_BLOCK action. Fixes: 6721239672fe ("net/mlx5e: Skip IPsec encryption for TX path without matching policy") Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA portsYan Wang1-3/+9
The system hang because of dsa_tag_8021q_port_setup()-> stmmac_vlan_rx_add_vid(). I found in stmmac_drv_probe() that cailing pm_runtime_put() disabled the clock. First, when the kernel is compiled with CONFIG_PM=y,The stmmac's resume/suspend is active. Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However, The system is hanged for the stmmac_vlan_rx_add_vid() accesses its registers after stmmac's clock is closed. I would suggest adding the pm_runtime_resume_and_get() to the stmmac_vlan_rx_add_vid().This guarantees that resuming clock output while in use. Fixes: b3dcb3127786 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()") Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Yan Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: Kconfig and pds_core.rstShannon Nelson2-0/+13
Remaining documentation and Kconfig hook for building the driver. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: publish events to the clientsShannon Nelson3-0/+37
When the Core device gets an event from the device, or notices the device FW to be up or down, it needs to send those events on to the clients that have an event handler. Add the code to pass along the events to the clients. The entry points pdsc_register_notify() and pdsc_unregister_notify() are EXPORTed for other drivers that want to listen for these events. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add the aux client APIShannon Nelson1-1/+150
Add the client API operations for running adminq commands. The core registers the client with the FW, then the client has a context for requesting adminq services. We expect to add additional operations for other clients, including requesting additional private adminqs and IRQs, but don't have the need yet. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: devlink params for enabling VIF supportShannon Nelson3-6/+108
Add the devlink parameter switches so the user can enable the features supported by the VFs. The only feature supported at the moment is vDPA. Example: devlink dev param set pci/0000:2b:00.0 \ name enable_vnet cmode runtime value true Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add auxiliary_bus devicesShannon Nelson4-2/+156
An auxiliary_bus device is created for each vDPA type VF at VF probe and destroyed at VF remove. The aux device name comes from the driver name + VIF type + the unique id assigned at PCI probe. The VFs are always removed on PF remove, so there should be no issues with VFs trying to access missing PF structures. The auxiliary_device names will look like "pds_core.vDPA.nn" where 'nn' is the VF's uid. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add initial VF device handlingShannon Nelson2-1/+56
This is the initial VF PCI driver framework for the new pds_vdpa VF device, which will work in conjunction with an auxiliary_bus client of the pds_core driver. This does the very basics of registering for the new VF device, setting up debugfs entries, and registering with devlink. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: set up the VIF definitions and defaultsShannon Nelson3-0/+83
The Virtual Interfaces (VIFs) supported by the DSC's configuration (vDPA, Eth, RDMA, etc) are reported in the dev_ident struct and made visible in debugfs. At this point only vDPA is supported in this driver so we only setup devices for that feature. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add FW update feature to devlinkShannon Nelson5-1/+211
Add in the support for doing firmware updates. Of the two main banks available, a and b, this updates the one not in use and then selects it for the next boot. Example: devlink dev flash pci/0000:b2:00.0 \ file pensando/dsc_fw_1.63.0-22.tar Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: Add adminq processing and commandsShannon Nelson3-11/+289
Add the service routines for submitting and processing the adminq messages and for handling notifyq events. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: set up device and adminqShannon Nelson5-4/+733
Set up the basic adminq and notifyq queue structures. These are used mostly by the client drivers for feature configuration. These are essentially the same adminq and notifyq as in the ionic driver. Part of this includes querying for device identity and FW information, so we can make that available to devlink dev info. $ devlink dev info pci/0000:b5:00.0 pci/0000:b5:00.0: driver pds_core serial_number FLM18420073 versions: fixed: asic.id 0x0 asic.rev 0x0 running: fw 1.51.0-73 stored: fw.goldfw 1.15.9-C-22 fw.mainfwa 1.60.0-73 fw.mainfwb 1.60.0-57 Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add devlink health facilitiesShannon Nelson5-1/+76
Add devlink health reporting on top of our fw watchdog. Example: # devlink health show pci/0000:2b:00.0 reporter fw pci/0000:2b:00.0: reporter fw state healthy error 0 recover 0 # devlink health diagnose pci/0000:2b:00.0 reporter fw Status: healthy State: 1 Generation: 0 Recoveries: 0 Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: health timer and workqueueShannon Nelson4-0/+103
Add in the periodic health check and the related workqueue, as well as the handlers for when a FW reset is seen. The firmware is polled every 5 seconds to be sure that it is still alive and that the FW generation didn't change. The alive check looks to see that the PCI bus is still readable and the fw_status still has the RUNNING bit on. If not alive, the driver stops activity and tears things down. When the FW recovers and the alive check again succeeds, the driver sets back up for activity. The generation check looks at the fw_generation to see if it has changed, which can happen if the FW crashed and recovered or was updated in between health checks. If changed, the driver counts that as though the alive test failed and forces the fw_down/fw_up cycle. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: add devcmd device interfacesShannon Nelson6-3/+506
The devcmd interface is the basic connection to the device through the PCI BAR for low level identification and command services. This does the early device initialization and finds the identity data, and adds devcmd routines to be used by later driver bits. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21pds_core: initial framework for pds_core PF driverShannon Nelson4-0/+372
This is the initial PCI driver framework for the new pds_core device driver and its family of devices. This does the very basics of registering for the new PF PCI device 1dd8:100c, setting up debugfs entries, and registering with devlink. Signed-off-by: Shannon Nelson <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5: Consider VLAN interface in MACsec TX steering rulesEmeel Hakim1-0/+7
Offloading MACsec when its configured over VLAN with current MACsec TX steering rules will wrongly insert MACsec sec tag after inserting the VLAN header leading to a ETHERNET | SECTAG | VLAN packet when ETHERNET | VLAN | SECTAG is configured. The above issue is due to adding the SECTAG by HW which is a later stage compared to the VLAN header insertion stage. Detect such a case and adjust TX steering rules to insert the SECTAG in the correct place by using reformat_param_0 field in the packet reformat to indicate the offset of SECTAG from end of the MAC header to account for VLANs in granularity of 4Bytes. Signed-off-by: Emeel Hakim <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5: Support MACsec over VLANEmeel Hakim1-16/+26
MACsec device may have a VLAN device on top of it. Detect MACsec state correctly under this condition, and return the correct net device accordingly. Signed-off-by: Emeel Hakim <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-21net/mlx5: Enable MACsec offload feature for VLAN interfaceEmeel Hakim1-0/+1
Enable MACsec offload feature over VLAN by adding NETIF_F_HW_MACSEC to the device vlan_features. Signed-off-by: Emeel Hakim <[email protected]> Reviewed-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-20net: enetc: include MAC Merge / FP registers in register dumpVladimir Oltean1-0/+17
These have been useful in debugging various problems related to frame preemption, so make them available through ethtool --register-dump for later too. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20net: enetc: only commit preemptible TCs to hardware when MM TX is activeVladimir Oltean4-18/+75
This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support for preemptible traffic classes") since it's relatively complicated. Where this makes a difference is with a configuration as follows: ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on Preemptible packets should only be sent when the MAC Merge TX direction becomes active (i.o.w. when the verification process succeeds, aka when the link partner confirms it can process preemptible traffic). But the tc qdisc with the preemptible traffic classes is offloaded completely asynchronously w.r.t. the MM becoming active. The ENETC manual does suggest that this should be handled in the driver: "On startup, software should wait for the verification process to complete (MMCSR[VSTS]=011) before initiating traffic". Adding the necessary logic allows future selftests to uphold the claim that an inactive or disabled MAC Merge layer should never send data packets through the pMAC. This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c, where its only caller is now - enetc_mm_commit_preemptible_tcs(). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20net: enetc: report mm tx-active based on tx-enabled and verify-statusVladimir Oltean1-1/+3
The MMCSR register contains 2 fields with overlapping meaning: - LPA (Local preemption active): This read-only status bit indicates whether preemption is active for this port. This bit will be set if preemption is both enabled and has completed the verification process. - TXSTS (Merge status): This read-only status field provides the state of the MAC Merge sublayer transmit status as defined in IEEE Std 802.3-2018 Clause 99. 00 Transmit preemption is inactive 01 Transmit preemption is active 10 Reserved 11 Reserved However none of these 2 fields offer reliable reporting to software. When connecting ENETC to a link partner which is not capable of Frame Preemption, the expectation is that ENETC's verification should fail (VSTS=4) and its MM TX direction should be inactive (LPA=0, TXSTS=00) even though the MM TX is enabled (ME=1). But surprise, the LPA bit of MMCSR stays set even if VSTS=4 and ME=1. OTOH, the TXSTS field has the opposite problem. I cannot get its value to change from 0, even when connecting to a link partner capable of frame preemption, which does respond to its verification frames (ME=1 and VSTS=3, "SUCCEEDED"). The only option with such buggy hardware seems to be to reimplement the formula for calculating tx-active in software, which is for tx-enabled to be true, and for the verify-status to be either SUCCEEDED, or DISABLED. Without reliable tx-active reporting, we have no good indication when to commit the preemptible traffic classes to hardware, which makes it possible (but not desirable) to send preemptible traffic to a link partner incapable of receiving it. However, currently we do not have the logic to wait for TX to be active yet, so the impact is limited. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20net: enetc: fix MAC Merge layer remaining enabled until a link down eventVladimir Oltean1-4/+7
Current enetc_set_mm() is designed to set the priv->active_offloads bit ENETC_F_QBU for enetc_mm_link_state_update() to act on, but if the link is already up, it modifies the ENETC_MMCSR_ME ("Merge Enable") bit directly. The problem is that it only *sets* ENETC_MMCSR_ME if the link is up, it doesn't *clear* it if needed. So subsequent enetc_get_mm() calls still see tx-enabled as true, up until a link down event, which is when enetc_mm_link_state_update() will get called. This is not a functional issue as far as I can assess. It has only come up because I'd like to uphold a simple API rule in core ethtool code: the pMAC cannot be disabled if TX is going to be enabled. Currently, the fact that TX remains enabled for longer than expected (after the enetc_set_mm() call that disables it) is going to violate that rule, which is how it was caught. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20eth: mlx5: avoid iterator use outside of a loopJakub Kicinski1-2/+3
Fix the following warning about risky iterator use: drivers/net/ethernet/mellanox/mlx5/core/eq.c:1010 mlx5_comp_irq_get_affinity_mask() warn: iterator used outside loop: 'eq' Acked-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20Revert "net/mlx5e: Don't use termination table when redundant"Vlad Buslov1-28/+4
This reverts commit 14624d7247fcd0f3114a6f5f17b3c8d1020fbbb7. The termination table usage is requires for DMFS steering mode as firmware doesn't support mixed table destinations list which causes following syndrome with hairpin rules: [81922.283225] mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 25977): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xaca205), err(-22) Fixes: 14624d7247fc ("net/mlx5e: Don't use termination table when redundant") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>