aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2023-04-20net/mlx5e: Nullify table pointer when failing to createAya Levin1-1/+2
On failing to create promisc flow steering table, the pointer is returned with an error. Nullify it so unloading the driver won't try to destroy a non existing table. Failing to create promisc table may happen over BF devices when the ARM side is going through a firmware tear down. The host side start a reload flow. While the driver unloads, it tries to remove the promisc table. Remove WARN in this state as it is a valid error flow. Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: Use recovery timeout on sync reset flowMoshe Shemesh2-3/+3
Use the same timeout for sync reset flow and health recovery flow, since the former involves driver's recovery from firmware reset, which is similar to health recovery. Otherwise, in some cases, such as a firmware upgrade on the DPU, the firmware pre-init bit may not be ready within current timeout and the driver will abort loading back after reset. Signed-off-by: Moshe Shemesh <[email protected]> Fixes: 37ca95e62ee2 ("net/mlx5: Increase FW pre-init timeout for health recovery") Reviewed-by: Maher Sanalla <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function"Moshe Shemesh3-6/+7
This reverts commit 5977ac3910f1cbaf44dca48179118b25c206ac29. Revert this patch as we need the "recovery" arg back in mlx5_load_one() function. This arg will be used in the next patch for using recovery timeout during sync reset flow. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Maher Sanalla <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: Fix error flow in representor failing to add vport rx ruleRoi Dayan3-0/+5
On representor init rx error flow the flow steering pointer is being released so mlx5e_attach_netdev() doesn't have a valid fs pointer in its error flow. Make sure the pointer is nullified when released and add a check in mlx5e_fs_cleanup() to verify fs is not null as representor cleanup callback would be called anyway. Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: Release tunnel device after tc update skbChris Mi1-0/+1
The cited commit causes a regression. Tunnel device is not released after tc update skb if skb needs to be freed. The following error message will be printed: unregister_netdevice: waiting for vxlan1 to become free. Usage count = 11 Fix it by releasing tunnel device if skb needs to be freed. Fixes: 93a1ab2c545b ("net/mlx5: Refactor tc miss handling to a single function") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: E-switch, Don't destroy indirect table in split ruleChris Mi1-2/+0
Source port rewrite (forward to ovs internal port or statck device) isn't supported in the rule of split action. So there is no indirect table in split rule. The cited commit destroyes indirect table in split rule. The indirect table for other rules will be destroyed wrongly. It will cause traffic loss. Fix it by removing the destroy function in split rule. And also remove the destroy function in error flow. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: E-switch, Create per vport table based on devlink encap modeChris Mi4-5/+15
Currently when creating per vport table, create flags are hardcoded. Devlink encap mode is set based on user input and HW capability. Create per vport table based on devlink encap mode. Fixes: c796bb7cd230 ("net/mlx5: E-switch, Generalize per vport table API") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: Release the label when replacing existing ct entryVlad Buslov1-0/+1
Cited commit doesn't release the label mapping when replacing existing ct entry which leads to following memleak report: unreferenced object 0xffff8881854cf280 (size 96): comm "kworker/u48:74", pid 23093, jiffies 4296664564 (age 175.944s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002722d368>] __kmalloc+0x4b/0x1c0 [<00000000cc44e18f>] mapping_add+0x6e8/0xc90 [mlx5_core] [<000000003ad942a7>] mlx5_get_label_mapping+0x66/0xe0 [mlx5_core] [<00000000266308ac>] mlx5_tc_ct_entry_create_mod_hdr+0x1c4/0xf50 [mlx5_core] [<000000009a768b4f>] mlx5_tc_ct_entry_add_rule+0x16f/0xaf0 [mlx5_core] [<00000000a178f3e5>] mlx5_tc_ct_block_flow_offload_add+0x10cb/0x1f90 [mlx5_core] [<000000007b46c496>] mlx5_tc_ct_block_flow_offload+0x14a/0x630 [mlx5_core] [<00000000a9a18ac5>] nf_flow_offload_tuple+0x1a3/0x390 [nf_flow_table] [<00000000d0881951>] flow_offload_work_handler+0x257/0xd30 [nf_flow_table] [<000000009e4935a4>] process_one_work+0x7c2/0x13e0 [<00000000f5cd36a7>] worker_thread+0x59d/0xec0 [<00000000baed1daf>] kthread+0x28f/0x330 [<0000000063d282a4>] ret_from_fork+0x1f/0x30 Fix the issue by correctly releasing the label mapping. Fixes: 94ceffb48eac ("net/mlx5e: Implement CT entry update") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: Don't clone flow post action attributes second timeVlad Buslov2-10/+3
The code already clones post action attributes in mlx5e_clone_flow_attr_for_post_act(). Creating another copy in mlx5e_tc_post_act_add() is a erroneous leftover from original implementation. Instead, assign handle->attribute to post_attr provided by the caller. Note that cloning the attribute second time is not just wasteful but also causes issues like second copy not being properly updated in neigh update code which leads to following use-after-free: Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30 Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0 Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22) Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22 Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace: Feb 21 09:02:00 c-237-177-40-045 kernel: <TASK> Feb 21 09:02:00 c-237-177-40-045 kernel: dump_stack_lvl+0x57/0x7d Feb 21 09:02:00 c-237-177-40-045 kernel: print_report+0x170/0x471 Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_report+0xbb/0x1a0 Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: ? __module_address.part.0+0x62/0x200 Feb 21 09:02:00 c-237-177-40-045 kernel: ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: ? __raw_spin_lock_init+0x3b/0x110 Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: add_rule_fg+0xe80/0x19c0 [mlx5_core] -- Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476: Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30 Feb 21 09:02:00 c-237-177-40-045 kernel: __kasan_kmalloc+0x7a/0x90 Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: post_process_attr+0x305/0xa30 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core] -- Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833: Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_stack+0x1e/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_set_track+0x21/0x30 Feb 21 09:02:00 c-237-177-40-045 kernel: kasan_save_free_info+0x2a/0x40 Feb 21 09:02:00 c-237-177-40-045 kernel: ____kasan_slab_free+0x11a/0x1b0 Feb 21 09:02:00 c-237-177-40-045 kernel: __kmem_cache_free+0x1de/0x400 Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core] Feb 21 09:02:00 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310 Feb 21 09:02:00 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0 Feb 21 09:02:00 c-237-177-40-045 kernel: kthread+0x28f/0x330 Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: Update op_mode to op_mod for port selectionRoi Dayan1-1/+1
To be consistent with the other enum keys use OP_MOD instead of OP_MODE. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()Roi Dayan2-23/+0
Remove unused function which also seems a duplicate of esw_port_metadata_set(). Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()Roi Dayan1-7/+7
The passded esw->dev is redundant as esw being passed and esw->dev being used inside. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()Eli Cohen1-0/+1
Add include directive to assure pci_msix_can_alloc_dyn() prototype. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: RX, Hook NAPIs to page poolsDragos Tatulea1-0/+1
Linking the NAPI to the rq page_pool to improve page_pool cache usage during skb recycling. Here are the observed improvements for a iperf single stream test case: - For 1500 MTU and legacy rq, seeing a 20% improvement of cache usage. - For 9K MTU, seeing 33-40 % page_pool cache usage improvements for both striding and legacy rq (depending if the application is running on the same core as the rq or not). Signed-off-by: Dragos Tatulea <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear caseDragos Tatulea1-3/+3
When the XDP handler marks the data for transmission (XDP_TX), it is incorrect to release the page fragment. Instead, the fragments should be marked as MLX5E_WQE_FRAG_SKIP_RELEASE because XDP will release the page directly to the page_pool (page_pool_put_defragged_page) after TX completion. The linear case already does this. This patch fixes the nonlinear part as well. Also, the looping over the fragments was incorrect: When handling pages after XDP_TX in the legacy rq nonlinear case, the loop was skipping the first wqe fragment. Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Signed-off-by: Dragos Tatulea <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQDragos Tatulea1-0/+5
mlx5e_free_rx_descs is responsible for calling the dealloc_wqe op which returns pages to the page_pool. This can happen during flush or close. For XSK, the regular RQ is flushed (when replaced by the XSK RQ) and also closed later. This is normally not a problem as the wqe list is empty on a second call to mlx5e_free_rx_descs. However, for striding RQ, the previously released wqes from the list will appear as missing and will be released a second time by mlx5e_free_rx_missing_descs. This patch sets the no release bits on the striding RQ wqes in the dealloc_wqe op to prevent releasing the pages a second time. Please note that the bits are set only in the control path during close and not in the data path. Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") Signed-off-by: Dragos Tatulea <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5e: Add vnic devlink health reporter to representorsMaher Sanalla2-2/+51
Create a new devlink health reporter for representor interface, which reports the values of representor vnic diagnostic counters when diagnosed. This patch will allow admins to monitor VF diagnostic counters through the representor-interface vnic reporter. Example of usage: $ devlink health diagnose pci/0000:08:00.0/65537 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: Add vnic devlink health reporter to PFs/VFsMaher Sanalla4-1/+146
Create a vnic devlink health reporter for PFs/VFs interfaces. The reporter's diagnose callback displays the values of vNIC/vport transport debug counters of PFs/VFs, as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 Moreover, add documentation on the reporter functionality and the counters description. While at it, expose the vNIC counters diagnose function to be used by the downstream patch, which will reveal the counters for representor interfaces. Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"Maher Sanalla5-197/+1
This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes the vnic diagnostic counters via debugfs. Instead, The upcoming series will expose the same counters through devlink health reporter. Signed-off-by: Maher Sanalla <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20Revert "net/mlx5: Expose steering dropped packets counter"Maher Sanalla1-19/+3
This reverts commit 4fe1b3a5f8fe2fdcedcaba9561e5b0ae5cb1d15b, which exposes the steering dropped packets counter via debugfs. The upcoming series will expose the counter via devlink health reporter instead of debugfs. Signed-off-by: Maher Sanalla <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: DR, Add memory statistics for domain objectYevgeny Kliteynik3-3/+15
Add counters for number of buddies that are currently in use per domain per buddy type (STE, MODIFY-HEADER, MODIFY-PATTERN). Signed-off-by: Erez Shitrit <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: DR, Add more info in domain dbg dumpYevgeny Kliteynik1-2/+9
Add additinal items to domain info dump: Linux version and device name. Signed-off-by: Yevgeny Kliteynik <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: DR, Calculate sync threshold of each pool according to its typeYevgeny Kliteynik1-15/+18
When certain ICM chunk is no longer needed, it needs to be freed. Fully freeing ICM memory involves issuing FW SYNC_STEERING command. This is very time consuming, and it is impractical to do it for every freed chunk. Instead, we manage these 'freed' chunks in hot list (list of chunks that are not required by SW any more, but HW might still access them). When size of the hot list reaches certain threshold, we purge it and issue SYNC_STEERING FW command. There is one threshold for all the different ICM types, which is not optimal, as different ICM types require different approach: STEs pool is very large, and it is very 'dynamic' in its nature, so letting hot list to become too large will result in a significant perf hiccup when purging the hot list. Modify action is much smaller and less dynamic, so we can let the hot list to grow to almost the size of the whole pool. This patch fixes this problem: instead of having same hot memory threshold for all the pools, sync operation will be triggered in accordance with the ICM type. Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dumpYevgeny Kliteynik1-4/+6
The steering dump parser expects to see 0 as rewrite num of actions in case pattern/args aren't supported - parsing of legacy modify header is based on this assumption. Fix this to align to parser's expectation. Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-04-20ice: sleep, don't busy-wait, in the SQ send retry loopMichal Schmidt1-1/+1
10 ms is a lot of time to spend busy-waiting. Sleeping is clearly allowed here, because we have just returned from ice_sq_send_cmd(), which takes a mutex. On kernels with HZ=100, this msleep may be twice as long, but I don't think it matters. I did not actually observe any retries happening here. Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ice: remove unused buffer copy code in ice_sq_send_cmd_retry()Michal Schmidt1-11/+2
The 'buf_cpy'-related code in ice_sq_send_cmd_retry() looks broken. 'buf' is nowhere copied into 'buf_cpy'. The reason this does not cause problems is that all commands for which 'is_cmd_for_retry' is true go with a NULL buf. Let's remove 'buf_cpy'. Add a WARN_ON in case the assumption no longer holds in the future. Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ice: sleep, don't busy-wait, for ICE_CTL_Q_SQ_CMD_TIMEOUTMichal Schmidt3-12/+13
The driver polls for ice_sq_done() with a 100 µs period for up to 1 s and it uses udelay to do that. Let's use usleep_range instead. We know sleeping is allowed here, because we're holding a mutex (cq->sq_lock). To preserve the total max waiting time, measure the timeout in jiffies. ICE_CTL_Q_SQ_CMD_TIMEOUT is used also in ice_release_res(), but there the polling period is 1 ms (i.e. 10 times longer). Since the timeout was expressed in terms of the number of loops, the total timeout in this function is 10 s. I do not know if this is intentional. This patch keeps it. The patch lowers the CPU usage of the ice-gnss-<dev_name> kernel thread on my system from ~8 % to less than 1 %. I received a report of high CPU usage with ptp4l where the busy-waiting in ice_sq_send_cmd dominated the profile. This patch has been tested in that usecase too and it made a huge improvement there. Tested-by: Brent Rowsell <[email protected]> Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ice: remove ice_ctl_q_info::sq_cmd_timeoutMichal Schmidt3-6/+2
sq_cmd_timeout is initialized to ICE_CTL_Q_SQ_CMD_TIMEOUT and never changed, so just use the constant directly. Suggested-by: Simon Horman <[email protected]> Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ice: increase the GNSS data polling interval to 20 msMichal Schmidt1-1/+1
Double the GNSS data polling interval from 10 ms to 20 ms. According to Karol Kolacinski from the Intel team, they have been planning to make this change. Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ice: do not busy-wait to read GNSS dataMichal Schmidt2-25/+20
The ice-gnss-<dev_name> kernel thread, which reads data from the u-blox GNSS module, keep a CPU core almost 100% busy. The main reason is that it busy-waits for data to become available. A simple improvement would be to replace the "mdelay(10);" in ice_gnss_read() with sleeping. A better fix is to not do any waiting directly in the function and just requeue this delayed work as needed. The advantage is that canceling the work from ice_gnss_exit() becomes immediate, rather than taking up to ~2.5 seconds (ICE_MAX_UBX_READ_TRIES * 10 ms). This lowers the CPU usage of the ice-gnss-<dev_name> thread on my system from ~90 % to ~8 %. I am not sure if the larger 0.1 s pause after inserting data into the gnss subsystem is really necessary, but I'm keeping that as it was. Of course, ideally the driver would not have to poll at all, but I don't know if the E810 can watch for GNSS data availability over the i2c bus by itself and notify the driver. Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Arkadiusz Kubalewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sunitha Mekala <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski12-57/+51
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20ixgbe: Enable setting RSS table to default valuesJoe Damato1-9/+10
ethtool uses `ETHTOOL_GRXRINGS` to compute how many queues are supported by RSS. The driver should return the smaller of either: - The maximum number of RSS queues the device supports, OR - The number of RX queues configured Prior to this change, running `ethtool -X $iface default` fails if the number of queues configured is larger than the number supported by RSS, even though changing the queue count correctly resets the flowhash to use all supported queues. Other drivers (for example, i40e) will succeed but the flow hash will reset to support the maximum number of queues supported by RSS, even if that amount is smaller than the configured amount. Prior to this change: $ sudo ethtool -L eth1 combined 20 $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 20 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 10 11 12 13 14 15 16: 0 1 2 3 4 5 6 7 24: 8 9 10 11 12 13 14 15 32: 0 1 2 3 4 5 6 7 ... You can see that the flowhash was correctly set to use the maximum number of queues supported by the driver (16). However, asking the NIC to reset to "default" fails: $ sudo ethtool -X eth1 default Cannot set RX flow hash configuration: Invalid argument After this change, the flowhash can be reset to default which will use all of the available RSS queues (16) or the configured queue count, whichever is smaller. Starting with eth1 which has 10 queues and a flowhash distributing to all 10 queues: $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 10 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 0 1 2 3 4 5 16: 6 7 8 9 0 1 2 3 ... Increasing the queue count to 48 resets the flowhash to distribute to 16 queues, as it did before this patch: $ sudo ethtool -L eth1 combined 48 $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 16 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 10 11 12 13 14 15 16: 0 1 2 3 4 5 6 7 ... Due to the other bugfix in this series, the flowhash can be set to use queues 0-5: $ sudo ethtool -X eth1 equal 5 $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 16 RX ring(s): 0: 0 1 2 3 4 0 1 2 8: 3 4 0 1 2 3 4 0 16: 1 2 3 4 0 1 2 3 ... Due to this bugfix, the flowhash can be reset to default and use 16 queues: $ sudo ethtool -X eth1 default $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 16 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 10 11 12 13 14 15 16: 0 1 2 3 4 5 6 7 ... Fixes: 91cd94bfe4f0 ("ixgbe: add basic support for setting and getting nfc controls") Signed-off-by: Joe Damato <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20ixgbe: Allow flow hash to be set via ethtoolJoe Damato1-2/+2
ixgbe currently returns `EINVAL` whenever the flowhash it set by ethtool because the ethtool code in the kernel passes a non-zero value for hfunc that ixgbe should allow. When ethtool is called with `ETHTOOL_SRXFHINDIR`, `ethtool_set_rxfh_indir` will call ixgbe's set_rxfh function with `ETH_RSS_HASH_NO_CHANGE`. This value should be accepted. When ethtool is called with `ETHTOOL_SRSSH`, `ethtool_set_rxfh` will call ixgbe's set_rxfh function with `rxfh.hfunc`, which appears to be hardcoded in ixgbe to always be `ETH_RSS_HASH_TOP`. This value should also be accepted. Before this patch: $ sudo ethtool -L eth1 combined 10 $ sudo ethtool -X eth1 default Cannot set RX flow hash configuration: Invalid argument After this patch: $ sudo ethtool -L eth1 combined 10 $ sudo ethtool -X eth1 default $ sudo ethtool -x eth1 RX flow hash indirection table for eth1 with 10 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 0 1 2 3 4 5 16: 6 7 8 9 0 1 2 3 24: 4 5 6 7 8 9 0 1 ... Fixes: 1c7cf0784e4d ("ixgbe: support for ethtool set_rxfh") Signed-off-by: Joe Damato <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2023-04-20net: libwx: fix memory leak in wx_setup_rx_resourcesZhengchao Shao1-1/+4
When wx_alloc_page_pool() failed in wx_setup_rx_resources(), it doesn't release DMA buffer. Add dma_free_coherent() in the error path to release the DMA buffer. Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources") Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-19Revert "net/mlx5: Enable management PF initialization"Jakub Kicinski3-15/+1
This reverts commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4. Paul reports that it causes a regression with IB on CX4 and FW 12.18.1000. In addition I think that the concept of "management PF" is not fully accepted and requires a discussion. Fixes: fe998a3c77b9 ("net/mlx5: Enable management PF initialization") Reported-by: Paul Moore <[email protected]> Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19net: stmmac: dwmac-meson8b: Avoid cast to incompatible function typeSimon Horman1-2/+6
Rather than casting clk_disable_unprepare to an incompatible function type provide a trivial wrapper with the correct signature for the use-case. Reported by clang-16 with W=1: drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:276:6: error: cast from 'void (*)(struct clk *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] (void(*)(void *))clk_disable_unprepare, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No functional change intended. Compile tested only. Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19Merge branch '40GbE' of ↵Jakub Kicinski1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-04-17 (i40e) This series contains updates to i40e only. Alex moves setting of active filters to occur under lock and checks/takes error path in rebuild if re-initializing the misc interrupt vector failed. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix i40e_setup_misc_vector() error handling i40e: fix accessing vsi->active_filters without holding lock ==================== Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19e1000e: Disable TSO on i219-LM card to increase speedSebastian Basierski1-25/+26
While using i219-LM card currently it was only possible to achieve about 60% of maximum speed due to regression introduced in Linux 5.8. This was caused by TSO not being disabled by default despite commit f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround"). Fix that by disabling TSO during driver probe. Fixes: f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround") Signed-off-by: Sebastian Basierski <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19bnxt_en: fix free-runnig PHC modeVadim Fedorenko1-1/+1
The patch in fixes changed the way real-time mode is chosen for PHC on the NIC. Apparently there is one more use case of the check outside of ptp part of the driver which was not converted to the new macro and is making a lot of noise in free-running mode. Fixes: 131db4991622 ("bnxt_en: reset PHC frequency in free-running mode") Signed-off-by: Vadim Fedorenko <[email protected]> Reviewed-by: Michael Chan <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-19stmmac: fix changing mac addressCorinna Vinschen1-0/+2
Without the IFF_LIVE_ADDR_CHANGE flag being set, the network code disallows changing the mac address while the interface is UP. Consequences are, for instance, that the interface can't be used in a failover bond. Add the missing flag to net_device priv_flags. Tested on Intel Elkhart Lake with default settings, as well as with failover and alb mode bonds. Signed-off-by: Corinna Vinschen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19mlxsw: pci: Fix possible crash during initializationIdo Schimmel1-1/+1
During initialization the driver issues a reset command via its command interface in order to remove previous configuration from the device. After issuing the reset, the driver waits for 200ms before polling on the "system_status" register using memory-mapped IO until the device reaches a ready state (0x5E). The wait is necessary because the reset command only triggers the reset, but the reset itself happens asynchronously. If the driver starts polling too soon, the read of the "system_status" register will never return and the system will crash [1]. The issue was discovered when the device was flashed with a development firmware version where the reset routine took longer to complete. The issue was fixed in the firmware, but it exposed the fact that the current wait time is borderline. Fix by increasing the wait time from 200ms to 400ms. With this patch and the buggy firmware version, the issue did not reproduce in 10 reboots whereas without the patch the issue is reproduced quite consistently. [1] mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 mce: CPUs not responding to MCE broadcast (may include false positives): 0,4 Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Shutting down cpus with NMI Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: ac004e84164e ("mlxsw: pci: Wait longer before accessing the device after reset") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: RX, Add XDP multi-buffer support in Striding RQTariq Toukan5-59/+138
Here we add support for multi-buffer XDP handling in Striding RQ, which is our default out-of-the-box RQ type. Before this series, loading such an XDP program would fail, until you switch to the legacy RQ (by unsetting the rx_striding_rq priv-flag). To overcome the lack of headroom and tailroom between the strides, we allocate a side page to be used for the descriptor (xdp_buff / skb) and the linear part. When an XDP program is attached, we structure the xdp_buff so that it contains no data in the linear part, and the whole packet resides in the fragments. In case of XDP_PASS, where an SKB still needs to be created, we copy up to 256 bytes to its linear part, to match the current behavior, and satisfy functions that assume finding the packet headers in the SKB linear part (like eth_type_trans). Performance testing: Packet rate test, 64 bytes, 32 channels, MTU 9000 bytes. CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz. NIC: ConnectX-6 Dx, at 100 Gbps. +----------+-------------+-------------+---------+ | Test | Legacy RQ | Striding RQ | Speedup | +----------+-------------+-------------+---------+ | XDP_DROP | 101,615,544 | 117,191,020 | +15% | +----------+-------------+-------------+---------+ | XDP_TX | 95,608,169 | 117,043,422 | +22% | +----------+-------------+-------------+---------+ Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: RX, Prepare non-linear striding RQ for XDP multi-buffer supportTariq Toukan1-4/+47
In preparation for supporting XDP multi-buffer in striding RQ, use xdp_buff struct to describe the packet. Make its skb_shared_info collide the one of the allocated SKB, then add the fragments using the xdp_buff API. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: RX, Generalize mlx5e_fill_mxbuf()Tariq Toukan1-5/+8
Make the function more generic. Let it get an additional frame_sz parameter instead of deriving it from the RQ struct. No functional change here, just a preparation for a downstream patch. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: RX, Take shared info fragment addition into a functionTariq Toukan1-25/+31
Introduce mlx5e_add_skb_shared_info_frag(), a function dedicated for adding a fragment into a struct skb_shared_info object. Use it in the Legacy RQ flow. Similar usage will be added in a downstream patch by the corresponding Striding RQ flow. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: XDP, Allow non-linear single-segment frames in XDP TX MPWQETariq Toukan1-9/+26
Under a few restrictions, TX MPWQE feature can serve multiple TX packets in a single TX descriptor. It requires each of the packets to have a single scatter entry / segment. Today we allow only linear frames to use this feature, although there's no real problem with non-linear ones where the whole packet reside in the first fragment. Expand the XDP TX MPWQE feature support to include such frames. This is in preparation for the downstream patch, in which we will generate such non-linear frames. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: XDP, Remove un-established assumptions on XDP bufferTariq Toukan2-18/+22
Remove the assumption of non-zero linear length in the XDP xmit function, used to serve both internal XDP_TX operations as well as redirected-in requests. Do not apply the MLX5E_XDP_MIN_INLINE check unless necessary. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: XDP, Consider large muti-buffer packets in Striding RQ params ↵Tariq Toukan1-4/+7
calculations Function mlx5e_rx_get_linear_stride_sz() returns PAGE_SIZE immediately in case an XDP program is attached. The more accurate formula is ALIGN(sz, PAGE_SIZE), to prevent two packets from residing on the same page. The assumption behind the current code is that sz <= PAGE_SIZE holds for all cases with XDP program set. This is true because it is being called from: - 3 times from Striding RQ flows, in which XDP is not supported for such large packets. - 1 time from Legacy RQ flow, under the condition mlx5e_rx_is_linear_skb(). No functional change here, just removing the implied assumption in preparation for supporting XDP multi-buffer in Striding RQ. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: XDP, Let XDP checker function get the params as inputTariq Toukan1-13/+8
Change mlx5e_xdp_allowed() so it gets the params structure with the xdp_prog applied, rather than creating a local copy based on the current params in priv. This reduces the amount of memory on the stack, and acts on the exact params instance that's about to be applied. Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net/mlx5e: XDP, Improve Striding RQ check with XDPTariq Toukan1-14/+9
Non-linear mem scheme of Striding RQ does not yet support XDP at this point. Take the check where it belongs, inside the params validation function mlx5e_params_validate_xdp(). Reviewed-by: Gal Pressman <[email protected]> Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>