aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2021-02-14bnxt_en: Implement faster recovery for firmware fatal error.Michael Chan1-0/+19
During some fatal firmware error conditions, the PCI config space register 0x2e which normally contains the subsystem ID will become 0xffff. This register will revert back to the normal value after the chip has completed core reset. If we detect this condition, we can poll this config register immediately for the value to revert. Because we use config read cycles to poll this register, there is no possibility of Master Abort if we happen to read it during core reset. This speeds up recovery significantly as we don't have to wait for the conservative min_time before polling MMIO to see if the firmware has come out of reset. As soon as this register changes value we can proceed to re-initialize the device. Reviewed-by: Edwin Peer <[email protected]> Reviewed-by: Vasundhara Volam <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14bnxt_en: selectively allocate context memoriesEdwin Peer1-32/+52
Newer devices may have local context memory instead of relying on the host for backing store. In these cases, HWRM_FUNC_BACKING_STORE_QCAPS will return a zero entry size to indicate contexts for which the host should not allocate backing store. Selectively allocate context memory based on device capabilities and only enable backing store for the appropriate contexts. Signed-off-by: Edwin Peer <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14bnxt_en: Update firmware interface spec to 1.10.2.16.Michael Chan2-11/+96
The main changes are the echo request/response from firmware for error detection and the NO_FCS feature to transmit frames without FCS. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: axienet: Support dynamic switching between 1000BaseX and SGMIIRobert Hancock2-18/+71
Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or later) allow the core to be configured with a PHY interface mode of "Both", allowing either 1000BaseX or SGMII modes to be selected at runtime. Add support for this in the driver to allow better support for applications which can use both fiber and copper SFP modules. Signed-off-by: Robert Hancock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: axienet: hook up nway_reset ethtool operationRobert Hancock1-0/+8
Hook up the nway_reset ethtool operation to the corresponding phylink function so that "ethtool -r" can be supported. Signed-off-by: Robert Hancock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: axienet: Handle deferred probe on clock properlyRobert Hancock1-14/+12
This driver is set up to use a clock mapping in the device tree if it is present, but still work without one for backward compatibility. However, if getting the clock returns -EPROBE_DEFER, then we need to abort and return that error from our driver initialization so that the probe can be retried later after the clock is set up. Move clock initialization to earlier in the process so we do not waste as much effort if the clock is not yet available. Switch to use devm_clk_get_optional and abort initialization on any error reported. Also enable the clock regardless of whether the controller is using an MDIO bus, as the clock is required in any case. Fixes: 09a0354cadec267be7f ("net: axienet: Use clock framework to get device clock rate") Signed-off-by: Robert Hancock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12Merge branch '40GbE' of ↵David S. Miller10-136/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2021-02-12 This series contains updates to i40e, ice, and ixgbe drivers. Maciej does cleanups on the following drivers. For i40e, removes redundant check for XDP prog, cleans up no longer relevant information, and removes an unused function argument. For ice, removes local variable use, instead returning values directly. Moves skb pointer from buffer to ring and removes an unneeded check for xdp_prog in zero copy path. Also removes a redundant MTU check when changing it. For i40e, ice, and ixgbe, stores the rx_offset in the Rx ring as the value is constant so there's no need for continual calls. Bjorn folds a decrement into a while statement. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-02-12of: Remove of_dev_{get,put}()Rob Herring1-7/+8
of_dev_get() and of_dev_put are just wrappers for get_device()/put_device() on a platform_device. There's also already platform_device_{get,put}() wrappers for this purpose. Let's update the few users and remove of_dev_{get,put}(). Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Patrice Chotard <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Gilles Muller <[email protected]> Cc: Nicolas Palix <[email protected]> Cc: Michal Marek <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-02-12ibmvnic: change IBMVNIC_MAX_IND_DESCS to 16Dany Madden1-1/+1
The supported indirect subcrq entries on Power8 is 16. Power9 supports 128. Redefined this value to 16 to minimize the driver from having to reset when migrating between Power9 and Power8. In our rx/tx performance testing, we found no performance difference between 16 and 128 at this time. Fixes: f019fb6392e5 ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer") Signed-off-by: Dany Madden <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: mscc: ocelot: offload bridge port flags to deviceVladimir Oltean2-5/+131
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: mscc: ocelot: use separate flooding PGID for broadcastVladimir Oltean1-5/+8
In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributesVladimir Oltean5-55/+76
This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored. This patch also changes drivers to make use of the "mask" field for edge detection when possible. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Grygorii Strashko <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: switchdev: propagate extack to port attributesVladimir Oltean5-5/+10
When a struct switchdev_attr is notified through switchdev, there is no way to report informational messages, unlike for struct switchdev_obj. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12octeontx2: Fix condition.David S. Miller1-1/+1
Fixes: 93efb0c656837 ("octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam()") Signed-off-by: David S. Miller <[email protected]>
2021-02-12octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam()Gustavo A. R. Silva1-1/+1
Code at line 967 implies that rsp->fwdata.supported_fec may be up to 4: 967: if (rsp->fwdata.supported_fec <= FEC_MAX_INDEX) If rsp->fwdata.supported_fec evaluates to 4, then there is an out-of-bounds read at line 971 because fec is an array with a maximum of 4 elements: 954 const int fec[] = { 955 ETHTOOL_FEC_OFF, 956 ETHTOOL_FEC_BASER, 957 ETHTOOL_FEC_RS, 958 ETHTOOL_FEC_BASER | ETHTOOL_FEC_RS}; 959 #define FEC_MAX_INDEX 4 971: fecparam->fec = fec[rsp->fwdata.supported_fec]; Fix this by properly indexing fec[] with rsp->fwdata.supported_fec - 1. In this case the proper indexes 0 to 3 are used when rsp->fwdata.supported_fec evaluates to a range of 1 to 4, correspondingly. Fixes: d0cf9503e908 ("octeontx2-pf: ethtool fec mode support") Addresses-Coverity-ID: 1501722 ("Out-of-bounds read") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12octeontx2-af: Fix spelling mistake "recievd" -> "received"Colin Ian King1-1/+1
There is a spelling mistake in the text in array rpm_rx_stats_fields, fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclge_rm_vport_all_mac_table()Hao Chen1-24/+43
hclge_rm_vport_all_mac_table() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Hao Chen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclgevf_set_rss_tuple()Huazhong Tan1-15/+32
To make it more readable and maintainable, split hclgevf_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclge_set_rss_tuple()Huazhong Tan1-13/+29
To make it more readable and maintainable, split hclge_set_rss_tuple() into two parts. Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: split out hclgevf_cmd_send()Yufeng Mo1-60/+81
hclgevf_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: split out hclge_cmd_send()Yufeng Mo1-41/+59
hclge_cmd_send() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: split out hclge_dbg_dump_qos_buf_cfg()Jian Shen1-43/+115
hclge_dbg_dump_qos_buf_cfg() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclgevf_get_rss_tuple()Jian Shen1-25/+42
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclgevf_get_rss_tuple(). Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclge_get_rss_tuple()Jian Shen1-21/+38
To improve code readability and maintainability, separate the flow type parsing part and the converting part from bloated hclge_get_rss_tuple(). Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclge_set_vf_vlan_common()Peng Li1-25/+48
To improve code readability and maintainability, separate the command handling part and the status parsing part from bloated hclge_set_vf_vlan_common(). Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: use ipv6_addr_any() helperJiaran Zhang1-8/+5
Use common ipv6_addr_any() to determine if an addr is ipv6 any addr. Signed-off-by: Jiaran Zhang <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: clean up hns3_dbg_cmd_write()Peng Li1-18/+26
As more commands are added, hns3_dbg_cmd_write() is going to get more bloated, so move the part about command check into a separate function. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclgevf_cmd_convert_err_code()Peng Li1-28/+27
To improve code readability and maintainability, refactor hclgevf_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12net: hns3: refactor out hclge_cmd_convert_err_code()Peng Li1-28/+27
To improve code readability and maintainability, refactor hclge_cmd_convert_err_code() with an array of imp_errcode and common_errno mapping, instead of a bloated switch/case. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-12ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ringMaciej Fijalkowski2-7/+9
Output of ixgbe_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to ixgbe_ring that is meant to hold the ixgbe_rx_offset() result and use it within ixgbe_clean_rx_irq(). Furthermore, use it within ixgbe_alloc_mapped_page(). Last but not least, un-inline the function of interest as it lives in .c file so let compiler do the decision about the inlining. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12ice: store the result of ice_rx_offset() onto ice_ringMaciej Fijalkowski2-21/+23
Output of ice_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to ice_ring that is meant to hold the ice_rx_offset() result and use it within ice_clean_rx_irq(). Furthermore, use it within ice_alloc_mapped_page(). Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12i40e: store the result of i40e_rx_offset() onto i40e_ringMaciej Fijalkowski2-17/+19
Output of i40e_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to i40e_ring that is meant to hold the i40e_rx_offset() result and use it within i40e_clean_rx_irq(). Furthermore, use it within i40e_alloc_mapped_page(). Last but not least, un-inline the function of interest so that compiler makes the decision about inlining as it lives in .c file. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12i40e: Simplify the do-while allocation loopBjörn Töpel1-3/+1
Fold the count decrement into the while-statement. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12ice: skip NULL check against XDP prog in ZC pathMaciej Fijalkowski1-4/+3
Whole zero-copy variant of clean Rx IRQ is executed when xsk_pool is attached to rx_ring and it can happen only when XDP program is present on interface. Therefore it is safe to assume that program is always !NULL and there is no need for checking it in ice_run_xdp_zc. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12ice: remove redundant checks in ice_change_mtuMaciej Fijalkowski1-9/+0
dev_validate_mtu checks that mtu value specified by user is not less than min mtu and not greater than max allowed mtu. It is being done before calling the ndo_change_mtu exposed by driver, so remove these redundant checks in ice_change_mtu. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12ice: move skb pointer from rx_buf to rx_ringMaciej Fijalkowski2-18/+14
Similar thing has been done in i40e, as there is no real need for having the sk_buff pointer in each rx_buf. Non-eop frames can be simply handled on that pointer moved upwards to rx_ring. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12ice: simplify ice_run_xdpMaciej Fijalkowski1-10/+5
There's no need for 'result' variable, we can directly return the internal status based on action returned by xdp prog. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12i40e: adjust i40e_is_non_eopMaciej Fijalkowski1-17/+6
i40e_is_non_eop had a leftover comment and unused skb argument which was used for placing the skb onto rx_buf in case when current buffer was non-eop one. This is not relevant anymore as commit e72e56597ba1 ("i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring") pulled the non-complete skb handling out of rx_bufs up to rx_ring. Therefore, let's adjust the function arguments that i40e_is_non_eop takes. Furthermore, since there is already a function responsible for bumping the ntc, make use of that and drop that logic from i40e_is_non_eop so that the scope of this function is limited to what the name actually states. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12i40e: drop misleading function commentsMaciej Fijalkowski1-27/+6
i40e_cleanup_headers has a statement about check against skb being linear or not which is not relevant anymore, so let's remove it. Same case for i40e_can_reuse_rx_page, it references things that are not present there anymore. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-12i40e: drop redundant check when setting xdp progMaciej Fijalkowski1-3/+0
Net core handles the case where netdev has no xdp prog attached and current prog is NULL. Therefore, remove such check within i40e_xdp_setup. Reviewed-by: Björn Töpel <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-11net/mlx5: Remove TLS dependencies on XPSTariq Toukan1-2/+0
No real dependency on XPS, but on RX queue mapping, which is being selected by TLS_DEVICE. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-11net/mlx5e: Check tunnel offload is required before setting SWPMoshe Shemesh1-1/+1
Check that tunnel offload is required before setting Software Parser offsets to get Geneve HW offload. In case of Geneve packet we check HW offload support of SWP in mlx5e_tunnel_features_check() and set features accordingly, this should be reflected in skb offload requested by the kernel and we should add the Software Parser offsets only if requested. Otherwise, in case HW doesn't support SWP for Geneve, data path will mistakenly try to offload Geneve SKBs with skb->encapsulation set, regardless of whether offload was requested or not on this specific SKB. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5e: CT: manage the lifetime of the ct entry objectOz Shlomo1-67/+192
The ct entry object is accessed by the ct add, del, stats and restore methods. In addition, it is referenced from several hash tables. The lifetime of the ct entry object was not managed which triggered race conditions as in the following kasan dump: [ 3374.973945] ================================================================== [ 3374.988552] BUG: KASAN: use-after-free in memcmp+0x4c/0x98 [ 3374.999590] Read of size 1 at addr ffff00036129ea55 by task ksoftirqd/1/15 [ 3375.016415] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.4.31+ #1 [ 3375.055301] Call trace: [ 3375.060214] dump_backtrace+0x0/0x238 [ 3375.067580] show_stack+0x24/0x30 [ 3375.074244] dump_stack+0xe0/0x118 [ 3375.081085] print_address_description.isra.9+0x74/0x3d0 [ 3375.091771] __kasan_report+0x198/0x1e8 [ 3375.099486] kasan_report+0xc/0x18 [ 3375.106324] __asan_load1+0x60/0x68 [ 3375.113338] memcmp+0x4c/0x98 [ 3375.119409] mlx5e_tc_ct_restore_flow+0x3a4/0x6f8 [mlx5_core] [ 3375.131073] mlx5e_rep_tc_update_skb+0x1d4/0x2f0 [mlx5_core] [ 3375.142553] mlx5e_handle_rx_cqe_rep+0x198/0x308 [mlx5_core] [ 3375.154034] mlx5e_poll_rx_cq+0x2a0/0x1060 [mlx5_core] [ 3375.164459] mlx5e_napi_poll+0x1d4/0xa78 [mlx5_core] [ 3375.174453] net_rx_action+0x28c/0x7a8 [ 3375.182004] __do_softirq+0x1b4/0x5d0 Manage the lifetime of the ct entry object by using synchornization mechanisms for concurrent access. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Oz Shlomo <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5: Disable devlink reload for lag devicesShay Drory1-0/+5
Devlink reload can't be allowed on lag devices since reloading one lag device will cause traffic on the bond to get stucked. Users who wish to reload a lag device, need to remove the device from the bond, and only then reload it. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5: Disallow RoCE on lag deviceShay Drory1-2/+2
In lag mode, setting roce enabled/disable of lag device have no effect. e.g.: bond device (roce/vf_lag) roce status remain unchanged. Therefore disable it and add an error message. Fixes: cc9defcbb8fa ("net/mlx5: Handle "enable_roce" devlink param") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5: Disallow RoCE on multi port slave deviceShay Drory1-0/+4
In dual port mode, setting roce enabled/disable for the slave device have no effect. e.g.: the slave device roce status remain unchanged. Therefore disable it and add an error message. Enable or disable roce of the master device affect both master and slave devices. Fixes: cc9defcbb8fa ("net/mlx5: Handle "enable_roce" devlink param") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5: Disable devlink reload for multi port slave deviceShay Drory1-1/+2
Devlink reload can't be allowed on a multi port slave device, because reload of slave device doesn't take effect. The right flow is to disable devlink reload for multi port slave device. Hence, disabling it in mlx5_core probing. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5e: kTLS, Use refcounts to free kTLS RX priv contextMaxim Mikityanskiy1-34/+30
wait_for_resync is unreliable - if it timeouts, priv_rx will be freed anyway. However, mlx5e_ktls_handle_get_psv_completion will be called sooner or later, leading to use-after-free. For example, it can happen if a CQ error happened, and ICOSQ stopped, but later on the queues are destroyed, and ICOSQ is flushed with mlx5e_free_icosq_descs. This patch converts the lifecycle of priv_rx to fully refcount-based, so that the struct won't be freed before the refcount goes to zero. Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5e: Fix CQ params of ICOSQ and async ICOSQMaxim Mikityanskiy1-2/+2
The commit mentioned below has split the parameters of ICOSQ and async ICOSQ, but it contained a typo: the CQ parameters were swapped for ICOSQ and async ICOSQ. Async ICOSQ is longer than the normal ICOSQ, and the CQ size must be the same as the size of the corresponding SQ, but due to this bug, the CQ of async ICOSQ was much shorter than async ICOSQ itself. It led to overflows of the CQ with such messages in dmesg, in particular, when running multiple kTLS-offloaded streams: mlx5_core 0000:08:00.0: cq_err_event_notifier:529:(pid 9422): CQ error on CQN 0x406, syndrome 0x1 mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000406 event=0x04 This commit fixes the issue by using the corresponding parameters for ICOSQ and async ICOSQ. Fixes: c293ac927fbb ("net/mlx5e: Refactor build channel params") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-02-11net/mlx5e: Replace synchronize_rcu with synchronize_netMaxim Mikityanskiy4-7/+7
The commit cited below switched from using napi_synchronize to synchronize_rcu to have a guarantee that it will finish in finite time. However, on average, synchronize_rcu takes more time than napi_synchronize. Given that it's called multiple times per channel on deactivation, it accumulates to a significant amount, which causes timeouts in some applications (for example, when using bonding with NetworkManager). This commit replaces synchronize_rcu with synchronize_net, which is faster when called under rtnl_lock, allowing to speed up the described flow. Fixes: 9c25a22dfb00 ("net/mlx5e: Use synchronize_rcu to sync with NAPI") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>