aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2022-05-11net: macb: simplify/cleanup NAPI reschedule checkingRobert Hancock1-34/+31
Previously the macb_poll method was checking the RSR register after completing its RX receive work to see if additional packets had been received since IRQs were disabled, since this controller does not maintain the pending IRQ status across IRQ disable. It also had to double-check the register after re-enabling IRQs to detect if packets were received after the first check but before IRQs were enabled. Using the RSR register for this purpose is problematic since it reflects the global device state rather than the per-queue state, so if packets are being received on multiple queues it may end up retriggering receive on a queue where the packets did not actually arrive and not on the one where they did arrive. This will also cause problems with an upcoming change to use NAPI for the TX path where use of multiple queues is more likely. Add a macb_rx_pending function to check the RX ring to see if more packets have arrived in the queue, and use that to check if NAPI should be rescheduled rather than the RSR register. By doing this, we can just ignore the global RSR register entirely, and thus save some extra device register accesses at the same time. This also makes the previous first check for pending packets rather redundant, since it would be checking the RX ring state which was just checked in the receive work function. Therefore we can get rid of it and just check after enabling interrupts whether packets are already pending. Signed-off-by: Robert Hancock <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-11net: dsa: ocelot: accept 1000base-X for VSC9959 and VSC9953Vladimir Oltean4-1/+7
Switches using the Lynx PCS driver support 1000base-X optical SFP modules. Accept this interface type on a port. Signed-off-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-11i40e: i40e_main: fix a missing check on list iteratorXiaomeng Tong1-13/+14
The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: [email protected] Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-11Merge branch '1GbE' of ↵Jakub Kicinski6-74/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-05-10 This series contains updates to igc driver only. Sasha cleans up the code by removing an unused function and removing an enum for PHY type as there is only one PHY. The return type for igc_check_downshift() is changed to void as it always returns success. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Change type of the 'igc_check_downshift' method igc: Remove unused phy_type enum igc: Remove igc_set_spd_dplx method ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-11eth: amd: remove NI6510 support (ni65)Jakub Kicinski5-1386/+0
Looks like all the changes to this driver had been tree-wide refactoring since git era begun. The driver is using virt_to_bus() we should make it use more modern DMA APIs but since it's unlikely to be getting any use these days delete it instead. We can always revert to bring it back. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11net: appletalk: remove Apple/Farallon LocalTalk PC supportJakub Kicinski4-1363/+0
Looks like all the changes to this driver had been tree-wide refactoring since git era begun. The driver is using virt_to_bus() we should make it use more modern DMA APIs but since it's unlikely to be getting any use these days delete it instead. We can always revert to bring it back. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11s390/lcs: fix variable dereferenced before checkAlexandra Winter1-3/+4
smatch complains about drivers/s390/net/lcs.c:1741 lcs_get_control() warn: variable dereferenced before check 'card->dev' (see line 1739) Fixes: 27eb5ac8f015 ("[PATCH] s390: lcs driver bug fixes and improvements [1/2]") Signed-off-by: Alexandra Winter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11s390/ctcm: fix potential memory leakAlexandra Winter1-5/+1
smatch complains about drivers/s390/net/ctcm_mpc.c:1210 ctcmpc_unpack_skb() warn: possible memory leak of 'mpcginfo' mpc_action_discontact() did not free mpcginfo. Consolidate the freeing in ctcmpc_unpack_skb(). Fixes: 293d984f0e36 ("ctcm: infrastructure for replaced ctc driver") Signed-off-by: Alexandra Winter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11s390/ctcm: fix variable dereferenced before checkAlexandra Winter1-2/+3
Found by cppcheck and smatch. smatch complains about drivers/s390/net/ctcm_sysfs.c:43 ctcm_buffer_write() warn: variable dereferenced before check 'priv' (see line 42) Fixes: 3c09e2647b5e ("ctcm: rename READ/WRITE defines to avoid redefinitions") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11net: atlantic: verify hw_head_ lies within TX buffer ringGrant Grundler1-0/+7
Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring. Reported-by: Aashay Shringarpure <[email protected]> Reported-by: Yi Chou <[email protected]> Reported-by: Shervin Oloumi <[email protected]> Signed-off-by: Grant Grundler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11net: atlantic: add check for MAX_SKB_FRAGSGrant Grundler1-1/+5
Enforce that the CPU can not get stuck in an infinite loop. Reported-by: Aashay Shringarpure <[email protected]> Reported-by: Yi Chou <[email protected]> Reported-by: Shervin Oloumi <[email protected]> Signed-off-by: Grant Grundler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11net: atlantic: reduce scope of is_rsc_completeGrant Grundler1-7/+6
Don't defer handling the err case outside the loop. That's pointless. And since is_rsc_complete is only used inside this loop, declare it inside the loop to reduce it's scope. Signed-off-by: Grant Grundler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11net: atlantic: fix "frag[0] not initialized"Grant Grundler1-2/+1
In aq_ring_rx_clean(), if buff->is_eop is not set AND buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called. The loop following this code starts calling skb_add_rx_frag() starting with i=1 and thus frag[0] is never initialized. Since i is initialized to zero at the top of the primary loop, we can just reference and post-increment i instead of hardcoding the 0 when calling skb_add_rx_frag() the first time. Reported-by: Aashay Shringarpure <[email protected]> Reported-by: Yi Chou <[email protected]> Reported-by: Shervin Oloumi <[email protected]> Signed-off-by: Grant Grundler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-11Merge tag 'mlx5-updates-2022-05-09' of ↵David S. Miller21-301/+716
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-05-09 1) Gavin Li, adds exit route from waiting for FW init on device boot and increases FW init timeout on health recovery flow 2) Support 4 ports HCAs LAG mode Mark Bloch Says: ================ This series adds to mlx5 drivers support for 4 ports HCAs. Starting with ConnectX-7 HCAs with 4 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Devcom - Merged E-Switch - Single FDB E-Switch Lag was chosen to be converted first. Creating hardware LAG when all 4 ports are added to the same bond device. Devom, merge E-Switch and single FDB E-Switch, are marked as supporting only 2 ports HCAs and future patches will add support for 4 ports HCAs. In order to activate the hardware lag a user can execute the: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA. ================ ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-10net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()Yang Yingliang1-3/+1
Switch to using pcim_enable_device() to avoid missing pci_disable_device(). Reported-by: Hulk Robot <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10net: phy: smsc: add comments for the LAN8742 phy ID mask.Yuiko Oshino1-1/+5
add comments for the LAN8742 phy ID mask in the previous patch. add one missing tab in the LAN8742 phy ID line. Signed-off-by: Yuiko Oshino <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10net: phy: microchip: add comments for the modified LAN88xx phy ID mask.Yuiko Oshino1-0/+4
add comments for the updated LAN88xx phy ID mask in the previous patch. Signed-off-by: Yuiko Oshino <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc: Add a basic Siena moduleMartin Habets5-3/+28
Make the (un)load message more specific to differentiate it from the sfc.ko messages. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Inline functions in sriov.h to avoid conflicts with sfcMartin Habets2-77/+63
The implementation of each is quite short. This means sriov.c is not needed any more. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Rename functions in nic_common.h to avoid conflicts with sfcMartin Habets14-138/+95
For siena use efx_siena_ as the function prefix. efx_nic_update_stats_atomic is only used in efx_common.c, so move it there. efx_nic_copy_stats is not used in Siena, so it is removed. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Rename functions in mcdi headers to avoid conflicts with sfcMartin Habets17-609/+459
For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Rename peripheral functions to avoid conflicts with sfcMartin Habets15-270/+270
For siena use efx_siena_ as the function prefix. This patch covers selftest.h, ptp.h, net_driver.h and ethtool_common.h. efx_ethtool_fill_self_tests() can become static. Some functions in ptp.c can also become static. Rename loopback_mode in net_driver.h. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Rename RX/TX functions to avoid conflicts with sfcMartin Habets13-233/+216
For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Use a Siena specific variable name for module parameter efx_separate_tx_channels. Move efx_fini_tx_queue() to avoid a forward declaration of efx_dequeue_buffer(). Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Rename functions in efx headers to avoid conflicts with sfcMartin Habets23-472/+427
When building with allyesconfig there are many identical symbol names. For siena use efx_siena_ as the function and variable prefix to avoid build errors. efx_mtd_remove_partition can become static as it is no longer called from other files. efx_ticks_to_usecs and efx_xmit_done_single are not used in Siena, so they are removed. Several functions are only used inside efx_channels.c for Siena so they can become static. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc/siena: Remove build references to missing functionalityMartin Habets10-458/+17
Functionality not supported or needed on Siena includes: - Anything for EF100 - EF10 specifics such as register access, PIO and TSO offload. Also only bind to Siena NICs. Remove EF10 specifics from nic.h. The functions that start with efx_farch_ will be removed from sfc.ko with a subsequent patch. Add the efx_ prefix to siena_prepare_flush() to make it consistent with the other APIs. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc: Copy shared files needed for Siena (part 2)Martin Habets27-0/+14153
These are the files starting with m through w. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc: Copy shared files needed for Siena (part 1)Martin Habets14-0/+10524
These are the files starting with b through i. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10sfc: Move Siena specific filesMartin Habets4-0/+0
Files are only moved, no changes are made. Signed-off-by: Martin Habets <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10net: phy: micrel: Fix incorrect variable type in micrelWan Jiabing1-3/+2
In lanphy_read_page_reg, calling __phy_read() might return a negative error code. Use 'int' to check the error code. Fixes: 7c2dcfa295b1 ("net: phy: micrel: Add support for LAN8804 PHY") Signed-off-by: Wan Jiabing <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10nfp: flower: fix 'variable 'flow6' set but not used'Louis Peens1-12/+7
Kernel test robot reported an issue after a recent patch about an unused variable when CONFIG_IPV6 is disabled. Move the variable declaration to be inside the #ifdef, and do a bit more cleanup. There is no need to use a temporary ipv6 bool value, it is just checked once, remove the extra variable and just do the check directly. Fixes: 9d5447ed44b5 ("nfp: flower: fixup ipv6/ipv4 route lookup for neigh events") Reported-by: kernel test robot <[email protected]> Signed-off-by: Louis Peens <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-10igc: Change type of the 'igc_check_downshift' methodSasha Neftin2-6/+2
The 'igc_check_downshift' method always returns 0; there is no need for a return value so change the type of this method to void. Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-10igc: Remove unused phy_type enumSasha Neftin3-18/+3
Complete to commit 8e153faf5827 ("igc: Remove unused phy type") i225 parts have only one PHY. There is no point to use phy_type enum. Clean up the code accordingly, and get rid of the unused enum lines. Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-10igc: Remove igc_set_spd_dplx methodSasha Neftin2-51/+0
igc_set_spd_dplx method is not used. This patch comes to tidy up the driver code. Reported-by: Muhammad Husaini Zulkifli <[email protected]> Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-05-10net: ethernet: Add driver for Sunplus SP7021Wells Lu17-0/+2042
Add driver for Sunplus SP7021 SoC. Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Wells Lu <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10net: atlantic: always deep reset on pm op, fixing up my null deref regressionManuel Ullmann1-2/+2
The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert <[email protected]> Reported-by: Holger Hoffstaette <[email protected]> Tested-by: Jordan Leppert <[email protected]> Tested-by: Holger Hoffstaette <[email protected]> CC: <[email protected]> # 5.10+ Signed-off-by: Manuel Ullmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10tsnep: Add free running cycle counter supportGerhard Engleder3-7/+63
The TSN endpoint Ethernet MAC supports a free running counter additionally to its clock. This free running counter can be read and hardware timestamps are supported. As the name implies, this counter cannot be set and its frequency cannot be adjusted. Add free running cycle counter support based on this free running counter to physical clock. This also requires hardware time stamps based on that free running counter. Signed-off-by: Gerhard Engleder <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10ptp: Speed up vclock lookupGerhard Engleder2-19/+48
ptp_convert_timestamp() is called in the RX path of network messages. The current implementation takes ~5000ns on 1.2GHz A53. This is too much for the hot path of packet processing. Introduce hash table for fast vclock lookup in ptp_convert_timestamp(). The execution time of ptp_convert_timestamp() is reduced to ~700ns on 1.2GHz A53. Signed-off-by: Gerhard Engleder <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10ptp: Pass hwtstamp to ptp_convert_timestamp()Gerhard Engleder1-3/+2
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is a field of the argument with the type struct skb_shared_hwtstamps *. So a pointer to the hwtstamp field of this structure is sufficient. Rework ptp_convert_timestamp() to use an argument of type ktime_t *. This allows to add additional timestamp manipulation stages before the call of ptp_convert_timestamp(). Signed-off-by: Gerhard Engleder <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10ptp: Add cycles support for virtual clocksGerhard Engleder4-16/+49
ptp vclocks require a free running time for their timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. Introduce support for a free running cycle counter called cycles to physical clocks. Rework ptp vclocks to use this free running cycle counter. Default implementation is based on time of physical clock. Thus, behavior of ptp vclocks based on physical clocks without free running cycle counter is identical to previous behavior. Signed-off-by: Gerhard Engleder <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-05-10eth: dpaa2-mac: remove a dead-code NULL check on fwnode parentJakub Kicinski1-3/+0
Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set") @parent can't be NULL after the if. It's either the address of the ->fwnode of @dpmacs or @fwnode in case of ACPI. Signed-off-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-05-09net/mlx5: Lag, add debugfs to query hardware lag stateMark Bloch4-4/+191
Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, use buckets in hash modeMark Bloch4-76/+182
When in hardware lag and the NIC has more than 2 ports when one port goes down need to distribute the traffic between the remaining active ports. For better spread in such cases instead of using 1-to-1 mapping and only 4 slots in the hash, use many. Each port will have many slots that point to it. When a port goes down go over all the slots that pointed to that port and spread them between the remaining active ports. Once the port comes back restore the default mapping. We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots. Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port. The native mapping is such that: port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1] which means if a packet is hased into one of this slots it will hit the wire via port 1. port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2] which means if a packet is hased into one of this slots it will hit the wire via port2. and this mapping is the same of the rest of the ports. On a failover, lets say port 2 goes down (port 1, 3, 4 are still up). the new mapping for port 2 will be: port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4] which means the mapping was changed from the native mapping to a mapping that consists of only the active ports. With this if a port goes down the traffic will be split between the active ports randomly Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, refactor dmesg printMark Bloch1-10/+12
Combine dmesg lag prints into a single function. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Support devices with more than 2 portsMark Bloch2-2/+4
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, use actual number of lag portsMark Bloch3-149/+216
Refactor the entire lag code to use ldev->ports instead of hard-coded defines (like MLX5_MAX_PORTS) for its operations. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, use hash when in roce lag on 4 portsMark Bloch1-9/+36
Downstream patches will add support for lag over 4 ports. In that mode we will only use hash as the uplink selection method. Using hash instead of queue affinity (before this patch) offers key advantages like: - Align ports selection method with the method used by the bond device - Better packets distribution where a single queue can transmit from multiple ports (with queue affinity a queue is bound to a single port regardless of the packet being sent). - In case of failover we traffic is split between multiple ports and not a single one like in queue affinity. Going forward it was decided that queue affinity will be deprecated as using hash provides a better user experience which means on 4 ports HCAs hash will always be used. Future work will add hash support for 2 ports HCAs as well. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, support single FDB only on 2 portsMark Bloch1-0/+4
E-Switch currently doesn't support more than 2 E-Switch managers being aggregated under a single hardware lag. Have specific checks to disallow creating lag when the code doesn't support it. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, store number of ports inside lag objectMark Bloch2-0/+2
Store the number of lag ports inside the lag object. Lag object is a single shared object managing the lag state of multiple mlx5 devices on the same physical HCA. Downstream patches will allow hardware lag to be created over devices with more than 2 ports. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, filter non compatible devicesMark Bloch3-14/+47
When search for a peer lag device we can filter based on that device's capabilities. Downstream patch will be less strict when filtering compatible devices and remove the limitation where we require exact MLX5_MAX_PORTS and change it to a range. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2022-05-09net/mlx5: Lag, use lag lockMark Bloch4-65/+35
Use a lag specific lock instead of depending on external locks to synchronise the lag creation/destruction. With this, taking E-Switch mode lock is no longer needed for syncing lag logic. Cleanup any dead code that is left over and don't export functions that aren't used outside the E-Switch core code. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>