aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2021-02-09net: hns3: change hclge_query_bd_num() param typePeng Li1-2/+2
The type of parameter mpf_bd_num and pf_bd_num in hclge_query_bd_num() should be u32* instead of int*, so change them. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: change hclge_parse_speed() param typePeng Li1-1/+1
The type of parameters in hclge_parse_speed() should be unsigned type, so change them. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: modify some unmacthed types print parameterJiaran Zhang6-9/+9
Fix an issue where the formatting symbol of the formatting input and output function does not match the actual type. Signed-off-by: Jiaran Zhang <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: clean up unnecessary parentheses in macro definitionsYufeng Mo8-18/+18
In macro definitions, parentheses are unnecessary in some cases, such as the calling parameter of a function, the left variable of the equal sign, and so on. So remove these unnecessary parentheses according to these rules. Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: remove the shaper param magic numberPeng Li1-6/+8
To make the code more readable, this patch adds a definition for the magic number 126 used for the default shaper param ir_b, and rename macro DIVISOR_IR_B_126. No functional change. Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: remove redundant client_setup_tc handleJian Shen3-43/+0
Since the real tx queue number and real rx queue number always be updated when netdev opens, it's redundant to call hclge_client_setup_tc to do the same thing. So remove it. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: clean up some incorrect variable types in hclge_dbg_dump_tm_map()Yonglong Liu1-6/+5
queue_id, qset_id and other IDs are unsigned type, so modify the corresponding local variables' type in hclge_dbg_dump_tm_map() from signed to unsigned. kstrtouint() and the print format should be updated as well. Signed-off-by: Yonglong Liu <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: add a check for index in hclge_get_rss_key()Yufeng Mo1-0/+11
The index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this index before using it in hclge_get_rss_key(). Fixes: a638b1d8cc87 ("net: hns3: fix get VF RSS issue") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()Yufeng Mo1-4/+14
The tqp_index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this tqp_index before using it in hclge_get_ring_chain_from_mbx(). Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: hns3: add a check for queue_id in hclge_reset_vf_queue()Yufeng Mo1-0/+7
The queue_id is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this queue_id before using it in hclge_reset_vf_queue(). Fixes: 1a426f8b40fc ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: dsa: felix: implement port flushing on .phylink_mac_link_downVladimir Oltean2-0/+62
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-09net: broadcom: bcm4908enet: add BCM4908 controller driverRafał Miłecki4-0/+781
BCM4908 SoCs family uses Ethernel controller that includes UniMAC but uses different DMA engine (than other controllers) and requires different programming. Signed-off-by: Rafał Miłecki <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-08i40e: Log error for oversized MTU on deviceEryk Rybak1-4/+7
When attempting to link XDP prog with MTU larger than supported, user is not informed why XDP linking fails. Adding proper error message: "MTU too large to enable XDP". Signed-off-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Eryk Rybak <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08i40e: consolidate handling of XDP program actionsCristian Dumitrescu1-37/+61
Consolidate the actions performed on the packet based on the XDP program result into a separate function that is easier to read and maintain. Simplify the i40e_construct_skb_zc function, so that the input xdp buffer is always freed, regardless of whether the output skb is successfully created or not. Simplify the behavior of the i40e_clean_rx_irq_zc function, so that the current packet descriptor is dropped when function i40_construct_skb_zc returns an error as opposed to re-processing the same description on the next invocation. Signed-off-by: Cristian Dumitrescu <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08i40e: remove the redundant buffer info updatesCristian Dumitrescu1-19/+14
For performance reasons, remove the redundant buffer info updates (*bi = NULL). The buffers ready to be cleaned can easily be tracked based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08i40e: remove unnecessary cleaned_count updatesCristian Dumitrescu1-3/+1
For performance reasons, remove the redundant updates of the cleaned_count variable, as its value can be computed based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08i40e: remove unnecessary memory writes of the next to clean pointerCristian Dumitrescu1-19/+11
For performance reasons, avoid writing the ring next-to-clean pointer value back to memory on every update, as it is not really necessary. Instead, simply read it at initialization into a local copy, update the local copy as necessary and write the local copy back to memory after the last update. Signed-off-by: Cristian Dumitrescu <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08mlxsw: spectrum_router: Set offload_failed flagAmit Cohen1-0/+52
When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion fails, FIB abort is triggered. After aborting, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-08IPv6: Add "offload failed" indication to routesAmit Cohen1-2/+2
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv6 routes, so that users will have better visibility into the offload process. 'struct fib6_info' is extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase struct size. Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-08IPv4: Add "offload failed" indication to routesAmit Cohen1-0/+2
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias', and 'struct fib_rt_info' are extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase structs size. Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-08RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flowYishai Hadas1-0/+1
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a very heavy time consumer as part of dereg_mr. For example de-registration of 10000 ODP MRs each with size of 2M hugepage took 19.6 sec comparing de-registration of same number of non ODP MRs that took 172 ms. The new locking scheme uses the wait_event() mechanism which follows the use count of the MR instead of using synchronize_srcu(). By that change, the time required for the above test took 95 ms which is even better than the non ODP flow. Once fully dropped the srcu usage, had to come with a lock to protect the XA access. As part of using the above mechanism we could also clean the num_deferred_work stuff and follow the use count instead. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-02-08ice: Improve MSI-X fallback logicTony Nguyen1-15/+25
Currently if the driver is unable to get all the MSI-X vectors it wants, it falls back to the minimum configuration which equates to a single Tx/Rx traffic queue pair. Instead of using the minimum configuration, if given more vectors than the minimum, utilize those vectors for additional traffic queues after accounting for other interrupts. Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Tony Brelinski <[email protected]>
2021-02-08ice: Fix trivial error messageMitch Williams1-1/+1
This message indicates an error on close, not open. Signed-off-by: Mitch Williams <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: remove unnecessary castsBruce Allan4-11/+11
Casting a void * rvalue in an assignment is unnecessary in C; remove the casts. Signed-off-by: Bruce Allan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: Refactor DCB related variables out of the ice_port_info structChinh T Cao7-78/+83
Refactor the DCB related variables out of the ice_port_info_struct. The goal is to make the ice_port_info struct cleaner. Signed-off-by: Chinh T Cao <[email protected]> Co-developed-by: Dave Ertman <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: fix writeback enable logicJesse Brandeburg2-35/+25
The writeback enable logic was incorrectly implemented (due to misunderstanding what the side effects of the implementation would be during polling). Fix this logic issue, while implementing a new feature allowing the user to control the writeback frequency using the knobs for controlling interrupt throttling that we already have. Basically if you leave adaptive interrupts enabled, the writeback frequency will be varied even if busy_polling or if napi-poll is in use. If the interrupt rates are set to a fixed value by ethtool -C and adaptive is off, the driver will allow the user-set interrupt rate to guide how frequently the hardware will complete descriptors to the driver. Effectively the user will get a control over the hardware efficiency, allowing the choice between immediate interrupts or delayed up to a maximum of the interrupt rate, even when interrupts are disabled during polling. Signed-off-by: Jesse Brandeburg <[email protected]> Co-developed-by: Brett Creeley <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: Use PSM clock frequency to calculate RL profilesBen Shelton5-9/+63
The core clock frequency is currently hardcoded at 446 MHz for the RL profile calculations. This causes issues since not all devices use that clock frequency. Read the GLGEN_CLKSTAT_SRC register to determine which PSM clock frequency is selected. This ensures that the rate limiter profile calculations will be correct. Signed-off-by: Ben Shelton <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: create scheduler aggregator node config and move VSIsKiran Patil9-20/+1207
Create set scheduler aggregator node and move for VSIs into respective scheduler node. Max children per aggregator node is 64. There are two types of aggregator node(s) created. 1. dedicated node for PF and _CTRL VSIs 2. dedicated node(s) for VFs. As part of reset and rebuild, aggregator nodes are recreated and VSIs are moved to respective aggregator node. Having related VSIs in respective tree avoid starvation between PF and VF w.r.t Tx bandwidth. Co-developed-by: Tarun Singh <[email protected]> Signed-off-by: Tarun Singh <[email protected]> Co-developed-by: Victor Raj <[email protected]> Signed-off-by: Victor Raj <[email protected]> Co-developed-by: Anirudh Venkataramanan <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Signed-off-by: Kiran Patil <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: Add initial support framework for LAGDave Ertman7-0/+570
Add the framework and initial implementation for receiving and processing netdev bonding events. This is only the software support and the implementation of the HW offload for bonding support will be coming at a later time. There are some architectural gaps that need to be closed before that happens. Because this is a software only solution that supports in kernel bonding, SR-IOV is not supported with this implementation. Signed-off-by: Dave Ertman <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: Remove xsk_buff_pool from VSI structureMichal Swiatkowski3-79/+30
Current implementation of netdev already contains xsk_buff_pools. We no longer have to contain these structures in ice_vsi. Refactor the code to operate on netdev-provided xsk_buff_pools. Move scheduling napi on each queue to a separate function to simplify setup function. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: implement new LLDP filter commandDave Ertman6-8/+82
There is an issue with some NVMs where an already existent LLDP filter is blocking the creation of a filter to allow LLDP packets to be redirected to the default VSI for the interface. This is blocking all LLDP functionality based in the kernel when the FW LLDP agent is disabled (e.g. software based DCBx). Implement the new AQ command to allow adding VSI destinations to existent filters on NVM versions that support the new command. The new lldp_fltr_ctrl AQ command supports Rx filters only, so the code flow for adding filters to disable Tx of control frames will remain intact. Signed-off-by: Dave Ertman <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08ice: log message when trusted VF goes in/out of promisc modeBrett Creeley1-13/+19
Currently there is no message printed on the host when a VF goes in and out of promiscuous mode. This is causing confusion because this is the expected behavior based on i40e. Fix this. Signed-off-by: Brett Creeley <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-02-08Merge tag 'mlx5-updates-2021-02-04' of ↵David S. Miller24-1057/+3783
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2021-02-04 Vlad Buslov says: ================= Implement support for VF tunneling Abstract Currently, mlx5 only supports configuration with tunnel endpoint IP address on uplink representor. Remove implicit and explicit assumptions of tunnel always being terminated on uplink and implement necessary infrastructure for configuring tunnels on VF representors and updating rules on such tunnels according to routing changes. SW TC model From TC perspective VF tunnel configuration requires two rules in both directions: TX rules 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed RX rules 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed HW offloads model For hardware offload the goal is to mach packet on both rules without exposing it to software on tunnel endpoint VF. In order to achieve this for tx, TC implementation marks encap rules with tunnel endpoint on mlx5 VF of same eswitch with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE flag and adds header modification rule to overwrite packet source port to the value of tunnel VF. Eswitch code is modified to recirculate such packets after source port value is changed, which allows second tx rules to match. For rx path indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this, indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Such configuration will cause packet to appear on VF representor instead of VF itself if packet has been matches by indirect table rule based on tunnel parameters but missed on second rule (after recirculation). Handle such case by marking packets processed by indirect table with special 0xFFF value in reg_c1 and extending slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Routing events In order to support routing changes and migration of tunnel device between different endpoint VFs, implement routing infrastructure and update it with FIB events. Routing entry table is introduced to mlx5 TC. Every rx and tx VF tunnel rule is attached to a routing entry, which is shared for rules of same tunnel. On FIB event the work is scheduled to delete/recreate all rules of affected tunnel. Note: only vxlan tunnel type is supported by this series. =================
2021-02-08cxgb4: remove unused vpd_cap_addrHeiner Kallweit2-3/+0
It is likely that this is a leftover from T3 driver heritage. cxgb4 uses the PCI core VPD access code that handles detection of VPD capabilities. Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-06Merge branch '100GbE' of ↵Jakub Kicinski12-242/+906
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-02-05 This series contains updates to ice driver only. Jake adds adds reporting of timeout length during devlink flash and implements support to report devlink info regarding the version of firmware that is stored (downloaded) to the device, but is not yet active. ice_devlink_info_get will report "stored" versions when there is no pending flash update. Version info includes the UNDI Option ROM, the Netlist module, and the fw.bundle_id. Gustavo A. R. Silva replaces a one-element array to flexible-array member. Bruce utilizes flex_array_size() helper and removes dead code on a check for a condition that can't occur. v2: * removed security revision implementation, and re-ordered patches to account for this removal * squashed patches implementing ice_read_flash_module to avoid patches refactoring the implementation of a previous patch in the series * modify ice_devlink_info_get to always report "stored" versions instead of only reporting them when a pending flash update is ready. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: remove dead code ice: use flex_array_size where possible ice: Replace one-element array with flexible-array member ice: display stored UNDI firmware version via devlink info ice: display stored netlist versions via devlink info ice: display some stored NVM versions via devlink info ice: introduce function for reading from flash modules ice: cache NVM module bank information ice: introduce context struct for info report ice: create flash_info structure and separate NVM version ice: report timeout length for erasing during devlink flash ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06r8169: don't try to disable interrupts if NAPI is scheduled alreadyHeiner Kallweit1-2/+4
There's no benefit in trying to disable interrupts if NAPI is scheduled already. This allows us to save a PCI write in this case. Signed-off-by: Heiner Kallweit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: ena: Update XDP verdict upon failureShay Agroskin1-1/+5
The verdict returned from ena_xdp_execute() is used to determine the fate of the RX buffer's page. In case of XDP Redirect/TX error the verdict should be set to XDP_ABORTED, otherwise the page won't be freed. Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation") Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: dsa: felix: propagate the LAG offload ops towards the ocelot libVladimir Oltean1-6/+0
The ocelot switch has been supporting LAG offload since its initial commit, however felix could not make use of that, due to lack of a LAG abstraction in DSA. Now that we have that, let's forward DSA's calls towards the ocelot library, who will deal with setting up the bonding. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: rebalance LAGs on link up/down eventsVladimir Oltean3-9/+63
At present there is an issue when ocelot is offloading a bonding interface, but one of the links of the physical ports goes down. Traffic keeps being hashed towards that destination, and of course gets dropped on egress. Monitor the netdev notifier events emitted by the bonding driver for changes in the physical state of lower interfaces, to determine which ports are active and which ones are no longer. Then extend ocelot_get_bond_mask to return either the configured bonding interfaces, or the active ones, depending on a boolean argument. The code that does rebalancing only needs to do so among the active ports, whereas the bridge forwarding mask and the logical port IDs still need to look at the permanently bonded ports. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: rename aggr_count to num_ports_in_lagVladimir Oltean1-4/+3
It makes it a bit easier to read and understand the code that deals with balancing the 16 aggregation codes among the ports in a certain LAG. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: drop the use of the "lags" arrayVladimir Oltean1-56/+39
We can now simplify the implementation by always using ocelot_get_bond_mask to look up the other ports that are offloading the same bonding interface as us. In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through LAGs. We need to achieve the same behavior by marking each LAG as visited, which we do now by using a temporary 32-bit "visited" bitmask. This is ok and we do not need dynamic memory allocation, because we know that this switch architecture will not have more than 32 ports (the PGID port masks are 32-bit anyway). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: set up logical port IDs centrallyVladimir Oltean1-19/+28
The setup of logical port IDs is done in two places: from the inconclusively named ocelot_setup_lag and from ocelot_port_lag_leave, a function that also calls ocelot_setup_lag (which apparently does an incomplete setup of the LAG). To improve this situation, we can rename ocelot_setup_lag into ocelot_setup_logical_port_ids, and drop the "lag" argument. It will now set up the logical port IDs of all switch ports, which may be just slightly more inefficient but more maintainable. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: avoid unneeded "lp" variable in LAG joinVladimir Oltean1-10/+6
The index of the LAG is equal to the logical port ID that all the physical port members have, which is further equal to the index of the first physical port that is a member of the LAG. The code gets a bit carried away with logic like this: if (a == b) c = a; else c = b; which can be simplified, of course, into: c = b; (with a being port, b being lp, c being lag) This further makes the "lp" variable redundant, since we can use "lag" everywhere where "lp" (logical port) was used. So instead of a "c = b" assignment, we can do a complete deletion of b. Only one comment here: if (bond_mask) { lp = __ffs(bond_mask); ocelot->lags[lp] = 0; } lp was clobbered before, because it was used as a temporary variable to hold the new smallest port ID from the bond. Now that we don't have "lp" any longer, we'll just avoid the temporary variable and zeroize the bonding mask directly. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: set up the bonding mask in a way that avoids a net_deviceVladimir Oltean1-7/+22
Since this code should be called from pure switchdev as well as from DSA, we must find a way to determine the bonding mask not by looking directly at the net_device lowers of the bonding interface, since those could have different private structures. We keep a pointer to the bonding upper interface, if present, in struct ocelot_port. Then the bonding mask becomes the bitwise OR of all ports that have the same bonding upper interface. This adds a duplication of functionality with the current "lags" array, but the duplication will be short-lived, since further patches will remove the latter completely. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: use ipv6 in the aggregation codeVladimir Oltean1-1/+4
IPv6 header information is not currently part of the entropy source for the 4-bit aggregation code used for LAG offload, even though it could be. The hardware reference manual says about these fields: ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA Use IPv6 TCP/UDP port when calculating aggregation code. Configure identically for all ports. Recommended value is 1. ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA Use IPv6 flow label when calculating AC. Configure identically for all ports. Recommended value is 1. Integration with the xmit_hash_policy of the bonding interface is TBD. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: don't refuse bonding interfaces we can't offloadVladimir Oltean3-28/+17
Since switchdev/DSA exposes network interfaces that fulfill many of the same user space expectations that dedicated NICs do, it makes sense to not deny bonding interfaces with a bonding policy that we cannot offload, but instead allow the bonding driver to select the egress interface in software. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_eventVladimir Oltean1-23/+45
Make ocelot's net device event handler more streamlined by structuring it in a similar way with others. The inspiration here was dsa_slave_netdevice_event. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: mscc: ocelot: rename ocelot_netdevice_port_event to ↵Vladimir Oltean1-32/+27
ocelot_netdevice_changeupper ocelot_netdevice_port_event treats a single event, NETDEV_CHANGEUPPER. So we can remove the check for the type of event, and rename the function to be more suggestive, since there already is a function with a very similar name of ocelot_netdevice_event. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: hns3: replace macro of max qset number with specificationGuangbin Huang6-6/+12
The max qset number is a fixed value now and it is defined by a macro. In order to support other value in different kinds of device, it is better to use specification queried from firmware to replace macro. Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-02-06net: hns3: debugfs add max tm rate specification printGuangbin Huang1-0/+1
In order to add a method to check the specification of max tm rate for debugging, function hns3_dbg_dev_specs() adds this value print. Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>