aboutsummaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2018-04-01mlxsw: spectrum: Don't use resource ID of 0Petr Machata1-1/+1
In commit 145307460ba9 ("devlink: Remove top_hierarchy arg to devlink_resource_register"), the "top_hierarchy" parameter to devlink_resource_register() was removed in favor of using the parameter "parent_resource_id" exclusively to determine who the parent is. The root node's resource ID for this purpose is DEVLINK_RESOURCE_ID_PARENT_TOP with the value 0. It is therefore problematic that the resource MLXSW_SP_RESOURCE_KVD has also ID of 0. Fix this by numbering driver-specific resources from 1. Fixes: 145307460ba9 ("devlink: Remove top_hierarchy arg to devlink_resource_register") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: spectrum: Pass mlxsw_core as arg of mlxsw_sp_kvdl_resources_register()Jiri Pirko3-4/+4
Pass struct mlxsw_core instead of devlink since it is nicer within mlxsw code and we need both structs in mlxsw_sp_kvdl_resources_register() anyway. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: Move "resources_query_enable" out of mlxsw_config_profileJiri Pirko6-12/+8
As struct mlxsw_config_profile is mapped to the payload of the FW command of the same name, resources_query_enable flag does not belong there. Move it to struct mlxsw_driver. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: Move "used_kvd_sizes" check to mlxsw_pci_config_profileJiri Pirko3-6/+4
The check should be done directly in mlxsw_pci_config_profile, as for other profile items. Also, be consistent in naming with the rest and rename to "used_kvd_sizes". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: core: Fix arg name of MLXSW_CORE_RES_VALID and MLXSW_CORE_RES_GETJiri Pirko1-4/+4
First arg of these helpers should be "mlxsw_core". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: remove kvd_hash_granularity from config profile structJiri Pirko2-4/+2
This should not be part of the struct, as the struct fields are tightly coupled with the FW command payload of the same name. Just use the "granularity" define directly, as in other places. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: spectrum: Change KVD linear parts from list to arrayJiri Pirko1-143/+92
The parts info is array. The parts copy this info array, yet they are a list. So make the indexing according to the id and change the list of parts into array of parts. This helps to eliminate lookups and constructs like mlxsw_sp_kvdl_part_update() (took me some non-trivial time to figure out what is going on there). Alongside with that, introduce a helper macro to define the parts infos. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: Constify devlink_resource_opsJiri Pirko2-4/+4
devlink_resource_ops should be const as the arg of register function is also const. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: spectrum_kvdl: Fix handling of resource_size_paramJiri Pirko1-33/+14
Current code uses global variables, adjusts them and passes pointer down to devlink. With every other mlxsw_core instance, the previously passed pointer values are rewritten. Fix this by de-globalize the variables. Fixes: 7f47b19bd744 ("mlxsw: spectrum_kvdl: Add support for per part occupancy") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01mlxsw: spectrum_acl: Fix flex actions header ifndef define constructJiri Pirko1-2/+2
Fix copy&paste error in flex actions header ifndef define construct Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31cxgb4: LLD driver changes to support TLSAtul Gupta3-15/+131
Read the Inline TLS capability from firmware. Determine the area reserved for storing the keys Dump the Inline TLS tx and rx records count. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Michael Werner <werner@chelsio.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31cxgb4: Inline TLS FW InterfaceAtul Gupta3-6/+283
Key area size in hw-config file. CPL struct for TLS request and response. Work request for Inline TLS. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller14-115/+794
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Add ULP calls to stop and restart IRQs.Michael Chan3-17/+90
When the driver needs to re-initailize the IRQ vectors, we make the new ulp_irq_stop() call to tell the RDMA driver to disable and free the IRQ vectors. After IRQ vectors have been re-initailized, we make the ulp_irq_restart() call to tell the RDMA driver that IRQs can be restarted. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.Michael Chan3-16/+65
Add additional logic to reserve completion rings for the bnxt_re driver when it requests MSIX vectors. The function bnxt_cp_rings_in_use() will return the total number of completion rings used by both drivers that need to be reserved. If the network interface in up, we will close and open the NIC to reserve the new set of completion rings and re-initialize the vectors. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Refactor bnxt_need_reserve_rings().Michael Chan1-32/+25
Refactor bnxt_need_reserve_rings() slightly so that __bnxt_reserve_rings() can call it and remove some duplicated code. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Add IRQ remapping logic.Michael Chan1-17/+42
Add remapping logic so that bnxt_en can use any arbitrary MSIX vectors. This will allow the driver to reserve one range of MSIX vectors to be used by both bnxt_en and bnxt_re. bnxt_en can now skip over the MSIX vectors used by bnxt_re. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Change IRQ assignment for RDMA driver.Michael Chan3-3/+61
In the current code, the range of MSIX vectors allocated for the RDMA driver is disjoint from the network driver. This creates a problem for the new firmware ring reservation scheme. The new scheme requires the reserved completion rings/MSIX vectors to be in a contiguous range. Change the logic to allocate RDMA MSIX vectors to be contiguous with the vectors used by bnxt_en on new firmware using the new scheme. The new function bnxt_get_num_msix() calculates the exact number of vectors needed by both drivers. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Improve ring allocation logic.Michael Chan2-15/+21
Currently, the driver code makes some assumptions about the group index and the map index of rings. This makes the code more difficult to understand and less flexible. Improve it by adding the grp_idx and map_idx fields explicitly to the bnxt_ring_struct as a union. The grp_idx is initialized for each tx ring and rx agg ring during init. time. We do the same for the map_idx for each cmpl ring. The grp_idx ties the tx ring to the ring group. The map_idx is the doorbell index of the ring. With this new infrastructure, we can change the ring index mapping scheme easily in the future. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Improve valid bit checking in firmware response message.Michael Chan2-5/+18
When firmware sends a DMA response to the driver, the last byte of the message will be set to 1 to indicate that the whole response is valid. The driver waits for the message to be valid before reading the message. The firmware spec allows these response messages to increase in length by adding new fields to the end of these messages. The older spec's valid location may become a new field in a newer spec. To guarantee compatibility, the driver should zero the valid byte before interpreting the entire message so that any new fields not implemented by the older spec will be read as zero. For messages that are forwarded to VFs, we need to set the length and re-instate the valid bit so the VF will see the valid response. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Improve resource accounting for SRIOV.Michael Chan1-10/+8
When VFs are created, the current code subtracts the maximum VF resources from the PF's pool. This under-estimates the resources remaining in the PF pool. Instead, we should subtract the minimum VF resources. The VF minimum resources are guaranteed to the VFs and only these should be subtracted from the PF's pool. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Check max_tx_scheduler_inputs value from firmware.Michael Chan3-2/+19
When checking for the maximum pre-set TX channels for ethtool -l, we need to check the current max_tx_scheduler_inputs parameter from firmware. This parameter specifies the max input for the internal QoS nodes currently available to this function. The function's TX rings will be capped by this parameter. By adding this logic, we provide a more accurate pre-set max TX channels to the user. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Add extended port statistics supportVasundhara Volam3-2/+81
Gather periodic extended port statistics, if the device is PF and link is up. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Include additional hardware port statistics in ethtool -S.Vasundhara Volam1-0/+5
Include additional hardware port statistics in ethtool -S, which are useful for debugging. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Add support for ndo_set_vf_trustVasundhara Volam4-9/+37
Trusted VFs are allowed to modify MAC address, even when PF has assigned one. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: fix clear flags in ethtool reset handlingScott Branden1-2/+6
Clear flags when reset command processed successfully for components specified. Fixes: 6502ad5963a5 ("bnxt_en: Add ETH_RESET_AP support") Signed-off-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Use a dedicated VNIC mode for RDMA.Michael Chan2-4/+15
If the RDMA driver is registered, use a new VNIC mode that allows RDMA traffic to be seen on the netdev in promiscuous mode. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Adjust default rings for multi-port NICs.Michael Chan1-3/+9
Change the default ring logic to select default number of rings to be up to 8 per port if the default rings x NIC ports <= total CPUs. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bnxt_en: Update firmware interface to 1.9.1.15.Michael Chan4-103/+210
Minor changes, such as new extended port statistics. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge tag 'mlx5-updates-2018-03-30' of ↵David S. Miller9-443/+402
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2018-03-30 This series contains updates to mlx5 core and mlx5e netdev drivers. The main highlight of this series is the RX optimizations for striding RQ path, introduced by Tariq. First Four patches are trivial misc cleanups. - Spelling mistake fix - Dead code removal - Warning messages RX optimizations for striding RQ: 1) RX refactoring, cleanups and micro optimizations - MTU calculation simplifications, obsoletes some WQEs-to-packets translation functions and helps delete ~60 LOC. - Do not busy-wait a pending UMR completion. - post the new values of UMR WQE inline, instead of using a data pointer. - use pre-initialized structures to save calculations in datapath. 2) Use linear SKB in Striding RQ "build_skb", (Using linear SKB has many advantages): - Saves a memcpy of the headers. - No page-boundary checks in datapath. - No filler CQEs. - Significantly smaller CQ. - SKB data continuously resides in linear part, and not split to small amount (linear part) and large amount (fragment). This saves datapath cycles in driver and improves utilization of SKB fragments in GRO. - The fragments of a resulting GRO SKB follow the IP forwarding assumption of equal-size fragments. implementation details: HW writes the packets to the beginning of a stride, i.e. does not keep headroom. To overcome this we make sure we can extend backwards and use the last bytes of stride i-1. Extra care is needed for stride 0 as it has no preceding stride. We make sure headroom bytes are available by shifting the buffer pointer passed to HW by headroom bytes. This configuration now becomes default, whenever capable. Of course, this implies turning LRO off. Performance testing: ConnectX-5, single core, single RX ring, default MTU. UDP packet rate, early drop in TC layer: -------------------------------------------- | pkt size | before | after | ratio | -------------------------------------------- | 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x | | 500byte | 5.23 Mpps | 5.97 Mpps | 1.14x | | 64byte | 5.94 Mpps | 5.96 Mpps | 1.00x | -------------------------------------------- TCP streams: ~20% gain 3) Support XDP over Striding RQ: Now that linear SKB is supported over Striding RQ, we can support XDP by setting stride size to PAGE_SIZE and headroom to XDP_PACKET_HEADROOM. Striding RQ is capable of a higher packet-rate than conventional RQ. Performance testing: ConnectX-5, 24 rings, default MTU. CQE compression ON (to reduce completions BW in PCI). XDP_DROP packet rate: -------------------------------------------------- | pkt size | XDP rate | 100GbE linerate | pct% | -------------------------------------------------- | 64byte | 126.2 Mpps | 148.0 Mpps | 85% | | 128byte | 80.0 Mpps | 84.8 Mpps | 94% | | 256byte | 42.7 Mpps | 42.7 Mpps | 100% | | 512byte | 23.4 Mpps | 23.4 Mpps | 100% | -------------------------------------------------- 4) Remove mlx5 page_ref bulking in Striding RQ and use page_ref_inc only when needed. Without this bulking, we have: - no atomic ops on WQE allocation or free - one atomic op per SKB - In the default MTU configuration (1500, stride size is 2K), the non-bulking method execute 2 atomic ops as before - For larger MTUs with stride size of 4K, non-bulking method executes only a single op. - For XDP (stride size of 4K, no SKBs), non-bulking have no atomic ops per packet at all. Performance testing: ConnectX-5, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Single core packet rate (64 bytes). Early drop in TC: no degradation. XDP_DROP: before: 14,270,188 pps after: 20,503,603 pps, 43% improvement. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31hv_netvsc: Clean up extra parameter from rndis_filter_receive_data()Haiyang Zhang1-7/+9
The variables, msg and data, have the same value. This patch removes the extra one. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31ethernet: hisilicon: hns: hns_dsaf_mac: Use generic eth_broadcast_addrJoe Perches1-4/+2
Rather than use an on-stack array to copy a broadcast address, use the generic eth_broadcast_addr function to save a trivial amount of object code. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: Do not take net_rwsem in __rtnl_link_unregister()Kirill Tkhai2-0/+4
This function calls call_netdevice_notifier(), which also may take net_rwsem. So, we can't use net_rwsem here. This patch makes callers of this functions take pernet_ops_rwsem, like register_netdevice_notifier() does. This will protect the modifications of net_namespace_list, and allows notifiers to take it (they won't have to care about context). Since __rtnl_link_unregister() is used on module load and unload (which are not frequent operations), this looks for me better, than make all call_netdevice_notifier() always executing in "protected net_namespace_list" context. Also, this fixes the problem we had a deal in 328fbe747ad4 "Close race between {un, }register_netdevice_notifier and ...", and guarantees __rtnl_link_unregister() does not skip exitting net. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: hns3: remove unnecessary pci_set_drvdata() and devm_kfree()Wei Yongjun1-4/+0
There is no need for explicit calls of devm_kfree(), as the allocated memory will be freed during driver's detach. The driver core clears the driver data to NULL after device_release. Thus, it is not needed to manually clear the device driver data to NULL. So remove the unnecessary pci_set_drvdata() and devm_kfree(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31netdevsim: Change nsim_devlink_setup to return error to callerDavid Ahern3-8/+15
Change nsim_devlink_setup to return any error back to the caller and update nsim_init to handle it. Requested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add ndo_set_rx_mode callback implementation for VFVadim Lomovtsev1-1/+109
The ndo_set_rx_mode() is called from atomic context which causes messages response timeouts while VF to PF communication via MSIx. To get rid of that we're copy passed mc list, parse flags and queue handling of kernel request to ordered workqueue. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add workqueue control structures for handle ndo_set_rx_mode ↵Vadim Lomovtsev1-0/+17
request The kernel calls ndo_set_rx_mode() callback from atomic context which causes messaging timeouts between VF and PF (as they’re implemented via MSIx). So in order to handle ndo_set_rx_mode() we need to get rid of it. This commit implements necessary workqueue related structures to let VF queue kernel request processing in non-atomic context later. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add XCAST messages handlers for PFVadim Lomovtsev1-4/+41
This commit is to add message handling for ndo_set_rx_mode() callback at PF side. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add new messages for handle ndo_set_rx_mode callbackVadim Lomovtsev1-0/+12
The kernel calls ndo_set_rx_mode() callback supplying it will all necessary info, such as device state flags, multicast mac addresses list and so on. Since we have only 128 bits to communicate with PF we need to initiate several requests to PF with small/short operation each based on input data. So this commit implements following PF messages codes along with new data structures for them: NIC_MBOX_MSG_RESET_XCAST to flush all filters configured for this particular network interface (VF) NIC_MBOX_MSG_ADD_MCAST to add new MAC address to DMAC filter registers for this particular network interface (VF) NIC_MBOX_MSG_SET_XCAST to apply filtering configuration to filter control register Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add multicast filter management supportVadim Lomovtsev2-1/+153
The ThunderX NIC could be partitioned to up to 128 VFs and thus represented to system. Each VF is mapped to pair BGX:LMAC, and each of VF is configured by kernel individually. Eventually the bunch of VFs could be mapped onto same pair BGX:LMAC and thus could cause several multicast filtering configuration requests to LMAC with the same MAC addresses. This commit is to add ThunderX NIC BGX filtering manipulation routines. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add MAC address filter tracking for LMACVadim Lomovtsev1-14/+30
The ThunderX NIC has two Ethernet Interfaces (BGX) each of them could has up to four Logical MACs configured. Each of BGX has 32 filters to be configured for filtering ingress packets. The number of filters available to particular LMAC is from 8 (if we have four LMACs configured per BGX) up to 32 (in case of only one LMAC is configured per BGX). At the same time the NIC could present up to 128 VFs to OS as network interfaces, each of them kernel will configure with set of MAC addresses for filtering. So to prevent dupes in BGX filter registers from different network interfaces it is required to cache and track all filter configuration requests prior to applying them onto BGX filter registers. This commit is to update LMAC structures with control fields to allocate/releasing filters tracking list along with implementing dmac array allocate/release per LMAC. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: move filter register related macro into proper placeVadim Lomovtsev2-11/+13
The ThunderX NIC has set of registers which allows to configure filter policy for ingress packets. There are three possible regimes of filtering multicasts, broadcasts and unicasts: accept all, reject all and accept filter allowed only. Current implementation has enum with all of them and two generic macro for enabling filtering et all (CAM_ACCEPT) and enabling/disabling broadcast packets, which also should be corrected in order to represent register bits properly. All these values are private for driver and there is no need to ‘publish’ them via header file. This commit is to move filtering register manipulation values from header file into source with explicit assignment of exact register values to them to be used while register configuring. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: stmmac: dwmac-meson8b: Add support for the Meson8m2 SoCMartin Blumenstingl1-2/+3
The Meson8m2 SoC uses a similar (potentially even identical) register layout as the Meson8b and GXBB SoCs for the dwmac glue. Add a new compatible string and update the module description to indicate support for these SoCs. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30net/mlx5e: RX, Recycle buffer of UMR WQEsTariq Toukan1-2/+9
Upon a new UMR post, check if the WQE buffer contains a previous UMR WQE. If so, modify the dynamic fields instead of a whole WQE overwrite. This saves a memcpy. In current setting, after 2 WQ cycles (12 UMR posts), this will always be the case. No degradation sensed. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Keep single pre-initialized UMR WQE per RQTariq Toukan3-19/+13
All UMR WQEs of an RQ share many common fields. We use pre-initialized structures to save calculations in datapath. One field (xlt_offset) was the only reason we saved a pre-initialized copy per WQE index. Here we remove its initialization (move its calculation to datapath), and reduce the number of copies to one-per-RQ. A very small datapath calculation is added, it occurs once per a MPWQE (i.e. once every 256KB), but reduces memory consumption and gives better cache utilization. Performance testing: Tested packet rate, no degradation sensed. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Remove page_ref bulking in Striding RQTariq Toukan2-32/+16
When many packets reside on the same page, the bulking of page_ref modifications reduces the total number of atomic operations executed. Besides the necessary 2 operations on page alloc/free, we have the following extra ops per page: - one on WQE allocation (bump refcnt to maximum possible), - zero ops for SKBs, - one on WQE free, a constant of two operations in total, no matter how many packets/SKBs actually populate the page. Without this bulking, we have: - no ops on WQE allocation or free, - one op per SKB, Comparing the two methods when PAGE_SIZE is 4K: - As mentioned above, bulking method always executes 2 operations, not more, but not less. - In the default MTU configuration (1500, stride size is 2K), the non-bulking method execute 2 ops as well. - For larger MTUs with stride size of 4K, non-bulking method executes only a single op. - For XDP (stride size of 4K, no SKBs), non-bulking method executes no ops at all! Hence, to optimize the flows with linear SKB and XDP over Striding RQ, we here remove the page_ref bulking method. Performance testing: ConnectX-5, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Single core packet rate (64 bytes). Early drop in TC: no degradation. XDP_DROP: before: 14,270,188 pps after: 20,503,603 pps, 43% improvement. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Support XDP over Striding RQTariq Toukan3-8/+26
Add XDP support over Striding RQ. Now that linear SKB is supported over Striding RQ, we can support XDP by setting stride size to PAGE_SIZE and headroom to XDP_PACKET_HEADROOM. Upon a MPWQE free, do not release pages that are being XDP xmit, they will be released upon completions. Striding RQ is capable of a higher packet-rate than conventional RQ. A performance gain is expected for all cases that had a HW packet-rate bottleneck. This is the case whenever using many flows that distribute to many cores. Performance testing: ConnectX-5, 24 rings, default MTU. CQE compression ON (to reduce completions BW in PCI). XDP_DROP packet rate: -------------------------------------------------- | pkt size | XDP rate | 100GbE linerate | pct% | -------------------------------------------------- | 64byte | 126.2 Mpps | 148.0 Mpps | 85% | | 128byte | 80.0 Mpps | 84.8 Mpps | 94% | | 256byte | 42.7 Mpps | 42.7 Mpps | 100% | | 512byte | 23.4 Mpps | 23.4 Mpps | 100% | -------------------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Refactor RQ XDP_TX indicationTariq Toukan2-6/+8
Make the xdp_xmit indication available for Striding RQ by taking it out of the type-specific union. This refactor is a preparation for a downstream patch that adds XDP support over Striding RQ. In addition, use a bitmap instead of a boolean for possible future flags. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Use linear SKB in Striding RQTariq Toukan3-42/+146
Current Striding RQ HW feature utilizes the RX buffers so that there is no wasted room between the strides. This maximises the memory utilization. This prevents the use of build_skb() (which requires headroom and tailroom), and demands to memcpy the packets headers into the skb linear part. In this patch, whenever a set of conditions holds, we apply an RQ configuration that allows combining the use of linear SKB on top of a Striding RQ. To use build_skb() with Striding RQ, the following must hold: 1. packet does not cross a page boundary. 2. there is enough headroom and tailroom surrounding the packet. We can satisfy 1 and 2 by configuring: stride size = MTU + headroom + tailoom. This is possible only when: a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE. b. HW LRO is turned off. Using linear SKB has many advantages: - Saves a memcpy of the headers. - No page-boundary checks in datapath. - No filler CQEs. - Significantly smaller CQ. - SKB data continuously resides in linear part, and not split to small amount (linear part) and large amount (fragment). This saves datapath cycles in driver and improves utilization of SKB fragments in GRO. - The fragments of a resulting GRO SKB follow the IP forwarding assumption of equal-size fragments. Some implementation details: HW writes the packets to the beginning of a stride, i.e. does not keep headroom. To overcome this we make sure we can extend backwards and use the last bytes of stride i-1. Extra care is needed for stride 0 as it has no preceding stride. We make sure headroom bytes are available by shifting the buffer pointer passed to HW by headroom bytes. This configuration now becomes default, whenever capable. Of course, this implies turning LRO off. Performance testing: ConnectX-5, single core, single RX ring, default MTU. UDP packet rate, early drop in TC layer: -------------------------------------------- | pkt size | before | after | ratio | -------------------------------------------- | 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x | | 500byte | 5.23 Mpps | 5.97 Mpps | 1.14x | | 64byte | 5.94 Mpps | 5.96 Mpps | 1.00x | -------------------------------------------- TCP streams: ~20% gain Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Use inline MTTs in UMR WQEsTariq Toukan3-88/+38
When modifying the page mapping of a HW memory region (via a UMR post), post the new values inlined in WQE, instead of using a data pointer. This is a micro-optimization, inline UMR WQEs of different rings scale better in HW. In addition, this obsoletes a few control flows and helps delete ~50 LOC. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>