aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2018-04-25ixgbe: Avoid performing unnecessary resets for macvlan offloadAlexander Duyck2-62/+135
The original implementation for macvlan offload has us performing a full port reset every time we added a new macvlan. This shouldn't be necessary and can be avoided with a few behavior changes. This patches updates the logic for the queues so that we have essentially 3 possible configurations for macvlan offload. They consist of 15 macvlans with 4 queues per macvlan, 31 macvlans with 2 queues per macvlan, and 63 macvlans with 1 queue per macvlan. As macvlans are added you will encounter up to 3 total resets if you add all the way up to 63, and after that the device will stay in the mode supporting up to 63 macvlans until the L2FW flag is cleared. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25ixgbe: Drop real_adapter from l2 fwd acceleration structureAlexander Duyck2-13/+16
This patch drops the real_adapter member from the fwd_adapter structure. The general idea behind the change is that the real_adapter is carrying unnecessary data since we could always just grab the adapter structure from netdev_priv(macvlan->lowerdev) if we really needed to get at it. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25ixgbe/fm10k: Only support macvlan offload for types that support destination ↵Alexander Duyck2-0/+15
filtering Both the ixgbe and fm10k drivers support destination filtering. Instead of adding a ton of complexity to support either source or passthru mode we can instead just avoid offloading them for now. Doing this we avoid leaking packets into interfaces that aren't meant to receive them. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicastAlexander Duyck2-8/+6
Drop dead code now that we shouldn't be receiving broadcast or multicast frames on the queues associated to the macvlan netdev. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25macvlan: Use software path for offloaded local, broadcast, and multicast trafficAlexander Duyck2-31/+12
This change makes it so that we use a software path for packets that are going to be locally switched between two macvlan interfaces on the same device. In addition we resort to software replication of broadcast and multicast packets instead of offloading that to hardware. The general idea is that using the device for east/west traffic local to the system is extremely inefficient. We can only support up to whatever the PCIe limit is for any given device so this caps us at somewhere around 20G for devices supported by ixgbe. This is compounded even further when you take broadcast and multicast into account as a single 10G port can come to a crawl as a packet is replicated up to 60+ times in some cases. In order to get away from that I am implementing changes so that we handle broadcast/multicast replication and east/west local traffic all in software. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25macvlan: Rename fwd_priv to accel_priv and add accessor functionAlexander Duyck1-6/+4
This change renames the fwd_priv member to accel_priv as this more accurately reflects the actual purpose of this value. In addition I am adding an accessor which will allow us to further abstract this in the future if needed. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-25ixgbe: Drop support for macvlan specific unicast listsAlexander Duyck1-31/+0
Drop the code for handling macvlan specific unicast lists. It isn't needed since we don't take any efforts to maintain it when we bring the interface up and it takes the slow path anyway. Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-16/+37
2018-04-24ice: Fix insufficient memory issue in ice_aq_manage_mac_readMd Fahad Iqbal Polash1-5/+17
For the MAC read operation, the device can return up to two (LAN and WoL) MAC addresses. Without access to adequate memory, the device will return an error. Fixed this by allocating the right amount of memory. Also, logic to detect and copy the LAN MAC address into the port_info structure has been added. Note that the WoL MAC address is ignored currently as the WoL feature isn't supported yet. Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Md Fahad Iqbal Polash <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24ice: Do not check INTEVENT bit for OICR interruptsBen Shelton2-6/+0
According to the hardware spec, checking the INTEVENT bit isn't a reliable way to detect if an OICR interrupt has occurred. This is because this bit can be cleared by the hardware/firmware before the interrupt service routine has run. So instead, just check for OICR events every time. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Ben Shelton <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24ice: Fix incorrect comment for action typeAnirudh Venkataramanan1-1/+1
Action type 5 defines large action generic values. Fix comment to reflect that better. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24ice: Fix initialization for num_nodes_addedAnirudh Venkataramanan1-2/+2
ice_sched_add_nodes_to_layer is used recursively, and so we start with num_nodes_added being 0. This way, in case of an error or if num_nodes is NULL, the function just returns 0 to indicate that no nodes were added. Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24igb: Fix the transmission mode of queue 0 for Qav modeVinicius Costa Gomes1-1/+16
When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-24ixgbevf: ensure xdp_ring resources are free'd on error exitColin Ian King1-1/+1
The current error handling for failed resource setup for xdp_ring data is a break out of the loop and returning 0 indicated everything was OK, when in fact it is not. Fix this by exiting via the error exit label err_setup_tx that will clean up the resources correctly and return and error status. Detected by CoverityScan, CID#1466879 ("Logically dead code") Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Signed-off-by: Colin Ian King <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-17xdp: transition into using xdp_frame for ndo_xdp_xmitJesper Dangaard Brouer3-25/+31
Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-17xdp: transition into using xdp_frame for return APIJesper Dangaard Brouer3-12/+14
Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-17xdp: rhashtable with allocator ID to pointer mappingJesper Dangaard Brouer1-1/+8
Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-17i40e: convert to use generic xdp_frame and xdp_return_frame APIJesper Dangaard Brouer2-5/+16
Also convert driver i40e, which very recently got XDP_REDIRECT support in commit d9314c474d4f ("i40e: add support for XDP_REDIRECT"). V7: This patch got added in V7 of this patchset. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-17ixgbe: use xdp_return_frame APIJesper Dangaard Brouer2-2/+5
Extend struct ixgbe_tx_buffer to store the xdp_mem_info. Notice that this could be optimized further by putting this into a union in the struct ixgbe_tx_buffer, but this patchset works towards removing this again. Thus, this is not done. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-3/+5
Pull networking fixes from David Miller: 1) The sockmap code has to free socket memory on close if there is corked data, from John Fastabend. 2) Tunnel names coming from userspace need to be length validated. From Eric Dumazet. 3) arp_filter() has to take VRFs properly into account, from Miguel Fadon Perlines. 4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti. 5) Missing idr_remove() in u32_delete_key(), from Cong Wang. 6) More syzbot stuff. Several use of uninitialized value fixes all over, from Eric Dumazet. 7) Do not leak kernel memory to userspace in sctp, also from Eric Dumazet. 8) Discard frames from unused ports in DSA, from Andrew Lunn. 9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas Falcon. 10) Do not access dp83640 PHY registers prematurely after reset, from Esben Haabendal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vhost-net: set packet weight of tx polling to 2 * vq size net: thunderx: rework mac addresses list to u64 array inetpeer: fix uninit-value in inet_getpeer dp83640: Ensure against premature access to PHY registers after reset devlink: convert occ_get op to separate registration ARM: dts: ls1021a: Specify TBIPA register address net/fsl_pq_mdio: Allow explicit speficition of TBIPA address ibmvnic: Do not reset CRQ for Mobility driver resets ibmvnic: Fix failover case for non-redundant configuration ibmvnic: Fix reset scheduler error handling ibmvnic: Zero used TX descriptor counter on reset ibmvnic: Fix DMA mapping mistakes tipc: use the right skb in tipc_sk_fill_sock_diag() sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6 net: dsa: Discard frames from unused ports sctp: do not leak kernel memory to user space soreuseport: initialise timewait reuseport field ipv4: fix uninit-value in ip_route_output_key_hash_rcu() dccp: initialize ireq->ir_mark net: fix uninit-value in __hw_addr_add_ex() ...
2018-04-06Merge tag 'pci-v4.17-changes' of ↵Linus Torvalds1-86/+1
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - move pci_uevent_ers() out of pci.h (Michael Ellerman) - skip ASPM common clock warning if BIOS already configured it (Sinan Kaya) - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva) - remove last user of pci_get_bus_and_slot() and the function itself (Sinan Kaya) - add decoding for 16 GT/s link speed (Jay Fang) - add interfaces to get max link speed and width (Tal Gilboa) - add pcie_bandwidth_capable() to compute max supported link bandwidth (Tal Gilboa) - add pcie_bandwidth_available() to compute bandwidth available to device (Tal Gilboa) - add pcie_print_link_status() to log link speed and whether it's limited (Tal Gilboa) - use PCI core interfaces to report when device performance may be limited by its slot instead of doing it in each driver (Tal Gilboa) - fix possible cpqphp NULL pointer dereference (Shawn Lin) - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI hotplug (Mika Westerberg) - add support for PCI I/O port space that's neither directly accessible via CPU in/out instructions nor directly mapped into CPU physical memory space. This is fairly intrusive and includes minor changes to interfaces used for I/O space on most platforms (Zhichang Yuan, John Garry) - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan, John Garry) - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas) - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr() (Shawn Lin) - report quirk timings with dev_info (Bjorn Helgaas) - report quirks that take longer than 10ms (Bjorn Helgaas) - add and use Altera Vendor ID (Johannes Thumshirn) - tidy Makefiles and comments (Bjorn Helgaas) - don't set up INTx if MSI or MSI-X is enabled to align cris, frv, ia64, and mn10300 with x86 (Bjorn Helgaas) - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick Lawler) - merge pcieport_if.h into portdrv.h (Bjorn Helgaas) - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn Helgaas) - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas) - remove portdrv link order dependency (Bjorn Helgaas) - remove support for unused VC portdrv service (Bjorn Helgaas) - simplify portdrv feature permission checking (Bjorn Helgaas) - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn Helgaas) - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas) - use cached AER capability offset (Frederick Lawler) - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg) - rename pcie-dpc.c to dpc.c (Bjorn Helgaas) - use generic pci_mmap_resource_range() instead of powerpc and xtensa arch-specific versions (David Woodhouse) - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu) - remove System and Video ROM reservations on sparc (Bjorn Helgaas) - probe for device reset support during enumeration instead of runtime (Bjorn Helgaas) - add ACS quirk for Ampere (née APM) root ports (Feng Kan) - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas Vincent-Cross) - protect device restore with device lock (Sinan Kaya) - handle failure of FLR gracefully (Sinan Kaya) - handle CRS (config retry status) after device resets (Sinan Kaya) - skip various config reads for SR-IOV VFs as an optimization (KarimAllah Ahmed) - consolidate VPD code in vpd.c (Bjorn Helgaas) - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann) - add DT support for R-Car r8a7743 (Biju Das) - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host bridge driver that causes a general protection fault (Dexuan Cui) - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV (Dexuan Cui) - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI (Dexuan Cui) - make several structures static (Fengguang Wu) - increase number of MSI IRQs supported by Synopsys DesignWare bridges from 32 to 256 (Gustavo Pimentel) - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ API from DesignWare drivers (Gustavo Pimentel) - add Tegra power management support (Manikanta Maddireddy) - add Tegra loadable module support (Manikanta Maddireddy) - handle 64-bit BARs correctly in endpoint support (Niklas Cassel) - support optional regulator for HiSilicon STB (Shawn Guo) - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla) - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla) * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits) MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver HISI LPC: Add ACPI support ACPI / scan: Do not enumerate Indirect IO host children ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings of: Add missing I/O range exception for indirect-IO devices PCI: Apply the new generic I/O management on PCI IO hosts PCI: Add fwnode handler as input param of pci_register_io_range() PCI: Remove __weak tag from pci_register_io_range() MAINTAINERS: Add missing /drivers/pci/cadence directory entry fm10k: Report PCIe link properties with pcie_print_link_status() net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth net/mlx5: Report PCIe link properties with pcie_print_link_status() net/mlx4_core: Report PCIe link properties with pcie_print_link_status() PCI: Add pcie_print_link_status() to log link speed and whether it's limited PCI: Add pcie_bandwidth_available() to compute bandwidth available to device misc: pci_endpoint_test: Handle 64-bit BARs properly PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar ...
2018-04-06ice: Bug fixes in ethtool codeAnirudh Venkataramanan1-2/+2
1) Return correct size from ice_get_regs_len. 2) Fix incorrect use of ARRAY_SIZE in ice_get_regs. Fixes: fcea6f3da546 (ice: Add stats and ethtool support) Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-06ice: Fix error return code in ice_init_hw()Wei Yongjun1-1/+3
Fix to return error code ICE_ERR_NO_MEMORY from the alloc error handling case instead of 0, as done elsewhere in this function. Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Wei Yongjun <[email protected]> Acked-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-04-03fm10k: Report PCIe link properties with pcie_print_link_status()Bjorn Helgaas1-86/+1
Previously the driver used pcie_get_minimum_link() to warn when the NIC is in a slot that can't supply as much bandwidth as the NIC could use. pcie_get_minimum_link() can be misleading because it finds the slowest link and the narrowest link (which may be different links) without considering the total bandwidth of each link. For a path with a 16 GT/s x1 link and a 2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of bandwidth, not the true available bandwidth of about 1969 MB/s for a 16 GT/s x1 link. Use pcie_print_link_status() to report PCIe link speed and possible limitations instead of implementing this in the driver itself. This finds the slowest link in the path to the device by computing the total bandwidth of each link and compares that with the capabilities of the device. Note that the driver previously used dev_warn() to suggest using a different slot, but pcie_print_link_status() uses dev_info() because if the platform has no faster slot available, the user can't do anything about the warning and may not want to be bothered with it. Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Jacob Keller <[email protected]>
2018-03-27Merge branch '40GbE' of ↵David S. Miller6-122/+181
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-03-26 This series contains updates to i40e only. Jake provides several patches which remove the need for cmpxchg64(), starting with moving I40E_FLAG_[UDP]_FILTER_SYNC from pf->flags to pf->state since they are modified during run time possibly when the RTNL lock is not held so they should be a state bits and not flags. Moved additional "flags" which should be state fields, into pf->state. Ensure we hold the RTNL lock for the entire sequence of preparing for reset and when resuming, which will protect the flags related to interrupt scheme under RTNL lock so that their modification is properly threaded. Finally, cleanup the use of cmpxchg64() since it is no longer needed. Cleaned up the holes in the feature flags created my moving some flags to the state field. Björn Töpel adds XDP_REDIRECT support as well as tweaking the page counting for XDP_REDIRECT so that it will function properly. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-03-26Merge branch '100GbE' of ↵David S. Miller24-0/+18823
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2018-03-26 This patch series adds the ice driver, which will support the Intel(R) E800 Series of network devices. This is the first phase in the release of this driver where we implement basic transmit and receive. The idea behind the multi-phase release is to aid in code review as well as testing. Subsequent phases will implement advanced features (like SR-IOV, tunnelling, flow director, QoS, etc.) that build upon the previous phase(s). Each phase will be submitted as a patch series. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-03-26i40e: add support for XDP_REDIRECTBjörn Töpel3-10/+68
The driver now acts upon the XDP_REDIRECT return action. Two new ndos are implemented, ndo_xdp_xmit and ndo_xdp_flush. XDP_REDIRECT action enables XDP program to redirect frames to other netdevs. Signed-off-by: Björn Töpel <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: tweak page counting for XDP_REDIRECTBjörn Töpel1-5/+4
This commit tweaks the page counting for XDP_REDIRECT to function properly. XDP_REDIRECT support will be added in a future commit. The current page counting scheme assumes that the reference count cannot decrease until the received frame is sent to the upper layers of the networking stack. This assumption does not hold for the XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have its reference count decreased via the xdp_do_redirect call. To work around that, we now start off by a large page count and then don't allow a refcount less than two. Signed-off-by: Björn Töpel <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: re-number feature flags to remove gapsJacob Keller1-31/+26
Remove the gaps created by the recent refactor of various feature flags that have moved to the state field. Use only a u32 now that we have fewer than 32 flags in the field. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: stop using cmpxchg flow in i40e_set_priv_flags()Jacob Keller1-14/+5
Now that the only places which modify flags are either (a) during initialization prior to creating a netdevice, or (b) while holding the rtnl lock, we no longer need the cmpxchg64 call in i40e_set_priv_flags. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: hold the RTNL lock while changing interrupt schemesJacob Keller1-2/+17
When we suspend and resume, we need to clear and re-enable the interrupt scheme. This was previously not done while holding the RTNL lock, which could be problematic, because we are actually destroying and re-creating queues. Hold the RTNL lock for the entire sequence of preparing for reset, and when resuming. This additionally protects the flags related to interrupt scheme under RTNL lock so that their modification is properly threaded. This is part of a larger effort to remove the need for cmpxchg64 in i40e_set_priv_flags(). Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: move client flags into state bitsJacob Keller3-19/+17
The iWarp client flags are all potentially changed when the RTNL lock is not held, so they should not be part of the pf->flags variable. Instead, move them into the state field so that we can use atomic bit operations. This is part of a larger effort to remove cmpxchg64 in i40e_set_priv_flags() Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: move I40E_FLAG_TEMP_LINK_POLLING to state fieldJacob Keller2-5/+5
This flag is modified outside of the RTNL lock and thus should not be part of the pf->flags variable. Use a state bit instead, so that we can use atomic bit operations. This is part of a larger effort to remove cmpxchg64 in i40e_set_priv_flags() Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: move AUTO_DISABLED flags into the state fieldJacob Keller4-26/+27
The two Flow Directory auto disable flags are used at run time to mark when the flow director features needed to be disabled. Thus the flags could change even when the RTNL lock is not held. They also have some code constructions which really should be test_and_set or test_and_clear using atomic bit operations. Create new state fields to mark this, and stop including them in pf->flags. This is part of a larger effort to remove the need for cmpxchg64 in i40e_set_priv_flags(). Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: move I40E_FLAG_UDP_FILTER_SYNC to the state fieldJacob Keller2-7/+6
This flag is modified during run time, possibly even when the RTNL lock is not held. Additionally it has a few places which should be using test_and_set or test_and_clear atomic bit operations. Create a new state bit __I40E_UDP_SYNC_PENDING and use it instead of the ole I40E_FLAG_UDP_FILTER_SYNC flag. This is part of a larger effort to remove the need for using cmpxchg64 in i40e_set_priv_flags. Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26i40e: move I40E_FLAG_FILTER_SYNC to a state bitJacob Keller2-8/+11
The I40E_FLAG_FILTER_SYNC flag is modified during run time possibly when the RTNL lock is not held. Thus, it should not be part of pf->flags, and instead should be using atomic bit operations in the pf->state field. Create a __I40E_MACVLAN_SYNC_PENDING state bit, and use it instead of the old I40E_FLAG_FILTER_SYNC flag. This is part of a larger effort to remove the need for cmpxchg64 in i40e_set_priv_flags(). Signed-off-by: Jacob Keller <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Implement filter sync, NDO operations and bump versionAnirudh Venkataramanan9-1/+728
This patch implements multiple pieces of functionality: 1. Added ice_vsi_sync_filters, which is called through the service task to push filter updates to the hardware. 2. Add support to enable/disable promiscuous mode on an interface. Enabling/disabling promiscuous mode on an interface results in addition/removal of a promisc filter rule through ice_vsi_sync_filters. 3. Implement handlers for ndo_set_mac_address, ndo_change_mtu, ndo_poll_controller and ndo_set_rx_mode. This patch also marks the end of the driver addition by bumping up the driver version. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Support link events, reset and rebuildAnirudh Venkataramanan7-6/+681
Link events are posted to a PF's admin receive queue (ARQ). This patch adds the ability to detect and process link events. This patch also adds the ability to process resets. The driver can process the following resets: 1) EMP Reset (EMPR) 2) Global Reset (GLOBR) 3) Core Reset (CORER) 4) Physical Function Reset (PFR) EMPR is the largest level of reset that the driver can handle. An EMPR resets the manageability block and also the data path, including PHY and link for all the PFs. The affected PFs are notified of this event through a miscellaneous interrupt. GLOBR is a subset of EMPR. It does everything EMPR does except that it doesn't reset the manageability block. CORER is a subset of GLOBR. It does everything GLOBR does but doesn't reset PHY and link. PFR is a subset of CORER and affects only the given physical function. In other words, PFR can be thought of as a CORER for a single PF. Since only the issuing PF is affected, a PFR doesn't result in the miscellaneous interrupt being triggered. All the resets have the following in common: 1) Tx/Rx is halted and all queues are stopped. 2) All the VSIs and filters programmed for the PF are lost and have to be reprogrammed. 3) Control queue interfaces are reset and have to be reprogrammed. In the rebuild flow, control queues are reinitialized, VSIs are reallocated and filters are restored. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Update Tx scheduler tree for VSI multi-Tx queue supportAnirudh Venkataramanan8-3/+1006
This patch adds the ability for a VSI to use multiple Tx queues. More specifically, the patch 1) Provides the ability to update the Tx scheduler tree in the firmware. The driver can configure the Tx scheduler tree by adding/removing multiple Tx queues per TC per VSI. 2) Allows a VSI to reconfigure its Tx queues during runtime. 3) Synchronizes the Tx scheduler update operations using locks. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Add stats and ethtool supportAnirudh Venkataramanan9-5/+1828
This patch implements a watchdog task to get packet statistics from the device. This patch also adds support for the following ethtool operations: ethtool devname ethtool -s devname [msglvl N] [msglevel type on|off] ethtool -g|--show-ring devname ethtool -G|--set-ring devname [rx N] [tx N] ethtool -i|--driver devname ethtool -d|--register-dump devname [raw on|off] [hex on|off] [file name] ethtool -k|--show-features|--show-offload devname ethtool -K|--features|--offload devname feature on|off ethtool -P|--show-permaddr devname ethtool -S|--statistics devname ethtool -a|--show-pause devname ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off] ethtool -r|--negotiate devname CC: Andrew Lunn <[email protected]> CC: Jakub Kicinski <[email protected]> CC: Stephen Hemminger <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Add support for VLANs and offloadsAnirudh Venkataramanan10-16/+1631
This patch adds support for VLANs. When a VLAN is created a switch filter is added to direct the VLAN traffic to the corresponding VSI. When a VLAN is deleted, the filter is deleted as well. This patch also adds support for the following hardware offloads. 1) VLAN tag insertion/stripping 2) Receive Side Scaling (RSS) 3) Tx checksum and TCP segmentation 4) Rx checksum Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Implement transmit and NAPI supportAnirudh Venkataramanan5-2/+1171
This patch implements ice_start_xmit (the handler for ndo_start_xmit) and related functions. ice_start_xmit ultimately calls ice_tx_map, where the Tx descriptor is built and posted to the hardware by bumping the ring tail. This patch also implements ice_napi_poll, which is invoked when there's an interrupt on the VSI's queues. The interrupt can be due to either a completed Tx or an Rx event. In case of a completed Tx/Rx event, resources are reclaimed. Additionally, in case of an Rx event, the skb is fetched and passed up to the network stack. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Configure VSIs for Tx/RxAnirudh Venkataramanan14-6/+2729
This patch configures the VSIs to be able to send and receive packets by doing the following: 1) Initialize flexible parser to extract and include certain fields in the Rx descriptor. 2) Add Tx queues by programming the Tx queue context (implemented in ice_vsi_cfg_txqs). Note that adding the queues also enables (starts) the queues. 3) Add Rx queues by programming Rx queue context (implemented in ice_vsi_cfg_rxqs). Note that this only adds queues but doesn't start them. The rings will be started by calling ice_vsi_start_rx_rings on interface up. 4) Configure interrupts for VSI queues. 5) Implement ice_open and ice_stop. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Add support for switch filter programmingAnirudh Venkataramanan7-2/+1935
A VSI needs traffic directed towards it. This is done by programming filter rules on the switch (embedded vSwitch) element in the hardware, which connects the VSI to the ingress/egress port. This patch introduces data structures and functions necessary to add remove or update switch rules on the switch element. This is a pretty low level function that is generic enough to add a whole range of filters. This patch also introduces two top level functions ice_add_mac and ice_remove mac which through a series of intermediate helper functions eventually call ice_aq_sw_rules to add/delete simple MAC based filters. It's worth noting that one invocation of ice_add_mac/ice_remove_mac is capable of adding/deleting multiple MAC filters. Also worth noting is the fact that the driver maintains a list of currently active filters, so every filter addition/removal causes an update to this list. This is done for a couple of reasons: 1) If two VSIs try to add the same filters, we need to detect it and do things a little differently (i.e. use VSI lists, described below) as the same filter can't be added more than once. 2) In the event of a hardware reset we can simply walk through this list and restore the filters. VSI Lists: In a multi-VSI situation, it's possible that multiple VSIs want to add the same filter rule. For example, two VSIs that want to receive broadcast traffic would both add a filter for destination MAC ff:ff:ff:ff:ff:ff. This can become cumbersome to maintain and so this is handled using a VSI list. A VSI list is resource that can be allocated in the hardware using the ice_aq_alloc_free_res admin queue command. Simply put, a VSI list can be thought of as a subscription list containing a set of VSIs to which the packet should be forwarded, should the filter match. For example, if VSI-0 has already added a broadcast filter, and VSI-1 wants to do the same thing, the filter creation flow will detect this, allocate a VSI list and update the switch rule so that broadcast traffic will now be forwarded to the VSI list which contains VSI-0 and VSI-1. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Add support for VSI allocation and deallocationAnirudh Venkataramanan7-0/+1548
This patch introduces data structures and functions to alloc/free VSIs. The driver represents a VSI using the ice_vsi structure. Some noteworthy points about VSI allocation: 1) A VSI is allocated in the firmware using the "add VSI" admin queue command (implemented as ice_aq_add_vsi). The firmware returns an identifier for the allocated VSI. The VSI context is used to program certain aspects (loopback, queue map, etc.) of the VSI's configuration. 2) A VSI is deleted using the "free VSI" admin queue command (implemented as ice_aq_free_vsi). 3) The driver represents a VSI using struct ice_vsi. This is allocated and initialized as part of the ice_vsi_alloc flow, and deallocated as part of the ice_vsi_delete flow. 4) Once the VSI is created, a netdev is allocated and associated with it. The VSI's ring and vector related data structures are also allocated and initialized. 5) A VSI's queues can either be contiguous or scattered. To do this, the driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with the firmware's VSI queue allocation imap. If the VSI can't get a contiguous queue allocation, it will fallback to scatter. This is implemented in ice_vsi_get_qs which is called as part of the VSI setup flow. In the release flow, the VSI's queues are released and the bitmap is updated to reflect this by ice_vsi_put_qs. CC: Shannon Nelson <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Acked-by: Shannon Nelson <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Initialize PF and setup miscellaneous interruptAnirudh Venkataramanan10-1/+1025
This patch continues the initialization flow as follows: 1) Allocate and initialize necessary fields (like vsi, num_alloc_vsi, irq_tracker, etc) in the ice_pf instance. 2) Setup the miscellaneous interrupt handler. This also known as the "other interrupt causes" (OIC) handler and is used to handle non hotpath interrupts (like control queue events, link events, exceptions, etc. 3) Implement a background task to process admin queue receive (ARQ) events received by the driver. CC: Shannon Nelson <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Acked-by: Shannon Nelson <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Get MAC/PHY/link info and scheduler topologyAnirudh Venkataramanan8-0/+929
This patch adds code to continue the initialization flow as follows: 1) Get PHY/link information and store it 2) Get default scheduler tree topology and store it 3) Get the MAC address associated with the port and store it Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Get switch config, scheduler config and device capabilitiesAnirudh Venkataramanan10-1/+1082
This patch adds to the initialization flow by getting switch configuration, scheduler configuration and device capabilities. Switch configuration: On boot, an L2 switch element is created in the firmware per physical function. Each physical function is also mapped to a port, to which its switch element is connected. In other words, this switch can be visualized as an embedded vSwitch that can connect a physical function's virtual station interfaces (VSIs) to the egress/ingress port. Egress/ingress filters will be eventually created and applied on this switch element. As part of the initialization flow, the driver gets configuration data from this switch element and stores it. Scheduler configuration: The Tx scheduler is a subsystem responsible for setting and enforcing QoS. As part of the initialization flow, the driver queries and stores the default scheduler configuration for the given physical function. Device capabilities: As part of initialization, the driver has to determine what the device is capable of (ex. max queues, VSIs, etc). This information is obtained from the firmware and stored by the driver. CC: Shannon Nelson <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Acked-by: Shannon Nelson <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Start hardware initializationAnirudh Venkataramanan12-1/+854
This patch implements multiple pieces of the initialization flow as follows: 1) A reset is issued to ensure a clean device state, followed by initialization of admin queue interface. 2) Once the admin queue interface is up, clear the PF config and transition the device to non-PXE mode. 3) Get the NVM configuration stored in the device's non-volatile memory (NVM) using ice_init_nvm. CC: Shannon Nelson <[email protected]> Signed-off-by: Anirudh Venkataramanan <[email protected]> Acked-by: Shannon Nelson <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2018-03-26ice: Add support for control queuesAnirudh Venkataramanan12-2/+1458
A control queue is a hardware interface which is used by the driver to interact with other subsystems (like firmware, PHY, etc.). It is implemented as a producer-consumer ring. More specifically, an "admin queue" is a type of control queue used to interact with the firmware. This patch introduces data structures and functions to initialize and teardown control/admin queues. Once the admin queue is initialized, the driver uses it to get the firmware version. Signed-off-by: Anirudh Venkataramanan <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>