aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2020-12-14octeontx2-af: Add devlink suppoort to af driverGeorge Cherian6-2/+98
Add devlink support to AF driver. Basic devlink support is added. Currently info_get is the only supported devlink ops. devlink ouptput looks like this # devlink dev pci/0002:01:00.0 # devlink dev info pci/0002:01:00.0: driver octeontx2-af # Signed-off-by: Sunil Kovvuri Goutham <[email protected]> Signed-off-by: Jerin Jacob <[email protected]> Signed-off-by: George Cherian <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-14Merge branch 'linus' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add speed testing on 1420-byte blocks for networking Algorithms: - Improve performance of chacha on ARM for network packets - Improve performance of aegis128 on ARM for network packets Drivers: - Add support for Keem Bay OCS AES/SM4 - Add support for QAT 4xxx devices - Enable crypto-engine retry mechanism in caam - Enable support for crypto engine on sdm845 in qce - Add HiSilicon PRNG driver support" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (161 commits) crypto: qat - add capability detection logic in qat_4xxx crypto: qat - add AES-XTS support for QAT GEN4 devices crypto: qat - add AES-CTR support for QAT GEN4 devices crypto: atmel-i2c - select CONFIG_BITREVERSE crypto: hisilicon/trng - replace atomic_add_return() crypto: keembay - Add support for Keem Bay OCS AES/SM4 dt-bindings: Add Keem Bay OCS AES bindings crypto: aegis128 - avoid spurious references crypto_aegis128_update_simd crypto: seed - remove trailing semicolon in macro definition crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg crypto: cpt - Fix sparse warnings in cptpf hwrng: ks-sa - Add dependency on IOMEM and OF crypto: lib/blake2s - Move selftest prototype into header file crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata crypto: ecdh - avoid unaligned accesses in ecdh_set_secret() crypto: ccree - rework cache parameters handling crypto: cavium - Use dma_set_mask_and_coherent to simplify code crypto: marvell/octeontx - Use dma_set_mask_and_coherent to simplify code ...
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski34-111/+235
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-9/+0
Alexei Starovoitov says: ==================== pull-request: bpf 2020-12-10 The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 12 day(s) which contain a total of 21 files changed, 163 insertions(+), 88 deletions(-). The main changes are: 1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei. 2) Fix ring_buffer__poll() return value, from Andrii. 3) Fix race in lwt_bpf, from Cong. 4) Fix test_offload, from Toke. 5) Various xsk fixes. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song ==================== Signed-off-by: David S. Miller <[email protected]>
2020-12-10Revert "macb: support the two tx descriptors on at91rm9200"Willy Tarreau2-40/+8
This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb. The code was developed and tested on an MSC313E SoC, which seems to be half-way between the AT91RM9200 and the AT91SAM9260 in that it supports both the 2-descriptors mode and a Tx ring. It turns out that after the code was merged I could notice that the controller would sometimes lock up, and only when dealing with sustained bidirectional transfers, in which case it would report a Tx overrun condition right after having reported being ready, and will stop sending even after the status is cleared (a down/up cycle fixes it though). After adding lots of traces I couldn't spot a sequence pattern allowing to predict that this situation would happen. The chip comes with no documentation and other bits are often reported with no conclusive pattern either. It is possible that my change is wrong just like it is possible that the controller on the chip is bogus or at least unpredictable based on existing docs from other chips. I do not have an RM9200 at hand to test at the moment and a few tests run on a more recent 9G20 indicate that this code path cannot be used there to test the code on a 3rd platform. Since the MSC313E works fine in the single-descriptor mode, and that people using the old RM9200 very likely favor stability over performance, better revert this patch until we can test it on the original platform this part of the driver was written for. Note that the reverted patch was actually tested on MSC313E. Cc: Nicolas Ferre <[email protected]> Cc: Claudiu Beznea <[email protected]> Cc: Daniel Palmer <[email protected]> Cc: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10net: qualcomm: rmnet: Update rmnet device MTU based on real deviceSubash Abhinov Kasiviswanathan4-3/+90
Packets sent by rmnet to the real device have variable MAP header lengths based on the data format configured. This patch adds checks to ensure that the real device MTU is sufficient to transmit the MAP packet comprising of the MAP header and the IP packet. This check is enforced when rmnet devices are created and updated and during MTU updates of both the rmnet and real device. Additionally, rmnet devices now have a default MTU configured which accounts for the real device MTU and the headroom based on the data format. Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Tested-by: Loic Poulain <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10igc: Add new device IDSasha Neftin3-0/+3
Add new device ID for the next step of the silicon and reflect the I226_K part. Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10net: mediatek: simplify the return expression of mtk_gmac_sgmii_path_setup()Zheng Yongjun1-6/+2
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10net/mlx4: simplify the return expression of mlx4_init_srq_table()Zheng Yongjun1-7/+2
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-10net: stmmac: simplify the return tc_delete_knode()Zheng Yongjun1-8/+2
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: adjust rss tc mode configure commandGuojia Liao2-1/+5
For the max rss size of PF may be up to 512, the max queue number of single tc may be up to 512 too. For the total queue numbers may be up to 1280, so the queue offset of each tc may be more than 1024. So adjust the rss tc mode configuration command, including extend tc size field from 10 bits to 11 bits, and extend tc size field from 3 bits to 4 bits. Signed-off-by: Guojia Liao <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: adjust rss indirection table configure commandGuojia Liao2-8/+19
For the max rss size of PF may be up to 512, so adjust the command of configuring rss indirection table to support queue id larger than 255. The width of queue id is extended from 8 bits to 10 bits. The high 2 bits are stored in filed rss_qid_h when the queue id is larger than 255. Signed-off-by: Guojia Liao <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: add support for max 512 rss sizeGuojia Liao5-16/+39
Currently, the driver gets the max rss size from configuration file when initialization. Both the PF and VF share the same max rss size, and no more than 128. For DEVICE_VERSION_V3, the max rss size for PF can be up to 512, so there is a new field in configuration file to store it, the old filed is used for VF. To be compatible with boards using old configure file, the PF will use the old filed if the one is zero. For the rss size may be larger than 256, so the type of rss_indirection_tbl of struct hclge_vport should be changed to u16 as well. Signed-off-by: Guojia Liao <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: add support for hw tc offload of tc flowerJian Shen4-13/+397
Some new device supports forwarding packet to queues of specified TC when flow director rule hit. So add support to configure flow director rule by tc flower. To avoid rule conflict, add a new flow director mode HCLGE_FD_TC_FLOWER_ACTIVE, and only one mode can be active at the same time. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: add support for forwarding packet to queues of specified TC when ↵Jian Shen4-5/+27
flow director rule hit For some new device, it supports forwarding packet to queues of specified TC when flow director rule hit. So extend the command handle to support it. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: add support for tc mqprio offloadJian Shen5-53/+220
Currently, the HNS3 driver only supports offload for tc number and prio_tc. This patch adds support for other qopts, including queues count and offset for each tc. When enable tc mqprio offload, it's not allowed to change queue numbers by ethtool. For hardware limitation, the queue number of each tc should be power of 2. For the queues is not assigned to each tc by average, so it's should return vport->alloc_tqps for hclge_get_max_channels(). Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hns3: refine the struct hane3_tc_infoJian Shen8-67/+64
Currently, there are multiple members related to tc information in struct hnae3_knic_private_info. Merge them into a new struct hnae3_tc_info. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09nfp: silence set but not used warning with IPV6=nJakub Kicinski1-1/+1
Test robot reports: drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_rx_resync_req': drivers/net/ethernet/netronome/nfp/crypto/tls.c:477:18: warning: variable 'ipv6h' set but not used [-Wunused-but-set-variable] 477 | struct ipv6hdr *ipv6h; | ^~~~~ In file included from include/linux/compiler_types.h:65, from <command-line>: drivers/net/ethernet/netronome/nfp/crypto/tls.c: In function 'nfp_net_tls_add': include/linux/compiler_attributes.h:208:41: warning: statement will never be executed [-Wswitch-unreachable] 208 | # define fallthrough __attribute__((__fallthrough__)) | ^~~~~~~~~~~~~ drivers/net/ethernet/netronome/nfp/crypto/tls.c:299:3: note: in expansion of macro 'fallthrough' 299 | fallthrough; | ^~~~~~~~~~~ Use the IPv6 header in the switch, it doesn't matter which header we use to read the version field. Reported-by: kernel test robot <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: stmmac: allow stmmac to probe for C45 PHY devicesWong Vee Khee1-0/+3
Assign stmmac's mdio_bus probe capabilities to MDIOBUS_C22_C45. This extended the probing of C45 PHY devices on the MDIO bus. Signed-off-by: Wong Vee Khee <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net/mlx4: simplify the return expression of mlx4_init_cq_table()Zheng Yongjun1-7/+2
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09ibmvnic: fix rx buffer tracking and index management in replenish_rx_pool ↵Dwip N. Banerjee1-0/+2
partial success We observed that in the error case for batched send_subcrq_indirect() the driver does not account for the partial success case. This caused Linux to crash when free_map and pool index are inconsistent. Driver needs to update the rx pools "available" count when some batched sends worked but an error was encountered as part of the whole operation. Also track replenish_add_buff_failure for statistic purposes. Fixes: 4f0b6812e9b9a ("ibmvnic: Introduce batched RX buffer descriptor transmission") Signed-off-by: Dwip N. Banerjee <[email protected]> Reviewed-by: Dany Madden <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09Merge branch '100GbE' of ↵David S. Miller9-222/+117
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-12-09 This series contains updates to ice driver only. Bruce changes the allocation of ice_flow_prof_params from stack to heap to avoid excessive stack usage. Corrects a misleading comment and silences a sparse warning that is not a problem. Paul allows for HW initialization to continue if PHY abilities cannot be obtained. Jeb removes bypassing FW link override and reading Option ROM and netlist information for non-E810 devices as this is now available on other devices. Nick removes vlan_ena field as this information can be gathered by checking num_vlan. Jake combines format strings and debug prints to the same line. Simon adds a space to fix string concatenation. v4: Drop ACL patches. Change PHY abilities failure message from debug to warning. v3: Fix email address for DaveM and fix character in cover letter v2: Expand on commit message for patch 3 to show example usage/commands. Reduce number of defensive checks being done. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-12-09Merge branch '1GbE' of ↵David S. Miller5-34/+90
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-12-09 This series contains updates to igb, ixgbe, i40e, and ice drivers. Sven Auhagen fixes issues with igb XDP: return correct error value in XDP xmit back, increase header padding to include space for double VLAN, add an extack error when Rx buffer is too small for frame size, set metasize if it is set in xdp, change xdp_do_flush_map to xdp_do_flush, and update trans_start to avoid possible Tx timeout. Björn fixes an issue where an Rx buffer can be reused prematurely with XDP redirect for ixgbe, i40e, and ice drivers. The following are changes since commit 323a391a220c4a234cb1e678689d7f4c3b73f863: can: isotp: isotp_setsockopt(): block setsockopt on bound sockets and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE ==================== Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: marvell: octeontx2: simplify the otx2_ptp_adjfine()Zheng Yongjun1-6/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: stmmac: simplify the return dwmac5_rxp_disable()Zheng Yongjun1-5/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: hinic: simplify the return hinic_configure_max_qnum()Zheng Yongjun1-7/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: freescale: dpaa: simplify the return dpaa_eth_refill_bpools()Zheng Yongjun1-5/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Acked-by: Madalin Bucur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: cisco: enic: simplify the return vnic_cq_alloc()Zheng Yongjun1-7/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: emulex: benet: simplify the return expression of be_if_create()Zheng Yongjun1-7/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: marvell: octeontx2: simplify the return expression of rvu_npa_init()Zheng Yongjun1-6/+2
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: marvell: prestera: simplify the return expression of prestera_port_close()Zheng Yongjun1-6/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net/mlx4_en: Handle TX error CQEMoshe Shemesh3-7/+39
In case error CQE was found while polling TX CQ, the QP is in error state and all posted WQEs will generate error CQEs without any data transmitted. Fix it by reopening the channels, via same method used for TX timeout handling. In addition add some more info on error CQE and WQE for debug. Fixes: bd2f631d7c60 ("net/mlx4_en: Notify user when TX ring in error state") Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net/mlx4_en: Avoid scheduling restart task if it is already runningMoshe Shemesh2-8/+19
Add restarting state flag to avoid scheduling another restart task while such task is already running. Change task name from watchdog_task to restart_task to better fit the task role. Fixes: 1e338db56e5a ("mlx4_en: Fix a race at restart task") Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: freescale: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: ethernet: ti: convert comma to semicolonZheng Yongjun1-3/+3
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09hisilicon/hns3: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09hisilicon/hns: convert comma to semicolonZheng Yongjun1-6/+6
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: mlx5: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: micrel: convert comma to semicolonZheng Yongjun1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: add support for sama7g5 emac interfaceClaudiu Beznea1-0/+9
Add support for SAMA7G5 10/100Mbps interface. Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: add support for sama7g5 gem interfaceClaudiu Beznea1-0/+17
Add support for SAMA7G5 gigabit ethernet interface. Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: unprepare clocks in case of failureClaudiu Beznea1-6/+18
Unprepare clocks in case of any failure in fu540_c000_clk_init(). Fixes: c218ad559020 ("macb: Add support for SiFive FU540-C000") Signed-off-by: Claudiu Beznea <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: add function to disable all macb clocksClaudiu Beznea1-17/+21
Add function to disable all macb clocks. Signed-off-by: Claudiu Beznea <[email protected]> Suggested-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: add capability to not set the clock rateClaudiu Beznea2-9/+10
SAMA7G5's ethernet IPs TX clock could be provided by its generic clock or by the external clock provided by the PHY. The internal IP logic divides properly this clock depending on the link speed. The patch adds a new capability so that macb_set_tx_clock() to not be called for IPs having this capability (the clock rate, in case of generic clock, is set at the boot time via device tree and the driver only enables it). Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09net: macb: add userio bits as platform configurationClaudiu Beznea2-4/+34
This is necessary for SAMA7G5 as it uses different values for PHY interface and also introduces hdfctlen bit. Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-12-09e1000e: fix S0ix flow to allow S0i3.2 subset entryVitaly Lifshits1-4/+4
Changed a configuration in the flows to align with architecture requirements to achieve S0i3.2 substate. This helps both i219V and i219LM configurations. Also fixed a typo in the previous commit 632fbd5eb5b0 ("e1000e: fix S0ix flows for cable connected case"). Fixes: 632fbd5eb5b0 ("e1000e: fix S0ix flows for cable connected case"). Signed-off-by: Vitaly Lifshits <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-09ice: avoid premature Rx buffer reuseBjörn Töpel1-9/+22
The page recycle code, incorrectly, relied on that a page fragment could not be freed inside xdp_do_redirect(). This assumption leads to that page fragments that are used by the stack/XDP redirect can be reused and overwritten. To avoid this, store the page count prior invoking xdp_do_redirect(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reported-and-analyzed-by: Li RongQing <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Tested-by: George Kuruvinakunnel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-12-09ixgbe: avoid premature Rx buffer reuseBjörn Töpel1-7/+17
The page recycle code, incorrectly, relied on that a page fragment could not be freed inside xdp_do_redirect(). This assumption leads to that page fragments that are used by the stack/XDP redirect can be reused and overwritten. To avoid this, store the page count prior invoking xdp_do_redirect(). Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Reported-and-analyzed-by: Li RongQing <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-12-09i40e: avoid premature Rx buffer reuseBjörn Töpel1-7/+20
The page recycle code, incorrectly, relied on that a page fragment could not be freed inside xdp_do_redirect(). This assumption leads to that page fragments that are used by the stack/XDP redirect can be reused and overwritten. To avoid this, store the page count prior invoking xdp_do_redirect(). Longer explanation: Intel NICs have a recycle mechanism. The main idea is that a page is split into two parts. One part is owned by the driver, one part might be owned by someone else, such as the stack. t0: Page is allocated, and put on the Rx ring +--------------- used by NIC ->| upper buffer (rx_buffer) +--------------- | lower buffer +--------------- page count == USHRT_MAX rx_buffer->pagecnt_bias == USHRT_MAX t1: Buffer is received, and passed to the stack (e.g.) +--------------- | upper buff (skb) +--------------- used by NIC ->| lower buffer (rx_buffer) +--------------- page count == USHRT_MAX rx_buffer->pagecnt_bias == USHRT_MAX - 1 t2: Buffer is received, and redirected +--------------- | upper buff (skb) +--------------- used by NIC ->| lower buffer (rx_buffer) +--------------- Now, prior calling xdp_do_redirect(): page count == USHRT_MAX rx_buffer->pagecnt_bias == USHRT_MAX - 2 This means that buffer *cannot* be flipped/reused, because the skb is still using it. The problem arises when xdp_do_redirect() actually frees the segment. Then we get: page count == USHRT_MAX - 1 rx_buffer->pagecnt_bias == USHRT_MAX - 2 From a recycle perspective, the buffer can be flipped and reused, which means that the skb data area is passed to the Rx HW ring! To work around this, the page count is stored prior calling xdp_do_redirect(). Note that this is not optimal, since the NIC could actually reuse the "lower buffer" again. However, then we need to track whether XDP_REDIRECT consumed the buffer or not. Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT") Reported-and-analyzed-by: Li RongQing <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Tested-by: George Kuruvinakunnel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-12-09igb: avoid transmit queue timeout in xdp pathSven Auhagen1-0/+5
Since we share the transmit queue with the network stack, it is possible that we run into a transmit queue timeout. This will reset the queue. This happens under high load when XDP is using the transmit queue pretty much exclusively. netdev_start_xmit() sets the trans_start variable of the transmit queue to jiffies which is later utilized by dev_watchdog(), so to avoid timeout, let stack know that XDP xmit happened by bumping the trans_start within XDP Tx routines to jiffies. Fixes: 9cbc948b5a20 ("igb: add XDP support") Acked-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Sven Auhagen <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>