aboutsummaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst3
-rw-r--r--Documentation/networking/device_drivers/ethernet/amd/pds_core.rst139
-rw-r--r--Documentation/networking/device_drivers/ethernet/index.rst2
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/e100.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/e1000.rst9
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/e1000e.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/fm10k.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/i40e.rst11
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/iavf.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ice.rst9
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/igb.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/igbvf.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ixgb.rst468
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/intel/ixgbevf.rst7
-rw-r--r--Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst26
-rw-r--r--Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst68
-rw-r--r--Documentation/networking/devlink/ice.rst15
-rw-r--r--Documentation/networking/devlink/mlx5.rst12
-rw-r--r--Documentation/networking/driver.rst156
-rw-r--r--Documentation/networking/ethtool-netlink.rst51
-rw-r--r--Documentation/networking/index.rst2
-rw-r--r--Documentation/networking/ip-sysctl.rst9
-rw-r--r--Documentation/networking/napi.rst254
-rw-r--r--Documentation/networking/page_pool.rst1
-rw-r--r--Documentation/networking/rxrpc.rst17
-rw-r--r--Documentation/networking/tls-handshake.rst217
-rw-r--r--Documentation/networking/xdp-rx-metadata.rst7
28 files changed, 881 insertions, 651 deletions
diff --git a/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst b/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
index 1a4fc6607582..1661d13174d5 100644
--- a/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
+++ b/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
@@ -229,8 +229,7 @@ frames for a while. This has a potential to avoid the costly round of
enabling interrupts, handling an incoming IRQ in ISR, re-enabling the
softirq and switching context back to softirq.
-More detailed documentation of NAPI may be found on the pages of Linux
-Foundation `<https://wiki.linuxfoundation.org/networking/napi>`_.
+See :ref:`Documentation/networking/napi.rst <napi>` for more information.
Integrating the core to Xilinx Zynq
-----------------------------------
diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
new file mode 100644
index 000000000000..9e8a16c44102
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
@@ -0,0 +1,139 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+========================================================
+Linux Driver for the AMD/Pensando(R) DSC adapter family
+========================================================
+
+Copyright(c) 2023 Advanced Micro Devices, Inc
+
+Identifying the Adapter
+=======================
+
+To find if one or more AMD/Pensando PCI Core devices are installed on the
+host, check for the PCI devices::
+
+ # lspci -d 1dd8:100c
+ b5:00.0 Processing accelerators: Pensando Systems Device 100c
+ b6:00.0 Processing accelerators: Pensando Systems Device 100c
+
+If such devices are listed as above, then the pds_core.ko driver should find
+and configure them for use. There should be log entries in the kernel
+messages such as these::
+
+ $ dmesg | grep pds_core
+ pds_core 0000:b5:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
+ pds_core 0000:b5:00.0: FW: 1.60.0-73
+ pds_core 0000:b6:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link)
+ pds_core 0000:b6:00.0: FW: 1.60.0-73
+
+Driver and firmware version information can be gathered with devlink::
+
+ $ devlink dev info pci/0000:b5:00.0
+ pci/0000:b5:00.0:
+ driver pds_core
+ serial_number FLM18420073
+ versions:
+ fixed:
+ asic.id 0x0
+ asic.rev 0x0
+ running:
+ fw 1.51.0-73
+ stored:
+ fw.goldfw 1.15.9-C-22
+ fw.mainfwa 1.60.0-73
+ fw.mainfwb 1.60.0-57
+
+Info versions
+=============
+
+The ``pds_core`` driver reports the following versions
+
+.. list-table:: devlink info versions implemented
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``fw``
+ - running
+ - Version of firmware running on the device
+ * - ``fw.goldfw``
+ - stored
+ - Version of firmware stored in the goldfw slot
+ * - ``fw.mainfwa``
+ - stored
+ - Version of firmware stored in the mainfwa slot
+ * - ``fw.mainfwb``
+ - stored
+ - Version of firmware stored in the mainfwb slot
+ * - ``asic.id``
+ - fixed
+ - The ASIC type for this device
+ * - ``asic.rev``
+ - fixed
+ - The revision of the ASIC for this device
+
+Parameters
+==========
+
+The ``pds_core`` driver implements the following generic
+parameters for controlling the functionality to be made available
+as auxiliary_bus devices.
+
+.. list-table:: Generic parameters implemented
+ :widths: 5 5 8 82
+
+ * - Name
+ - Mode
+ - Type
+ - Description
+ * - ``enable_vnet``
+ - runtime
+ - Boolean
+ - Enables vDPA functionality through an auxiliary_bus device
+
+Firmware Management
+===================
+
+The ``flash`` command can update a the DSC firmware. The downloaded firmware
+will be saved into either of firmware bank 1 or bank 2, whichever is not
+currently in use, and that bank will used for the next boot::
+
+ # devlink dev flash pci/0000:b5:00.0 \
+ file pensando/dsc_fw_1.63.0-22.tar
+
+Health Reporters
+================
+
+The driver supports a devlink health reporter for FW status::
+
+ # devlink health show pci/0000:2b:00.0 reporter fw
+ pci/0000:2b:00.0:
+ reporter fw
+ state healthy error 0 recover 0
+ # devlink health diagnose pci/0000:2b:00.0 reporter fw
+ Status: healthy State: 1 Generation: 0 Recoveries: 0
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+ make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+ -> Device Drivers
+ -> Network device support (NETDEVICES [=y])
+ -> Ethernet driver support
+ -> AMD devices
+ -> AMD/Pensando Ethernet PDS_CORE Support
+
+Support
+=======
+
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by AMD/Pensando personnel::
+
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 392969ac88ad..417ca514a4d0 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -14,6 +14,7 @@ Contents:
3com/vortex
amazon/ena
altera/altera_tse
+ amd/pds_core
aquantia/atlantic
chelsio/cxgb
cirrus/cs89x0
@@ -31,7 +32,6 @@ Contents:
intel/fm10k
intel/igb
intel/igbvf
- intel/ixgb
intel/ixgbe
intel/ixgbevf
intel/i40e
diff --git a/Documentation/networking/device_drivers/ethernet/intel/e100.rst b/Documentation/networking/device_drivers/ethernet/intel/e100.rst
index 3d4a9ba21946..5dee1b53e977 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/e100.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/e100.rst
@@ -151,8 +151,7 @@ NAPI
NAPI (Rx polling mode) is supported in the e100 driver.
-See https://wiki.linuxfoundation.org/networking/napi for more
-information on NAPI.
+See :ref:`Documentation/networking/napi.rst <napi>` for more information.
Multiple Interfaces on Same Ethernet Broadcast Network
------------------------------------------------------
@@ -181,8 +180,6 @@ Support
For general information, go to the Intel support website at:
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-http://sourceforge.net/projects/e1000
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/e1000.rst b/Documentation/networking/device_drivers/ethernet/intel/e1000.rst
index 4aaae0f7d6ba..52a7fb9ce8d9 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/e1000.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/e1000.rst
@@ -451,13 +451,8 @@ Support
=======
For general information, go to the Intel support website at:
-
- http://support.intel.com
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
- http://sourceforge.net/projects/e1000
+http://support.intel.com
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
-to the issue to [email protected]
+to the issue to [email protected].
diff --git a/Documentation/networking/device_drivers/ethernet/intel/e1000e.rst b/Documentation/networking/device_drivers/ethernet/intel/e1000e.rst
index f49cd370e7bf..d8f810afdd49 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/e1000e.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/e1000e.rst
@@ -371,13 +371,8 @@ NOTE: Wake on LAN is only supported on port A for the following devices:
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/fm10k.rst b/Documentation/networking/device_drivers/ethernet/intel/fm10k.rst
index 9258ef6f515c..396a2c8c3db1 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/fm10k.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/fm10k.rst
@@ -130,13 +130,8 @@ the Intel Ethernet Controller XL710.
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/i40e.rst b/Documentation/networking/device_drivers/ethernet/intel/i40e.rst
index ac35bd472bdc..4fbaa1a2d674 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/i40e.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/i40e.rst
@@ -399,8 +399,8 @@ operate only in full duplex and only at their native speed.
NAPI
----
NAPI (Rx polling mode) is supported in the i40e driver.
-For more information on NAPI, see
-https://wiki.linuxfoundation.org/networking/napi
+
+See :ref:`Documentation/networking/napi.rst <napi>` for more information.
Flow Control
------------
@@ -759,13 +759,8 @@ enabled when setting up DCB on your switch.
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/iavf.rst b/Documentation/networking/device_drivers/ethernet/intel/iavf.rst
index 151af0a8da9c..eb926c3bd4cd 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/iavf.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/iavf.rst
@@ -319,13 +319,8 @@ This is caused by the way the Linux kernel reports this stressed condition.
Support
=======
For general information, go to the Intel support website at:
-
https://support.intel.com
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on the supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
index 5efea4dd1251..69695e5511f4 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst
@@ -817,10 +817,10 @@ NOTE:
NAPI
----
+
This driver supports NAPI (Rx polling mode).
-For more information on NAPI, see
-https://wiki.linuxfoundation.org/networking/napi
+See :ref:`Documentation/networking/napi.rst <napi>` for more information.
MACVLAN
-------
@@ -1026,12 +1026,9 @@ Support
For general information, go to the Intel support website at:
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
Trademarks
diff --git a/Documentation/networking/device_drivers/ethernet/intel/igb.rst b/Documentation/networking/device_drivers/ethernet/intel/igb.rst
index d46289e182cf..fbd590b6a0d6 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/igb.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/igb.rst
@@ -201,13 +201,8 @@ NOTE: This feature is exclusive to i210 models.
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/igbvf.rst b/Documentation/networking/device_drivers/ethernet/intel/igbvf.rst
index 40fa210c5e14..11a9017f3069 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/igbvf.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/igbvf.rst
@@ -53,13 +53,8 @@ https://www.kernel.org/pub/software/network/ethtool/
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst b/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst
deleted file mode 100644
index c6a233e68ad6..000000000000
--- a/Documentation/networking/device_drivers/ethernet/intel/ixgb.rst
+++ /dev/null
@@ -1,468 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0+
-
-=====================================================================
-Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
-=====================================================================
-
-October 1, 2018
-
-
-Contents
-========
-
-- In This Release
-- Identifying Your Adapter
-- Command Line Parameters
-- Improving Performance
-- Additional Configurations
-- Known Issues/Troubleshooting
-- Support
-
-
-
-In This Release
-===============
-
-This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R)
-Network Connection. This driver includes support for Itanium(R)2-based
-systems.
-
-For questions related to hardware requirements, refer to the documentation
-supplied with your 10 Gigabit adapter. All hardware requirements listed apply
-to use with Linux.
-
-The following features are available in this kernel:
- - Native VLANs
- - Channel Bonding (teaming)
- - SNMP
-
-Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.rst
-
-The driver information previously displayed in the /proc filesystem is not
-supported in this release. Alternatively, you can use ethtool (version 1.6
-or later), lspci, and iproute2 to obtain the same information.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-
-Identifying Your Adapter
-========================
-
-The following Intel network adapters are compatible with the drivers in this
-release:
-
-+------------+------------------------------+----------------------------------+
-| Controller | Adapter Name | Physical Layer |
-+============+==============================+==================================+
-| 82597EX | Intel(R) PRO/10GbE LR/SR/CX4 | - 10G Base-LR (fiber) |
-| | Server Adapters | - 10G Base-SR (fiber) |
-| | | - 10G Base-CX4 (copper) |
-+------------+------------------------------+----------------------------------+
-
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
- https://support.intel.com
-
-
-Command Line Parameters
-=======================
-
-If the driver is built as a module, the following optional parameters are
-used by entering them on the command line with the modprobe command using
-this syntax::
-
- modprobe ixgb [<option>=<VAL1>,<VAL2>,...]
-
-For example, with two 10GbE PCI adapters, entering::
-
- modprobe ixgb TxDescriptors=80,128
-
-loads the ixgb driver with 80 TX resources for the first adapter and 128 TX
-resources for the second adapter.
-
-The default value for each parameter is generally the recommended setting,
-unless otherwise noted.
-
-Copybreak
----------
-:Valid Range: 0-XXXX
-:Default Value: 256
-
- This is the maximum size of packet that is copied to a new buffer on
- receive.
-
-Debug
------
-:Valid Range: 0-16 (0=none,...,16=all)
-:Default Value: 0
-
- This parameter adjusts the level of debug messages displayed in the
- system logs.
-
-FlowControl
------------
-:Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
-:Default Value: 1 if no EEPROM, otherwise read from EEPROM
-
- This parameter controls the automatic generation(Tx) and response(Rx) to
- Ethernet PAUSE frames. There are hardware bugs associated with enabling
- Tx flow control so beware.
-
-RxDescriptors
--------------
-:Valid Range: 64-4096
-:Default Value: 1024
-
- This value is the number of receive descriptors allocated by the driver.
- Increasing this value allows the driver to buffer more incoming packets.
- Each descriptor is 16 bytes. A receive buffer is also allocated for
- each descriptor and can be either 2048, 4056, 8192, or 16384 bytes,
- depending on the MTU setting. When the MTU size is 1500 or less, the
- receive buffer size is 2048 bytes. When the MTU is greater than 1500 the
- receive buffer size will be either 4056, 8192, or 16384 bytes. The
- maximum MTU size is 16114.
-
-TxDescriptors
--------------
-:Valid Range: 64-4096
-:Default Value: 256
-
- This value is the number of transmit descriptors allocated by the driver.
- Increasing this value allows the driver to queue more transmits. Each
- descriptor is 16 bytes.
-
-RxIntDelay
-----------
-:Valid Range: 0-65535 (0=off)
-:Default Value: 72
-
- This value delays the generation of receive interrupts in units of
- 0.8192 microseconds. Receive interrupt reduction can improve CPU
- efficiency if properly tuned for specific network traffic. Increasing
- this value adds extra latency to frame reception and can end up
- decreasing the throughput of TCP traffic. If the system is reporting
- dropped receives, this value may be set too high, causing the driver to
- run out of available receive descriptors.
-
-TxIntDelay
-----------
-:Valid Range: 0-65535 (0=off)
-:Default Value: 32
-
- This value delays the generation of transmit interrupts in units of
- 0.8192 microseconds. Transmit interrupt reduction can improve CPU
- efficiency if properly tuned for specific network traffic. Increasing
- this value adds extra latency to frame transmission and can end up
- decreasing the throughput of TCP traffic. If this value is set too high,
- it will cause the driver to run out of available transmit descriptors.
-
-XsumRX
-------
-:Valid Range: 0-1
-:Default Value: 1
-
- A value of '1' indicates that the driver should enable IP checksum
- offload for received packets (both UDP and TCP) to the adapter hardware.
-
-RxFCHighThresh
---------------
-:Valid Range: 1,536-262,136 (0x600 - 0x3FFF8, 8 byte granularity)
-:Default Value: 196,608 (0x30000)
-
- Receive Flow control high threshold (when we send a pause frame)
-
-RxFCLowThresh
--------------
-:Valid Range: 64-262,136 (0x40 - 0x3FFF8, 8 byte granularity)
-:Default Value: 163,840 (0x28000)
-
- Receive Flow control low threshold (when we send a resume frame)
-
-FCReqTimeout
-------------
-:Valid Range: 1-65535
-:Default Value: 65535
-
- Flow control request timeout (how long to pause the link partner's tx)
-
-IntDelayEnable
---------------
-:Value Range: 0,1
-:Default Value: 1
-
- Interrupt Delay, 0 disables transmit interrupt delay and 1 enables it.
-
-
-Improving Performance
-=====================
-
-With the 10 Gigabit server adapters, the default Linux configuration will
-very likely limit the total available throughput artificially. There is a set
-of configuration changes that, when applied together, will increase the ability
-of Linux to transmit and receive data. The following enhancements were
-originally acquired from settings published at https://www.spec.org/web99/ for
-various submitted results using Linux.
-
-NOTE:
- These changes are only suggestions, and serve as a starting point for
- tuning your network performance.
-
-The changes are made in three major ways, listed in order of greatest effect:
-
-- Use ip link to modify the mtu (maximum transmission unit) and the txqueuelen
- parameter.
-- Use sysctl to modify /proc parameters (essentially kernel tuning)
-- Use setpci to modify the MMRBC field in PCI-X configuration space to increase
- transmit burst lengths on the bus.
-
-NOTE:
- setpci modifies the adapter's configuration registers to allow it to read
- up to 4k bytes at a time (for transmits). However, for some systems the
- behavior after modifying this register may be undefined (possibly errors of
- some kind). A power-cycle, hard reset or explicitly setting the e6 register
- back to 22 (setpci -d 8086:1a48 e6.b=22) may be required to get back to a
- stable configuration.
-
-- COPY these lines and paste them into ixgb_perf.sh:
-
-::
-
- #!/bin/bash
- echo "configuring network performance , edit this file to change the interface
- or device ID of 10GbE card"
- # set mmrbc to 4k reads, modify only Intel 10GbE device IDs
- # replace 1a48 with appropriate 10GbE device's ID installed on the system,
- # if needed.
- setpci -d 8086:1a48 e6.b=2e
- # set the MTU (max transmission unit) - it requires your switch and clients
- # to change as well.
- # set the txqueuelen
- # your ixgb adapter should be loaded as eth1 for this to work, change if needed
- ip li set dev eth1 mtu 9000 txqueuelen 1000 up
- # call the sysctl utility to modify /proc/sys entries
- sysctl -p ./sysctl_ixgb.conf
-
-- COPY these lines and paste them into sysctl_ixgb.conf:
-
-::
-
- # some of the defaults may be different for your kernel
- # call this file with sysctl -p <this file>
- # these are just suggested values that worked well to increase throughput in
- # several network benchmark tests, your mileage may vary
-
- ### IPV4 specific settings
- # turn TCP timestamp support off, default 1, reduces CPU use
- net.ipv4.tcp_timestamps = 0
- # turn SACK support off, default on
- # on systems with a VERY fast bus -> memory interface this is the big gainer
- net.ipv4.tcp_sack = 0
- # set min/default/max TCP read buffer, default 4096 87380 174760
- net.ipv4.tcp_rmem = 10000000 10000000 10000000
- # set min/pressure/max TCP write buffer, default 4096 16384 131072
- net.ipv4.tcp_wmem = 10000000 10000000 10000000
- # set min/pressure/max TCP buffer space, default 31744 32256 32768
- net.ipv4.tcp_mem = 10000000 10000000 10000000
-
- ### CORE settings (mostly for socket and UDP effect)
- # set maximum receive socket buffer size, default 131071
- net.core.rmem_max = 524287
- # set maximum send socket buffer size, default 131071
- net.core.wmem_max = 524287
- # set default receive socket buffer size, default 65535
- net.core.rmem_default = 524287
- # set default send socket buffer size, default 65535
- net.core.wmem_default = 524287
- # set maximum amount of option memory buffers, default 10240
- net.core.optmem_max = 524287
- # set number of unprocessed input packets before kernel starts dropping them; default 300
- net.core.netdev_max_backlog = 300000
-
-Edit the ixgb_perf.sh script if necessary to change eth1 to whatever interface
-your ixgb driver is using and/or replace '1a48' with appropriate 10GbE device's
-ID installed on the system.
-
-NOTE:
- Unless these scripts are added to the boot process, these changes will
- only last only until the next system reboot.
-
-
-Resolving Slow UDP Traffic
---------------------------
-If your server does not seem to be able to receive UDP traffic as fast as it
-can receive TCP traffic, it could be because Linux, by default, does not set
-the network stack buffers as large as they need to be to support high UDP
-transfer rates. One way to alleviate this problem is to allow more memory to
-be used by the IP stack to store incoming data.
-
-For instance, use the commands::
-
- sysctl -w net.core.rmem_max=262143
-
-and::
-
- sysctl -w net.core.rmem_default=262143
-
-to increase the read buffer memory max and default to 262143 (256k - 1) from
-defaults of max=131071 (128k - 1) and default=65535 (64k - 1). These variables
-will increase the amount of memory used by the network stack for receives, and
-can be increased significantly more if necessary for your application.
-
-
-Additional Configurations
-=========================
-
-Configuring the Driver on Different Distributions
--------------------------------------------------
-Configuring a network driver to load properly when the system is started is
-distribution dependent. Typically, the configuration process involves adding
-an alias line to /etc/modprobe.conf as well as editing other system startup
-scripts and/or configuration files. Many popular Linux distributions ship
-with tools to make these changes for you. To learn the proper way to
-configure a network device for your system, refer to your distribution
-documentation. If during this process you are asked for the driver or module
-name, the name for the Linux Base Driver for the Intel 10GbE Family of
-Adapters is ixgb.
-
-Viewing Link Messages
----------------------
-Link messages will not be displayed to the console if the distribution is
-restricting system messages. In order to see network driver link messages on
-your console, set dmesg to eight by entering the following::
-
- dmesg -n 8
-
-NOTE: This setting is not saved across reboots.
-
-Jumbo Frames
-------------
-The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
-enabled by changing the MTU to a value larger than the default of 1500.
-The maximum value for the MTU is 16114. Use the ip command to
-increase the MTU size. For example::
-
- ip li set dev ethx mtu 9000
-
-The maximum MTU setting for Jumbo Frames is 16114. This value coincides
-with the maximum Jumbo Frames size of 16128.
-
-Ethtool
--------
-The driver utilizes the ethtool interface for driver configuration and
-diagnostics, as well as displaying statistical information. The ethtool
-version 1.6 or later is required for this functionality.
-
-The latest release of ethtool can be found from
-https://www.kernel.org/pub/software/network/ethtool/
-
-NOTE:
- The ethtool version 1.6 only supports a limited set of ethtool options.
- Support for a more complete ethtool feature set can be enabled by
- upgrading to the latest version.
-
-NAPI
-----
-NAPI (Rx polling mode) is supported in the ixgb driver.
-
-See https://wiki.linuxfoundation.org/networking/napi for more information on
-NAPI.
-
-
-Known Issues/Troubleshooting
-============================
-
-NOTE:
- After installing the driver, if your Intel Network Connection is not
- working, verify in the "In This Release" section of the readme that you have
- installed the correct driver.
-
-Cable Interoperability Issue with Fujitsu XENPAK Module in SmartBits Chassis
-----------------------------------------------------------------------------
-Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4
-Server adapter is connected to a Fujitsu XENPAK CX4 module in a SmartBits
-chassis using 15 m/24AWG cable assemblies manufactured by Fujitsu or Leoni.
-The CRC errors may be received either by the Intel(R) PRO/10GbE CX4
-Server adapter or the SmartBits. If this situation occurs using a different
-cable assembly may resolve the issue.
-
-Cable Interoperability Issues with HP Procurve 3400cl Switch Port
------------------------------------------------------------------
-Excessive CRC errors may be observed if the Intel(R) PRO/10GbE CX4 Server
-adapter is connected to an HP Procurve 3400cl switch port using short cables
-(1 m or shorter). If this situation occurs, using a longer cable may resolve
-the issue.
-
-Excessive CRC errors may be observed using Fujitsu 24AWG cable assemblies that
-Are 10 m or longer or where using a Leoni 15 m/24AWG cable assembly. The CRC
-errors may be received either by the CX4 Server adapter or at the switch. If
-this situation occurs, using a different cable assembly may resolve the issue.
-
-Jumbo Frames System Requirement
--------------------------------
-Memory allocation failures have been observed on Linux systems with 64 MB
-of RAM or less that are running Jumbo Frames. If you are using Jumbo
-Frames, your system may require more than the advertised minimum
-requirement of 64 MB of system memory.
-
-Performance Degradation with Jumbo Frames
------------------------------------------
-Degradation in throughput performance may be observed in some Jumbo frames
-environments. If this is observed, increasing the application's socket buffer
-size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
-See the specific application manual and /usr/src/linux*/Documentation/
-networking/ip-sysctl.txt for more details.
-
-Allocating Rx Buffers when Using Jumbo Frames
----------------------------------------------
-Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
-the available memory is heavily fragmented. This issue may be seen with PCI-X
-adapters or with packet split disabled. This can be reduced or eliminated
-by changing the amount of available memory for receive buffer allocation, by
-increasing /proc/sys/vm/min_free_kbytes.
-
-Multiple Interfaces on Same Ethernet Broadcast Network
-------------------------------------------------------
-Due to the default ARP behavior on Linux, it is not possible to have
-one system on two IP networks in the same Ethernet broadcast domain
-(non-partitioned switch) behave as expected. All Ethernet interfaces
-will respond to IP traffic for any IP address assigned to the system.
-This results in unbalanced receive traffic.
-
-If you have multiple interfaces in a server, do either of the following:
-
- - Turn on ARP filtering by entering::
-
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-
- - Install the interfaces in separate broadcast domains - either in
- different switches or in a switch partitioned to VLANs.
-
-UDP Stress Test Dropped Packet Issue
---------------------------------------
-Under small packets UDP stress test with 10GbE driver, the Linux system
-may drop UDP packets due to the fullness of socket buffers. You may want
-to change the driver's Flow Control variables to the minimum value for
-controlling packet reception.
-
-Tx Hangs Possible Under Stress
-------------------------------
-Under stress conditions, if TX hangs occur, turning off TSO
-"ethtool -K eth0 tso off" may resolve the problem.
-
-
-Support
-=======
-For general information, go to the Intel support website at:
-
-https://www.intel.com/support/
-
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
-If an issue is identified with the released source code on a supported kernel
-with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst b/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
index 0a233b17c664..1e5f16993f69 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ixgbe.rst
@@ -545,13 +545,8 @@ on the Intel Ethernet Controller XL710.
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixgbevf.rst b/Documentation/networking/device_drivers/ethernet/intel/ixgbevf.rst
index 76bbde736f21..08dc0d368a48 100644
--- a/Documentation/networking/device_drivers/ethernet/intel/ixgbevf.rst
+++ b/Documentation/networking/device_drivers/ethernet/intel/ixgbevf.rst
@@ -55,13 +55,8 @@ VLANs: There is a limit of a total of 64 shared VLANs to 1 or more VFs.
Support
=======
For general information, go to the Intel support website at:
-
https://www.intel.com/support/
-or the Intel Wired Networking project hosted by Sourceforge at:
-
-https://sourceforge.net/projects/e1000
-
If an issue is identified with the released source code on a supported kernel
with a supported adapter, email the specific information related to the issue
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 4cd8e869762b..6b2d1fe74ecf 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -346,32 +346,6 @@ the software port.
- The number of receive packets with CQE compression on ring i [#accel]_.
- Acceleration
- * - `rx[i]_cache_reuse`
- - The number of events of successful reuse of a page from a driver's
- internal page cache.
- - Acceleration
-
- * - `rx[i]_cache_full`
- - The number of events of full internal page cache where driver can't put a
- page back to the cache for recycling (page will be freed).
- - Acceleration
-
- * - `rx[i]_cache_empty`
- - The number of events where cache was empty - no page to give. Driver
- shall allocate new page.
- - Acceleration
-
- * - `rx[i]_cache_busy`
- - The number of events where cache head was busy and cannot be recycled.
- Driver allocated new page.
- - Acceleration
-
- * - `rx[i]_cache_waive`
- - The number of cache evacuation. This can occur due to page move to
- another NUMA node or page was pfmemalloc-ed and should be freed as soon
- as possible.
- - Acceleration
-
* - `rx[i]_arfs_err`
- Number of flow rules that failed to be added to the flow table.
- Error
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
index 9b5c40ba7f0d..3a7a714cc08f 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
@@ -122,6 +122,41 @@ users try to enable them.
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
+hairpin_num_queues: Number of hairpin queues
+--------------------------------------------
+We refer to a TC NIC rule that involves forwarding as "hairpin".
+
+Hairpin queues are mlx5 hardware specific implementation for hardware
+forwarding of such packets.
+
+- Show the number of hairpin queues::
+
+ $ devlink dev param show pci/0000:06:00.0 name hairpin_num_queues
+ pci/0000:06:00.0:
+ name hairpin_num_queues type driver-specific
+ values:
+ cmode driverinit value 2
+
+- Change the number of hairpin queues::
+
+ $ devlink dev param set pci/0000:06:00.0 name hairpin_num_queues value 4 cmode driverinit
+
+hairpin_queue_size: Size of the hairpin queues
+----------------------------------------------
+Control the size of the hairpin queues.
+
+- Show the size of the hairpin queues::
+
+ $ devlink dev param show pci/0000:06:00.0 name hairpin_queue_size
+ pci/0000:06:00.0:
+ name hairpin_queue_size type driver-specific
+ values:
+ cmode driverinit value 1024
+
+- Change the size (in packets) of the hairpin queues::
+
+ $ devlink dev param set pci/0000:06:00.0 name hairpin_queue_size value 512 cmode driverinit
+
Health reporters
================
@@ -222,3 +257,36 @@ User commands examples:
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
NOTE: This command can run only on PF.
+
+vnic reporter
+-------------
+The vnic reporter implements only the `diagnose` callback.
+It is responsible for querying the vnic diagnostic counters from fw and displaying
+them in realtime.
+
+Description of the vnic counters:
+total_q_under_processor_handle: number of queues in an error state due to
+an async error or errored command.
+send_queue_priority_update_flow: number of QP/SQ priority/SL update
+events.
+cq_overrun: number of times CQ entered an error state due to an
+overflow.
+async_eq_overrun: number of times an EQ mapped to async events was
+overrun.
+comp_eq_overrun: number of times an EQ mapped to completion events was
+overrun.
+quota_exceeded_command: number of commands issued and failed due to quota
+exceeded.
+invalid_command: number of commands issued and failed dues to any reason
+other than quota exceeded.
+nic_receive_steering_discard: number of packets that completed RX flow
+steering but were discarded due to a mismatch in flow table.
+
+User commands examples:
+- Diagnose PF/VF vnic counters
+ $ devlink health diagnose pci/0000:82:00.1 reporter vnic
+- Diagnose representor vnic counters (performed by supplying devlink port of the
+ representor, which can be obtained via devlink port command)
+ $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic
+
+NOTE: This command can run over all interfaces such as PF/VF and representor ports.
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 10f282c2117c..2f60e34ab926 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -7,6 +7,21 @@ ice devlink support
This document describes the devlink features implemented by the ``ice``
device driver.
+Parameters
+==========
+
+.. list-table:: Generic parameters implemented
+
+ * - Name
+ - Mode
+ - Notes
+ * - ``enable_roce``
+ - runtime
+ - mutually exclusive with ``enable_iwarp``
+ * - ``enable_iwarp``
+ - runtime
+ - mutually exclusive with ``enable_roce``
+
Info versions
=============
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 3321117cf605..202798d6501e 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -72,6 +72,18 @@ parameters.
Default: disabled
+ * - ``hairpin_num_queues``
+ - u32
+ - driverinit
+ - We refer to a TC NIC rule that involves forwarding as "hairpin".
+ Hairpin queues are mlx5 hardware specific implementation for hardware
+ forwarding of such packets.
+
+ Control the number of hairpin queues.
+ * - ``hairpin_queue_size``
+ - u32
+ - driverinit
+ - Control the size (in packets) of the hairpin queues.
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 64f7236ff10b..4f5dfa9c022e 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -4,94 +4,124 @@
Softnet Driver Issues
=====================
-Transmit path guidelines:
+Probing guidelines
+==================
-1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under
- any normal circumstances. It is considered a hard error unless
- there is no way your device can tell ahead of time when its
- transmit function will become busy.
+Address validation
+------------------
- Instead it must maintain the queue properly. For example,
- for a driver implementing scatter-gather this means::
+Any hardware layer address you obtain for your device should
+be verified. For example, for ethernet check it with
+linux/etherdevice.h:is_valid_ether_addr()
+
+Close/stop guidelines
+=====================
+
+Quiescence
+----------
+
+After the ndo_stop routine has been called, the hardware must
+not receive or transmit any data. All in flight packets must
+be aborted. If necessary, poll or wait for completion of
+any reset commands.
+
+Auto-close
+----------
+
+The ndo_stop routine will be called by unregister_netdevice
+if device is still UP.
+
+Transmit path guidelines
+========================
+
+Stop queues in advance
+----------------------
+
+The ndo_start_xmit method must not return NETDEV_TX_BUSY under
+any normal circumstances. It is considered a hard error unless
+there is no way your device can tell ahead of time when its
+transmit function will become busy.
+
+Instead it must maintain the queue properly. For example,
+for a driver implementing scatter-gather this means:
+
+.. code-block:: c
+
+ static u32 drv_tx_avail(struct drv_ring *dr)
+ {
+ u32 used = READ_ONCE(dr->prod) - READ_ONCE(dr->cons);
+
+ return dr->tx_ring_size - (used & bp->tx_ring_mask);
+ }
static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct drv *dp = netdev_priv(dev);
+ struct netdev_queue *txq;
+ struct drv_ring *dr;
+ int idx;
- lock_tx(dp);
- ...
- /* This is a hard error log it. */
- if (TX_BUFFS_AVAIL(dp) <= (skb_shinfo(skb)->nr_frags + 1)) {
+ idx = skb_get_queue_mapping(skb);
+ dr = dp->tx_rings[idx];
+ txq = netdev_get_tx_queue(dev, idx);
+
+ //...
+ /* This should be a very rare race - log it. */
+ if (drv_tx_avail(dr) <= skb_shinfo(skb)->nr_frags + 1) {
netif_stop_queue(dev);
- unlock_tx(dp);
- printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
- dev->name);
+ netdev_warn(dev, "Tx Ring full when queue awake!\n");
return NETDEV_TX_BUSY;
}
- ... queue packet to card ...
- ... update tx consumer index ...
-
- if (TX_BUFFS_AVAIL(dp) <= (MAX_SKB_FRAGS + 1))
- netif_stop_queue(dev);
-
- ...
- unlock_tx(dp);
- ...
- return NETDEV_TX_OK;
- }
-
- And then at the end of your TX reclamation event handling::
+ //... queue packet to card ...
- if (netif_queue_stopped(dp->dev) &&
- TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
- netif_wake_queue(dp->dev);
+ netdev_tx_sent_queue(txq, skb->len);
- For a non-scatter-gather supporting card, the three tests simply become::
+ //... update tx producer index using WRITE_ONCE() ...
- /* This is a hard error log it. */
- if (TX_BUFFS_AVAIL(dp) <= 0)
+ if (!netif_txq_maybe_stop(txq, drv_tx_avail(dr),
+ MAX_SKB_FRAGS + 1, 2 * MAX_SKB_FRAGS))
+ dr->stats.stopped++;
- and::
+ //...
+ return NETDEV_TX_OK;
+ }
- if (TX_BUFFS_AVAIL(dp) == 0)
+And then at the end of your TX reclamation event handling:
- and::
+.. code-block:: c
- if (netif_queue_stopped(dp->dev) &&
- TX_BUFFS_AVAIL(dp) > 0)
- netif_wake_queue(dp->dev);
+ //... update tx consumer index using WRITE_ONCE() ...
-2) An ndo_start_xmit method must not modify the shared parts of a
- cloned SKB.
+ netif_txq_completed_wake(txq, cmpl_pkts, cmpl_bytes,
+ drv_tx_avail(dr), 2 * MAX_SKB_FRAGS);
-3) Do not forget that once you return NETDEV_TX_OK from your
- ndo_start_xmit method, it is your driver's responsibility to free
- up the SKB and in some finite amount of time.
+Lockless queue stop / wake helper macros
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- For example, this means that it is not allowed for your TX
- mitigation scheme to let TX packets "hang out" in the TX
- ring unreclaimed forever if no new TX packets are sent.
- This error can deadlock sockets waiting for send buffer room
- to be freed up.
+.. kernel-doc:: include/net/netdev_queues.h
+ :doc: Lockless queue stopping / waking helpers.
- If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
- must not keep any reference to that SKB and you must not attempt
- to free it up.
+No exclusive ownership
+----------------------
-Probing guidelines:
+An ndo_start_xmit method must not modify the shared parts of a
+cloned SKB.
-1) Any hardware layer address you obtain for your device should
- be verified. For example, for ethernet check it with
- linux/etherdevice.h:is_valid_ether_addr()
+Timely completions
+------------------
-Close/stop guidelines:
+Do not forget that once you return NETDEV_TX_OK from your
+ndo_start_xmit method, it is your driver's responsibility to free
+up the SKB and in some finite amount of time.
-1) After the ndo_stop routine has been called, the hardware must
- not receive or transmit any data. All in flight packets must
- be aborted. If necessary, poll or wait for completion of
- any reset commands.
+For example, this means that it is not allowed for your TX
+mitigation scheme to let TX packets "hang out" in the TX
+ring unreclaimed forever if no new TX packets are sent.
+This error can deadlock sockets waiting for send buffer room
+to be freed up.
-2) The ndo_stop routine will be called by unregister_netdevice
- if device is still UP.
+If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
+must not keep any reference to that SKB and you must not attempt
+to free it up.
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index e1bc6186d7ea..2540c70952ff 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -860,22 +860,24 @@ Request contents:
Kernel response contents:
- ==================================== ====== ===========================
- ``ETHTOOL_A_RINGS_HEADER`` nested reply header
- ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
- ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring
- ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring
- ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
- ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
- ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
- ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
- ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
- ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring
- ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split
- ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
- ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
- ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
- ==================================== ====== ===========================
+ ======================================= ====== ===========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring
+ ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split
+ ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
+ ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
+ ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX`` u32 max size of TX push buffer
+ ======================================= ====== ===========================
``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with
page-flipping TCP zero-copy receive (``getsockopt(TCP_ZEROCOPY_RECEIVE)``).
@@ -891,6 +893,18 @@ through MMIO writes, thus reducing the latency. However, enabling this feature
may increase the CPU cost. Drivers may enforce additional per-packet
eligibility checks (e.g. on packet size).
+``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` specifies the maximum number of bytes of a
+transmitted packet a driver can push directly to the underlying device
+('push' mode). Pushing some of the payload bytes to the device has the
+advantages of reducing latency for small packets by avoiding DMA mapping (same
+as ``ETHTOOL_A_RINGS_TX_PUSH`` parameter) as well as allowing the underlying
+device to process packet headers ahead of fetching its payload.
+This can help the device to make fast actions based on the packet's headers.
+This is similar to the "tx-copybreak" parameter, which copies the packet to a
+preallocated DMA memory area instead of mapping new memory. However,
+tx-push-buff parameter copies the packet directly to the device to allow the
+device to take faster actions on the packet.
+
RINGS_SET
=========
@@ -908,6 +922,7 @@ Request contents:
``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer
==================================== ====== ===========================
Kernel checks that requested ring sizes do not exceed limits reported by
@@ -1084,6 +1099,10 @@ such that the corresponding bit in ``ethtool_ops::supported_coalesce_params``
is not set), regardless of their values. Driver may impose additional
constraints on coalescing parameters and their values.
+Compared to requests issued via the ``ioctl()`` netlink version of this request
+will try harder to make sure that values specified by the user have been applied
+and may call the driver twice.
+
PAUSE_GET
=========
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 4ddcae33c336..a164ff074356 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -36,6 +36,7 @@ Contents:
scaling
tls
tls-offload
+ tls-handshake
nfc
6lowpan
6pack
@@ -73,6 +74,7 @@ Contents:
mpls-sysctl
mptcp-sysctl
multiqueue
+ napi
netconsole
netdev-features
netdevices
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 87dd1c5283e6..6ec06a33688a 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -340,6 +340,8 @@ tcp_app_win - INTEGER
Reserve max(window/2^tcp_app_win, mss) of window for application
buffer. Value 0 is special, it means that nothing is reserved.
+ Possible values are [0, 31], inclusive.
+
Default: 31
tcp_autocorking - BOOLEAN
@@ -2719,6 +2721,13 @@ echo_ignore_anycast - BOOLEAN
Default: 0
+error_anycast_as_unicast - BOOLEAN
+ If set to 1, then the kernel will respond with ICMP Errors
+ resulting from requests sent to it over the IPv6 protocol destined
+ to anycast address essentially treating anycast as unicast.
+
+ Default: 0
+
xfrm6_gc_thresh - INTEGER
(Obsolete since linux-4.14)
The threshold at which we will start garbage collecting for IPv6
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
new file mode 100644
index 000000000000..a7a047742e93
--- /dev/null
+++ b/Documentation/networking/napi.rst
@@ -0,0 +1,254 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+.. _napi:
+
+====
+NAPI
+====
+
+NAPI is the event handling mechanism used by the Linux networking stack.
+The name NAPI no longer stands for anything in particular [#]_.
+
+In basic operation the device notifies the host about new events
+via an interrupt.
+The host then schedules a NAPI instance to process the events.
+The device may also be polled for events via NAPI without receiving
+interrupts first (:ref:`busy polling<poll>`).
+
+NAPI processing usually happens in the software interrupt context,
+but there is an option to use :ref:`separate kernel threads<threaded>`
+for NAPI processing.
+
+All in all NAPI abstracts away from the drivers the context and configuration
+of event (packet Rx and Tx) processing.
+
+Driver API
+==========
+
+The two most important elements of NAPI are the struct napi_struct
+and the associated poll method. struct napi_struct holds the state
+of the NAPI instance while the method is the driver-specific event
+handler. The method will typically free Tx packets that have been
+transmitted and process newly received packets.
+
+.. _drv_ctrl:
+
+Control API
+-----------
+
+netif_napi_add() and netif_napi_del() add/remove a NAPI instance
+from the system. The instances are attached to the netdevice passed
+as argument (and will be deleted automatically when netdevice is
+unregistered). Instances are added in a disabled state.
+
+napi_enable() and napi_disable() manage the disabled state.
+A disabled NAPI can't be scheduled and its poll method is guaranteed
+to not be invoked. napi_disable() waits for ownership of the NAPI
+instance to be released.
+
+The control APIs are not idempotent. Control API calls are safe against
+concurrent use of datapath APIs but an incorrect sequence of control API
+calls may result in crashes, deadlocks, or race conditions. For example,
+calling napi_disable() multiple times in a row will deadlock.
+
+Datapath API
+------------
+
+napi_schedule() is the basic method of scheduling a NAPI poll.
+Drivers should call this function in their interrupt handler
+(see :ref:`drv_sched` for more info). A successful call to napi_schedule()
+will take ownership of the NAPI instance.
+
+Later, after NAPI is scheduled, the driver's poll method will be
+called to process the events/packets. The method takes a ``budget``
+argument - drivers can process completions for any number of Tx
+packets but should only process up to ``budget`` number of
+Rx packets. Rx processing is usually much more expensive.
+
+In other words, it is recommended to ignore the budget argument when
+performing TX buffer reclamation to ensure that the reclamation is not
+arbitrarily bounded; however, it is required to honor the budget argument
+for RX processing.
+
+.. warning::
+
+ The ``budget`` argument may be 0 if core tries to only process Tx completions
+ and no Rx packets.
+
+The poll method returns the amount of work done. If the driver still
+has outstanding work to do (e.g. ``budget`` was exhausted)
+the poll method should return exactly ``budget``. In that case,
+the NAPI instance will be serviced/polled again (without the
+need to be scheduled).
+
+If event processing has been completed (all outstanding packets
+processed) the poll method should call napi_complete_done()
+before returning. napi_complete_done() releases the ownership
+of the instance.
+
+.. warning::
+
+ The case of finishing all events and using exactly ``budget``
+ must be handled carefully. There is no way to report this
+ (rare) condition to the stack, so the driver must either
+ not call napi_complete_done() and wait to be called again,
+ or return ``budget - 1``.
+
+ If the ``budget`` is 0 napi_complete_done() should never be called.
+
+Call sequence
+-------------
+
+Drivers should not make assumptions about the exact sequencing
+of calls. The poll method may be called without the driver scheduling
+the instance (unless the instance is disabled). Similarly,
+it's not guaranteed that the poll method will be called, even
+if napi_schedule() succeeded (e.g. if the instance gets disabled).
+
+As mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent
+calls to the poll method only wait for the ownership of the instance
+to be released, not for the poll method to exit. This means that
+drivers should avoid accessing any data structures after calling
+napi_complete_done().
+
+.. _drv_sched:
+
+Scheduling and IRQ masking
+--------------------------
+
+Drivers should keep the interrupts masked after scheduling
+the NAPI instance - until NAPI polling finishes any further
+interrupts are unnecessary.
+
+Drivers which have to mask the interrupts explicitly (as opposed
+to IRQ being auto-masked by the device) should use the napi_schedule_prep()
+and __napi_schedule() calls:
+
+.. code-block:: c
+
+ if (napi_schedule_prep(&v->napi)) {
+ mydrv_mask_rxtx_irq(v->idx);
+ /* schedule after masking to avoid races */
+ __napi_schedule(&v->napi);
+ }
+
+IRQ should only be unmasked after a successful call to napi_complete_done():
+
+.. code-block:: c
+
+ if (budget && napi_complete_done(&v->napi, work_done)) {
+ mydrv_unmask_rxtx_irq(v->idx);
+ return min(work_done, budget - 1);
+ }
+
+napi_schedule_irqoff() is a variant of napi_schedule() which takes advantage
+of guarantees given by being invoked in IRQ context (no need to
+mask interrupts). Note that PREEMPT_RT forces all interrupts
+to be threaded so the interrupt may need to be marked ``IRQF_NO_THREAD``
+to avoid issues on real-time kernel configurations.
+
+Instance to queue mapping
+-------------------------
+
+Modern devices have multiple NAPI instances (struct napi_struct) per
+interface. There is no strong requirement on how the instances are
+mapped to queues and interrupts. NAPI is primarily a polling/processing
+abstraction without specific user-facing semantics. That said, most networking
+devices end up using NAPI in fairly similar ways.
+
+NAPI instances most often correspond 1:1:1 to interrupts and queue pairs
+(queue pair is a set of a single Rx and single Tx queue).
+
+In less common cases a NAPI instance may be used for multiple queues
+or Rx and Tx queues can be serviced by separate NAPI instances on a single
+core. Regardless of the queue assignment, however, there is usually still
+a 1:1 mapping between NAPI instances and interrupts.
+
+It's worth noting that the ethtool API uses a "channel" terminology where
+each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
+what constitutes a channel; the recommended interpretation is to understand
+a channel as an IRQ/NAPI which services queues of a given type. For example,
+a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
+to utilize 3 interrupts, 2 Rx and 2 Tx queues.
+
+User API
+========
+
+User interactions with NAPI depend on NAPI instance ID. The instance IDs
+are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
+It's not currently possible to query IDs used by a given device.
+
+Software IRQ coalescing
+-----------------------
+
+NAPI does not perform any explicit event coalescing by default.
+In most scenarios batching happens due to IRQ coalescing which is done
+by the device. There are cases where software coalescing is helpful.
+
+NAPI can be configured to arm a repoll timer instead of unmasking
+the hardware interrupts as soon as all packets are processed.
+The ``gro_flush_timeout`` sysfs configuration of the netdevice
+is reused to control the delay of the timer, while
+``napi_defer_hard_irqs`` controls the number of consecutive empty polls
+before NAPI gives up and goes back to using hardware IRQs.
+
+.. _poll:
+
+Busy polling
+------------
+
+Busy polling allows a user process to check for incoming packets before
+the device interrupt fires. As is the case with any busy polling it trades
+off CPU cycles for lower latency (production uses of NAPI busy polling
+are not well known).
+
+Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
+selected sockets or using the global ``net.core.busy_poll`` and
+``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
+also exists.
+
+IRQ mitigation
+---------------
+
+While busy polling is supposed to be used by low latency applications,
+a similar mechanism can be used for IRQ mitigation.
+
+Very high request-per-second applications (especially routing/forwarding
+applications and especially applications using AF_XDP sockets) may not
+want to be interrupted until they finish processing a request or a batch
+of packets.
+
+Such applications can pledge to the kernel that they will perform a busy
+polling operation periodically, and the driver should keep the device IRQs
+permanently masked. This mode is enabled by using the ``SO_PREFER_BUSY_POLL``
+socket option. To avoid system misbehavior the pledge is revoked
+if ``gro_flush_timeout`` passes without any busy poll call.
+
+The NAPI budget for busy polling is lower than the default (which makes
+sense given the low latency intention of normal busy polling). This is
+not the case with IRQ mitigation, however, so the budget can be adjusted
+with the ``SO_BUSY_POLL_BUDGET`` socket option.
+
+.. _threaded:
+
+Threaded NAPI
+-------------
+
+Threaded NAPI is an operating mode that uses dedicated kernel
+threads rather than software IRQ context for NAPI processing.
+The configuration is per netdevice and will affect all
+NAPI instances of that device. Each NAPI instance will spawn a separate
+thread (called ``napi/${ifc-name}-${napi-id}``).
+
+It is recommended to pin each kernel thread to a single CPU, the same
+CPU as the CPU which services the interrupt. Note that the mapping
+between IRQs and NAPI instances may not be trivial (and is driver
+dependent). The NAPI instance IDs will be assigned in the opposite
+order than the process IDs of the kernel threads.
+
+Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in
+netdev's sysfs directory.
+
+.. rubric:: Footnotes
+
+.. [#] NAPI was originally referred to as New API in 2.4 Linux.
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 30f1344e7cca..873efd97f822 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -165,6 +165,7 @@ Registration
pp_params.pool_size = DESC_NUM;
pp_params.nid = NUMA_NO_NODE;
pp_params.dev = priv->dev;
+ pp_params.napi = napi; /* only if locking is tied to NAPI */
pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
page_pool = page_pool_create(&pp_params);
diff --git a/Documentation/networking/rxrpc.rst b/Documentation/networking/rxrpc.rst
index ec1323d92c96..e807e18ba32a 100644
--- a/Documentation/networking/rxrpc.rst
+++ b/Documentation/networking/rxrpc.rst
@@ -848,14 +848,21 @@ The kernel interface functions are as follows:
returned. The caller now holds a reference on this and it must be
properly ended.
- (#) End a client call::
+ (#) Shut down a client call::
- void rxrpc_kernel_end_call(struct socket *sock,
+ void rxrpc_kernel_shutdown_call(struct socket *sock,
+ struct rxrpc_call *call);
+
+ This is used to shut down a previously begun call. The user_call_ID is
+ expunged from AF_RXRPC's knowledge and will not be seen again in
+ association with the specified call.
+
+ (#) Release the ref on a client call::
+
+ void rxrpc_kernel_put_call(struct socket *sock,
struct rxrpc_call *call);
- This is used to end a previously begun call. The user_call_ID is expunged
- from AF_RXRPC's knowledge and will not be seen again in association with
- the specified call.
+ This is used to release the caller's ref on an rxrpc call.
(#) Send data through a call::
diff --git a/Documentation/networking/tls-handshake.rst b/Documentation/networking/tls-handshake.rst
new file mode 100644
index 000000000000..a2817a88e905
--- /dev/null
+++ b/Documentation/networking/tls-handshake.rst
@@ -0,0 +1,217 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+In-Kernel TLS Handshake
+=======================
+
+Overview
+========
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs
+over TCP. TLS provides end-to-end data integrity and confidentiality in
+addition to peer authentication.
+
+The kernel's kTLS implementation handles the TLS record subprotocol, but
+does not handle the TLS handshake subprotocol which is used to establish
+a TLS session. Kernel consumers can use the API described here to
+request TLS session establishment.
+
+There are several possible ways to provide a handshake service in the
+kernel. The API described here is designed to hide the details of those
+implementations so that in-kernel TLS consumers do not need to be
+aware of how the handshake gets done.
+
+
+User handshake agent
+====================
+
+As of this writing, there is no TLS handshake implementation in the
+Linux kernel. To provide a handshake service, a handshake agent
+(typically in user space) is started in each network namespace where a
+kernel consumer might require a TLS handshake. Handshake agents listen
+for events sent from the kernel that indicate a handshake request is
+waiting.
+
+An open socket is passed to a handshake agent via a netlink operation,
+which creates a socket descriptor in the agent's file descriptor table.
+If the handshake completes successfully, the handshake agent promotes
+the socket to use the TLS ULP and sets the session information using the
+SOL_TLS socket options. The handshake agent returns the socket to the
+kernel via a second netlink operation.
+
+
+Kernel Handshake API
+====================
+
+A kernel TLS consumer initiates a client-side TLS handshake on an open
+socket by invoking one of the tls_client_hello() functions. First, it
+fills in a structure that contains the parameters of the request:
+
+.. code-block:: c
+
+ struct tls_handshake_args {
+ struct socket *ta_sock;
+ tls_done_func_t ta_done;
+ void *ta_data;
+ unsigned int ta_timeout_ms;
+ key_serial_t ta_keyring;
+ key_serial_t ta_my_cert;
+ key_serial_t ta_my_privkey;
+ unsigned int ta_num_peerids;
+ key_serial_t ta_my_peerids[5];
+ };
+
+The @ta_sock field references an open and connected socket. The consumer
+must hold a reference on the socket to prevent it from being destroyed
+while the handshake is in progress. The consumer must also have
+instantiated a struct file in sock->file.
+
+
+@ta_done contains a callback function that is invoked when the handshake
+has completed. Further explanation of this function is in the "Handshake
+Completion" sesction below.
+
+The consumer can fill in the @ta_timeout_ms field to force the servicing
+handshake agent to exit after a number of milliseconds. This enables the
+socket to be fully closed once both the kernel and the handshake agent
+have closed their endpoints.
+
+Authentication material such as x.509 certificates, private certificate
+keys, and pre-shared keys are provided to the handshake agent in keys
+that are instantiated by the consumer before making the handshake
+request. The consumer can provide a private keyring that is linked into
+the handshake agent's process keyring in the @ta_keyring field to prevent
+access of those keys by other subsystems.
+
+To request an x.509-authenticated TLS session, the consumer fills in
+the @ta_my_cert and @ta_my_privkey fields with the serial numbers of
+keys containing an x.509 certificate and the private key for that
+certificate. Then, it invokes this function:
+
+.. code-block:: c
+
+ ret = tls_client_hello_x509(args, gfp_flags);
+
+The function returns zero when the handshake request is under way. A
+zero return guarantees the callback function @ta_done will be invoked
+for this socket. The function returns a negative errno if the handshake
+could not be started. A negative errno guarantees the callback function
+@ta_done will not be invoked on this socket.
+
+
+To initiate a client-side TLS handshake with a pre-shared key, use:
+
+.. code-block:: c
+
+ ret = tls_client_hello_psk(args, gfp_flags);
+
+However, in this case, the consumer fills in the @ta_my_peerids array
+with serial numbers of keys containing the peer identities it wishes
+to offer, and the @ta_num_peerids field with the number of array
+entries it has filled in. The other fields are filled in as above.
+
+
+To initiate an anonymous client-side TLS handshake use:
+
+.. code-block:: c
+
+ ret = tls_client_hello_anon(args, gfp_flags);
+
+The handshake agent presents no peer identity information to the remote
+during this type of handshake. Only server authentication (ie the client
+verifies the server's identity) is performed during the handshake. Thus
+the established session uses encryption only.
+
+
+Consumers that are in-kernel servers use:
+
+.. code-block:: c
+
+ ret = tls_server_hello_x509(args, gfp_flags);
+
+or
+
+.. code-block:: c
+
+ ret = tls_server_hello_psk(args, gfp_flags);
+
+The argument structure is filled in as above.
+
+
+If the consumer needs to cancel the handshake request, say, due to a ^C
+or other exigent event, the consumer can invoke:
+
+.. code-block:: c
+
+ bool tls_handshake_cancel(sock);
+
+This function returns true if the handshake request associated with
+@sock has been canceled. The consumer's handshake completion callback
+will not be invoked. If this function returns false, then the consumer's
+completion callback has already been invoked.
+
+
+Handshake Completion
+====================
+
+When the handshake agent has completed processing, it notifies the
+kernel that the socket may be used by the consumer again. At this point,
+the consumer's handshake completion callback, provided in the @ta_done
+field in the tls_handshake_args structure, is invoked.
+
+The synopsis of this function is:
+
+.. code-block:: c
+
+ typedef void (*tls_done_func_t)(void *data, int status,
+ key_serial_t peerid);
+
+The consumer provides a cookie in the @ta_data field of the
+tls_handshake_args structure that is returned in the @data parameter of
+this callback. The consumer uses the cookie to match the callback to the
+thread waiting for the handshake to complete.
+
+The success status of the handshake is returned via the @status
+parameter:
+
++------------+----------------------------------------------+
+| status | meaning |
++============+==============================================+
+| 0 | TLS session established successfully |
++------------+----------------------------------------------+
+| -EACCESS | Remote peer rejected the handshake or |
+| | authentication failed |
++------------+----------------------------------------------+
+| -ENOMEM | Temporary resource allocation failure |
++------------+----------------------------------------------+
+| -EINVAL | Consumer provided an invalid argument |
++------------+----------------------------------------------+
+| -ENOKEY | Missing authentication material |
++------------+----------------------------------------------+
+| -EIO | An unexpected fault occurred |
++------------+----------------------------------------------+
+
+The @peerid parameter contains the serial number of a key containing the
+remote peer's identity or the value TLS_NO_PEERID if the session is not
+authenticated.
+
+A best practice is to close and destroy the socket immediately if the
+handshake failed.
+
+
+Other considerations
+--------------------
+
+While a handshake is under way, the kernel consumer must alter the
+socket's sk_data_ready callback function to ignore all incoming data.
+Once the handshake completion callback function has been invoked, normal
+receive operation can be resumed.
+
+Once a TLS session is established, the consumer must provide a buffer
+for and then examine the control message (CMSG) that is part of every
+subsequent sock_recvmsg(). Each control message indicates whether the
+received message data is TLS record data or session metadata.
+
+See tls.rst for details on how a kTLS consumer recognizes incoming
+(decrypted) application data, alerts, and handshake packets once the
+socket has been promoted to use the TLS ULP.
diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index aac63fc2d08b..25ce72af81c2 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -23,10 +23,13 @@ metadata is supported, this set will grow:
An XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
consumers, an XDP program can store it into the metadata area carried
-ahead of the packet.
+ahead of the packet. Not all packets will necessary have the requested
+metadata available in which case the driver returns ``-ENODATA``.
Not all kfuncs have to be implemented by the device driver; when not
-implemented, the default ones that return ``-EOPNOTSUPP`` will be used.
+implemented, the default ones that return ``-EOPNOTSUPP`` will be used
+to indicate the device driver have not implemented this kfunc.
+
Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is
as follows::