diff options
Diffstat (limited to 'Documentation/networking')
26 files changed, 1936 insertions, 686 deletions
diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst index b9d04ee6dac1..621cb9575c8f 100644 --- a/Documentation/networking/bareudp.rst +++ b/Documentation/networking/bareudp.rst @@ -6,16 +6,17 @@ Bare UDP Tunnelling Module Documentation There are various L3 encapsulation standards using UDP being discussed to leverage the UDP based load balancing capability of different networks. -MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. +MPLSoUDP (https://tools.ietf.org/html/rfc7510) is one among them. The Bareudp tunnel module provides a generic L3 encapsulation support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP tunnel. Special Handling ---------------- + The bareudp device supports special handling for MPLS & IP as they can have multiple ethertypes. -MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). +The MPLS protocol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC with a flag called multiproto mode. @@ -52,7 +53,7 @@ be enabled explicitly with the "multiproto" flag. 3) Device Usage The bareudp device could be used along with OVS or flower filter in TC. -The OVS or TC flower layer must set the tunnel information in SKB dst field before -sending packet buffer to the bareudp device for transmission. On reception the -bareudp device extracts and stores the tunnel information in SKB dst field before +The OVS or TC flower layer must set the tunnel information in the SKB dst field before +sending the packet buffer to the bareudp device for transmission. On reception, the +bareUDP device extracts and stores the tunnel information in the SKB dst field before passing the packet buffer to the network stack. diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index e774b48de9f5..7c8d22d68682 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -2916,6 +2916,17 @@ from the bond (``ifenslave -d bond0 eth0``). The bonding driver will then restore the MAC addresses that the slaves had before they were enslaved. +9. What bonding modes support native XDP? +------------------------------------------ + + * balance-rr (0) + * active-backup (1) + * balance-xor (2) + * 802.3ad (4) + +Note that the vlan+srcmac hash policy does not support native XDP. +For other bonding modes, the XDP program must be loaded with generic mode. + 16. Resources and Links ======================= diff --git a/Documentation/networking/cdc_mbim.rst b/Documentation/networking/cdc_mbim.rst index 37f968acc473..8404a3f794f3 100644 --- a/Documentation/networking/cdc_mbim.rst +++ b/Documentation/networking/cdc_mbim.rst @@ -51,7 +51,7 @@ Such userspace applications includes, but are not limited to: - mbimcli (included with the libmbim [3] library), and - ModemManager [4] -Establishing a MBIM IP session reequires at least these actions by the +Establishing a MBIM IP session requires at least these actions by the management application: - open the control channel diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst index 934752f675ba..3c46a48d99ba 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst @@ -101,6 +101,37 @@ example, if Rx packets are 10 and Netdev (software statistics) displays rx_bytes as "X", then ethtool (hardware statistics) will display rx_bytes as "X+40" (4 bytes CRC x 10 packets). +ethtool reset +------------- +The driver supports 3 types of resets: + +- PF reset - resets only components associated with the given PF, does not + impact other PFs + +- CORE reset - whole adapter is affected, reset all PFs + +- GLOBAL reset - same as CORE but mac and phy components are also reinitialized + +These are mapped to ethtool reset flags as follow: + +- PF reset: + + # ethtool --reset <ethX> irq dma filter offload + +- CORE reset: + + # ethtool --reset <ethX> irq-shared dma-shared filter-shared offload-shared \ + ram-shared + +- GLOBAL reset: + + # ethtool --reset <ethX> irq-shared dma-shared filter-shared offload-shared \ + mac-shared phy-shared ram-shared + +In switchdev mode you can reset a VF using port representor: + + # ethtool --reset <repr> irq dma filter offload + Viewing Link Messages --------------------- diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst index 1e196cb9ce25..af7db0e91f6b 100644 --- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst +++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst @@ -14,6 +14,7 @@ Contents - `Basic packet flow`_ - `Devlink health reporters`_ - `Quality of service`_ +- `RVU representors`_ Overview ======== @@ -340,3 +341,93 @@ Setup HTB offload # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416 # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768 + + +RVU Representors +================ + +RVU representor driver adds support for creation of representor devices for +RVU PFs' VFs in the system. Representor devices are created when user enables +the switchdev mode. +Switchdev mode can be enabled either before or after setting up SRIOV numVFs. +All representor devices share a single NIXLF but each has a dedicated Rx/Tx +queues. RVU PF representor driver registers a separate netdev for each +Rx/Tx queue pair. + +Current HW does not support built-in switch which can do L2 learning and +forwarding packets between representee and representor. Hence, packet path +between representee and it's representor is achieved by setting up appropriate +NPC MCAM filters. +Transmit packets matching these filters will be loopbacked through hardware +loopback channel/interface (i.e, instead of sending them out of MAC interface). +Which will again match the installed filters and will be forwarded. +This way representee => representor and representor => representee packet +path is achieved. These rules get installed when representors are created +and gets active/deactivate based on the representor/representee interface state. + +Usage example: + + - Change device to switchdev mode:: + + # devlink dev eswitch set pci/0002:1c:00.0 mode switchdev + + - List of representor devices on the system:: + + # ip link show + Rpf1vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether f6:43:83:ee:26:21 brd ff:ff:ff:ff:ff:ff + Rpf1vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 12:b2:54:0e:24:54 brd ff:ff:ff:ff:ff:ff + Rpf1vf2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 4a:12:c4:4c:32:62 brd ff:ff:ff:ff:ff:ff + Rpf1vf3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether ca:cb:68:0e:e2:6e brd ff:ff:ff:ff:ff:ff + Rpf2vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 06:cc:ad:b4:f0:93 brd ff:ff:ff:ff:ff:ff + + +To delete the representors devices from the system. Change the device to legacy mode. + + - Change device to legacy mode:: + + # devlink dev eswitch set pci/0002:1c:00.0 mode legacy + +RVU representors can be managed using devlink ports +(see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. + + - Show devlink ports of representors:: + + # devlink port + pci/0002:1c:00.0/0: type eth netdev Rpf1vf0 flavour physical port 0 splittable false + pci/0002:1c:00.0/1: type eth netdev Rpf1vf1 flavour pcivf controller 0 pfnum 1 vfnum 1 external false splittable false + pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false + pci/0002:1c:00.0/3: type eth netdev Rpf1vf3 flavour pcivf controller 0 pfnum 1 vfnum 3 external false splittable false + +Function attributes +=================== + +The RVU representor support function attributes for representors. +Port function configuration of the representors are supported through devlink eswitch port. + +MAC address setup +----------------- + +RVU representor driver support devlink port function attr mechanism to setup MAC +address. (refer to Documentation/networking/devlink/devlink-port.rst) + + - To setup MAC address for port 2:: + + # devlink port function set pci/0002:1c:00.0/2 hw_addr 5c:a1:1b:5e:43:11 + # devlink port show pci/0002:1c:00.0/2 + pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false + function: + hw_addr 5c:a1:1b:5e:43:11 + + +TC offload +========== + +The rvu representor driver implements support for offloading tc rules using port representors. + + - Drop packets with vlan id 3:: + + # tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower vlan_id 3 vlan_ethtype ipv4 skip_sw action drop + + - Redirect packets with vlan id 5 and IPv4 packets to eth1, after stripping vlan header.:: + + # tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5 vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress redirect dev eth1 diff --git a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst index 32ff114f5c26..04e0595bb0a7 100644 --- a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst +++ b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst @@ -27,3 +27,46 @@ driver takes over. devlink dev info provides version information for all three components. In addition to the version the hg commit hash of the build is included as a separate entry. + +Statistics +---------- + +RPC (Rx parser) +~~~~~~~~~~~~~~~ + + - ``rpc_unkn_etype``: frames containing unknown EtherType + - ``rpc_unkn_ext_hdr``: frames containing unknown IPv6 extension header + - ``rpc_ipv4_frag``: frames containing IPv4 fragment + - ``rpc_ipv6_frag``: frames containing IPv6 fragment + - ``rpc_ipv4_esp``: frames with IPv4 ESP encapsulation + - ``rpc_ipv6_esp``: frames with IPv6 ESP encapsulation + - ``rpc_tcp_opt_err``: frames which encountered TCP option parsing error + - ``rpc_out_of_hdr_err``: frames where header was larger than parsable region + - ``ovr_size_err``: oversized frames + +PCIe +~~~~ + +The fbnic driver exposes PCIe hardware performance statistics through debugfs +(``pcie_stats``). These statistics provide insights into PCIe transaction +behavior and potential performance bottlenecks. + +1. PCIe Transaction Counters: + + These counters track PCIe transaction activity: + - ``pcie_ob_rd_tlp``: Outbound read Transaction Layer Packets count + - ``pcie_ob_rd_dword``: DWORDs transferred in outbound read transactions + - ``pcie_ob_wr_tlp``: Outbound write Transaction Layer Packets count + - ``pcie_ob_wr_dword``: DWORDs transferred in outbound write + transactions + - ``pcie_ob_cpl_tlp``: Outbound completion TLP count + - ``pcie_ob_cpl_dword``: DWORDs transferred in outbound completion TLPs + +2. PCIe Resource Monitoring: + + These counters indicate PCIe resource exhaustion events: + - ``pcie_ob_rd_no_tag``: Read requests dropped due to tag unavailability + - ``pcie_ob_rd_no_cpl_cred``: Read requests dropped due to completion + credit exhaustion + - ``pcie_ob_rd_no_np_cred``: Read requests dropped due to non-posted + credit exhaustion diff --git a/Documentation/networking/device_drivers/wwan/t7xx.rst b/Documentation/networking/device_drivers/wwan/t7xx.rst index f346f5f85f15..e07de7700dfc 100644 --- a/Documentation/networking/device_drivers/wwan/t7xx.rst +++ b/Documentation/networking/device_drivers/wwan/t7xx.rst @@ -7,12 +7,13 @@ ============================================ t7xx driver for MTK PCIe based T700 5G modem ============================================ -The t7xx driver is a WWAN PCIe host driver developed for linux or Chrome OS platforms -for data exchange over PCIe interface between Host platform & MediaTek's T700 5G modem. -The driver exposes an interface conforming to the MBIM protocol [1]. Any front end -application (e.g. Modem Manager) could easily manage the MBIM interface to enable -data communication towards WWAN. The driver also provides an interface to interact -with the MediaTek's modem via AT commands. +The t7xx driver is a WWAN PCIe host driver developed for linux or Chrome OS +platforms for data exchange over PCIe interface between Host platform & +MediaTek's T700 5G modem. +The driver exposes an interface conforming to the MBIM protocol [1]. Any front +end application (e.g. Modem Manager) could easily manage the MBIM interface to +enable data communication towards WWAN. The driver also provides an interface +to interact with the MediaTek's modem via AT commands. Basic usage =========== @@ -45,8 +46,8 @@ The driver provides sysfs interfaces to userspace. t7xx_mode --------- -The sysfs interface provides userspace with access to the device mode, this interface -supports read and write operations. +The sysfs interface provides userspace with access to the device mode, this +interface supports read and write operations. Device mode: @@ -67,6 +68,28 @@ Write from userspace to set the device mode. :: $ echo fastboot_switching > /sys/bus/pci/devices/${bdf}/t7xx_mode +t7xx_debug_ports +---------------- +The sysfs interface provides userspace with access to enable/disable the debug +ports, this interface supports read and write operations. + +Debug port status: + +- ``1`` represents enable debug ports +- ``0`` represents disable debug ports + +Currently supported debug ports (ADB/MIPC). + +Read from userspace to get the current debug ports status. + +:: + $ cat /sys/bus/pci/devices/${bdf}/t7xx_debug_ports + +Write from userspace to set the debug ports status. + +:: + $ echo 1 > /sys/bus/pci/devices/${bdf}/t7xx_debug_ports + Management application development ================================== The driver and userspace interfaces are described below. The MBIM protocol is @@ -139,6 +162,25 @@ Please note that driver needs to be reloaded to export /dev/wwan0fastboot0 port, because device needs a cold reset after enter ``fastboot_switching`` mode. +ADB port userspace ABI +---------------------- + +/dev/wwan0adb0 character device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver exposes a ADB protocol interface by implementing ADB WWAN Port. +The userspace end of the ADB channel pipe is a /dev/wwan0adb0 character device. +Application shall use this interface for ADB protocol communication. + +MIPC port userspace ABI +----------------------- + +/dev/wwan0mipc0 character device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver exposes a diagnostic interface by implementing MIPC (Modem +Information Process Center) WWAN Port. The userspace end of the MIPC channel +pipe is a /dev/wwan0mipc0 character device. +Application shall use this interface for MTK modem diagnostic communication. + The MediaTek's T700 modem supports the 3GPP TS 27.007 [4] specification. References @@ -164,3 +206,9 @@ speak the Mobile Interface Broadband Model (MBIM) protocol"* [5] *fastboot "a mechanism for communicating with bootloaders"* - https://android.googlesource.com/platform/system/core/+/refs/heads/main/fastboot/README.md + +[6] *ADB (Android Debug Bridge) "a mechanism to keep track of Android devices +and emulators instances connected to or running on a given host developer +machine with ADB protocol"* + +- https://android.googlesource.com/platform/packages/modules/adb/+/refs/heads/main/README.md diff --git a/Documentation/networking/devlink/octeontx2.rst b/Documentation/networking/devlink/octeontx2.rst index d33a90dd44bf..84206537aedb 100644 --- a/Documentation/networking/devlink/octeontx2.rst +++ b/Documentation/networking/devlink/octeontx2.rst @@ -40,6 +40,27 @@ The ``octeontx2 AF`` driver implements the following driver-specific parameters. - runtime - Use to set the quantum which hardware uses for scheduling among transmit queues. Hardware uses weighted DWRR algorithm to schedule among all transmit queues. + * - ``npc_mcam_high_zone_percent`` + - u8 + - runtime + - Use to set the number of high priority zone entries in NPC MCAM that can be allocated + by a user, out of the three priority zone categories high, mid and low. + * - ``npc_def_rule_cntr`` + - bool + - runtime + - Use to enable or disable hit counters for the default rules in NPC MCAM. + Its not guaranteed that counters gets enabled and mapped to all the default rules, + since the counters are scarce and driver follows a best effort approach. + The default rule serves as the primary packet steering rule for a specific PF or VF, + based on its DMAC address which is installed by AF driver as part of its initialization. + Sample command to read hit counters for default rule from debugfs is as follows, + cat /sys/kernel/debug/cn10k/npc/mcam_rules + * - ``nix_maxlf`` + - u16 + - runtime + - Use to set the maximum number of LFs in NIX hardware block. This would be useful + to increase the availability of default resources allocated to enabled LFs like + MCAM entries for example. The ``octeontx2 PF`` driver implements the following driver-specific parameters. diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst index a55bf21f671c..d95363645331 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner. Failure to do so will exhaust the limited dmabuf that is bound to the RX queue and will lead to packet drops. +The user must pass no more than 128 tokens, with no more than 1024 total frags +among the token->token_count across all the tokens. If the user provides more +than 1024 frags, the kernel will free up to 1024 frags and return early. + +The kernel returns the number of actual frags freed. The number of frags freed +can be less than the tokens provided by the user in case of: + +(a) an internal kernel leak bug. +(b) the user passed more than 1024 frags. Implementation & Caveats ======================== diff --git a/Documentation/networking/diagnostic/index.rst b/Documentation/networking/diagnostic/index.rst new file mode 100644 index 000000000000..86488aa46b48 --- /dev/null +++ b/Documentation/networking/diagnostic/index.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================== +Networking Diagnostics +====================== + +.. toctree:: + :maxdepth: 2 + + twisted_pair_layer1_diagnostics.rst + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst b/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst new file mode 100644 index 000000000000..c9be5cc7e113 --- /dev/null +++ b/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst @@ -0,0 +1,767 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1 +================================================================================== + +Introduction +------------ + +This documentation is designed for two primary audiences: + +1. **Users and System Administrators**: For those dealing with real-world + Ethernet issues, this guide provides a practical, step-by-step + troubleshooting flow to help identify and resolve common problems in Twisted + Pair Ethernet at OSI Layer 1. If you're facing unstable links, speed drops, + or mysterious network issues, jump right into the step-by-step guide and + follow it through to find your solution. + +2. **Kernel Developers**: For developers working with network drivers and PHY + support, this documentation outlines the diagnostic process and highlights + areas where the Linux kernel’s diagnostic interfaces could be extended or + improved. By understanding the diagnostic flow, developers can better + prioritize future enhancements. + +Step-by-Step Diagnostic Guide from Linux (General Ethernet) +----------------------------------------------------------- + +This diagnostic guide covers common Ethernet troubleshooting scenarios, +focusing on **link stability and detection** across different Ethernet +environments, including **Single-Pair Ethernet (SPE)** and **Multi-Pair +Ethernet (MPE)**, as well as power delivery technologies like **PoDL** (Power +over Data Line) and **PoE** (Clause 33 PSE). + +The guide is designed to help users diagnose physical layer (Layer 1) issues on +systems running **Linux kernel version 6.11 or newer**, utilizing **ethtool +version 6.10 or later** and **iproute2 version 6.4.0 or later**. + +In this guide, we assume that users may have **limited or no access to the link +partner** and will focus on diagnosing issues locally. + +Diagnostic Scenarios +~~~~~~~~~~~~~~~~~~~~ + +- **Link is up and stable, but no data transfer**: If the link is stable but + there are issues with data transmission, refer to the **OSI Layer 2 + Troubleshooting Guide**. + +- **Link is unstable**: Link resets, speed drops, or other fluctuations + indicate potential issues at the hardware or physical layer. + +- **No link detected**: The interface is up, but no link is established. + +Verify Interface Status +~~~~~~~~~~~~~~~~~~~~~~~ + +Begin by verifying the status of the Ethernet interface to check if it is +administratively up. Unlike `ethtool`, which provides information on the link +and PHY status, it does not show the **administrative state** of the interface. +To check this, you should use the `ip` command, which describes the interface +state within the angle brackets `"<>"` in its output. + +For example, in the output `<NO-CARRIER,BROADCAST,MULTICAST,UP>`, the important +keywords are: + +- **UP**: The interface is in the administrative "UP" state. +- **NO-CARRIER**: The interface is administratively up, but no physical link is + detected. + +If the output shows `<BROADCAST,MULTICAST>`, this indicates the interface is in +the administrative "DOWN" state. + +- **Command:** `ip link show dev <interface>` + +- **Expected Output:** + + .. code-block:: bash + + 4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ... + link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff + +- **Interpreting the Output:** + + - **Administrative UP State**: + + - If the output contains **"UP"**, the interface is administratively up, + and the system is trying to establish a physical link. + + - If you also see **"NO-CARRIER"**, it means the physical link has not been + detected, indicating potential Layer 1 issues like a cable fault, + misconfiguration, or no connection at the link partner. In this case, + proceed to the **Inspect Link Status and PHY Configuration** section. + + - **Administrative DOWN State**: + + - If the output lacks **"UP"** and shows only states like + **"<BROADCAST,MULTICAST>"**, it means the interface is administratively + down. In this case, bring the interface up using the following command: + + .. code-block:: bash + + ip link set dev <interface> up + +- **Next Steps**: + + - If the interface is **administratively up** but shows **NO-CARRIER**, + proceed to the **Inspect Link Status and PHY Configuration** section to + troubleshoot potential physical layer issues. + + - If the interface was **administratively down** and you have brought it up, + ensure to **repeat this verification step** to confirm the new state of the + interface before proceeding + + - **If the interface is up and the link is detected**: + + - If the output shows **"UP"** and there is **no `NO-CARRIER`**, the + interface is administratively up, and the physical link has been + successfully established. If everything is working as expected, the Layer + 1 diagnostics are complete, and no further action is needed. + + - If the interface is up and the link is detected but **no data is being + transferred**, the issue is likely beyond Layer 1, and you should proceed + with diagnosing the higher layers of the OSI model. This may involve + checking Layer 2 configurations (such as VLANs or MAC address issues), + Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and + above (firewalls, services, etc.). + + - If the **link is unstable** or **frequently resetting or dropping**, this + may indicate a physical layer issue such as a faulty cable, interference, + or power delivery problems. In this case, proceed with the next step in + this guide. + +Inspect Link Status and PHY Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Use `ethtool -I` to check the link status, PHY configuration, supported link +modes, and additional statistics such as the **Link Down Events** counter. This +step is essential for diagnosing Layer 1 problems such as speed mismatches, +duplex issues, and link instability. + +For both **Single-Pair Ethernet (SPE)** and **Multi-Pair Ethernet (MPE)** +devices, you will use this step to gather key details about the link. **SPE** +links generally support a single speed and mode without autonegotiation (with +the exception of **10BaseT1L**), while **MPE** devices typically support +multiple link modes and autonegotiation. + +- **Command:** `ethtool -I <interface>` + +- **Example Output for SPE Interface (Non-autonegotiation)**: + + .. code-block:: bash + + Settings for spe4: + Supported ports: [ TP ] + Supported link modes: 100baseT1/Full + Supported pause frame use: No + Supports auto-negotiation: No + Supported FEC modes: Not reported + Advertised link modes: Not applicable + Advertised pause frame use: No + Advertised auto-negotiation: No + Advertised FEC modes: Not reported + Speed: 100Mb/s + Duplex: Full + Auto-negotiation: off + master-slave cfg: forced slave + master-slave status: slave + Port: Twisted Pair + PHYAD: 6 + Transceiver: external + MDI-X: Unknown + Supports Wake-on: d + Wake-on: d + Link detected: yes + SQI: 7/7 + Link Down Events: 2 + +- **Example Output for MPE Interface (Autonegotiation)**: + + .. code-block:: bash + + Settings for eth1: + Supported ports: [ TP MII ] + Supported link modes: 10baseT/Half 10baseT/Full + 100baseT/Half 100baseT/Full + Supported pause frame use: Symmetric Receive-only + Supports auto-negotiation: Yes + Supported FEC modes: Not reported + Advertised link modes: 10baseT/Half 10baseT/Full + 100baseT/Half 100baseT/Full + Advertised pause frame use: Symmetric Receive-only + Advertised auto-negotiation: Yes + Advertised FEC modes: Not reported + Link partner advertised link modes: 10baseT/Half 10baseT/Full + 100baseT/Half 100baseT/Full + Link partner advertised pause frame use: Symmetric Receive-only + Link partner advertised auto-negotiation: Yes + Link partner advertised FEC modes: Not reported + Speed: 100Mb/s + Duplex: Full + Auto-negotiation: on + Port: Twisted Pair + PHYAD: 10 + Transceiver: internal + MDI-X: Unknown + Supports Wake-on: pg + Wake-on: p + Link detected: yes + Link Down Events: 1 + +- **Next Steps**: + + - Record the output provided by `ethtool`, particularly noting the + **master-slave status**, **speed**, **duplex**, and other relevant fields. + This information will be useful for further analysis or troubleshooting. + Once the **ethtool** output has been collected and stored, move on to the + next diagnostic step. + +Check Power Delivery (PoDL or PoE) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If it is known that **PoDL** or **PoE** is **not implemented** on the system, +or the **PSE** (Power Sourcing Equipment) is managed by proprietary user-space +software or external tools, you can skip this step. In such cases, verify power +delivery through alternative methods, such as checking hardware indicators +(LEDs), using multimeters, or consulting vendor-specific software for +monitoring power status. + +If **PoDL** or **PoE** is implemented and managed directly by Linux, follow +these steps to ensure power is being delivered correctly: + +- **Command:** `ethtool --show-pse <interface>` + +- **Expected Output Examples**: + + 1. **PSE Not Supported**: + + If no PSE is attached or the interface does not support PSE, the following + output is expected: + + .. code-block:: bash + + netlink error: No PSE is attached + netlink error: Operation not supported + + 2. **PoDL (Single-Pair Ethernet)**: + + When PoDL is implemented, you might see the following attributes: + + .. code-block:: bash + + PSE attributes for eth1: + PoDL PSE Admin State: enabled + PoDL PSE Power Detection Status: delivering power + + 3. **PoE (Clause 33 PSE)**: + + For standard PoE, the output may look like this: + + .. code-block:: bash + + PSE attributes for eth1: + Clause 33 PSE Admin State: enabled + Clause 33 PSE Power Detection Status: delivering power + Clause 33 PSE Available Power Limit: 18000 + +- **Adjust Power Limit (if needed)**: + + - Sometimes, the available power limit may not be sufficient for the link + partner. You can increase the power limit as needed. + + - **Command:** `ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>` + + Example: + + .. code-block:: bash + + ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000 + ethtool --show-pse eth1 + + **Expected Output** after adjusting the power limit: + + .. code-block:: bash + + Clause 33 PSE Available Power Limit: 18000 + + +- **Next Steps**: + + - **PoE or PoDL Not Used**: If **PoE** or **PoDL** is not implemented or used + on the system, proceed to the next diagnostic step, as power delivery is + not relevant for this setup. + + - **PoE or PoDL Controlled Externally**: If **PoE** or **PoDL** is used but + is not managed by the Linux kernel's **PSE-PD** framework (i.e., it is + controlled by proprietary user-space software or external tools), this part + is out of scope for this documentation. Please consult vendor-specific + documentation or external tools for monitoring and managing power delivery. + + - **PSE Admin State Disabled**: + + - If the `PSE Admin State:` is **disabled**, enable it by running one of + the following commands: + + .. code-block:: bash + + ethtool --set-pse <devname> podl-pse-admin-control enable + + or, for Clause 33 PSE (PoE): + + ethtool --set-pse <devname> c33-pse-admin-control enable + + - After enabling the PSE Admin State, return to the start of the **Check + Power Delivery (PoDL or PoE)** step to recheck the power delivery status. + + - **Power Not Delivered**: If the `Power Detection Status` shows something + other than "delivering power" (e.g., `over current`), troubleshoot the + **PSE**. Check for potential issues such as a short circuit in the cable, + insufficient power delivery, or a fault in the PSE itself. + + - **Power Delivered but No Link**: If power is being delivered but no link is + established, proceed with further diagnostics by performing **Cable + Diagnostics** or reviewing the **Inspect Link Status and PHY + Configuration** steps to identify any underlying issues with the physical + link or settings. + +Cable Diagnostics +~~~~~~~~~~~~~~~~~ + +Use `ethtool` to test for physical layer issues such as cable faults. The test +results can vary depending on the cable's condition, the technology in use, and +the state of the link partner. The results from the cable test will help in +diagnosing issues like open circuits, shorts, impedance mismatches, and +noise-related problems. + +- **Command:** `ethtool --cable-test <interface>` + +The following are the typical outputs for **Single-Pair Ethernet (SPE)** and +**Multi-Pair Ethernet (MPE)**: + +- **For Single-Pair Ethernet (SPE)**: + - **Expected Output (SPE)**: + + .. code-block:: bash + + Cable test completed for device eth1. + Pair A, fault length: 25.00m + Pair A code Open Circuit + + This indicates an open circuit or cable fault at the reported distance, but + results can be influenced by the link partner's state. Refer to the + **"Troubleshooting Based on Cable Test Results"** section for further + interpretation of these results. + +- **For Multi-Pair Ethernet (MPE)**: + - **Expected Output (MPE)**: + + .. code-block:: bash + + Cable test completed for device eth0. + Pair A code OK + Pair B code OK + Pair C code Open Circuit + + Here, Pair C is reported as having an open circuit, while Pairs A and B are + functioning correctly. However, if autonegotiation is in use on Pairs A and + B, the cable test may be disrupted. Refer to the **"Troubleshooting Based on + Cable Test Results"** section for a detailed explanation of these issues and + how to resolve them. + +For detailed descriptions of the different possible cable test results, please +refer to the **"Troubleshooting Based on Cable Test Results"** section. + +Troubleshooting Based on Cable Test Results +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +After running the cable test, the results can help identify specific issues in +the physical connection. However, it is important to note that **cable testing +results heavily depend on the capabilities and characteristics of both the +local hardware and the link partner**. The accuracy and reliability of the +results can vary significantly between different hardware implementations. + +In some cases, this can introduce **blind spots** in the current cable testing +implementation, where certain results may not accurately reflect the actual +physical state of the cable. For example: + +- An **Open Circuit** result might not only indicate a damaged or disconnected + cable but also occur if the cable is properly attached to a powered-down link + partner. + +- Some PHYs may report a **Short within Pair** if the link partner is in + **forced slave mode**, even though there is no actual short in the cable. + +To help users interpret the results more effectively, it could be beneficial to +extend the **kernel UAPI** (User API) to provide additional context or +**possible variants** of issues based on the hardware’s characteristics. Since +these quirks are often hardware-specific, the **kernel driver** would be an +ideal source of such information. By providing flags or hints related to +potential false positives for each test result, users would have a better +understanding of what to verify and where to investigate further. + +Until such improvements are made, users should be aware of these limitations +and manually verify cable issues as needed. Physical inspections may help +resolve uncertainties related to false positive results. + +The results can be one of the following: + +- **OK**: + + - The cable is functioning correctly, and no issues were detected. + + - **Next Steps**: If you are still experiencing issues, it might be related + to higher-layer problems, such as duplex mismatches or speed negotiation, + which are not physical-layer issues. + + - **Special Case for `BaseT1` (1000/100/10BaseT1)**: In `BaseT1` systems, an + "OK" result typically also means that the link is up and likely in **slave + mode**, since cable tests usually only pass in this mode. For some + **10BaseT1L** PHYs, an "OK" result may occur even if the cable is too long + for the PHY's configured range (for example, when the range is configured + for short-distance mode). + +- **Open Circuit**: + + - An **Open Circuit** result typically indicates that the cable is damaged or + disconnected at the reported fault length. Consider these possibilities: + + - If the link partner is in **admin down** state or powered off, you might + still get an "Open Circuit" result even if the cable is functional. + + - **Next Steps**: Inspect the cable at the fault length for visible damage + or loose connections. Verify the link partner is powered on and in the + correct mode. + +- **Short within Pair**: + + - A **Short within Pair** indicates an unintended connection within the same + pair of wires, typically caused by physical damage to the cable. + + - **Next Steps**: Replace or repair the cable and check for any physical + damage or improperly crimped connectors. + +- **Short to Another Pair**: + + - A **Short to Another Pair** means the wires from different pairs are + shorted, which could occur due to physical damage or incorrect wiring. + + - **Next Steps**: Replace or repair the damaged cable. Inspect the cable for + incorrect terminations or pinched wiring. + +- **Impedance Mismatch**: + + - **Impedance Mismatch** indicates a reflection caused by an impedance + discontinuity in the cable. This can happen when a part of the cable has + abnormal impedance (e.g., when different cable types are spliced together + or when there is a defect in the cable). + + - **Next Steps**: Check the cable quality and ensure consistent impedance + throughout its length. Replace any sections of the cable that do not meet + specifications. + +- **Noise**: + + - **Noise** means that the Time Domain Reflectometry (TDR) test could not + complete due to excessive noise on the cable, which can be caused by + interference from electromagnetic sources. + + - **Next Steps**: Identify and eliminate sources of electromagnetic + interference (EMI) near the cable. Consider using shielded cables or + rerouting the cable away from noise sources. + +- **Resolution Not Possible**: + + - **Resolution Not Possible** means that the TDR test could not detect the + issue due to the resolution limitations of the test or because the fault is + beyond the distance that the test can measure. + + - **Next Steps**: Inspect the cable manually if possible, or use alternative + diagnostic tools that can handle greater distances or higher resolution. + +- **Unknown**: + + - An **Unknown** result may occur when the test cannot classify the fault or + when a specific issue is outside the scope of the tool's detection + capabilities. + + - **Next Steps**: Re-run the test, verify the link partner's state, and inspect + the cable manually if necessary. + +Verify Link Partner PHY Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the cable test passes but the link is still not functioning correctly, it’s +essential to verify the configuration of the link partner’s PHY. Mismatches in +speed, duplex settings, or master-slave roles can cause connection issues. + +Autonegotiation Mismatch +^^^^^^^^^^^^^^^^^^^^^^^^ + +- If both link partners support autonegotiation, ensure that autonegotiation is + enabled on both sides and that all supported link modes are advertised. A + mismatch can lead to connectivity problems or sub optimal performance. + +- **Quick Fix:** Reset autonegotiation to the default settings, which will + advertise all default link modes: + + .. code-block:: bash + + ethtool -s <interface> autoneg on + +- **Command to check configuration:** `ethtool <interface>` + +- **Expected Output:** Ensure that both sides advertise compatible link modes. + If autonegotiation is off, verify that both link partners are configured for + the same speed and duplex. + + The following example shows a case where the local PHY advertises fewer link + modes than it supports. This will reduce the number of overlapping link modes + with the link partner. In the worst case, there will be no common link modes, + and the link will not be created: + + .. code-block:: bash + + Settings for eth0: + Supported link modes: 1000baseT/Full, 100baseT/Full + Advertised link modes: 1000baseT/Full + Speed: 1000Mb/s + Duplex: Full + Auto-negotiation: on + +Combined Mode Mismatch (Autonegotiation on One Side, Forced on the Other) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- One possible issue occurs when one side is using **autonegotiation** (as in + most modern systems), and the other side is set to a **forced link mode** + (e.g., older hardware with single-speed hubs). In such cases, modern PHYs + will attempt to detect the forced mode on the other side. If the link is + established, you may notice: + + - **No or empty "Link partner advertised link modes"**. + + - **"Link partner advertised auto-negotiation:"** will be **"no"** or not + present. + +- This type of detection does not always work reliably: + + - Typically, the modern PHY will default to **Half Duplex**, even if the link + partner is actually configured for **Full Duplex**. + + - Some PHYs may not work reliably if the link partner switches from one + forced mode to another. In this case, only a down/up cycle may help. + +- **Next Steps**: Set both sides to the same fixed speed and duplex mode to + avoid potential detection issues. + + .. code-block:: bash + + ethtool -s <interface> speed 1000 duplex full autoneg off + +Master/Slave Role Mismatch (BaseT1 and 1000BaseT PHYs) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- In **BaseT1** systems (e.g., 1000BaseT1, 100BaseT1), link establishment + requires that one device is configured as **master** and the other as + **slave**. A mismatch in this master-slave configuration can prevent the link + from being established. However, **1000BaseT** also supports configurable + master/slave roles and can face similar issues. + +- **Role Preference in 1000BaseT**: The **1000BaseT** specification allows link + partners to negotiate master-slave roles or role preferences during + autonegotiation. Some PHYs have hardware limitations or bugs that prevent + them from functioning properly in certain roles. In such cases, drivers may + force these PHYs into a specific role (e.g., **forced master** or **forced + slave**) or try a weaker option by setting preferences. If both link partners + have the same issue and are forced into the same mode (e.g., both forced into + master mode), they will not be able to establish a link. + +- **Next Steps**: Ensure that one side is configured as **master** and the + other as **slave** to avoid this issue, particularly when hardware + limitations are involved, or try the weaker **preferred** option instead of + **forced**. Check for any driver-related restrictions or forced modes. + +- **Command to force master/slave mode**: + + .. code-block:: bash + + ethtool -s <interface> master-slave forced-master + + or: + + .. code-block:: bash + + ethtool -s <interface> master-slave forced-master speed 1000 duplex full autoneg off + + +- **Check the current master/slave status**: + + .. code-block:: bash + + ethtool <interface> + + Example Output: + + .. code-block:: bash + + master-slave cfg: forced-master + master-slave status: master + +- **Hardware Bugs and Driver Forcing**: If a known hardware issue forces the + PHY into a specific mode, it’s essential to check the driver source code or + hardware documentation for details. Ensure that the roles are compatible + across both link partners, and if both PHYs are forced into the same mode, + adjust one side accordingly to resolve the mismatch. + +Monitor Link Resets and Speed Drops +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the link is unstable, showing frequent resets or speed drops, this may +indicate issues with the cable, PHY configuration, or environmental factors. +While there is still no completely unified way in Linux to directly monitor +downshift events or link speed changes via user space tools, both the Linux +kernel logs and `ethtool` can provide valuable insights, especially if the +driver supports reporting such events. + +- **Monitor Kernel Logs for Link Resets and Speed Drops**: + + - The Linux kernel will print link status changes, including downshift + events, in the system logs. These messages typically include speed changes, + duplex mode, and downshifted link speed (if the driver supports it). + + - **Command to monitor kernel logs in real-time:** + + .. code-block:: bash + + dmesg -w | grep "Link is Up\|Link is Down" + + - Example Output (if a downshift occurs): + + .. code-block:: bash + + eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx + eth0: Link is Down + + This indicates that the link has been established but has downshifted from + a higher speed. + + - **Note**: Not all drivers or PHYs support downshift reporting, so you may + not see this information for all devices. + +- **Monitor Link Down Events Using `ethtool`**: + + - Starting with the latest kernel and `ethtool` versions, you can track + **Link Down Events** using the `ethtool -I` command. This will provide + counters for link drops, helping to diagnose link instability issues if + supported by the driver. + + - **Command to monitor link down events:** + + .. code-block:: bash + + ethtool -I <interface> + + - Example Output (if supported): + + .. code-block:: bash + + PSE attributes for eth1: + Link Down Events: 5 + + This indicates that the link has dropped 5 times. Frequent link down events + may indicate cable or environmental issues that require further + investigation. + +- **Check Link Status and Speed**: + + - Even though downshift counts or events are not easily tracked, you can + still use `ethtool` to manually check the current link speed and status. + + - **Command:** `ethtool <interface>` + + - **Expected Output:** + + .. code-block:: bash + + Speed: 1000Mb/s + Duplex: Full + Auto-negotiation: on + Link detected: yes + + Any inconsistencies in the expected speed or duplex setting could indicate + an issue. + +- **Disable Energy-Efficient Ethernet (EEE) for Diagnostics**: + + - **EEE** (Energy-Efficient Ethernet) can be a source of link instability due + to transitions in and out of low-power states. For diagnostic purposes, it + may be useful to **temporarily** disable EEE to determine if it is + contributing to link instability. This is **not a generic recommendation** + for disabling power management. + + - **Next Steps**: Disable EEE and monitor if the link becomes stable. If + disabling EEE resolves the issue, report the bug so that the driver can be + fixed. + + - **Command:** + + .. code-block:: bash + + ethtool --set-eee <interface> eee off + + - **Important**: If disabling EEE resolves the instability, the issue should + be reported to the maintainers as a bug, and the driver should be corrected + to handle EEE properly without causing instability. Disabling EEE + permanently should not be seen as a solution. + +- **Monitor Error Counters**: + + - While some NIC drivers and PHYs provide error counters, there is no unified + set of PHY-specific counters across all hardware. Additionally, not all + PHYs provide useful information related to errors like CRC errors, frame + drops, or link flaps. Therefore, this step is dependent on the specific + hardware and driver support. + + - **Next Steps**: Use `ethtool -S <interface>` to check if your driver + provides useful error counters. In some cases, counters may provide + information about errors like link flaps or physical layer problems (e.g., + excessive CRC errors), but results can vary significantly depending on the + PHY. + + - **Command:** `ethtool -S <interface>` + + - **Example Output (if supported)**: + + .. code-block:: bash + + rx_crc_errors: 123 + tx_errors: 45 + rx_frame_errors: 78 + + - **Note**: If no meaningful error counters are available or if counters are + not supported, you may need to rely on physical inspections (e.g., cable + condition) or kernel log messages (e.g., link up/down events) to further + diagnose the issue. + +When All Else Fails... +~~~~~~~~~~~~~~~~~~~~~~ + +So you've checked the cables, monitored the logs, disabled EEE, and still... +nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t +want to cooperate. + +But before you throw in the towel (or the Ethernet cable), take a deep breath. +It’s always possible that: + +1. Your PHY has a unique, undocumented personality. + +2. The problem is lying dormant, waiting for just the right moment to magically + resolve itself (hey, it happens!). + +3. Or, it could be that the ultimate solution simply hasn’t been invented yet. + +If none of the above bring you comfort, there’s one final step: contribute! If +you've uncovered new or unusual issues, or have creative diagnostic methods, +feel free to share your findings and extend this documentation. Together, we +can hunt down every elusive network issue - one twisted pair at a time. + +Remember: sometimes the solution is just a reboot away, but if not, it’s time to +dig deeper - or report that bug! + diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index 295563e91082..b25926071ece 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -236,6 +236,7 @@ Userspace to kernel: ``ETHTOOL_MSG_MM_GET`` get MAC merge layer state ``ETHTOOL_MSG_MM_SET`` set MAC merge layer parameters ``ETHTOOL_MSG_MODULE_FW_FLASH_ACT`` flash transceiver module firmware + ``ETHTOOL_MSG_PHY_GET`` get Ethernet PHY information ===================================== ================================= Kernel to userspace: @@ -283,6 +284,8 @@ Kernel to userspace: ``ETHTOOL_MSG_PLCA_NTF`` PLCA RS parameters ``ETHTOOL_MSG_MM_GET_REPLY`` MAC merge layer status ``ETHTOOL_MSG_MODULE_FW_FLASH_NTF`` transceiver module flash updates + ``ETHTOOL_MSG_PHY_GET_REPLY`` Ethernet PHY information + ``ETHTOOL_MSG_PHY_NTF`` Ethernet PHY information change ======================================== ================================= ``GET`` requests are sent by userspace applications to retrieve device diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 803dfc1efb75..46c178e564b3 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -14,6 +14,7 @@ Contents: can can_ucan_protocol device_drivers/index + diagnostic/index dsa/index devlink/index caif/index diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index eacf8983e230..dcbb6f6caf6d 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2170,6 +2170,12 @@ nexthop_compat_mode - BOOLEAN understands the new API, this sysctl can be disabled to achieve full performance benefits of the new API by disabling the nexthop expansion and extraneous notifications. + + Note that as a backward-compatible mode, dumping of modern features + might be incomplete or wrong. For example, resilient groups will not be + shown as such, but rather as just a list of next hops. Also weights that + do not fit into 8 bits will show incorrectly. + Default: true (backward compat mode) fib_notify_on_flag_change - INTEGER diff --git a/Documentation/networking/kapi.rst b/Documentation/networking/kapi.rst index ea55f462cefa..98682b9a13ee 100644 --- a/Documentation/networking/kapi.rst +++ b/Documentation/networking/kapi.rst @@ -104,6 +104,9 @@ Driver Support .. kernel-doc:: include/linux/netdevice.h :internal: +.. kernel-doc:: include/net/net_shaper.h + :internal: + PHY Support ----------- diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst index dfa5d549be9c..02720dd71a76 100644 --- a/Documentation/networking/napi.rst +++ b/Documentation/networking/napi.rst @@ -192,6 +192,33 @@ is reused to control the delay of the timer, while ``napi_defer_hard_irqs`` controls the number of consecutive empty polls before NAPI gives up and goes back to using hardware IRQs. +The above parameters can also be set on a per-NAPI basis using netlink via +netdev-genl. When used with netlink and configured on a per-NAPI basis, the +parameters mentioned above use hyphens instead of underscores: +``gro-flush-timeout`` and ``napi-defer-hard-irqs``. + +Per-NAPI configuration can be done programmatically in a user application +or by using a script included in the kernel source tree: +``tools/net/ynl/cli.py``. + +For example, using the script: + +.. code-block:: bash + + $ kernel-source/tools/net/ynl/cli.py \ + --spec Documentation/netlink/specs/netdev.yaml \ + --do napi-set \ + --json='{"id": 345, + "defer-hard-irqs": 111, + "gro-flush-timeout": 11111}' + +Similarly, the parameter ``irq-suspend-timeout`` can be set using netlink +via netdev-genl. There is no global sysfs parameter for this value. + +``irq-suspend-timeout`` is used to determine how long an application can +completely suspend IRQs. It is used in combination with SO_PREFER_BUSY_POLL, +which can be set on a per-epoll context basis with ``EPIOCSPARAMS`` ioctl. + .. _poll: Busy polling @@ -207,6 +234,46 @@ selected sockets or using the global ``net.core.busy_poll`` and ``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling also exists. +epoll-based busy polling +------------------------ + +It is possible to trigger packet processing directly from calls to +``epoll_wait``. In order to use this feature, a user application must ensure +all file descriptors which are added to an epoll context have the same NAPI ID. + +If the application uses a dedicated acceptor thread, the application can obtain +the NAPI ID of the incoming connection using SO_INCOMING_NAPI_ID and then +distribute that file descriptor to a worker thread. The worker thread would add +the file descriptor to its epoll context. This would ensure each worker thread +has an epoll context with FDs that have the same NAPI ID. + +Alternatively, if the application uses SO_REUSEPORT, a bpf or ebpf program can +be inserted to distribute incoming connections to threads such that each thread +is only given incoming connections with the same NAPI ID. Care must be taken to +carefully handle cases where a system may have multiple NICs. + +In order to enable busy polling, there are two choices: + +1. ``/proc/sys/net/core/busy_poll`` can be set with a time in useconds to busy + loop waiting for events. This is a system-wide setting and will cause all + epoll-based applications to busy poll when they call epoll_wait. This may + not be desirable as many applications may not have the need to busy poll. + +2. Applications using recent kernels can issue an ioctl on the epoll context + file descriptor to set (``EPIOCSPARAMS``) or get (``EPIOCGPARAMS``) ``struct + epoll_params``:, which user programs can define as follows: + +.. code-block:: c + + struct epoll_params { + uint32_t busy_poll_usecs; + uint16_t busy_poll_budget; + uint8_t prefer_busy_poll; + + /* pad the struct to a multiple of 64bits */ + uint8_t __pad; + }; + IRQ mitigation --------------- @@ -222,12 +289,111 @@ Such applications can pledge to the kernel that they will perform a busy polling operation periodically, and the driver should keep the device IRQs permanently masked. This mode is enabled by using the ``SO_PREFER_BUSY_POLL`` socket option. To avoid system misbehavior the pledge is revoked -if ``gro_flush_timeout`` passes without any busy poll call. +if ``gro_flush_timeout`` passes without any busy poll call. For epoll-based +busy polling applications, the ``prefer_busy_poll`` field of ``struct +epoll_params`` can be set to 1 and the ``EPIOCSPARAMS`` ioctl can be issued to +enable this mode. See the above section for more details. The NAPI budget for busy polling is lower than the default (which makes sense given the low latency intention of normal busy polling). This is not the case with IRQ mitigation, however, so the budget can be adjusted -with the ``SO_BUSY_POLL_BUDGET`` socket option. +with the ``SO_BUSY_POLL_BUDGET`` socket option. For epoll-based busy polling +applications, the ``busy_poll_budget`` field can be adjusted to the desired value +in ``struct epoll_params`` and set on a specific epoll context using the ``EPIOCSPARAMS`` +ioctl. See the above section for more details. + +It is important to note that choosing a large value for ``gro_flush_timeout`` +will defer IRQs to allow for better batch processing, but will induce latency +when the system is not fully loaded. Choosing a small value for +``gro_flush_timeout`` can cause interference of the user application which is +attempting to busy poll by device IRQs and softirq processing. This value +should be chosen carefully with these tradeoffs in mind. epoll-based busy +polling applications may be able to mitigate how much user processing happens +by choosing an appropriate value for ``maxevents``. + +Users may want to consider an alternate approach, IRQ suspension, to help deal +with these tradeoffs. + +IRQ suspension +-------------- + +IRQ suspension is a mechanism wherein device IRQs are masked while epoll +triggers NAPI packet processing. + +While application calls to epoll_wait successfully retrieve events, the kernel will +defer the IRQ suspension timer. If the kernel does not retrieve any events +while busy polling (for example, because network traffic levels subsided), IRQ +suspension is disabled and the IRQ mitigation strategies described above are +engaged. + +This allows users to balance CPU consumption with network processing +efficiency. + +To use this mechanism: + + 1. The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the + maximum time (in nanoseconds) the application can have its IRQs + suspended. This is done using netlink, as described above. This timeout + serves as a safety mechanism to restart IRQ driver interrupt processing if + the application has stalled. This value should be chosen so that it covers + the amount of time the user application needs to process data from its + call to epoll_wait, noting that applications can control how much data + they retrieve by setting ``max_events`` when calling epoll_wait. + + 2. The sysfs parameter or per-NAPI config parameters ``gro_flush_timeout`` + and ``napi_defer_hard_irqs`` can be set to low values. They will be used + to defer IRQs after busy poll has found no data. + + 3. The ``prefer_busy_poll`` flag must be set to true. This can be done using + the ``EPIOCSPARAMS`` ioctl as described above. + + 4. The application uses epoll as described above to trigger NAPI packet + processing. + +As mentioned above, as long as subsequent calls to epoll_wait return events to +userland, the ``irq-suspend-timeout`` is deferred and IRQs are disabled. This +allows the application to process data without interference. + +Once a call to epoll_wait results in no events being found, IRQ suspension is +automatically disabled and the ``gro_flush_timeout`` and +``napi_defer_hard_irqs`` mitigation mechanisms take over. + +It is expected that ``irq-suspend-timeout`` will be set to a value much larger +than ``gro_flush_timeout`` as ``irq-suspend-timeout`` should suspend IRQs for +the duration of one userland processing cycle. + +While it is not stricly necessary to use ``napi_defer_hard_irqs`` and +``gro_flush_timeout`` to use IRQ suspension, their use is strongly +recommended. + +IRQ suspension causes the system to alternate between polling mode and +irq-driven packet delivery. During busy periods, ``irq-suspend-timeout`` +overrides ``gro_flush_timeout`` and keeps the system busy polling, but when +epoll finds no events, the setting of ``gro_flush_timeout`` and +``napi_defer_hard_irqs`` determine the next step. + +There are essentially three possible loops for network processing and +packet delivery: + +1) hardirq -> softirq -> napi poll; basic interrupt delivery +2) timer -> softirq -> napi poll; deferred irq processing +3) epoll -> busy-poll -> napi poll; busy looping + +Loop 2 can take control from Loop 1, if ``gro_flush_timeout`` and +``napi_defer_hard_irqs`` are set. + +If ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` are set, Loops 2 +and 3 "wrestle" with each other for control. + +During busy periods, ``irq-suspend-timeout`` is used as timer in Loop 2, +which essentially tilts network processing in favour of Loop 3. + +If ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` are not set, Loop 3 +cannot take control from Loop 1. + +Therefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is +the recommended usage, because otherwise setting ``irq-suspend-timeout`` +might not have any discernible effect. .. _threaded: diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rst b/Documentation/networking/net_cachelines/inet_connection_sock.rst index 7a911dc95652..4a15627fc93b 100644 --- a/Documentation/networking/net_cachelines/inet_connection_sock.rst +++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst @@ -5,46 +5,48 @@ inet_connection_sock struct fast path usage breakdown ===================================================== +=================================== ====================== =================== =================== ======================================================================================================================================================== Type Name fastpath_tx_access fastpath_rx_access comment -..struct ..inet_connection_sock -struct_inet_sock icsk_inet read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data -struct_request_sock_queue icsk_accept_queue - - -struct_inet_bind_bucket icsk_bind_hash read_mostly - tcp_set_state -struct_inet_bind2_bucket icsk_bind2_hash read_mostly - tcp_set_state,inet_put_port -unsigned_long icsk_timeout read_mostly - inet_csk_reset_xmit_timer,tcp_connect -struct_timer_list icsk_retransmit_timer read_mostly - inet_csk_reset_xmit_timer,tcp_connect -struct_timer_list icsk_delack_timer read_mostly - inet_csk_reset_xmit_timer,tcp_connect -u32 icsk_rto read_write - tcp_cwnd_validate,tcp_schedule_loss_probe,tcp_connect_init,tcp_connect,tcp_write_xmit,tcp_push_one -u32 icsk_rto_min - - -u32 icsk_delack_max - - -u32 icsk_pmtu_cookie read_write - tcp_sync_mss,tcp_current_mss,tcp_send_syn_data,tcp_connect_init,tcp_connect -struct_tcp_congestion_ops icsk_ca_ops read_write - tcp_cwnd_validate,tcp_tso_segs,tcp_ca_dst_init,tcp_connect_init,tcp_connect,tcp_write_xmit -struct_inet_connection_sock_af_ops icsk_af_ops read_mostly - tcp_finish_connect,tcp_send_syn_data,tcp_mtup_init,tcp_mtu_check_reprobe,tcp_mtu_probe,tcp_connect_init,tcp_connect,__tcp_transmit_skb -struct_tcp_ulp_ops* icsk_ulp_ops - - -void* icsk_ulp_data - - -u8:5 icsk_ca_state read_write - tcp_cwnd_application_limited,tcp_set_ca_state,tcp_enter_cwr,tcp_tso_should_defer,tcp_mtu_probe,tcp_schedule_loss_probe,tcp_write_xmit,__tcp_transmit_skb -u8:1 icsk_ca_initialized read_write - tcp_init_transfer,tcp_init_congestion_control,tcp_init_transfer,tcp_finish_connect,tcp_connect -u8:1 icsk_ca_setsockopt - - -u8:1 icsk_ca_dst_locked write_mostly - tcp_ca_dst_init,tcp_connect_init,tcp_connect -u8 icsk_retransmits write_mostly - tcp_connect_init,tcp_connect -u8 icsk_pending read_write - inet_csk_reset_xmit_timer,tcp_connect,tcp_check_probe_timer,__tcp_push_pending_frames,tcp_rearm_rto,tcp_event_new_data_sent,tcp_event_new_data_sent -u8 icsk_backoff write_mostly - tcp_write_queue_purge,tcp_connect_init -u8 icsk_syn_retries - - -u8 icsk_probes_out - - -u16 icsk_ext_hdr_len read_mostly - __tcp_mtu_to_mss,tcp_mtu_to_rss,tcp_mtu_probe,tcp_write_xmit,tcp_mtu_to_mss, -struct_icsk_ack_u8 pending read_write read_write inet_csk_ack_scheduled,__tcp_cleanup_rbuf,tcp_cleanup_rbuf,inet_csk_clear_xmit_timer,tcp_event_ack-sent,inet_csk_reset_xmit_timer -struct_icsk_ack_u8 quick read_write write_mostly tcp_dec_quickack_mode,tcp_event_ack_sent,__tcp_transmit_skb,__tcp_select_window,__tcp_cleanup_rbuf -struct_icsk_ack_u8 pingpong - - -struct_icsk_ack_u8 retry write_mostly read_write inet_csk_clear_xmit_timer,tcp_rearm_rto,tcp_event_new_data_sent,tcp_write_xmit,__tcp_send_ack,tcp_send_ack, -struct_icsk_ack_u8 ato read_mostly write_mostly tcp_dec_quickack_mode,tcp_event_ack_sent,__tcp_transmit_skb,__tcp_send_ack,tcp_send_ack -struct_icsk_ack_unsigned_long timeout read_write read_write inet_csk_reset_xmit_timer,tcp_connect -struct_icsk_ack_u32 lrcvtime read_write - tcp_finish_connect,tcp_connect,tcp_event_data_sent,__tcp_transmit_skb -struct_icsk_ack_u16 rcv_mss write_mostly read_mostly __tcp_select_window,__tcp_cleanup_rbuf,tcp_initialize_rcv_mss,tcp_connect_init -struct_icsk_mtup_int search_high read_write - tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_check_reprobe,tcp_write_xmit -struct_icsk_mtup_int search_low read_write - tcp_mtu_probe,tcp_mtu_check_reprobe,tcp_write_xmit,tcp_sync_mss,tcp_connect_init,tcp_mtup_init -struct_icsk_mtup_u32:31 probe_size read_write - tcp_mtup_init,tcp_connect_init,__tcp_transmit_skb -struct_icsk_mtup_u32:1 enabled read_write - tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_probe,tcp_write_xmit -struct_icsk_mtup_u32 probe_timestamp read_write - tcp_mtup_init,tcp_connect_init,tcp_mtu_check_reprobe,tcp_mtu_probe -u32 icsk_probes_tstamp - - -u32 icsk_user_timeout - - -u64[104/sizeof(u64)] icsk_ca_priv - - +=================================== ====================== =================== =================== ======================================================================================================================================================== +struct inet_sock icsk_inet read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data +struct request_sock_queue icsk_accept_queue +struct inet_bind_bucket icsk_bind_hash read_mostly tcp_set_state +struct inet_bind2_bucket icsk_bind2_hash read_mostly tcp_set_state,inet_put_port +unsigned_long icsk_timeout read_mostly inet_csk_reset_xmit_timer,tcp_connect +struct timer_list icsk_retransmit_timer read_mostly inet_csk_reset_xmit_timer,tcp_connect +struct timer_list icsk_delack_timer read_mostly inet_csk_reset_xmit_timer,tcp_connect +u32 icsk_rto read_write tcp_cwnd_validate,tcp_schedule_loss_probe,tcp_connect_init,tcp_connect,tcp_write_xmit,tcp_push_one +u32 icsk_rto_min +u32 icsk_delack_max +u32 icsk_pmtu_cookie read_write tcp_sync_mss,tcp_current_mss,tcp_send_syn_data,tcp_connect_init,tcp_connect +struct tcp_congestion_ops icsk_ca_ops read_write tcp_cwnd_validate,tcp_tso_segs,tcp_ca_dst_init,tcp_connect_init,tcp_connect,tcp_write_xmit +struct inet_connection_sock_af_ops icsk_af_ops read_mostly tcp_finish_connect,tcp_send_syn_data,tcp_mtup_init,tcp_mtu_check_reprobe,tcp_mtu_probe,tcp_connect_init,tcp_connect,__tcp_transmit_skb +struct tcp_ulp_ops* icsk_ulp_ops +void* icsk_ulp_data +u8:5 icsk_ca_state read_write tcp_cwnd_application_limited,tcp_set_ca_state,tcp_enter_cwr,tcp_tso_should_defer,tcp_mtu_probe,tcp_schedule_loss_probe,tcp_write_xmit,__tcp_transmit_skb +u8:1 icsk_ca_initialized read_write tcp_init_transfer,tcp_init_congestion_control,tcp_init_transfer,tcp_finish_connect,tcp_connect +u8:1 icsk_ca_setsockopt +u8:1 icsk_ca_dst_locked write_mostly tcp_ca_dst_init,tcp_connect_init,tcp_connect +u8 icsk_retransmits write_mostly tcp_connect_init,tcp_connect +u8 icsk_pending read_write inet_csk_reset_xmit_timer,tcp_connect,tcp_check_probe_timer,__tcp_push_pending_frames,tcp_rearm_rto,tcp_event_new_data_sent,tcp_event_new_data_sent +u8 icsk_backoff write_mostly tcp_write_queue_purge,tcp_connect_init +u8 icsk_syn_retries +u8 icsk_probes_out +u16 icsk_ext_hdr_len read_mostly __tcp_mtu_to_mss,tcp_mtu_to_rss,tcp_mtu_probe,tcp_write_xmit,tcp_mtu_to_mss, +struct icsk_ack_u8 pending read_write read_write inet_csk_ack_scheduled,__tcp_cleanup_rbuf,tcp_cleanup_rbuf,inet_csk_clear_xmit_timer,tcp_event_ack-sent,inet_csk_reset_xmit_timer +struct icsk_ack_u8 quick read_write write_mostly tcp_dec_quickack_mode,tcp_event_ack_sent,__tcp_transmit_skb,__tcp_select_window,__tcp_cleanup_rbuf +struct icsk_ack_u8 pingpong +struct icsk_ack_u8 retry write_mostly read_write inet_csk_clear_xmit_timer,tcp_rearm_rto,tcp_event_new_data_sent,tcp_write_xmit,__tcp_send_ack,tcp_send_ack, +struct icsk_ack_u8 ato read_mostly write_mostly tcp_dec_quickack_mode,tcp_event_ack_sent,__tcp_transmit_skb,__tcp_send_ack,tcp_send_ack +struct icsk_ack_unsigned_long timeout read_write read_write inet_csk_reset_xmit_timer,tcp_connect +struct icsk_ack_u32 lrcvtime read_write tcp_finish_connect,tcp_connect,tcp_event_data_sent,__tcp_transmit_skb +struct icsk_ack_u16 rcv_mss write_mostly read_mostly __tcp_select_window,__tcp_cleanup_rbuf,tcp_initialize_rcv_mss,tcp_connect_init +struct icsk_mtup_int search_high read_write tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_check_reprobe,tcp_write_xmit +struct icsk_mtup_int search_low read_write tcp_mtu_probe,tcp_mtu_check_reprobe,tcp_write_xmit,tcp_sync_mss,tcp_connect_init,tcp_mtup_init +struct icsk_mtup_u32:31 probe_size read_write tcp_mtup_init,tcp_connect_init,__tcp_transmit_skb +struct icsk_mtup_u32:1 enabled read_write tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_probe,tcp_write_xmit +struct icsk_mtup_u32 probe_timestamp read_write tcp_mtup_init,tcp_connect_init,tcp_mtu_check_reprobe,tcp_mtu_probe +u32 icsk_probes_tstamp +u32 icsk_user_timeout +u64[104/sizeof(u64)] icsk_ca_priv +=================================== ====================== =================== =================== ======================================================================================================================================================== diff --git a/Documentation/networking/net_cachelines/inet_sock.rst b/Documentation/networking/net_cachelines/inet_sock.rst index 595d7ef5fc8b..b11bf48fa2b3 100644 --- a/Documentation/networking/net_cachelines/inet_sock.rst +++ b/Documentation/networking/net_cachelines/inet_sock.rst @@ -5,40 +5,42 @@ inet_sock struct fast path usage breakdown ========================================== +======================= ===================== =================== =================== ====================================================================================================== Type Name fastpath_tx_access fastpath_rx_access comment -..struct ..inet_sock -struct_sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data -struct_ipv6_pinfo* pinet6 - - -be16 inet_sport read_mostly - __tcp_transmit_skb -be32 inet_daddr read_mostly - ip_select_ident_segs -be32 inet_rcv_saddr - - -be16 inet_dport read_mostly - __tcp_transmit_skb -u16 inet_num - - -be32 inet_saddr - - -s16 uc_ttl read_mostly - __ip_queue_xmit/ip_select_ttl -u16 cmsg_flags - - -struct_ip_options_rcu* inet_opt read_mostly - __ip_queue_xmit -u16 inet_id read_mostly - ip_select_ident_segs -u8 tos read_mostly - ip_queue_xmit -u8 min_ttl - - -u8 mc_ttl - - -u8 pmtudisc - - -u8:1 recverr - - -u8:1 is_icsk - - -u8:1 freebind - - -u8:1 hdrincl - - -u8:1 mc_loop - - -u8:1 transparent - - -u8:1 mc_all - - -u8:1 nodefrag - - -u8:1 bind_address_no_port - - -u8:1 recverr_rfc4884 - - -u8:1 defer_connect read_mostly - tcp_sendmsg_fastopen -u8 rcv_tos - - -u8 convert_csum - - -int uc_index - - -int mc_index - - -be32 mc_addr - - -struct_ip_mc_socklist* mc_list - - -struct_inet_cork_full cork read_mostly - __tcp_transmit_skb -struct local_port_range - - +======================= ===================== =================== =================== ====================================================================================================== +struct sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data +struct ipv6_pinfo* pinet6 +be16 inet_sport read_mostly __tcp_transmit_skb +be32 inet_daddr read_mostly ip_select_ident_segs +be32 inet_rcv_saddr +be16 inet_dport read_mostly __tcp_transmit_skb +u16 inet_num +be32 inet_saddr +s16 uc_ttl read_mostly __ip_queue_xmit/ip_select_ttl +u16 cmsg_flags +struct ip_options_rcu* inet_opt read_mostly __ip_queue_xmit +u16 inet_id read_mostly ip_select_ident_segs +u8 tos read_mostly ip_queue_xmit +u8 min_ttl +u8 mc_ttl +u8 pmtudisc +u8:1 recverr +u8:1 is_icsk +u8:1 freebind +u8:1 hdrincl +u8:1 mc_loop +u8:1 transparent +u8:1 mc_all +u8:1 nodefrag +u8:1 bind_address_no_port +u8:1 recverr_rfc4884 +u8:1 defer_connect read_mostly tcp_sendmsg_fastopen +u8 rcv_tos +u8 convert_csum +int uc_index +int mc_index +be32 mc_addr +struct ip_mc_socklist* mc_list +struct inet_cork_full cork read_mostly __tcp_transmit_skb +struct local_port_range +======================= ===================== =================== =================== ====================================================================================================== diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst index 22b07c814f4a..15e31ece675f 100644 --- a/Documentation/networking/net_cachelines/net_device.rst +++ b/Documentation/networking/net_cachelines/net_device.rst @@ -5,181 +5,188 @@ net_device struct fast path usage breakdown =========================================== -Type Name fastpath_tx_access fastpath_rx_access Comments -..struct ..net_device -unsigned_long:32 priv_flags read_mostly - __dev_queue_xmit(tx) -unsigned_long:1 lltx read_mostly - HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx) -char name[16] - - -struct_netdev_name_node* name_node -struct_dev_ifalias* ifalias -unsigned_long mem_end -unsigned_long mem_start -unsigned_long base_addr -unsigned_long state read_mostly read_mostly netif_running(dev) -struct_list_head dev_list -struct_list_head napi_list -struct_list_head unreg_list -struct_list_head close_list -struct_list_head ptype_all read_mostly - dev_nit_active(tx) -struct_list_head ptype_specific read_mostly deliver_ptype_list_skb/__netif_receive_skb_core(rx) -struct adj_list -unsigned_int flags read_mostly read_mostly __dev_queue_xmit,__dev_xmit_skb,ip6_output,__ip6_finish_output(tx);ip6_rcv_core(rx) -xdp_features_t xdp_features -struct_net_device_ops* netdev_ops read_mostly - netdev_core_pick_tx,netdev_start_xmit(tx) -struct_xdp_metadata_ops* xdp_metadata_ops -int ifindex - read_mostly ip6_rcv_core -unsigned_short gflags -unsigned_short hard_header_len read_mostly read_mostly ip6_xmit(tx);gro_list_prepare(rx) -unsigned_int mtu read_mostly - ip_finish_output2 -unsigned_short needed_headroom read_mostly - LL_RESERVED_SPACE/ip_finish_output2 -unsigned_short needed_tailroom -netdev_features_t features read_mostly read_mostly HARD_TX_LOCK,netif_skb_features,sk_setup_caps(tx);netif_elide_gro(rx) -netdev_features_t hw_features -netdev_features_t wanted_features -netdev_features_t vlan_features -netdev_features_t hw_enc_features - - netif_skb_features -netdev_features_t mpls_features -netdev_features_t gso_partial_features read_mostly gso_features_check -unsigned_int min_mtu -unsigned_int max_mtu -unsigned_short type -unsigned_char min_header_len -unsigned_char name_assign_type -int group -struct_net_device_stats stats -struct_net_device_core_stats* core_stats -atomic_t carrier_up_count -atomic_t carrier_down_count -struct_iw_handler_def* wireless_handlers -struct_iw_public_data* wireless_data -struct_ethtool_ops* ethtool_ops -struct_l3mdev_ops* l3mdev_ops -struct_ndisc_ops* ndisc_ops -struct_xfrmdev_ops* xfrmdev_ops -struct_tlsdev_ops* tlsdev_ops -struct_header_ops* header_ops read_mostly - ip_finish_output2,ip6_finish_output2(tx) -unsigned_char operstate -unsigned_char link_mode -unsigned_char if_port -unsigned_char dma -unsigned_char perm_addr[32] -unsigned_char addr_assign_type -unsigned_char addr_len -unsigned_char upper_level -unsigned_char lower_level -unsigned_short neigh_priv_len -unsigned_short padded -unsigned_short dev_id -unsigned_short dev_port -spinlock_t addr_list_lock -int irq -struct_netdev_hw_addr_list uc -struct_netdev_hw_addr_list mc -struct_netdev_hw_addr_list dev_addrs -struct_kset* queues_kset -struct_list_head unlink_list -unsigned_int promiscuity -unsigned_int allmulti -bool uc_promisc -unsigned_char nested_level -struct_in_device* ip_ptr read_mostly read_mostly __in_dev_get -struct_inet6_dev* ip6_ptr read_mostly read_mostly __in6_dev_get -struct_vlan_info* vlan_info -struct_dsa_port* dsa_ptr -struct_tipc_bearer* tipc_ptr -void* atalk_ptr -void* ax25_ptr -struct_wireless_dev* ieee80211_ptr -struct_wpan_dev* ieee802154_ptr -struct_mpls_dev* mpls_ptr -struct_mctp_dev* mctp_ptr -unsigned_char* dev_addr -struct_netdev_queue* _rx read_mostly - netdev_get_rx_queue(rx) -unsigned_int num_rx_queues -unsigned_int real_num_rx_queues - read_mostly get_rps_cpu -struct_bpf_prog* xdp_prog - read_mostly netif_elide_gro() -unsigned_long gro_flush_timeout - read_mostly napi_complete_done -u32 napi_defer_hard_irqs - read_mostly napi_complete_done -unsigned_int gro_max_size - read_mostly skb_gro_receive -unsigned_int gro_ipv4_max_size - read_mostly skb_gro_receive -rx_handler_func_t* rx_handler read_mostly - __netif_receive_skb_core -void* rx_handler_data read_mostly - -struct_netdev_queue* ingress_queue read_mostly - -struct_bpf_mprog_entry tcx_ingress - read_mostly sch_handle_ingress -struct_nf_hook_entries* nf_hooks_ingress -unsigned_char broadcast[32] -struct_cpu_rmap* rx_cpu_rmap -struct_hlist_node index_hlist -struct_netdev_queue* _tx read_mostly - netdev_get_tx_queue(tx) -unsigned_int num_tx_queues - - -unsigned_int real_num_tx_queues read_mostly - skb_tx_hash,netdev_core_pick_tx(tx) -unsigned_int tx_queue_len -spinlock_t tx_global_lock -struct_xdp_dev_bulk_queue__percpu* xdp_bulkq -struct_xps_dev_maps* xps_maps[2] read_mostly - __netif_set_xps_queue -struct_bpf_mprog_entry tcx_egress read_mostly - sch_handle_egress -struct_nf_hook_entries* nf_hooks_egress read_mostly - -struct_hlist_head qdisc_hash[16] -struct_timer_list watchdog_timer -int watchdog_timeo -u32 proto_down_reason -struct_list_head todo_list -int__percpu* pcpu_refcnt -refcount_t dev_refcnt -struct_ref_tracker_dir refcnt_tracker -struct_list_head link_watch_list -enum:8 reg_state -bool dismantle -enum:16 rtnl_link_state -bool needs_free_netdev -void*priv_destructor struct_net_device -struct_netpoll_info* npinfo - read_mostly napi_poll/napi_poll_lock -possible_net_t nd_net - read_mostly (dev_net)napi_busy_loop,tcp_v(4/6)_rcv,ip(v6)_rcv,ip(6)_input,ip(6)_input_finish -void* ml_priv -enum_netdev_ml_priv_type ml_priv_type -struct_pcpu_lstats__percpu* lstats read_mostly dev_lstats_add() -struct_pcpu_sw_netstats__percpu* tstats read_mostly dev_sw_netstats_tx_add() -struct_pcpu_dstats__percpu* dstats -struct_garp_port* garp_port -struct_mrp_port* mrp_port -struct_dm_hw_stat_delta* dm_private -struct_device dev - - -struct_attribute_group* sysfs_groups[4] -struct_attribute_group* sysfs_rx_queue_group -struct_rtnl_link_ops* rtnl_link_ops -unsigned_int gso_max_size read_mostly - sk_dst_gso_max_size -unsigned_int tso_max_size -u16 gso_max_segs read_mostly - gso_max_segs -u16 tso_max_segs -unsigned_int gso_ipv4_max_size read_mostly - sk_dst_gso_max_size -struct_dcbnl_rtnl_ops* dcbnl_ops -s16 num_tc read_mostly - skb_tx_hash -struct_netdev_tc_txq tc_to_txq[16] read_mostly - skb_tx_hash -u8 prio_tc_map[16] -unsigned_int fcoe_ddp_xid -struct_netprio_map* priomap -struct_phy_device* phydev -struct_sfp_bus* sfp_bus -struct_lock_class_key* qdisc_tx_busylock -bool proto_down -unsigned:1 wol_enabled -unsigned:1 threaded - - napi_poll(napi_enable,dev_set_threaded) -unsigned_long:1 see_all_hwtstamp_requests -unsigned_long:1 change_proto_down -unsigned_long:1 netns_local -unsigned_long:1 fcoe_mtu -struct_list_head net_notifier_list -struct_macsec_ops* macsec_ops -struct_udp_tunnel_nic_info* udp_tunnel_nic_info -struct_udp_tunnel_nic* udp_tunnel_nic -unsigned_int xdp_zc_max_segs -struct_bpf_xdp_entity xdp_state[3] -u8 dev_addr_shadow[32] -netdevice_tracker linkwatch_dev_tracker -netdevice_tracker watchdog_dev_tracker -netdevice_tracker dev_registered_tracker -struct_rtnl_hw_stats64* offload_xstats_l3 -struct_devlink_port* devlink_port -struct_dpll_pin* dpll_pin +=================================== =========================== =================== =================== =================================================================================== +Type Name fastpath_tx_access fastpath_rx_access Comments +=================================== =========================== =================== =================== =================================================================================== +unsigned_long:32 priv_flags read_mostly __dev_queue_xmit(tx) +unsigned_long:1 lltx read_mostly HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx) +char name[16] +struct netdev_name_node* name_node +struct dev_ifalias* ifalias +unsigned_long mem_end +unsigned_long mem_start +unsigned_long base_addr +unsigned_long state read_mostly read_mostly netif_running(dev) +struct list_head dev_list +struct list_head napi_list +struct list_head unreg_list +struct list_head close_list +struct list_head ptype_all read_mostly dev_nit_active(tx) +struct list_head ptype_specific read_mostly deliver_ptype_list_skb/__netif_receive_skb_core(rx) +struct adj_list +unsigned_int flags read_mostly read_mostly __dev_queue_xmit,__dev_xmit_skb,ip6_output,__ip6_finish_output(tx);ip6_rcv_core(rx) +xdp_features_t xdp_features +struct net_device_ops* netdev_ops read_mostly netdev_core_pick_tx,netdev_start_xmit(tx) +struct xdp_metadata_ops* xdp_metadata_ops +int ifindex read_mostly ip6_rcv_core +unsigned_short gflags +unsigned_short hard_header_len read_mostly read_mostly ip6_xmit(tx);gro_list_prepare(rx) +unsigned_int mtu read_mostly ip_finish_output2 +unsigned_short needed_headroom read_mostly LL_RESERVED_SPACE/ip_finish_output2 +unsigned_short needed_tailroom +netdev_features_t features read_mostly read_mostly HARD_TX_LOCK,netif_skb_features,sk_setup_caps(tx);netif_elide_gro(rx) +netdev_features_t hw_features +netdev_features_t wanted_features +netdev_features_t vlan_features +netdev_features_t hw_enc_features netif_skb_features +netdev_features_t mpls_features +netdev_features_t gso_partial_features read_mostly gso_features_check +unsigned_int min_mtu +unsigned_int max_mtu +unsigned_short type +unsigned_char min_header_len +unsigned_char name_assign_type +int group +struct net_device_stats stats +struct net_device_core_stats* core_stats +atomic_t carrier_up_count +atomic_t carrier_down_count +struct iw_handler_def* wireless_handlers +struct ethtool_ops* ethtool_ops +struct l3mdev_ops* l3mdev_ops +struct ndisc_ops* ndisc_ops +struct xfrmdev_ops* xfrmdev_ops +struct tlsdev_ops* tlsdev_ops +struct header_ops* header_ops read_mostly ip_finish_output2,ip6_finish_output2(tx) +unsigned_char operstate +unsigned_char link_mode +unsigned_char if_port +unsigned_char dma +unsigned_char perm_addr[32] +unsigned_char addr_assign_type +unsigned_char addr_len +unsigned_char upper_level +unsigned_char lower_level +unsigned_short neigh_priv_len +unsigned_short padded +unsigned_short dev_id +unsigned_short dev_port +spinlock_t addr_list_lock +int irq +struct netdev_hw_addr_list uc +struct netdev_hw_addr_list mc +struct netdev_hw_addr_list dev_addrs +struct kset* queues_kset +struct list_head unlink_list +unsigned_int promiscuity +unsigned_int allmulti +bool uc_promisc +unsigned_char nested_level +struct in_device* ip_ptr read_mostly read_mostly __in_dev_get +struct hlist_head fib_nh_head +struct inet6_dev* ip6_ptr read_mostly read_mostly __in6_dev_get +struct vlan_info* vlan_info +struct dsa_port* dsa_ptr +struct tipc_bearer* tipc_ptr +void* atalk_ptr +void* ax25_ptr +struct wireless_dev* ieee80211_ptr +struct wpan_dev* ieee802154_ptr +struct mpls_dev* mpls_ptr +struct mctp_dev* mctp_ptr +unsigned_char* dev_addr +struct netdev_queue* _rx read_mostly netdev_get_rx_queue(rx) +unsigned_int num_rx_queues +unsigned_int real_num_rx_queues read_mostly get_rps_cpu +struct bpf_prog* xdp_prog read_mostly netif_elide_gro() +unsigned_long gro_flush_timeout read_mostly napi_complete_done +u32 napi_defer_hard_irqs read_mostly napi_complete_done +unsigned_int gro_max_size read_mostly skb_gro_receive +unsigned_int gro_ipv4_max_size read_mostly skb_gro_receive +rx_handler_func_t* rx_handler read_mostly __netif_receive_skb_core +void* rx_handler_data read_mostly +struct netdev_queue* ingress_queue read_mostly +struct bpf_mprog_entry tcx_ingress read_mostly sch_handle_ingress +struct nf_hook_entries* nf_hooks_ingress +unsigned_char broadcast[32] +struct cpu_rmap* rx_cpu_rmap +struct hlist_node index_hlist +struct netdev_queue* _tx read_mostly netdev_get_tx_queue(tx) +unsigned_int num_tx_queues +unsigned_int real_num_tx_queues read_mostly skb_tx_hash,netdev_core_pick_tx(tx) +unsigned_int tx_queue_len +spinlock_t tx_global_lock +struct xdp_dev_bulk_queue__percpu* xdp_bulkq +struct xps_dev_maps* xps_maps[2] read_mostly __netif_set_xps_queue +struct bpf_mprog_entry tcx_egress read_mostly sch_handle_egress +struct nf_hook_entries* nf_hooks_egress read_mostly +struct hlist_head qdisc_hash[16] +struct timer_list watchdog_timer +int watchdog_timeo +u32 proto_down_reason +struct list_head todo_list +int__percpu* pcpu_refcnt +refcount_t dev_refcnt +struct ref_tracker_dir refcnt_tracker +struct list_head link_watch_list +enum:8 reg_state +bool dismantle +enum:16 rtnl_link_state +bool needs_free_netdev +void*priv_destructor struct net_device +struct netpoll_info* npinfo read_mostly napi_poll/napi_poll_lock +possible_net_t nd_net read_mostly (dev_net)napi_busy_loop,tcp_v(4/6)_rcv,ip(v6)_rcv,ip(6)_input,ip(6)_input_finish +void* ml_priv +enum_netdev_ml_priv_type ml_priv_type +struct pcpu_lstats__percpu* lstats read_mostly dev_lstats_add() +struct pcpu_sw_netstats__percpu* tstats read_mostly dev_sw_netstats_tx_add() +struct pcpu_dstats__percpu* dstats +struct garp_port* garp_port +struct mrp_port* mrp_port +struct dm_hw_stat_delta* dm_private +struct device dev +struct attribute_group* sysfs_groups[4] +struct attribute_group* sysfs_rx_queue_group +struct rtnl_link_ops* rtnl_link_ops +unsigned_int gso_max_size read_mostly sk_dst_gso_max_size +unsigned_int tso_max_size +u16 gso_max_segs read_mostly gso_max_segs +u16 tso_max_segs +unsigned_int gso_ipv4_max_size read_mostly sk_dst_gso_max_size +struct dcbnl_rtnl_ops* dcbnl_ops +s16 num_tc read_mostly skb_tx_hash +struct netdev_tc_txq tc_to_txq[16] read_mostly skb_tx_hash +u8 prio_tc_map[16] +unsigned_int fcoe_ddp_xid +struct netprio_map* priomap +struct phy_device* phydev +struct sfp_bus* sfp_bus +struct lock_class_key* qdisc_tx_busylock +bool proto_down +unsigned:1 wol_enabled +unsigned:1 threaded napi_poll(napi_enable,dev_set_threaded) +unsigned_long:1 see_all_hwtstamp_requests +unsigned_long:1 change_proto_down +unsigned_long:1 netns_local +unsigned_long:1 fcoe_mtu +struct list_head net_notifier_list +struct macsec_ops* macsec_ops +struct udp_tunnel_nic_info* udp_tunnel_nic_info +struct udp_tunnel_nic* udp_tunnel_nic +unsigned_int xdp_zc_max_segs +struct bpf_xdp_entity xdp_state[3] +u8 dev_addr_shadow[32] +netdevice_tracker linkwatch_dev_tracker +netdevice_tracker watchdog_dev_tracker +netdevice_tracker dev_registered_tracker +struct rtnl_hw_stats64* offload_xstats_l3 +struct devlink_port* devlink_port +struct dpll_pin* dpll_pin struct hlist_head page_pools struct dim_irq_moder* irq_moder +u64 max_pacing_offload_horizon +struct_napi_config* napi_config +unsigned_long gro_flush_timeout +u32 napi_defer_hard_irqs +struct hlist_head neighbours[2] +=================================== =========================== =================== =================== =================================================================================== diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst index 9b87089a84c6..629da6dc6d74 100644 --- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst +++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst @@ -5,154 +5,156 @@ netns_ipv4 struct fast path usage breakdown =========================================== +=============================== ============================================ =================== =================== ================================================= Type Name fastpath_tx_access fastpath_rx_access comment -..struct ..netns_ipv4 -struct_inet_timewait_death_row tcp_death_row -struct_udp_table* udp_table -struct_ctl_table_header* forw_hdr -struct_ctl_table_header* frags_hdr -struct_ctl_table_header* ipv4_hdr -struct_ctl_table_header* route_hdr -struct_ctl_table_header* xfrm4_hdr -struct_ipv4_devconf* devconf_all -struct_ipv4_devconf* devconf_dflt -struct_ip_ra_chain ra_chain -struct_mutex ra_mutex -struct_fib_rules_ops* rules_ops -struct_fib_table fib_main -struct_fib_table fib_default -unsigned_int fib_rules_require_fldissect -bool fib_has_custom_rules -bool fib_has_custom_local_routes -bool fib_offload_disabled -atomic_t fib_num_tclassid_users -struct_hlist_head* fib_table_hash -struct_sock* fibnl -struct_sock* mc_autojoin_sk -struct_inet_peer_base* peers -struct_fqdir* fqdir -u8 sysctl_icmp_echo_ignore_all -u8 sysctl_icmp_echo_enable_probe -u8 sysctl_icmp_echo_ignore_broadcasts -u8 sysctl_icmp_ignore_bogus_error_responses -u8 sysctl_icmp_errors_use_inbound_ifaddr -int sysctl_icmp_ratelimit -int sysctl_icmp_ratemask -u32 ip_rt_min_pmtu - - -int ip_rt_mtu_expires - - -int ip_rt_min_advmss - - -struct_local_ports ip_local_ports - - -u8 sysctl_tcp_ecn - - -u8 sysctl_tcp_ecn_fallback - - -u8 sysctl_ip_default_ttl - - ip4_dst_hoplimit/ip_select_ttl -u8 sysctl_ip_no_pmtu_disc - - -u8 sysctl_ip_fwd_use_pmtu read_mostly - ip_dst_mtu_maybe_forward/ip_skb_dst_mtu -u8 sysctl_ip_fwd_update_priority - - ip_forward -u8 sysctl_ip_nonlocal_bind - - -u8 sysctl_ip_autobind_reuse - - -u8 sysctl_ip_dynaddr - - -u8 sysctl_ip_early_demux - read_mostly ip(6)_rcv_finish_core -u8 sysctl_raw_l3mdev_accept - - -u8 sysctl_tcp_early_demux - read_mostly ip(6)_rcv_finish_core -u8 sysctl_udp_early_demux -u8 sysctl_nexthop_compat_mode - - -u8 sysctl_fwmark_reflect - - -u8 sysctl_tcp_fwmark_accept - - -u8 sysctl_tcp_l3mdev_accept - - -u8 sysctl_tcp_mtu_probing - - -int sysctl_tcp_mtu_probe_floor - - -int sysctl_tcp_base_mss - - -int sysctl_tcp_min_snd_mss read_mostly - __tcp_mtu_to_mss(tcp_write_xmit) -int sysctl_tcp_probe_threshold - - tcp_mtu_probe(tcp_write_xmit) -u32 sysctl_tcp_probe_interval - - tcp_mtu_check_reprobe(tcp_write_xmit) -int sysctl_tcp_keepalive_time - - -int sysctl_tcp_keepalive_intvl - - -u8 sysctl_tcp_keepalive_probes - - -u8 sysctl_tcp_syn_retries - - -u8 sysctl_tcp_synack_retries - - -u8 sysctl_tcp_syncookies - - generated_on_syn -u8 sysctl_tcp_migrate_req - - reuseport -u8 sysctl_tcp_comp_sack_nr - - __tcp_ack_snd_check -int sysctl_tcp_reordering - read_mostly tcp_may_raise_cwnd/tcp_cong_control -u8 sysctl_tcp_retries1 - - -u8 sysctl_tcp_retries2 - - -u8 sysctl_tcp_orphan_retries - - -u8 sysctl_tcp_tw_reuse - - timewait_sock_ops -int sysctl_tcp_fin_timeout - - TCP_LAST_ACK/tcp_rcv_state_process -unsigned_int sysctl_tcp_notsent_lowat read_mostly - tcp_notsent_lowat/tcp_stream_memory_free -u8 sysctl_tcp_sack - - tcp_syn_options -u8 sysctl_tcp_window_scaling - - tcp_syn_options,tcp_parse_options -u8 sysctl_tcp_timestamps -u8 sysctl_tcp_early_retrans read_mostly - tcp_schedule_loss_probe(tcp_write_xmit) -u8 sysctl_tcp_recovery - - tcp_fastretrans_alert -u8 sysctl_tcp_thin_linear_timeouts - - tcp_retrans_timer(on_thin_streams) -u8 sysctl_tcp_slow_start_after_idle - - unlikely(tcp_cwnd_validate-network-not-starved) -u8 sysctl_tcp_retrans_collapse - - -u8 sysctl_tcp_stdurg - - unlikely(tcp_check_urg) -u8 sysctl_tcp_rfc1337 - - -u8 sysctl_tcp_abort_on_overflow - - -u8 sysctl_tcp_fack - - -int sysctl_tcp_max_reordering - - tcp_check_sack_reordering -int sysctl_tcp_adv_win_scale - - tcp_init_buffer_space -u8 sysctl_tcp_dsack - - partial_packet_or_retrans_in_tcp_data_queue -u8 sysctl_tcp_app_win - - tcp_win_from_space -u8 sysctl_tcp_frto - - tcp_enter_loss -u8 sysctl_tcp_nometrics_save - - TCP_LAST_ACK/tcp_update_metrics -u8 sysctl_tcp_no_ssthresh_metrics_save - - TCP_LAST_ACK/tcp_(update/init)_metrics +=============================== ============================================ =================== =================== ================================================= +struct_inet_timewait_death_row tcp_death_row +struct_udp_table* udp_table +struct_ctl_table_header* forw_hdr +struct_ctl_table_header* frags_hdr +struct_ctl_table_header* ipv4_hdr +struct_ctl_table_header* route_hdr +struct_ctl_table_header* xfrm4_hdr +struct_ipv4_devconf* devconf_all +struct_ipv4_devconf* devconf_dflt +struct_ip_ra_chain ra_chain +struct_mutex ra_mutex +struct_fib_rules_ops* rules_ops +struct_fib_table fib_main +struct_fib_table fib_default +unsigned_int fib_rules_require_fldissect +bool fib_has_custom_rules +bool fib_has_custom_local_routes +bool fib_offload_disabled +atomic_t fib_num_tclassid_users +struct_hlist_head* fib_table_hash +struct_sock* fibnl +struct_sock* mc_autojoin_sk +struct_inet_peer_base* peers +struct_fqdir* fqdir +u8 sysctl_icmp_echo_ignore_all +u8 sysctl_icmp_echo_enable_probe +u8 sysctl_icmp_echo_ignore_broadcasts +u8 sysctl_icmp_ignore_bogus_error_responses +u8 sysctl_icmp_errors_use_inbound_ifaddr +int sysctl_icmp_ratelimit +int sysctl_icmp_ratemask +u32 ip_rt_min_pmtu +int ip_rt_mtu_expires +int ip_rt_min_advmss +struct_local_ports ip_local_ports +u8 sysctl_tcp_ecn +u8 sysctl_tcp_ecn_fallback +u8 sysctl_ip_default_ttl ip4_dst_hoplimit/ip_select_ttl +u8 sysctl_ip_no_pmtu_disc +u8 sysctl_ip_fwd_use_pmtu read_mostly ip_dst_mtu_maybe_forward/ip_skb_dst_mtu +u8 sysctl_ip_fwd_update_priority ip_forward +u8 sysctl_ip_nonlocal_bind +u8 sysctl_ip_autobind_reuse +u8 sysctl_ip_dynaddr +u8 sysctl_ip_early_demux read_mostly ip(6)_rcv_finish_core +u8 sysctl_raw_l3mdev_accept +u8 sysctl_tcp_early_demux read_mostly ip(6)_rcv_finish_core +u8 sysctl_udp_early_demux +u8 sysctl_nexthop_compat_mode +u8 sysctl_fwmark_reflect +u8 sysctl_tcp_fwmark_accept +u8 sysctl_tcp_l3mdev_accept read_mostly __inet6_lookup_established/inet_request_bound_dev_if +u8 sysctl_tcp_mtu_probing +int sysctl_tcp_mtu_probe_floor +int sysctl_tcp_base_mss +int sysctl_tcp_min_snd_mss read_mostly __tcp_mtu_to_mss(tcp_write_xmit) +int sysctl_tcp_probe_threshold tcp_mtu_probe(tcp_write_xmit) +u32 sysctl_tcp_probe_interval tcp_mtu_check_reprobe(tcp_write_xmit) +int sysctl_tcp_keepalive_time +int sysctl_tcp_keepalive_intvl +u8 sysctl_tcp_keepalive_probes +u8 sysctl_tcp_syn_retries +u8 sysctl_tcp_synack_retries +u8 sysctl_tcp_syncookies generated_on_syn +u8 sysctl_tcp_migrate_req reuseport +u8 sysctl_tcp_comp_sack_nr __tcp_ack_snd_check +int sysctl_tcp_reordering read_mostly tcp_may_raise_cwnd/tcp_cong_control +u8 sysctl_tcp_retries1 +u8 sysctl_tcp_retries2 +u8 sysctl_tcp_orphan_retries +u8 sysctl_tcp_tw_reuse timewait_sock_ops +int sysctl_tcp_fin_timeout TCP_LAST_ACK/tcp_rcv_state_process +unsigned_int sysctl_tcp_notsent_lowat read_mostly tcp_notsent_lowat/tcp_stream_memory_free +u8 sysctl_tcp_sack tcp_syn_options +u8 sysctl_tcp_window_scaling tcp_syn_options,tcp_parse_options +u8 sysctl_tcp_timestamps +u8 sysctl_tcp_early_retrans read_mostly tcp_schedule_loss_probe(tcp_write_xmit) +u8 sysctl_tcp_recovery tcp_fastretrans_alert +u8 sysctl_tcp_thin_linear_timeouts tcp_retrans_timer(on_thin_streams) +u8 sysctl_tcp_slow_start_after_idle unlikely(tcp_cwnd_validate-network-not-starved) +u8 sysctl_tcp_retrans_collapse +u8 sysctl_tcp_stdurg unlikely(tcp_check_urg) +u8 sysctl_tcp_rfc1337 +u8 sysctl_tcp_abort_on_overflow +u8 sysctl_tcp_fack +int sysctl_tcp_max_reordering tcp_check_sack_reordering +int sysctl_tcp_adv_win_scale tcp_init_buffer_space +u8 sysctl_tcp_dsack partial_packet_or_retrans_in_tcp_data_queue +u8 sysctl_tcp_app_win tcp_win_from_space +u8 sysctl_tcp_frto tcp_enter_loss +u8 sysctl_tcp_nometrics_save TCP_LAST_ACK/tcp_update_metrics +u8 sysctl_tcp_no_ssthresh_metrics_save TCP_LAST_ACK/tcp_(update/init)_metrics u8 sysctl_tcp_moderate_rcvbuf read_mostly read_mostly tcp_tso_should_defer(tx);tcp_rcv_space_adjust(rx) -u8 sysctl_tcp_tso_win_divisor read_mostly - tcp_tso_should_defer(tcp_write_xmit) -u8 sysctl_tcp_workaround_signed_windows - - tcp_select_window -int sysctl_tcp_limit_output_bytes read_mostly - tcp_small_queue_check(tcp_write_xmit) -int sysctl_tcp_challenge_ack_limit - - -int sysctl_tcp_min_rtt_wlen read_mostly - tcp_ack_update_rtt -u8 sysctl_tcp_min_tso_segs - - unlikely(icsk_ca_ops-written) -u8 sysctl_tcp_tso_rtt_log read_mostly - tcp_tso_autosize -u8 sysctl_tcp_autocorking read_mostly - tcp_push/tcp_should_autocork -u8 sysctl_tcp_reflect_tos - - tcp_v(4/6)_send_synack -int sysctl_tcp_invalid_ratelimit - - -int sysctl_tcp_pacing_ss_ratio - - default_cong_cont(tcp_update_pacing_rate) -int sysctl_tcp_pacing_ca_ratio - - default_cong_cont(tcp_update_pacing_rate) -int sysctl_tcp_wmem[3] read_mostly - tcp_wmem_schedule(sendmsg/sendpage) -int sysctl_tcp_rmem[3] - read_mostly __tcp_grow_window(tx),tcp_rcv_space_adjust(rx) -unsigned_int sysctl_tcp_child_ehash_entries -unsigned_long sysctl_tcp_comp_sack_delay_ns - - __tcp_ack_snd_check -unsigned_long sysctl_tcp_comp_sack_slack_ns - - __tcp_ack_snd_check -int sysctl_max_syn_backlog - - -int sysctl_tcp_fastopen - - -struct_tcp_congestion_ops tcp_congestion_control - - init_cc -struct_tcp_fastopen_context tcp_fastopen_ctx - - -unsigned_int sysctl_tcp_fastopen_blackhole_timeout - - -atomic_t tfo_active_disable_times - - -unsigned_long tfo_active_disable_stamp - - -u32 tcp_challenge_timestamp - - -u32 tcp_challenge_count - - -u8 sysctl_tcp_plb_enabled - - -u8 sysctl_tcp_plb_idle_rehash_rounds - - -u8 sysctl_tcp_plb_rehash_rounds - - -u8 sysctl_tcp_plb_suspend_rto_sec - - -int sysctl_tcp_plb_cong_thresh - - -int sysctl_udp_wmem_min -int sysctl_udp_rmem_min -u8 sysctl_fib_notify_on_flag_change -u8 sysctl_udp_l3mdev_accept -u8 sysctl_igmp_llm_reports -int sysctl_igmp_max_memberships -int sysctl_igmp_max_msf -int sysctl_igmp_qrv -struct_ping_group_range ping_group_range -atomic_t dev_addr_genid -unsigned_int sysctl_udp_child_hash_entries -unsigned_long* sysctl_local_reserved_ports -int sysctl_ip_prot_sock -struct_mr_table* mrt -struct_list_head mr_tables -struct_fib_rules_ops* mr_rules_ops -u32 sysctl_fib_multipath_hash_fields -u8 sysctl_fib_multipath_use_neigh -u8 sysctl_fib_multipath_hash_policy -struct_fib_notifier_ops* notifier_ops -unsigned_int fib_seq -struct_fib_notifier_ops* ipmr_notifier_ops -unsigned_int ipmr_seq -atomic_t rt_genid -siphash_key_t ip_id_key +u8 sysctl_tcp_tso_win_divisor read_mostly tcp_tso_should_defer(tcp_write_xmit) +u8 sysctl_tcp_workaround_signed_windows tcp_select_window +int sysctl_tcp_limit_output_bytes read_mostly tcp_small_queue_check(tcp_write_xmit) +int sysctl_tcp_challenge_ack_limit +int sysctl_tcp_min_rtt_wlen read_mostly tcp_ack_update_rtt +u8 sysctl_tcp_min_tso_segs unlikely(icsk_ca_ops-written) +u8 sysctl_tcp_tso_rtt_log read_mostly tcp_tso_autosize +u8 sysctl_tcp_autocorking read_mostly tcp_push/tcp_should_autocork +u8 sysctl_tcp_reflect_tos tcp_v(4/6)_send_synack +int sysctl_tcp_invalid_ratelimit +int sysctl_tcp_pacing_ss_ratio default_cong_cont(tcp_update_pacing_rate) +int sysctl_tcp_pacing_ca_ratio default_cong_cont(tcp_update_pacing_rate) +int sysctl_tcp_wmem[3] read_mostly tcp_wmem_schedule(sendmsg/sendpage) +int sysctl_tcp_rmem[3] read_mostly __tcp_grow_window(tx),tcp_rcv_space_adjust(rx) +unsigned_int sysctl_tcp_child_ehash_entries +unsigned_long sysctl_tcp_comp_sack_delay_ns __tcp_ack_snd_check +unsigned_long sysctl_tcp_comp_sack_slack_ns __tcp_ack_snd_check +int sysctl_max_syn_backlog +int sysctl_tcp_fastopen +struct_tcp_congestion_ops tcp_congestion_control init_cc +struct_tcp_fastopen_context tcp_fastopen_ctx +unsigned_int sysctl_tcp_fastopen_blackhole_timeout +atomic_t tfo_active_disable_times +unsigned_long tfo_active_disable_stamp +u32 tcp_challenge_timestamp +u32 tcp_challenge_count +u8 sysctl_tcp_plb_enabled +u8 sysctl_tcp_plb_idle_rehash_rounds +u8 sysctl_tcp_plb_rehash_rounds +u8 sysctl_tcp_plb_suspend_rto_sec +int sysctl_tcp_plb_cong_thresh +int sysctl_udp_wmem_min +int sysctl_udp_rmem_min +u8 sysctl_fib_notify_on_flag_change +u8 sysctl_udp_l3mdev_accept +u8 sysctl_igmp_llm_reports +int sysctl_igmp_max_memberships +int sysctl_igmp_max_msf +int sysctl_igmp_qrv +struct_ping_group_range ping_group_range +atomic_t dev_addr_genid +unsigned_int sysctl_udp_child_hash_entries +unsigned_long* sysctl_local_reserved_ports +int sysctl_ip_prot_sock +struct_mr_table* mrt +struct_list_head mr_tables +struct_fib_rules_ops* mr_rules_ops +u32 sysctl_fib_multipath_hash_fields +u8 sysctl_fib_multipath_use_neigh +u8 sysctl_fib_multipath_hash_policy +struct_fib_notifier_ops* notifier_ops +unsigned_int fib_seq +struct_fib_notifier_ops* ipmr_notifier_ops +unsigned_int ipmr_seq +atomic_t rt_genid +siphash_key_t ip_id_key +=============================== ============================================ =================== =================== ================================================= diff --git a/Documentation/networking/net_cachelines/snmp.rst b/Documentation/networking/net_cachelines/snmp.rst index 6a071538566c..90ca2d92547d 100644 --- a/Documentation/networking/net_cachelines/snmp.rst +++ b/Documentation/networking/net_cachelines/snmp.rst @@ -5,131 +5,133 @@ netns_ipv4 enum fast path usage breakdown =========================================== +============== ===================================== =================== =================== ================================================== Type Name fastpath_tx_access fastpath_rx_access comment -..enum -unsigned_long LINUX_MIB_TCPKEEPALIVE write_mostly - tcp_keepalive_timer -unsigned_long LINUX_MIB_DELAYEDACKS write_mostly - tcp_delack_timer_handler,tcp_delack_timer -unsigned_long LINUX_MIB_DELAYEDACKLOCKED write_mostly - tcp_delack_timer_handler,tcp_delack_timer -unsigned_long LINUX_MIB_TCPAUTOCORKING write_mostly - tcp_push,tcp_sendmsg_locked -unsigned_long LINUX_MIB_TCPFROMZEROWINDOWADV write_mostly - tcp_select_window,tcp_transmit-skb -unsigned_long LINUX_MIB_TCPTOZEROWINDOWADV write_mostly - tcp_select_window,tcp_transmit-skb -unsigned_long LINUX_MIB_TCPWANTZEROWINDOWADV write_mostly - tcp_select_window,tcp_transmit-skb -unsigned_long LINUX_MIB_TCPORIGDATASENT write_mostly - tcp_write_xmit -unsigned_long LINUX_MIB_TCPHPHITS - write_mostly tcp_rcv_established,tcp_v4_do_rcv,tcp_v6_do_rcv -unsigned_long LINUX_MIB_TCPRCVCOALESCE - write_mostly tcp_try_coalesce,tcp_queue_rcv,tcp_rcv_established -unsigned_long LINUX_MIB_TCPPUREACKS - write_mostly tcp_ack,tcp_rcv_established -unsigned_long LINUX_MIB_TCPHPACKS - write_mostly tcp_ack,tcp_rcv_established -unsigned_long LINUX_MIB_TCPDELIVERED - write_mostly tcp_newly_delivered,tcp_ack,tcp_rcv_established -unsigned_long LINUX_MIB_SYNCOOKIESSENT -unsigned_long LINUX_MIB_SYNCOOKIESRECV -unsigned_long LINUX_MIB_SYNCOOKIESFAILED -unsigned_long LINUX_MIB_EMBRYONICRSTS -unsigned_long LINUX_MIB_PRUNECALLED -unsigned_long LINUX_MIB_RCVPRUNED -unsigned_long LINUX_MIB_OFOPRUNED -unsigned_long LINUX_MIB_OUTOFWINDOWICMPS -unsigned_long LINUX_MIB_LOCKDROPPEDICMPS -unsigned_long LINUX_MIB_ARPFILTER -unsigned_long LINUX_MIB_TIMEWAITED -unsigned_long LINUX_MIB_TIMEWAITRECYCLED -unsigned_long LINUX_MIB_TIMEWAITKILLED -unsigned_long LINUX_MIB_PAWSACTIVEREJECTED -unsigned_long LINUX_MIB_PAWSESTABREJECTED -unsigned_long LINUX_MIB_DELAYEDACKLOST -unsigned_long LINUX_MIB_LISTENOVERFLOWS -unsigned_long LINUX_MIB_LISTENDROPS -unsigned_long LINUX_MIB_TCPRENORECOVERY -unsigned_long LINUX_MIB_TCPSACKRECOVERY -unsigned_long LINUX_MIB_TCPSACKRENEGING -unsigned_long LINUX_MIB_TCPSACKREORDER -unsigned_long LINUX_MIB_TCPRENOREORDER -unsigned_long LINUX_MIB_TCPTSREORDER -unsigned_long LINUX_MIB_TCPFULLUNDO -unsigned_long LINUX_MIB_TCPPARTIALUNDO -unsigned_long LINUX_MIB_TCPDSACKUNDO -unsigned_long LINUX_MIB_TCPLOSSUNDO -unsigned_long LINUX_MIB_TCPLOSTRETRANSMIT -unsigned_long LINUX_MIB_TCPRENOFAILURES -unsigned_long LINUX_MIB_TCPSACKFAILURES -unsigned_long LINUX_MIB_TCPLOSSFAILURES -unsigned_long LINUX_MIB_TCPFASTRETRANS -unsigned_long LINUX_MIB_TCPSLOWSTARTRETRANS -unsigned_long LINUX_MIB_TCPTIMEOUTS -unsigned_long LINUX_MIB_TCPLOSSPROBES -unsigned_long LINUX_MIB_TCPLOSSPROBERECOVERY -unsigned_long LINUX_MIB_TCPRENORECOVERYFAIL -unsigned_long LINUX_MIB_TCPSACKRECOVERYFAIL -unsigned_long LINUX_MIB_TCPRCVCOLLAPSED -unsigned_long LINUX_MIB_TCPDSACKOLDSENT -unsigned_long LINUX_MIB_TCPDSACKOFOSENT -unsigned_long LINUX_MIB_TCPDSACKRECV -unsigned_long LINUX_MIB_TCPDSACKOFORECV -unsigned_long LINUX_MIB_TCPABORTONDATA -unsigned_long LINUX_MIB_TCPABORTONCLOSE -unsigned_long LINUX_MIB_TCPABORTONMEMORY -unsigned_long LINUX_MIB_TCPABORTONTIMEOUT -unsigned_long LINUX_MIB_TCPABORTONLINGER -unsigned_long LINUX_MIB_TCPABORTFAILED -unsigned_long LINUX_MIB_TCPMEMORYPRESSURES -unsigned_long LINUX_MIB_TCPMEMORYPRESSURESCHRONO -unsigned_long LINUX_MIB_TCPSACKDISCARD -unsigned_long LINUX_MIB_TCPDSACKIGNOREDOLD -unsigned_long LINUX_MIB_TCPDSACKIGNOREDNOUNDO -unsigned_long LINUX_MIB_TCPSPURIOUSRTOS -unsigned_long LINUX_MIB_TCPMD5NOTFOUND -unsigned_long LINUX_MIB_TCPMD5UNEXPECTED -unsigned_long LINUX_MIB_TCPMD5FAILURE -unsigned_long LINUX_MIB_SACKSHIFTED -unsigned_long LINUX_MIB_SACKMERGED -unsigned_long LINUX_MIB_SACKSHIFTFALLBACK -unsigned_long LINUX_MIB_TCPBACKLOGDROP -unsigned_long LINUX_MIB_PFMEMALLOCDROP -unsigned_long LINUX_MIB_TCPMINTTLDROP -unsigned_long LINUX_MIB_TCPDEFERACCEPTDROP -unsigned_long LINUX_MIB_IPRPFILTER -unsigned_long LINUX_MIB_TCPTIMEWAITOVERFLOW -unsigned_long LINUX_MIB_TCPREQQFULLDOCOOKIES -unsigned_long LINUX_MIB_TCPREQQFULLDROP -unsigned_long LINUX_MIB_TCPRETRANSFAIL -unsigned_long LINUX_MIB_TCPBACKLOGCOALESCE -unsigned_long LINUX_MIB_TCPOFOQUEUE -unsigned_long LINUX_MIB_TCPOFODROP -unsigned_long LINUX_MIB_TCPOFOMERGE -unsigned_long LINUX_MIB_TCPCHALLENGEACK -unsigned_long LINUX_MIB_TCPSYNCHALLENGE -unsigned_long LINUX_MIB_TCPFASTOPENACTIVE -unsigned_long LINUX_MIB_TCPFASTOPENACTIVEFAIL -unsigned_long LINUX_MIB_TCPFASTOPENPASSIVE -unsigned_long LINUX_MIB_TCPFASTOPENPASSIVEFAIL -unsigned_long LINUX_MIB_TCPFASTOPENLISTENOVERFLOW -unsigned_long LINUX_MIB_TCPFASTOPENCOOKIEREQD -unsigned_long LINUX_MIB_TCPFASTOPENBLACKHOLE -unsigned_long LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES -unsigned_long LINUX_MIB_BUSYPOLLRXPACKETS -unsigned_long LINUX_MIB_TCPSYNRETRANS -unsigned_long LINUX_MIB_TCPHYSTARTTRAINDETECT -unsigned_long LINUX_MIB_TCPHYSTARTTRAINCWND -unsigned_long LINUX_MIB_TCPHYSTARTDELAYDETECT -unsigned_long LINUX_MIB_TCPHYSTARTDELAYCWND -unsigned_long LINUX_MIB_TCPACKSKIPPEDSYNRECV -unsigned_long LINUX_MIB_TCPACKSKIPPEDPAWS -unsigned_long LINUX_MIB_TCPACKSKIPPEDSEQ -unsigned_long LINUX_MIB_TCPACKSKIPPEDFINWAIT2 -unsigned_long LINUX_MIB_TCPACKSKIPPEDTIMEWAIT -unsigned_long LINUX_MIB_TCPACKSKIPPEDCHALLENGE -unsigned_long LINUX_MIB_TCPWINPROBE -unsigned_long LINUX_MIB_TCPMTUPFAIL -unsigned_long LINUX_MIB_TCPMTUPSUCCESS -unsigned_long LINUX_MIB_TCPDELIVEREDCE -unsigned_long LINUX_MIB_TCPACKCOMPRESSED -unsigned_long LINUX_MIB_TCPZEROWINDOWDROP -unsigned_long LINUX_MIB_TCPRCVQDROP -unsigned_long LINUX_MIB_TCPWQUEUETOOBIG -unsigned_long LINUX_MIB_TCPFASTOPENPASSIVEALTKEY -unsigned_long LINUX_MIB_TCPTIMEOUTREHASH -unsigned_long LINUX_MIB_TCPDUPLICATEDATAREHASH -unsigned_long LINUX_MIB_TCPDSACKRECVSEGS -unsigned_long LINUX_MIB_TCPDSACKIGNOREDDUBIOUS -unsigned_long LINUX_MIB_TCPMIGRATEREQSUCCESS -unsigned_long LINUX_MIB_TCPMIGRATEREQFAILURE -unsigned_long __LINUX_MIB_MAX +============== ===================================== =================== =================== ================================================== +unsigned_long LINUX_MIB_TCPKEEPALIVE write_mostly tcp_keepalive_timer +unsigned_long LINUX_MIB_DELAYEDACKS write_mostly tcp_delack_timer_handler,tcp_delack_timer +unsigned_long LINUX_MIB_DELAYEDACKLOCKED write_mostly tcp_delack_timer_handler,tcp_delack_timer +unsigned_long LINUX_MIB_TCPAUTOCORKING write_mostly tcp_push,tcp_sendmsg_locked +unsigned_long LINUX_MIB_TCPFROMZEROWINDOWADV write_mostly tcp_select_window,tcp_transmit-skb +unsigned_long LINUX_MIB_TCPTOZEROWINDOWADV write_mostly tcp_select_window,tcp_transmit-skb +unsigned_long LINUX_MIB_TCPWANTZEROWINDOWADV write_mostly tcp_select_window,tcp_transmit-skb +unsigned_long LINUX_MIB_TCPORIGDATASENT write_mostly tcp_write_xmit +unsigned_long LINUX_MIB_TCPHPHITS write_mostly tcp_rcv_established,tcp_v4_do_rcv,tcp_v6_do_rcv +unsigned_long LINUX_MIB_TCPRCVCOALESCE write_mostly tcp_try_coalesce,tcp_queue_rcv,tcp_rcv_established +unsigned_long LINUX_MIB_TCPPUREACKS write_mostly tcp_ack,tcp_rcv_established +unsigned_long LINUX_MIB_TCPHPACKS write_mostly tcp_ack,tcp_rcv_established +unsigned_long LINUX_MIB_TCPDELIVERED write_mostly tcp_newly_delivered,tcp_ack,tcp_rcv_established +unsigned_long LINUX_MIB_SYNCOOKIESSENT +unsigned_long LINUX_MIB_SYNCOOKIESRECV +unsigned_long LINUX_MIB_SYNCOOKIESFAILED +unsigned_long LINUX_MIB_EMBRYONICRSTS +unsigned_long LINUX_MIB_PRUNECALLED +unsigned_long LINUX_MIB_RCVPRUNED +unsigned_long LINUX_MIB_OFOPRUNED +unsigned_long LINUX_MIB_OUTOFWINDOWICMPS +unsigned_long LINUX_MIB_LOCKDROPPEDICMPS +unsigned_long LINUX_MIB_ARPFILTER +unsigned_long LINUX_MIB_TIMEWAITED +unsigned_long LINUX_MIB_TIMEWAITRECYCLED +unsigned_long LINUX_MIB_TIMEWAITKILLED +unsigned_long LINUX_MIB_PAWSACTIVEREJECTED +unsigned_long LINUX_MIB_PAWSESTABREJECTED +unsigned_long LINUX_MIB_DELAYEDACKLOST +unsigned_long LINUX_MIB_LISTENOVERFLOWS +unsigned_long LINUX_MIB_LISTENDROPS +unsigned_long LINUX_MIB_TCPRENORECOVERY +unsigned_long LINUX_MIB_TCPSACKRECOVERY +unsigned_long LINUX_MIB_TCPSACKRENEGING +unsigned_long LINUX_MIB_TCPSACKREORDER +unsigned_long LINUX_MIB_TCPRENOREORDER +unsigned_long LINUX_MIB_TCPTSREORDER +unsigned_long LINUX_MIB_TCPFULLUNDO +unsigned_long LINUX_MIB_TCPPARTIALUNDO +unsigned_long LINUX_MIB_TCPDSACKUNDO +unsigned_long LINUX_MIB_TCPLOSSUNDO +unsigned_long LINUX_MIB_TCPLOSTRETRANSMIT +unsigned_long LINUX_MIB_TCPRENOFAILURES +unsigned_long LINUX_MIB_TCPSACKFAILURES +unsigned_long LINUX_MIB_TCPLOSSFAILURES +unsigned_long LINUX_MIB_TCPFASTRETRANS +unsigned_long LINUX_MIB_TCPSLOWSTARTRETRANS +unsigned_long LINUX_MIB_TCPTIMEOUTS +unsigned_long LINUX_MIB_TCPLOSSPROBES +unsigned_long LINUX_MIB_TCPLOSSPROBERECOVERY +unsigned_long LINUX_MIB_TCPRENORECOVERYFAIL +unsigned_long LINUX_MIB_TCPSACKRECOVERYFAIL +unsigned_long LINUX_MIB_TCPRCVCOLLAPSED +unsigned_long LINUX_MIB_TCPDSACKOLDSENT +unsigned_long LINUX_MIB_TCPDSACKOFOSENT +unsigned_long LINUX_MIB_TCPDSACKRECV +unsigned_long LINUX_MIB_TCPDSACKOFORECV +unsigned_long LINUX_MIB_TCPABORTONDATA +unsigned_long LINUX_MIB_TCPABORTONCLOSE +unsigned_long LINUX_MIB_TCPABORTONMEMORY +unsigned_long LINUX_MIB_TCPABORTONTIMEOUT +unsigned_long LINUX_MIB_TCPABORTONLINGER +unsigned_long LINUX_MIB_TCPABORTFAILED +unsigned_long LINUX_MIB_TCPMEMORYPRESSURES +unsigned_long LINUX_MIB_TCPMEMORYPRESSURESCHRONO +unsigned_long LINUX_MIB_TCPSACKDISCARD +unsigned_long LINUX_MIB_TCPDSACKIGNOREDOLD +unsigned_long LINUX_MIB_TCPDSACKIGNOREDNOUNDO +unsigned_long LINUX_MIB_TCPSPURIOUSRTOS +unsigned_long LINUX_MIB_TCPMD5NOTFOUND +unsigned_long LINUX_MIB_TCPMD5UNEXPECTED +unsigned_long LINUX_MIB_TCPMD5FAILURE +unsigned_long LINUX_MIB_SACKSHIFTED +unsigned_long LINUX_MIB_SACKMERGED +unsigned_long LINUX_MIB_SACKSHIFTFALLBACK +unsigned_long LINUX_MIB_TCPBACKLOGDROP +unsigned_long LINUX_MIB_PFMEMALLOCDROP +unsigned_long LINUX_MIB_TCPMINTTLDROP +unsigned_long LINUX_MIB_TCPDEFERACCEPTDROP +unsigned_long LINUX_MIB_IPRPFILTER +unsigned_long LINUX_MIB_TCPTIMEWAITOVERFLOW +unsigned_long LINUX_MIB_TCPREQQFULLDOCOOKIES +unsigned_long LINUX_MIB_TCPREQQFULLDROP +unsigned_long LINUX_MIB_TCPRETRANSFAIL +unsigned_long LINUX_MIB_TCPBACKLOGCOALESCE +unsigned_long LINUX_MIB_TCPOFOQUEUE +unsigned_long LINUX_MIB_TCPOFODROP +unsigned_long LINUX_MIB_TCPOFOMERGE +unsigned_long LINUX_MIB_TCPCHALLENGEACK +unsigned_long LINUX_MIB_TCPSYNCHALLENGE +unsigned_long LINUX_MIB_TCPFASTOPENACTIVE +unsigned_long LINUX_MIB_TCPFASTOPENACTIVEFAIL +unsigned_long LINUX_MIB_TCPFASTOPENPASSIVE +unsigned_long LINUX_MIB_TCPFASTOPENPASSIVEFAIL +unsigned_long LINUX_MIB_TCPFASTOPENLISTENOVERFLOW +unsigned_long LINUX_MIB_TCPFASTOPENCOOKIEREQD +unsigned_long LINUX_MIB_TCPFASTOPENBLACKHOLE +unsigned_long LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES +unsigned_long LINUX_MIB_BUSYPOLLRXPACKETS +unsigned_long LINUX_MIB_TCPSYNRETRANS +unsigned_long LINUX_MIB_TCPHYSTARTTRAINDETECT +unsigned_long LINUX_MIB_TCPHYSTARTTRAINCWND +unsigned_long LINUX_MIB_TCPHYSTARTDELAYDETECT +unsigned_long LINUX_MIB_TCPHYSTARTDELAYCWND +unsigned_long LINUX_MIB_TCPACKSKIPPEDSYNRECV +unsigned_long LINUX_MIB_TCPACKSKIPPEDPAWS +unsigned_long LINUX_MIB_TCPACKSKIPPEDSEQ +unsigned_long LINUX_MIB_TCPACKSKIPPEDFINWAIT2 +unsigned_long LINUX_MIB_TCPACKSKIPPEDTIMEWAIT +unsigned_long LINUX_MIB_TCPACKSKIPPEDCHALLENGE +unsigned_long LINUX_MIB_TCPWINPROBE +unsigned_long LINUX_MIB_TCPMTUPFAIL +unsigned_long LINUX_MIB_TCPMTUPSUCCESS +unsigned_long LINUX_MIB_TCPDELIVEREDCE +unsigned_long LINUX_MIB_TCPACKCOMPRESSED +unsigned_long LINUX_MIB_TCPZEROWINDOWDROP +unsigned_long LINUX_MIB_TCPRCVQDROP +unsigned_long LINUX_MIB_TCPWQUEUETOOBIG +unsigned_long LINUX_MIB_TCPFASTOPENPASSIVEALTKEY +unsigned_long LINUX_MIB_TCPTIMEOUTREHASH +unsigned_long LINUX_MIB_TCPDUPLICATEDATAREHASH +unsigned_long LINUX_MIB_TCPDSACKRECVSEGS +unsigned_long LINUX_MIB_TCPDSACKIGNOREDDUBIOUS +unsigned_long LINUX_MIB_TCPMIGRATEREQSUCCESS +unsigned_long LINUX_MIB_TCPMIGRATEREQFAILURE +unsigned_long __LINUX_MIB_MAX +============== ===================================== =================== =================== ================================================== diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst index 1c154cbd1848..1f79765072b1 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -5,153 +5,155 @@ tcp_sock struct fast path usage breakdown ========================================= +============================= ======================= =================== =================== ================================================================================================================================================================================================================== Type Name fastpath_tx_access fastpath_rx_access Comments -..struct ..tcp_sock -struct_inet_connection_sock inet_conn +============================= ======================= =================== =================== ================================================================================================================================================================================================================== +struct inet_connection_sock inet_conn u16 tcp_header_len read_mostly read_mostly tcp_bound_to_half_wnd,tcp_current_mss(tx);tcp_rcv_established(rx) -u16 gso_segs read_mostly - tcp_xmit_size_goal +u16 gso_segs read_mostly tcp_xmit_size_goal __be32 pred_flags read_write read_mostly tcp_select_window(tx);tcp_rcv_established(rx) -u64 bytes_received - read_write tcp_rcv_nxt_update(rx) -u32 segs_in - read_write tcp_v6_rcv(rx) -u32 data_segs_in - read_write tcp_v6_rcv(rx) +u64 bytes_received read_write tcp_rcv_nxt_update(rx) +u32 segs_in read_write tcp_v6_rcv(rx) +u32 data_segs_in read_write tcp_v6_rcv(rx) u32 rcv_nxt read_mostly read_write tcp_cleanup_rbuf,tcp_send_ack,tcp_inq_hint,tcp_transmit_skb,tcp_receive_window(tx);tcp_v6_do_rcv,tcp_rcv_established,tcp_data_queue,tcp_receive_window,tcp_rcv_nxt_update(write)(rx) -u32 copied_seq - read_mostly tcp_cleanup_rbuf,tcp_rcv_space_adjust,tcp_inq_hint -u32 rcv_wup - read_write __tcp_cleanup_rbuf,tcp_receive_window,tcp_receive_established +u32 copied_seq read_mostly tcp_cleanup_rbuf,tcp_rcv_space_adjust,tcp_inq_hint +u32 rcv_wup read_write __tcp_cleanup_rbuf,tcp_receive_window,tcp_receive_established u32 snd_nxt read_write read_mostly tcp_rate_check_app_limited,__tcp_transmit_skb,tcp_event_new_data_sent(write)(tx);tcp_rcv_established,tcp_ack,tcp_clean_rtx_queue(rx) -u32 segs_out read_write - __tcp_transmit_skb -u32 data_segs_out read_write - __tcp_transmit_skb,tcp_update_skb_after_send -u64 bytes_sent read_write - __tcp_transmit_skb -u64 bytes_acked - read_write tcp_snd_una_update/tcp_ack -u32 dsack_dups +u32 segs_out read_write __tcp_transmit_skb +u32 data_segs_out read_write __tcp_transmit_skb,tcp_update_skb_after_send +u64 bytes_sent read_write __tcp_transmit_skb +u64 bytes_acked read_write tcp_snd_una_update/tcp_ack +u32 dsack_dups u32 snd_una read_mostly read_write tcp_wnd_end,tcp_urg_mode,tcp_minshall_check,tcp_cwnd_validate(tx);tcp_ack,tcp_may_update_window,tcp_clean_rtx_queue(write),tcp_ack_tstamp(rx) -u32 snd_sml read_write - tcp_minshall_check,tcp_minshall_update -u32 rcv_tstamp - read_mostly tcp_ack -u32 lsndtime read_write - tcp_slow_start_after_idle_check,tcp_event_data_sent -u32 last_oow_ack_time -u32 compressed_ack_rcv_nxt +u32 snd_sml read_write tcp_minshall_check,tcp_minshall_update +u32 rcv_tstamp read_mostly tcp_ack +u32 lsndtime read_write tcp_slow_start_after_idle_check,tcp_event_data_sent +u32 last_oow_ack_time +u32 compressed_ack_rcv_nxt u32 tsoffset read_mostly read_mostly tcp_established_options(tx);tcp_fast_parse_options(rx) -struct_list_head tsq_node - - -struct_list_head tsorted_sent_queue read_write - tcp_update_skb_after_send -u32 snd_wl1 - read_mostly tcp_may_update_window +struct list_head tsq_node +struct list_head tsorted_sent_queue read_write tcp_update_skb_after_send +u32 snd_wl1 read_mostly tcp_may_update_window u32 snd_wnd read_mostly read_mostly tcp_wnd_end,tcp_tso_should_defer(tx);tcp_fast_path_on(rx) -u32 max_window read_mostly - tcp_bound_to_half_wnd,forced_push +u32 max_window read_mostly tcp_bound_to_half_wnd,forced_push u32 mss_cache read_mostly read_mostly tcp_rate_check_app_limited,tcp_current_mss,tcp_sync_mss,tcp_sndbuf_expand,tcp_tso_should_defer(tx);tcp_update_pacing_rate,tcp_clean_rtx_queue(rx) u32 window_clamp read_mostly read_write tcp_rcv_space_adjust,__tcp_select_window -u32 rcv_ssthresh read_mostly - __tcp_select_window +u32 rcv_ssthresh read_mostly __tcp_select_window u8 scaling_ratio read_mostly read_mostly tcp_win_from_space -struct tcp_rack -u16 advmss - read_mostly tcp_rcv_space_adjust -u8 compressed_ack -u8:2 dup_ack_counter -u8:1 tlp_retrans +struct tcp_rack +u16 advmss read_mostly tcp_rcv_space_adjust +u8 compressed_ack +u8:2 dup_ack_counter +u8:1 tlp_retrans u8:1 tcp_usec_ts read_mostly read_mostly -u32 chrono_start read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) -u32[3] chrono_stat read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) -u8:2 chrono_type read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) -u8:1 rate_app_limited - read_write tcp_rate_gen -u8:1 fastopen_connect -u8:1 fastopen_no_cookie -u8:1 is_sack_reneg - read_mostly tcp_skb_entail,tcp_ack -u8:2 fastopen_client_fail -u8:4 nonagle read_write - tcp_skb_entail,tcp_push_pending_frames -u8:1 thin_lto -u8:1 recvmsg_inq -u8:1 repair read_mostly - tcp_write_xmit -u8:1 frto -u8 repair_queue - - -u8:2 save_syn -u8:1 syn_data -u8:1 syn_fastopen -u8:1 syn_fastopen_exp -u8:1 syn_fastopen_ch -u8:1 syn_data_acked -u8:1 is_cwnd_limited read_mostly - tcp_cwnd_validate,tcp_is_cwnd_limited -u32 tlp_high_seq - read_mostly tcp_ack -u32 tcp_tx_delay -u64 tcp_wstamp_ns read_write - tcp_pacing_check,tcp_tso_should_defer,tcp_update_skb_after_send +u32 chrono_start read_write tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) +u32[3] chrono_stat read_write tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) +u8:2 chrono_type read_write tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) +u8:1 rate_app_limited read_write tcp_rate_gen +u8:1 fastopen_connect +u8:1 fastopen_no_cookie +u8:1 is_sack_reneg read_mostly tcp_skb_entail,tcp_ack +u8:2 fastopen_client_fail +u8:4 nonagle read_write tcp_skb_entail,tcp_push_pending_frames +u8:1 thin_lto +u8:1 recvmsg_inq +u8:1 repair read_mostly tcp_write_xmit +u8:1 frto +u8 repair_queue +u8:2 save_syn +u8:1 syn_data +u8:1 syn_fastopen +u8:1 syn_fastopen_exp +u8:1 syn_fastopen_ch +u8:1 syn_data_acked +u8:1 is_cwnd_limited read_mostly tcp_cwnd_validate,tcp_is_cwnd_limited +u32 tlp_high_seq read_mostly tcp_ack +u32 tcp_tx_delay +u64 tcp_wstamp_ns read_write tcp_pacing_check,tcp_tso_should_defer,tcp_update_skb_after_send u64 tcp_clock_cache read_write read_write tcp_mstamp_refresh(tcp_write_xmit/tcp_rcv_space_adjust),__tcp_transmit_skb,tcp_tso_should_defer;timer u64 tcp_mstamp read_write read_write tcp_mstamp_refresh(tcp_write_xmit/tcp_rcv_space_adjust)(tx);tcp_rcv_space_adjust,tcp_rate_gen,tcp_clean_rtx_queue,tcp_ack_update_rtt/tcp_time_stamp(rx);timer u32 srtt_us read_mostly read_write tcp_tso_should_defer(tx);tcp_update_pacing_rate,__tcp_set_rto,tcp_rtt_estimator(rx) -u32 mdev_us read_write - tcp_rtt_estimator -u32 mdev_max_us -u32 rttvar_us - read_mostly __tcp_set_rto +u32 mdev_us read_write tcp_rtt_estimator +u32 mdev_max_us +u32 rttvar_us read_mostly __tcp_set_rto u32 rtt_seq read_write tcp_rtt_estimator -struct_minmax rtt_min - read_mostly tcp_min_rtt/tcp_rate_gen,tcp_min_rtttcp_update_rtt_min +struct minmax rtt_min read_mostly tcp_min_rtt/tcp_rate_gen,tcp_min_rtttcp_update_rtt_min u32 packets_out read_write read_write tcp_packets_in_flight(tx/rx);tcp_slow_start_after_idle_check,tcp_nagle_check,tcp_rate_skb_sent,tcp_event_new_data_sent,tcp_cwnd_validate,tcp_write_xmit(tx);tcp_ack,tcp_clean_rtx_queue,tcp_update_pacing_rate(rx) -u32 retrans_out - read_mostly tcp_packets_in_flight,tcp_rate_check_app_limited -u32 max_packets_out - read_write tcp_cwnd_validate -u32 cwnd_usage_seq - read_write tcp_cwnd_validate -u16 urg_data - read_mostly tcp_fast_path_check -u8 ecn_flags read_write - tcp_ecn_send -u8 keepalive_probes -u32 reordering read_mostly - tcp_sndbuf_expand -u32 reord_seen +u32 retrans_out read_mostly tcp_packets_in_flight,tcp_rate_check_app_limited +u32 max_packets_out read_write tcp_cwnd_validate +u32 cwnd_usage_seq read_write tcp_cwnd_validate +u16 urg_data read_mostly tcp_fast_path_check +u8 ecn_flags read_write tcp_ecn_send +u8 keepalive_probes +u32 reordering read_mostly tcp_sndbuf_expand +u32 reord_seen u32 snd_up read_write read_mostly tcp_mark_urg,tcp_urg_mode,__tcp_transmit_skb(tx);tcp_clean_rtx_queue(rx) -struct_tcp_options_received rx_opt read_mostly read_write tcp_established_options(tx);tcp_fast_path_on,tcp_ack_update_window,tcp_is_sack,tcp_data_queue,tcp_rcv_established,tcp_ack_update_rtt(rx) -u32 snd_ssthresh - read_mostly tcp_update_pacing_rate +struct tcp_options_received rx_opt read_mostly read_write tcp_established_options(tx);tcp_fast_path_on,tcp_ack_update_window,tcp_is_sack,tcp_data_queue,tcp_rcv_established,tcp_ack_update_rtt(rx) +u32 snd_ssthresh read_mostly tcp_update_pacing_rate u32 snd_cwnd read_mostly read_mostly tcp_snd_cwnd,tcp_rate_check_app_limited,tcp_tso_should_defer(tx);tcp_update_pacing_rate -u32 snd_cwnd_cnt -u32 snd_cwnd_clamp -u32 snd_cwnd_used -u32 snd_cwnd_stamp -u32 prior_cwnd -u32 prr_delivered +u32 snd_cwnd_cnt +u32 snd_cwnd_clamp +u32 snd_cwnd_used +u32 snd_cwnd_stamp +u32 prior_cwnd +u32 prr_delivered u32 prr_out read_mostly read_mostly tcp_rate_skb_sent,tcp_newly_delivered(tx);tcp_ack,tcp_rate_gen,tcp_clean_rtx_queue(rx) u32 delivered read_mostly read_write tcp_rate_skb_sent, tcp_newly_delivered(tx);tcp_ack, tcp_rate_gen, tcp_clean_rtx_queue (rx) u32 delivered_ce read_mostly read_write tcp_rate_skb_sent(tx);tcp_rate_gen(rx) -u32 lost - read_mostly tcp_ack +u32 lost read_mostly tcp_ack u32 app_limited read_write read_mostly tcp_rate_check_app_limited,tcp_rate_skb_sent(tx);tcp_rate_gen(rx) -u64 first_tx_mstamp read_write - tcp_rate_skb_sent -u64 delivered_mstamp read_write - tcp_rate_skb_sent -u32 rate_delivered - read_mostly tcp_rate_gen -u32 rate_interval_us - read_mostly rate_delivered,rate_app_limited +u64 first_tx_mstamp read_write tcp_rate_skb_sent +u64 delivered_mstamp read_write tcp_rate_skb_sent +u32 rate_delivered read_mostly tcp_rate_gen +u32 rate_interval_us read_mostly rate_delivered,rate_app_limited u32 rcv_wnd read_write read_mostly tcp_select_window,tcp_receive_window,tcp_fast_path_check -u32 write_seq read_write - tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push -u32 notsent_lowat read_mostly - tcp_stream_memory_free -u32 pushed_seq read_write - tcp_mark_push,forced_push +u32 write_seq read_write tcp_rate_check_app_limited,tcp_write_queue_empty,tcp_skb_entail,forced_push,tcp_mark_push +u32 notsent_lowat read_mostly tcp_stream_memory_free +u32 pushed_seq read_write tcp_mark_push,forced_push u32 lost_out read_mostly read_mostly tcp_left_out(tx);tcp_packets_in_flight(tx/rx);tcp_rate_check_app_limited(rx) u32 sacked_out read_mostly read_mostly tcp_left_out(tx);tcp_packets_in_flight(tx/rx);tcp_clean_rtx_queue(rx) -struct_hrtimer pacing_timer -struct_hrtimer compressed_ack_timer -struct_sk_buff* lost_skb_hint read_mostly tcp_clean_rtx_queue -struct_sk_buff* retransmit_skb_hint read_mostly - tcp_clean_rtx_queue -struct_rb_root out_of_order_queue - read_mostly tcp_data_queue,tcp_fast_path_check -struct_sk_buff* ooo_last_skb -struct_tcp_sack_block[1] duplicate_sack -struct_tcp_sack_block[4] selective_acks -struct_tcp_sack_block[4] recv_sack_cache -struct_sk_buff* highest_sack read_write - tcp_event_new_data_sent -int lost_cnt_hint -u32 prior_ssthresh -u32 high_seq -u32 retrans_stamp -u32 undo_marker -int undo_retrans -u64 bytes_retrans -u32 total_retrans -u32 rto_stamp -u16 total_rto -u16 total_rto_recoveries -u32 total_rto_time -u32 urg_seq - - -unsigned_int keepalive_time -unsigned_int keepalive_intvl -int linger2 -u8 bpf_sock_ops_cb_flags -u8:1 bpf_chg_cc_inprogress -u16 timeout_rehash -u32 rcv_ooopack -u32 rcv_rtt_last_tsecr -struct rcv_rtt_est - read_write tcp_rcv_space_adjust,tcp_rcv_established -struct rcvq_space - read_write tcp_rcv_space_adjust -struct mtu_probe -u32 plb_rehash -u32 mtu_info -bool is_mptcp -bool smc_hs_congested -bool syn_smc -struct_tcp_sock_af_ops* af_specific -struct_tcp_md5sig_info* md5sig_info -struct_tcp_fastopen_request* fastopen_req -struct_request_sock* fastopen_rsk -struct_saved_syn* saved_syn
\ No newline at end of file +struct hrtimer pacing_timer +struct hrtimer compressed_ack_timer +struct sk_buff* lost_skb_hint read_mostly tcp_clean_rtx_queue +struct sk_buff* retransmit_skb_hint read_mostly tcp_clean_rtx_queue +struct rb_root out_of_order_queue read_mostly tcp_data_queue,tcp_fast_path_check +struct sk_buff* ooo_last_skb +struct tcp_sack_block[1] duplicate_sack +struct tcp_sack_block[4] selective_acks +struct tcp_sack_block[4] recv_sack_cache +struct sk_buff* highest_sack read_write tcp_event_new_data_sent +int lost_cnt_hint +u32 prior_ssthresh +u32 high_seq +u32 retrans_stamp +u32 undo_marker +int undo_retrans +u64 bytes_retrans +u32 total_retrans +u32 rto_stamp +u16 total_rto +u16 total_rto_recoveries +u32 total_rto_time +u32 urg_seq +unsigned_int keepalive_time +unsigned_int keepalive_intvl +int linger2 +u8 bpf_sock_ops_cb_flags +u8:1 bpf_chg_cc_inprogress +u16 timeout_rehash +u32 rcv_ooopack +u32 rcv_rtt_last_tsecr +struct rcv_rtt_est read_write tcp_rcv_space_adjust,tcp_rcv_established +struct rcvq_space read_write tcp_rcv_space_adjust +struct mtu_probe +u32 plb_rehash +u32 mtu_info +bool is_mptcp +bool smc_hs_congested +bool syn_smc +struct tcp_sock_af_ops* af_specific +struct tcp_md5sig_info* md5sig_info +struct tcp_fastopen_request* fastopen_req +struct request_sock* fastopen_rsk +struct saved_syn* saved_syn +============================= ======================= =================== =================== ================================================================================================================================================================================================================== diff --git a/Documentation/networking/net_dim.rst b/Documentation/networking/net_dim.rst index 8908fd7b0a8d..4377998e6826 100644 --- a/Documentation/networking/net_dim.rst +++ b/Documentation/networking/net_dim.rst @@ -156,7 +156,7 @@ usage is not complete but it should make the outline of the usage clear. my_entity->bytes, &dim_sample); /* Call net DIM */ - net_dim(&my_entity->dim, dim_sample); + net_dim(&my_entity->dim, &dim_sample); ... } diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index 8199e6917671..b37bfbfc7d79 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -194,6 +194,20 @@ SOF_TIMESTAMPING_OPT_ID: among all possibly concurrently outstanding timestamp requests for that socket. + The process can optionally override the default generated ID, by + passing a specific ID with control message SCM_TS_OPT_ID (not + supported for TCP sockets):: + + struct msghdr *msg; + ... + cmsg = CMSG_FIRSTHDR(msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_TS_OPT_ID; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *((__u32 *) CMSG_DATA(cmsg)) = opt_id; + err = sendmsg(fd, msg, 0); + + SOF_TIMESTAMPING_OPT_ID_TCP: Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the diff --git a/Documentation/networking/tipc.rst b/Documentation/networking/tipc.rst index ab63d298cca2..9b375b9b9981 100644 --- a/Documentation/networking/tipc.rst +++ b/Documentation/networking/tipc.rst @@ -112,7 +112,7 @@ More Information - How to contribute to TIPC: -- http://tipc.io/contacts.html + http://tipc.io/contacts.html - More details about TIPC specification: diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst index 5f0dea3d571e..7354d48cdf92 100644 --- a/Documentation/networking/tls-offload.rst +++ b/Documentation/networking/tls-offload.rst @@ -51,7 +51,7 @@ and send them to the device for encryption and transmission. RX -- -On the receive side if the device handled decryption and authentication +On the receive side, if the device handled decryption and authentication successfully, the driver will set the decrypted bit in the associated :c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and are handled normally. ``ktls`` is informed when data is queued to the socket @@ -120,8 +120,9 @@ before installing the connection state in the kernel. RX -- -In RX direction local networking stack has little control over the segmentation, -so the initial records' TCP sequence number may be anywhere inside the segment. +In the RX direction, the local networking stack has little control over +segmentation, so the initial records' TCP sequence number may be anywhere +inside the segment. Normal operation ================ @@ -138,8 +139,8 @@ There are no guarantees on record length or record segmentation. In particular segments may start at any point of a record and contain any number of records. Assuming segments are received in order, the device should be able to perform crypto operations and authentication regardless of segmentation. For this -to be possible device has to keep small amount of segment-to-segment state. -This includes at least: +to be possible, the device has to keep a small amount of segment-to-segment +state. This includes at least: * partial headers (if a segment carried only a part of the TLS header) * partial data block @@ -175,12 +176,12 @@ and packet transformation functions) the device validates the Layer 4 checksum and performs a 5-tuple lookup to find any TLS connection the packet may belong to (technically a 4-tuple lookup is sufficient - IP addresses and TCP port numbers, as the protocol -is always TCP). If connection is matched device confirms if the TCP sequence -number is the expected one and proceeds to TLS handling (record delineation, -decryption, authentication for each record in the packet). The device leaves -the record framing unmodified, the stack takes care of record decapsulation. -Device indicates successful handling of TLS offload in the per-packet context -(descriptor) passed to the host. +is always TCP). If the packet is matched to a connection, the device confirms +if the TCP sequence number is the expected one and proceeds to TLS handling +(record delineation, decryption, authentication for each record in the packet). +The device leaves the record framing unmodified, the stack takes care of record +decapsulation. Device indicates successful handling of TLS offload in the +per-packet context (descriptor) passed to the host. Upon reception of a TLS offloaded packet, the driver sets the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>` @@ -439,7 +440,7 @@ by the driver: * ``rx_tls_resync_req_end`` - number of times the TLS async resync request properly ended with providing the HW tracked tcp-seq. * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request - procedure was started by not properly ended. + procedure was started but not properly ended. * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to the driver was successfully handled. * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to @@ -507,8 +508,8 @@ in packets as seen on the wire. Transport layer transparency ---------------------------- -The device should not modify any packet headers for the purpose -of the simplifying TLS offload. +For the purpose of simplifying TLS offload, the device should not modify any +packet headers. The device should not depend on any packet headers beyond what is strictly necessary for TLS offload. |