diff options
Diffstat (limited to 'Documentation/networking')
| -rw-r--r-- | Documentation/networking/bonding.rst | 20 | ||||
| -rw-r--r-- | Documentation/networking/can.rst | 2 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/can/can327.rst | 331 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/can/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/index.rst | 2 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/intel/ice.rst | 9 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/neterion/vxge.rst | 115 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/wangxun/txgbe.rst | 20 | ||||
| -rw-r--r-- | Documentation/networking/devlink/devlink-selftests.rst | 38 | ||||
| -rw-r--r-- | Documentation/networking/devlink/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/networking/devlink/mlxsw.rst | 24 | ||||
| -rw-r--r-- | Documentation/networking/dsa/dsa.rst | 363 | ||||
| -rw-r--r-- | Documentation/networking/ip-sysctl.rst | 87 | ||||
| -rw-r--r-- | Documentation/networking/sfp-phylink.rst | 6 | ||||
| -rw-r--r-- | Documentation/networking/smc-sysctl.rst | 13 | ||||
| -rw-r--r-- | Documentation/networking/tls.rst | 47 |
16 files changed, 862 insertions, 217 deletions
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index 43be3782e5df..7823a069a903 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -780,6 +780,17 @@ peer_notif_delay value is 0 which means to match the value of the link monitor interval. +prio + Slave priority. A higher number means higher priority. + The primary slave has the highest priority. This option also + follows the primary_reselect rules. + + This option could only be configured via netlink, and is only valid + for active-backup(1), balance-tlb (5) and balance-alb (6) mode. + The valid value range is a signed 32 bit integer. + + The default value is 0. + primary A string (eth0, eth2, etc) specifying which slave is the @@ -1971,15 +1982,6 @@ uses the response as an indication that the link is operating. This gives some assurance that traffic is actually flowing to and from one or more peers on the local network. -The ARP monitor relies on the device driver itself to verify -that traffic is flowing. In particular, the driver must keep up to -date the last receive time, dev->last_rx. Drivers that use NETIF_F_LLTX -flag must also update netdev_queue->trans_start. If they do not, then the -ARP monitor will immediately fail any slaves using that driver, and -those slaves will stay down. If networking monitoring (tcpdump, etc) -shows the ARP requests and replies on the network, then it may be that -your device driver is not updating last_rx and trans_start. - 7.2 Configuring Multiple ARP Targets ------------------------------------ diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst index f34cb0e4460e..ebc822e605f5 100644 --- a/Documentation/networking/can.rst +++ b/Documentation/networking/can.rst @@ -168,7 +168,7 @@ reflect the correct [#f1]_ traffic on the node the loopback of the sent data has to be performed right after a successful transmission. If the CAN network interface is not capable of performing the loopback for some reason the SocketCAN core can do this task as a fallback solution. -See :ref:`socketcan-local-loopback1` for details (recommended). +See :ref:`socketcan-local-loopback2` for details (recommended). The loopback functionality is enabled by default to reflect standard networking behaviour for CAN applications. Due to some requests from diff --git a/Documentation/networking/device_drivers/can/can327.rst b/Documentation/networking/device_drivers/can/can327.rst new file mode 100644 index 000000000000..b87bfbe5d51c --- /dev/null +++ b/Documentation/networking/device_drivers/can/can327.rst @@ -0,0 +1,331 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) + +can327: ELM327 driver for Linux SocketCAN +========================================== + +Authors +-------- + +Max Staudt <[email protected]> + + + +Motivation +----------- + +This driver aims to lower the initial cost for hackers interested in +working with CAN buses. + +CAN adapters are expensive, few, and far between. +ELM327 interfaces are cheap and plentiful. +Let's use ELM327s as CAN adapters. + + + +Introduction +------------- + +This driver is an effort to turn abundant ELM327 based OBD interfaces +into full fledged (as far as possible) CAN interfaces. + +Since the ELM327 was never meant to be a stand alone CAN controller, +the driver has to switch between its modes as quickly as possible in +order to fake full-duplex operation. + +As such, can327 is a best effort driver. However, this is more than +enough to implement simple request-response protocols (such as OBD II), +and to monitor broadcast messages on a bus (such as in a vehicle). + +Most ELM327s come as nondescript serial devices, attached via USB or +Bluetooth. The driver cannot recognize them by itself, and as such it +is up to the user to attach it in form of a TTY line discipline +(similar to PPP, SLIP, slcan, ...). + +This driver is meant for ELM327 versions 1.4b and up, see below for +known limitations in older controllers and clones. + + + +Data sheet +----------- + +The official data sheets can be found at ELM electronics' home page: + + https://www.elmelectronics.com/ + + + +How to attach the line discipline +---------------------------------- + +Every ELM327 chip is factory programmed to operate at a serial setting +of 38400 baud/s, 8 data bits, no parity, 1 stopbit. + +If you have kept this default configuration, the line discipline can +be attached on a command prompt as follows:: + + sudo ldattach \ + --debug \ + --speed 38400 \ + --eightbits \ + --noparity \ + --onestopbit \ + --iflag -ICRNL,INLCR,-IXOFF \ + 30 \ + /dev/ttyUSB0 + +To change the ELM327's serial settings, please refer to its data +sheet. This needs to be done before attaching the line discipline. + +Once the ldisc is attached, the CAN interface starts out unconfigured. +Set the speed before starting it:: + + # The interface needs to be down to change parameters + sudo ip link set can0 down + sudo ip link set can0 type can bitrate 500000 + sudo ip link set can0 up + +500000 bit/s is a common rate for OBD-II diagnostics. +If you're connecting straight to a car's OBD port, this is the speed +that most cars (but not all!) expect. + +After this, you can set out as usual with candump, cansniffer, etc. + + + +How to check the controller version +------------------------------------ + +Use a terminal program to attach to the controller. + +After issuing the "``AT WS``" command, the controller will respond with +its version:: + + >AT WS + + + ELM327 v1.4b + + > + +Note that clones may claim to be any version they like. +It is not indicative of their actual feature set. + + + + +Communication example +---------------------- + +This is a short and incomplete introduction on how to talk to an ELM327. +It is here to guide understanding of the controller's and the driver's +limitation (listed below) as well as manual testing. + + +The ELM327 has two modes: + +- Command mode +- Reception mode + +In command mode, it expects one command per line, terminated by CR. +By default, the prompt is a "``>``", after which a command can be +entered:: + + >ATE1 + OK + > + +The init script in the driver switches off several configuration options +that are only meaningful in the original OBD scenario the chip is meant +for, and are actually a hindrance for can327. + + +When a command is not recognized, such as by an older version of the +ELM327, a question mark is printed as a response instead of OK:: + + >ATUNKNOWN + ? + > + +At present, can327 does not evaluate this response. See the section +below on known limitations for details. + + +When a CAN frame is to be sent, the target address is configured, after +which the frame is sent as a command that consists of the data's hex +dump:: + + >ATSH123 + OK + >DEADBEEF12345678 + OK + > + +The above interaction sends the SFF frame "``DE AD BE EF 12 34 56 78``" +with (11 bit) CAN ID ``0x123``. +For this to function, the controller must be configured for SFF sending +mode (using "``AT PB``", see code or datasheet). + + +Once a frame has been sent and wait-for-reply mode is on (``ATR1``, +configured on ``listen-only=off``), or when the reply timeout expires +and the driver sets the controller into monitoring mode (``ATMA``), +the ELM327 will send one line for each received CAN frame, consisting +of CAN ID, DLC, and data:: + + 123 8 DEADBEEF12345678 + +For EFF (29 bit) CAN frames, the address format is slightly different, +which can327 uses to tell the two apart:: + + 12 34 56 78 8 DEADBEEF12345678 + +The ELM327 will receive both SFF and EFF frames - the current CAN +config (``ATPB``) does not matter. + + +If the ELM327's internal UART sending buffer runs full, it will abort +the monitoring mode, print "BUFFER FULL" and drop back into command +mode. Note that in this case, unlike with other error messages, the +error message may appear on the same line as the last (usually +incomplete) data frame:: + + 12 34 56 78 8 DEADBEEF123 BUFFER FULL + + + +Known limitations of the controller +------------------------------------ + +- Clone devices ("v1.5" and others) + + Sending RTR frames is not supported and will be dropped silently. + + Receiving RTR with DLC 8 will appear to be a regular frame with + the last received frame's DLC and payload. + + "``AT CSM``" (CAN Silent Monitoring, i.e. don't send CAN ACKs) is + not supported, and is hard coded to ON. Thus, frames are not ACKed + while listening: "``AT MA``" (Monitor All) will always be "silent". + However, immediately after sending a frame, the ELM327 will be in + "receive reply" mode, in which it *does* ACK any received frames. + Once the bus goes silent, or an error occurs (such as BUFFER FULL), + or the receive reply timeout runs out, the ELM327 will end reply + reception mode on its own and can327 will fall back to "``AT MA``" + in order to keep monitoring the bus. + + Other limitations may apply, depending on the clone and the quality + of its firmware. + + +- All versions + + No full duplex operation is supported. The driver will switch + between input/output mode as quickly as possible. + + The length of outgoing RTR frames cannot be set. In fact, some + clones (tested with one identifying as "``v1.5``") are unable to + send RTR frames at all. + + We don't have a way to get real-time notifications on CAN errors. + While there is a command (``AT CS``) to retrieve some basic stats, + we don't poll it as it would force us to interrupt reception mode. + + +- Versions prior to 1.4b + + These versions do not send CAN ACKs when in monitoring mode (AT MA). + However, they do send ACKs while waiting for a reply immediately + after sending a frame. The driver maximizes this time to make the + controller as useful as possible. + + Starting with version 1.4b, the ELM327 supports the "``AT CSM``" + command, and the "listen-only" CAN option will take effect. + + +- Versions prior to 1.4 + + These chips do not support the "``AT PB``" command, and thus cannot + change bitrate or SFF/EFF mode on-the-fly. This will have to be + programmed by the user before attaching the line discipline. See the + data sheet for details. + + +- Versions prior to 1.3 + + These chips cannot be used at all with can327. They do not support + the "``AT D1``" command, which is necessary to avoid parsing conflicts + on incoming data, as well as distinction of RTR frame lengths. + + Specifically, this allows for easy distinction of SFF and EFF + frames, and to check whether frames are complete. While it is possible + to deduce the type and length from the length of the line the ELM327 + sends us, this method fails when the ELM327's UART output buffer + overruns. It may abort sending in the middle of the line, which will + then be mistaken for something else. + + + +Known limitations of the driver +-------------------------------- + +- No 8/7 timing. + + ELM327 can only set CAN bitrates that are of the form 500000/n, where + n is an integer divisor. + However there is an exception: With a separate flag, it may set the + speed to be 8/7 of the speed indicated by the divisor. + This mode is not currently implemented. + +- No evaluation of command responses. + + The ELM327 will reply with OK when a command is understood, and with ? + when it is not. The driver does not currently check this, and simply + assumes that the chip understands every command. + The driver is built such that functionality degrades gracefully + nevertheless. See the section on known limitations of the controller. + +- No use of hardware CAN ID filtering + + An ELM327's UART sending buffer will easily overflow on heavy CAN bus + load, resulting in the "``BUFFER FULL``" message. Using the hardware + filters available through "``AT CF xxx``" and "``AT CM xxx``" would be + helpful here, however SocketCAN does not currently provide a facility + to make use of such hardware features. + + + +Rationale behind the chosen configuration +------------------------------------------ + +``AT E1`` + Echo on + + We need this to be able to get a prompt reliably. + +``AT S1`` + Spaces on + + We need this to distinguish 11/29 bit CAN addresses received. + + Note: + We can usually do this using the line length (odd/even), + but this fails if the line is not transmitted fully to + the host (BUFFER FULL). + +``AT D1`` + DLC on + + We need this to tell the "length" of RTR frames. + + + +A note on CAN bus termination +------------------------------ + +Your adapter may have resistors soldered in which are meant to terminate +the bus. This is correct when it is plugged into a OBD-II socket, but +not helpful when trying to tap into the middle of an existing CAN bus. + +If communications don't work with the adapter connected, check for the +termination resistors on its PCB and try removing them. diff --git a/Documentation/networking/device_drivers/can/index.rst b/Documentation/networking/device_drivers/can/index.rst index 0c3cc6633559..6a8a4f74fa26 100644 --- a/Documentation/networking/device_drivers/can/index.rst +++ b/Documentation/networking/device_drivers/can/index.rst @@ -10,6 +10,7 @@ Contents: .. toctree:: :maxdepth: 2 + can327 ctu/ctucanfd-driver freescale/flexcan diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 4e06684d079b..7f1777173abb 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -42,7 +42,6 @@ Contents: mellanox/mlx5 microsoft/netvsc neterion/s2io - neterion/vxge netronome/nfp pensando/ionic smsc/smc9 @@ -52,6 +51,7 @@ Contents: ti/am65_nuss_cpsw_switchdev ti/tlan toshiba/spider_net + wangxun/txgbe .. only:: subproject and html diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst index 67b7a701ce9e..dc2e60ced927 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst @@ -901,6 +901,15 @@ To enable/disable UDP Segmentation Offload, issue the following command:: # ethtool -K <ethX> tx-udp-segmentation [off|on] +GNSS module +----------- +Allows user to read messages from the GNSS module and write supported commands. +If the module is physically present, driver creates 2 TTYs for each supported +device in /dev, ttyGNSS_<device>:<function>_0 and _1. First one (_0) is RW and +the second one is RO. +The protocol of write commands is dependent on the GNSS module as the driver +writes raw bytes from the TTY to the GNSS i2c. Please refer to the module +documentation for details. Performance Optimization ======================== diff --git a/Documentation/networking/device_drivers/ethernet/neterion/vxge.rst b/Documentation/networking/device_drivers/ethernet/neterion/vxge.rst deleted file mode 100644 index 589c6b15c63d..000000000000 --- a/Documentation/networking/device_drivers/ethernet/neterion/vxge.rst +++ /dev/null @@ -1,115 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============================================================================== -Neterion's (Formerly S2io) X3100 Series 10GbE PCIe Server Adapter Linux driver -============================================================================== - -.. Contents - - 1) Introduction - 2) Features supported - 3) Configurable driver parameters - 4) Troubleshooting - -1. Introduction -=============== - -This Linux driver supports all Neterion's X3100 series 10 GbE PCIe I/O -Virtualized Server adapters. - -The X3100 series supports four modes of operation, configurable via -firmware: - - - Single function mode - - Multi function mode - - SRIOV mode - - MRIOV mode - -The functions share a 10GbE link and the pci-e bus, but hardly anything else -inside the ASIC. Features like independent hw reset, statistics, bandwidth/ -priority allocation and guarantees, GRO, TSO, interrupt moderation etc are -supported independently on each function. - -(See below for a complete list of features supported for both IPv4 and IPv6) - -2. Features supported -===================== - -i) Single function mode (up to 17 queues) - -ii) Multi function mode (up to 17 functions) - -iii) PCI-SIG's I/O Virtualization - - - Single Root mode: v1.0 (up to 17 functions) - - Multi-Root mode: v1.0 (up to 17 functions) - -iv) Jumbo frames - - X3100 Series supports MTU up to 9600 bytes, modifiable using - ip command. - -v) Offloads supported: (Enabled by default) - - - Checksum offload (TCP/UDP/IP) on transmit and receive paths - - TCP Segmentation Offload (TSO) on transmit path - - Generic Receive Offload (GRO) on receive path - -vi) MSI-X: (Enabled by default) - - Resulting in noticeable performance improvement (up to 7% on certain - platforms). - -vii) NAPI: (Enabled by default) - - For better Rx interrupt moderation. - -viii)RTH (Receive Traffic Hash): (Enabled by default) - - Receive side steering for better scaling. - -ix) Statistics - - Comprehensive MAC-level and software statistics displayed using - "ethtool -S" option. - -x) Multiple hardware queues: (Enabled by default) - - Up to 17 hardware based transmit and receive data channels, with - multiple steering options (transmit multiqueue enabled by default). - -3) Configurable driver parameters: ----------------------------------- - -i) max_config_dev - Specifies maximum device functions to be enabled. - - Valid range: 1-8 - -ii) max_config_port - Specifies number of ports to be enabled. - - Valid range: 1,2 - - Default: 1 - -iii) max_config_vpath - Specifies maximum VPATH(s) configured for each device function. - - Valid range: 1-17 - -iv) vlan_tag_strip - Enables/disables vlan tag stripping from all received tagged frames that - are not replicated at the internal L2 switch. - - Valid range: 0,1 (disabled, enabled respectively) - - Default: 1 - -v) addr_learn_en - Enable learning the mac address of the guest OS interface in - virtualization environment. - - Valid range: 0,1 (disabled, enabled respectively) - - Default: 0 diff --git a/Documentation/networking/device_drivers/ethernet/wangxun/txgbe.rst b/Documentation/networking/device_drivers/ethernet/wangxun/txgbe.rst new file mode 100644 index 000000000000..eaa87dbe8848 --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/wangxun/txgbe.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================================================ +Linux Base Driver for WangXun(R) 10 Gigabit PCI Express Adapters +================================================================ + +WangXun 10 Gigabit Linux driver. +Copyright (c) 2015 - 2022 Beijing WangXun Technology Co., Ltd. + + +Contents +======== + +- Support + + +Support +======= +If you got any problem, contact Wangxun support team via [email protected] +and Cc: netdev. diff --git a/Documentation/networking/devlink/devlink-selftests.rst b/Documentation/networking/devlink/devlink-selftests.rst new file mode 100644 index 000000000000..c0aa1f3aef0d --- /dev/null +++ b/Documentation/networking/devlink/devlink-selftests.rst @@ -0,0 +1,38 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +================= +Devlink Selftests +================= + +The ``devlink-selftests`` API allows executing selftests on the device. + +Tests Mask +========== +The ``devlink-selftests`` command should be run with a mask indicating +the tests to be executed. + +Tests Description +================= +The following is a list of tests that drivers may execute. + +.. list-table:: List of tests + :widths: 5 90 + + * - Name + - Description + * - ``DEVLINK_SELFTEST_FLASH`` + - Devices may have the firmware on non-volatile memory on the board, e.g. + flash. This particular test helps to run a flash selftest on the device. + Implementation of the test is left to the driver/firmware. + +example usage +------------- + +.. code:: shell + + # Query selftests supported on the devlink device + $ devlink dev selftests show DEV + # Query selftests supported on all devlink devices + $ devlink dev selftests show + # Executes selftests on the device + $ devlink dev selftests run DEV id flash diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index 850715512293..e3a5f985673e 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -38,6 +38,7 @@ general. devlink-region devlink-resource devlink-reload + devlink-selftests devlink-trap devlink-linecard diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst index cf857cb4ba8f..433962225bd4 100644 --- a/Documentation/networking/devlink/mlxsw.rst +++ b/Documentation/networking/devlink/mlxsw.rst @@ -58,6 +58,30 @@ The ``mlxsw`` driver reports the following versions - running - Three digit firmware version +Line card auxiliary device info versions +======================================== + +The ``mlxsw`` driver reports the following versions for line card auxiliary device + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``hw.revision`` + - fixed + - The hardware revision for this line card + * - ``ini.version`` + - running + - Version of line card INI loaded + * - ``fw.psid`` + - fixed + - Line card device PSID + * - ``fw.version`` + - running + - Three digit firmware version of line card device + Driver-specific Traps ===================== diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst index ed7fa76e7a40..d742ba6bd211 100644 --- a/Documentation/networking/dsa/dsa.rst +++ b/Documentation/networking/dsa/dsa.rst @@ -503,26 +503,108 @@ per-port PHY specific details: interface connection, MDIO bus location, etc. Driver development ================== -DSA switch drivers need to implement a dsa_switch_ops structure which will +DSA switch drivers need to implement a ``dsa_switch_ops`` structure which will contain the various members described below. -``register_switch_driver()`` registers this dsa_switch_ops in its internal list -of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite. +Probing, registration and device lifetime +----------------------------------------- -Unless requested differently by setting the priv_size member accordingly, DSA -does not allocate any driver private context space. +DSA switches are regular ``device`` structures on buses (be they platform, SPI, +I2C, MDIO or otherwise). The DSA framework is not involved in their probing +with the device core. + +Switch registration from the perspective of a driver means passing a valid +``struct dsa_switch`` pointer to ``dsa_register_switch()``, usually from the +switch driver's probing function. The following members must be valid in the +provided structure: + +- ``ds->dev``: will be used to parse the switch's OF node or platform data. + +- ``ds->num_ports``: will be used to create the port list for this switch, and + to validate the port indices provided in the OF node. + +- ``ds->ops``: a pointer to the ``dsa_switch_ops`` structure holding the DSA + method implementations. + +- ``ds->priv``: backpointer to a driver-private data structure which can be + retrieved in all further DSA method callbacks. + +In addition, the following flags in the ``dsa_switch`` structure may optionally +be configured to obtain driver-specific behavior from the DSA core. Their +behavior when set is documented through comments in ``include/net/dsa.h``. + +- ``ds->vlan_filtering_is_global`` + +- ``ds->needs_standalone_vlan_filtering`` + +- ``ds->configure_vlan_while_not_filtering`` + +- ``ds->untag_bridge_pvid`` + +- ``ds->assisted_learning_on_cpu_port`` + +- ``ds->mtu_enforcement_ingress`` + +- ``ds->fdb_isolation`` + +Internally, DSA keeps an array of switch trees (group of switches) global to +the kernel, and attaches a ``dsa_switch`` structure to a tree on registration. +The tree ID to which the switch is attached is determined by the first u32 +number of the ``dsa,member`` property of the switch's OF node (0 if missing). +The switch ID within the tree is determined by the second u32 number of the +same OF property (0 if missing). Registering multiple switches with the same +switch ID and tree ID is illegal and will cause an error. Using platform data, +a single switch and a single switch tree is permitted. + +In case of a tree with multiple switches, probing takes place asymmetrically. +The first N-1 callers of ``dsa_register_switch()`` only add their ports to the +port list of the tree (``dst->ports``), each port having a backpointer to its +associated switch (``dp->ds``). Then, these switches exit their +``dsa_register_switch()`` call early, because ``dsa_tree_setup_routing_table()`` +has determined that the tree is not yet complete (not all ports referenced by +DSA links are present in the tree's port list). The tree becomes complete when +the last switch calls ``dsa_register_switch()``, and this triggers the effective +continuation of initialization (including the call to ``ds->ops->setup()``) for +all switches within that tree, all as part of the calling context of the last +switch's probe function. + +The opposite of registration takes place when calling ``dsa_unregister_switch()``, +which removes a switch's ports from the port list of the tree. The entire tree +is torn down when the first switch unregisters. + +It is mandatory for DSA switch drivers to implement the ``shutdown()`` callback +of their respective bus, and call ``dsa_switch_shutdown()`` from it (a minimal +version of the full teardown performed by ``dsa_unregister_switch()``). +The reason is that DSA keeps a reference on the master net device, and if the +driver for the master device decides to unbind on shutdown, DSA's reference +will block that operation from finalizing. + +Either ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` must be called, +but not both, and the device driver model permits the bus' ``remove()`` method +to be called even if ``shutdown()`` was already called. Therefore, drivers are +expected to implement a mutual exclusion method between ``remove()`` and +``shutdown()`` by setting their drvdata to NULL after any of these has run, and +checking whether the drvdata is NULL before proceeding to take any action. + +After ``dsa_switch_shutdown()`` or ``dsa_unregister_switch()`` was called, no +further callbacks via the provided ``dsa_switch_ops`` may take place, and the +driver may free the data structures associated with the ``dsa_switch``. Switch configuration -------------------- -- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported, - should be a valid value from the ``dsa_tag_protocol`` enum +- ``get_tag_protocol``: this is to indicate what kind of tagging protocol is + supported, should be a valid value from the ``dsa_tag_protocol`` enum. + The returned information does not have to be static; the driver is passed the + CPU port number, as well as the tagging protocol of a possibly stacked + upstream switch, in case there are hardware limitations in terms of supported + tag formats. -- ``probe``: probe routine which will be invoked by the DSA platform device upon - registration to test for the presence/absence of a switch device. For MDIO - devices, it is recommended to issue a read towards internal registers using - the switch pseudo-PHY and return whether this is a supported device. For other - buses, return a non-NULL string +- ``change_tag_protocol``: when the default tagging protocol has compatibility + problems with the master or other issues, the driver may support changing it + at runtime, either through a device tree property or through sysfs. In that + case, further calls to ``get_tag_protocol`` should report the protocol in + current use. - ``setup``: setup function for the switch, this function is responsible for setting up the ``dsa_switch_ops`` private structure with all it needs: register maps, @@ -535,7 +617,17 @@ Switch configuration fully configured and ready to serve any kind of request. It is recommended to issue a software reset of the switch during this setup function in order to avoid relying on what a previous software agent such as a bootloader/firmware - may have previously configured. + may have previously configured. The method responsible for undoing any + applicable allocations or operations done here is ``teardown``. + +- ``port_setup`` and ``port_teardown``: methods for initialization and + destruction of per-port data structures. It is mandatory for some operations + such as registering and unregistering devlink port regions to be done from + these methods, otherwise they are optional. A port will be torn down only if + it has been previously set up. It is possible for a port to be set up during + probing only to be torn down immediately afterwards, for example in case its + PHY cannot be found. In this case, probing of the DSA switch continues + without that particular port. PHY devices and link management ------------------------------- @@ -635,26 +727,198 @@ Power management ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is disabled while being a bridge member +Address databases +----------------- + +Switching hardware is expected to have a table for FDB entries, however not all +of them are active at the same time. An address database is the subset (partition) +of FDB entries that is active (can be matched by address learning on RX, or FDB +lookup on TX) depending on the state of the port. An address database may +occasionally be called "FID" (Filtering ID) in this document, although the +underlying implementation may choose whatever is available to the hardware. + +For example, all ports that belong to a VLAN-unaware bridge (which is +*currently* VLAN-unaware) are expected to learn source addresses in the +database associated by the driver with that bridge (and not with other +VLAN-unaware bridges). During forwarding and FDB lookup, a packet received on a +VLAN-unaware bridge port should be able to find a VLAN-unaware FDB entry having +the same MAC DA as the packet, which is present on another port member of the +same bridge. At the same time, the FDB lookup process must be able to not find +an FDB entry having the same MAC DA as the packet, if that entry points towards +a port which is a member of a different VLAN-unaware bridge (and is therefore +associated with a different address database). + +Similarly, each VLAN of each offloaded VLAN-aware bridge should have an +associated address database, which is shared by all ports which are members of +that VLAN, but not shared by ports belonging to different bridges that are +members of the same VID. + +In this context, a VLAN-unaware database means that all packets are expected to +match on it irrespective of VLAN ID (only MAC address lookup), whereas a +VLAN-aware database means that packets are supposed to match based on the VLAN +ID from the classified 802.1Q header (or the pvid if untagged). + +At the bridge layer, VLAN-unaware FDB entries have the special VID value of 0, +whereas VLAN-aware FDB entries have non-zero VID values. Note that a +VLAN-unaware bridge may have VLAN-aware (non-zero VID) FDB entries, and a +VLAN-aware bridge may have VLAN-unaware FDB entries. As in hardware, the +software bridge keeps separate address databases, and offloads to hardware the +FDB entries belonging to these databases, through switchdev, asynchronously +relative to the moment when the databases become active or inactive. + +When a user port operates in standalone mode, its driver should configure it to +use a separate database called a port private database. This is different from +the databases described above, and should impede operation as standalone port +(packet in, packet out to the CPU port) as little as possible. For example, +on ingress, it should not attempt to learn the MAC SA of ingress traffic, since +learning is a bridging layer service and this is a standalone port, therefore +it would consume useless space. With no address learning, the port private +database should be empty in a naive implementation, and in this case, all +received packets should be trivially flooded to the CPU port. + +DSA (cascade) and CPU ports are also called "shared" ports because they service +multiple address databases, and the database that a packet should be associated +to is usually embedded in the DSA tag. This means that the CPU port may +simultaneously transport packets coming from a standalone port (which were +classified by hardware in one address database), and from a bridge port (which +were classified to a different address database). + +Switch drivers which satisfy certain criteria are able to optimize the naive +configuration by removing the CPU port from the flooding domain of the switch, +and just program the hardware with FDB entries pointing towards the CPU port +for which it is known that software is interested in those MAC addresses. +Packets which do not match a known FDB entry will not be delivered to the CPU, +which will save CPU cycles required for creating an skb just to drop it. + +DSA is able to perform host address filtering for the following kinds of +addresses: + +- Primary unicast MAC addresses of ports (``dev->dev_addr``). These are + associated with the port private database of the respective user port, + and the driver is notified to install them through ``port_fdb_add`` towards + the CPU port. + +- Secondary unicast and multicast MAC addresses of ports (addresses added + through ``dev_uc_add()`` and ``dev_mc_add()``). These are also associated + with the port private database of the respective user port. + +- Local/permanent bridge FDB entries (``BR_FDB_LOCAL``). These are the MAC + addresses of the bridge ports, for which packets must be terminated locally + and not forwarded. They are associated with the address database for that + bridge. + +- Static bridge FDB entries installed towards foreign (non-DSA) interfaces + present in the same bridge as some DSA switch ports. These are also + associated with the address database for that bridge. + +- Dynamically learned FDB entries on foreign interfaces present in the same + bridge as some DSA switch ports, only if ``ds->assisted_learning_on_cpu_port`` + is set to true by the driver. These are associated with the address database + for that bridge. + +For various operations detailed below, DSA provides a ``dsa_db`` structure +which can be of the following types: + +- ``DSA_DB_PORT``: the FDB (or MDB) entry to be installed or deleted belongs to + the port private database of user port ``db->dp``. +- ``DSA_DB_BRIDGE``: the entry belongs to one of the address databases of bridge + ``db->bridge``. Separation between the VLAN-unaware database and the per-VID + databases of this bridge is expected to be done by the driver. +- ``DSA_DB_LAG``: the entry belongs to the address database of LAG ``db->lag``. + Note: ``DSA_DB_LAG`` is currently unused and may be removed in the future. + +The drivers which act upon the ``dsa_db`` argument in ``port_fdb_add``, +``port_mdb_add`` etc should declare ``ds->fdb_isolation`` as true. + +DSA associates each offloaded bridge and each offloaded LAG with a one-based ID +(``struct dsa_bridge :: num``, ``struct dsa_lag :: id``) for the purposes of +refcounting addresses on shared ports. Drivers may piggyback on DSA's numbering +scheme (the ID is readable through ``db->bridge.num`` and ``db->lag.id`` or may +implement their own. + +Only the drivers which declare support for FDB isolation are notified of FDB +entries on the CPU port belonging to ``DSA_DB_PORT`` databases. +For compatibility/legacy reasons, ``DSA_DB_BRIDGE`` addresses are notified to +drivers even if they do not support FDB isolation. However, ``db->bridge.num`` +and ``db->lag.id`` are always set to 0 in that case (to denote the lack of +isolation, for refcounting purposes). + +Note that it is not mandatory for a switch driver to implement physically +separate address databases for each standalone user port. Since FDB entries in +the port private databases will always point to the CPU port, there is no risk +for incorrect forwarding decisions. In this case, all standalone ports may +share the same database, but the reference counting of host-filtered addresses +(not deleting the FDB entry for a port's MAC address if it's still in use by +another port) becomes the responsibility of the driver, because DSA is unaware +that the port databases are in fact shared. This can be achieved by calling +``dsa_fdb_present_in_other_db()`` and ``dsa_mdb_present_in_other_db()``. +The down side is that the RX filtering lists of each user port are in fact +shared, which means that user port A may accept a packet with a MAC DA it +shouldn't have, only because that MAC address was in the RX filtering list of +user port B. These packets will still be dropped in software, however. + Bridge layer ------------ +Offloading the bridge forwarding plane is optional and handled by the methods +below. They may be absent, return -EOPNOTSUPP, or ``ds->max_num_bridges`` may +be non-zero and exceeded, and in this case, joining a bridge port is still +possible, but the packet forwarding will take place in software, and the ports +under a software bridge must remain configured in the same way as for +standalone operation, i.e. have all bridging service functions (address +learning etc) disabled, and send all received packets to the CPU port only. + +Concretely, a port starts offloading the forwarding plane of a bridge once it +returns success to the ``port_bridge_join`` method, and stops doing so after +``port_bridge_leave`` has been called. Offloading the bridge means autonomously +learning FDB entries in accordance with the software bridge port's state, and +autonomously forwarding (or flooding) received packets without CPU intervention. +This is optional even when offloading a bridge port. Tagging protocol drivers +are expected to call ``dsa_default_offload_fwd_mark(skb)`` for packets which +have already been autonomously forwarded in the forwarding domain of the +ingress switch port. DSA, through ``dsa_port_devlink_setup()``, considers all +switch ports part of the same tree ID to be part of the same bridge forwarding +domain (capable of autonomous forwarding to each other). + +Offloading the TX forwarding process of a bridge is a distinct concept from +simply offloading its forwarding plane, and refers to the ability of certain +driver and tag protocol combinations to transmit a single skb coming from the +bridge device's transmit function to potentially multiple egress ports (and +thereby avoid its cloning in software). + +Packets for which the bridge requests this behavior are called data plane +packets and have ``skb->offload_fwd_mark`` set to true in the tag protocol +driver's ``xmit`` function. Data plane packets are subject to FDB lookup, +hardware learning on the CPU port, and do not override the port STP state. +Additionally, replication of data plane packets (multicast, flooding) is +handled in hardware and the bridge driver will transmit a single skb for each +packet that may or may not need replication. + +When the TX forwarding offload is enabled, the tag protocol driver is +responsible to inject packets into the data plane of the hardware towards the +correct bridging domain (FID) that the port is a part of. The port may be +VLAN-unaware, and in this case the FID must be equal to the FID used by the +driver for its VLAN-unaware address database associated with that bridge. +Alternatively, the bridge may be VLAN-aware, and in that case, it is guaranteed +that the packet is also VLAN-tagged with the VLAN ID that the bridge processed +this packet in. It is the responsibility of the hardware to untag the VID on +the egress-untagged ports, or keep the tag on the egress-tagged ones. + - ``port_bridge_join``: bridge layer function invoked when a given switch port is added to a bridge, this function should do what's necessary at the switch level to permit the joining port to be added to the relevant logical domain for it to ingress/egress traffic with other members of the bridge. + By setting the ``tx_fwd_offload`` argument to true, the TX forwarding process + of this bridge is also offloaded. - ``port_bridge_leave``: bridge layer function invoked when a given switch port is removed from a bridge, this function should do what's necessary at the switch level to deny the leaving port from ingress/egress traffic from the - remaining bridge members. When the port leaves the bridge, it should be aged - out at the switch hardware for the switch to (re) learn MAC addresses behind - this port. + remaining bridge members. - ``port_stp_state_set``: bridge layer function invoked when a given switch port STP state is computed by the bridge layer and should be propagated to switch - hardware to forward/block/learn traffic. The switch driver is responsible for - computing a STP state change based on current and asked parameters and perform - the relevant ageing based on the intersection results + hardware to forward/block/learn traffic. - ``port_bridge_flags``: bridge layer function invoked when a port must configure its settings for e.g. flooding of unknown traffic or source address @@ -667,21 +931,11 @@ Bridge layer CPU port, and flooding towards the CPU port should also be enabled, due to a lack of an explicit address filtering mechanism in the DSA core. -- ``port_bridge_tx_fwd_offload``: bridge layer function invoked after - ``port_bridge_join`` when a driver sets ``ds->num_fwd_offloading_bridges`` to - a non-zero value. Returning success in this function activates the TX - forwarding offload bridge feature for this port, which enables the tagging - protocol driver to inject data plane packets towards the bridging domain that - the port is a part of. Data plane packets are subject to FDB lookup, hardware - learning on the CPU port, and do not override the port STP state. - Additionally, replication of data plane packets (multicast, flooding) is - handled in hardware and the bridge driver will transmit a single skb for each - packet that needs replication. The method is provided as a configuration - point for drivers that need to configure the hardware for enabling this - feature. - -- ``port_bridge_tx_fwd_unoffload``: bridge layer function invoked when a driver - leaves a bridge port which had the TX forwarding offload feature enabled. +- ``port_fast_age``: bridge layer function invoked when flushing the + dynamically learned FDB entries on the port is necessary. This is called when + transitioning from an STP state where learning should take place to an STP + state where it shouldn't, or when leaving a bridge, or when address learning + is turned off via ``port_bridge_flags``. Bridge VLAN filtering --------------------- @@ -697,55 +951,44 @@ Bridge VLAN filtering allowed. - ``port_vlan_add``: bridge layer function invoked when a VLAN is configured - (tagged or untagged) for the given switch port. If the operation is not - supported by the hardware, this function should return ``-EOPNOTSUPP`` to - inform the bridge code to fallback to a software implementation. + (tagged or untagged) for the given switch port. The CPU port becomes a member + of a VLAN only if a foreign bridge port is also a member of it (and + forwarding needs to take place in software), or the VLAN is installed to the + VLAN group of the bridge device itself, for termination purposes + (``bridge vlan add dev br0 vid 100 self``). VLANs on shared ports are + reference counted and removed when there is no user left. Drivers do not need + to manually install a VLAN on the CPU port. - ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the given switch port -- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback - function that the driver has to call for each VLAN the given port is a member - of. A switchdev object is used to carry the VID and bridge flags. - - ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a Forwarding Database entry, the switch hardware should be programmed with the specified address in the specified VLAN Id in the forwarding database - associated with this VLAN ID. If the operation is not supported, this - function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to - a software implementation. - -.. note:: VLAN ID 0 corresponds to the port private database, which, in the context - of DSA, would be its port-based VLAN, used by the associated bridge device. + associated with this VLAN ID. - ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a Forwarding Database entry, the switch hardware should be programmed to delete the specified MAC address from the specified VLAN ID if it was mapped into this port forwarding database -- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback - function that the driver has to call for each MAC address known to be behind - the given port. A switchdev object is used to carry the VID and FDB info. +- ``port_fdb_dump``: bridge bypass function invoked by ``ndo_fdb_dump`` on the + physical DSA port interfaces. Since DSA does not attempt to keep in sync its + hardware FDB entries with the software bridge, this method is implemented as + a means to view the entries visible on user ports in the hardware database. + The entries reported by this function have the ``self`` flag in the output of + the ``bridge fdb show`` command. - ``port_mdb_add``: bridge layer function invoked when the bridge wants to install - a multicast database entry. If the operation is not supported, this function - should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a - software implementation. The switch hardware should be programmed with the + a multicast database entry. The switch hardware should be programmed with the specified address in the specified VLAN ID in the forwarding database associated with this VLAN ID. -.. note:: VLAN ID 0 corresponds to the port private database, which, in the context - of DSA, would be its port-based VLAN, used by the associated bridge device. - - ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a multicast database entry, the switch hardware should be programmed to delete the specified MAC address from the specified VLAN ID if it was mapped into this port forwarding database. -- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback - function that the driver has to call for each MAC address known to be behind - the given port. A switchdev object is used to carry the VID and MDB info. - Link aggregation ---------------- diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 9f41961d11d5..56cd4ea059b2 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -202,6 +202,12 @@ neigh/default/unres_qlen - INTEGER Default: 101 +neigh/default/interval_probe_time_ms - INTEGER + The probe interval for neighbor entries with NTF_MANAGED flag, + the min value is 1. + + Default: 5000 + mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. @@ -630,6 +636,16 @@ tcp_recovery - INTEGER Default: 0x1 +tcp_reflect_tos - BOOLEAN + For listening sockets, reuse the DSCP value of the initial SYN message + for outgoing packets. This allows to have both directions of a TCP + stream to use the same DSCP value, assuming DSCP remains unchanged for + the lifetime of the connection. + + This options affects both IPv4 and IPv6. + + Default: 0 (disabled) + tcp_reordering - INTEGER Initial reordering level of packets in a TCP stream. TCP stack can then dynamically adjust flow reordering level @@ -1052,11 +1068,7 @@ udp_rmem_min - INTEGER Default: 4K udp_wmem_min - INTEGER - Minimal size of send buffer used by UDP sockets in moderation. - Each UDP socket is able to use the size for sending data, even if - total pages of UDP sockets exceed udp_mem pressure. The unit is byte. - - Default: 4K + UDP does not have tx memory accounting and this tunable has no effect. RAW variables ============= @@ -1085,7 +1097,7 @@ cipso_cache_enable - BOOLEAN cipso_cache_bucket_size - INTEGER The CIPSO label cache consists of a fixed size hash table with each hash bucket containing a number of cache entries. This variable limits - the number of entries in each hash bucket; the larger the value the + the number of entries in each hash bucket; the larger the value is, the more CIPSO label mappings that can be cached. When the number of entries in a given hash bucket reaches this limit adding new entries causes the oldest entry in the bucket to be removed to make room. @@ -1179,7 +1191,7 @@ ip_autobind_reuse - BOOLEAN option should only be set by experts. Default: 0 -ip_dynaddr - BOOLEAN +ip_dynaddr - INTEGER If set non-zero, enables support for dynamic addresses. If set to a non-zero value larger than 1, a kernel log message will be printed when dynamic address rewriting @@ -1627,12 +1639,15 @@ arp_notify - BOOLEAN or hardware address changes. == ========================================================== -arp_accept - BOOLEAN - Define behavior for gratuitous ARP frames who's IP is not - already present in the ARP table: +arp_accept - INTEGER + Define behavior for accepting gratuitous ARP (garp) frames from devices + that are not already present in the ARP table: - 0 - don't create new entries in the ARP table - 1 - create new entries in the ARP table + - 2 - create new entries only if the source IP address is in the same + subnet as an address configured on the interface that received the + garp message. Both replies and requests type gratuitous arp will trigger the ARP table to be updated, if this setting is on. @@ -2474,27 +2489,36 @@ drop_unsolicited_na - BOOLEAN By default this is turned off. -accept_untracked_na - BOOLEAN - Add a new neighbour cache entry in STALE state for routers on receiving a - neighbour advertisement (either solicited or unsolicited) with target - link-layer address option specified if no neighbour entry is already - present for the advertised IPv6 address. Without this knob, NAs received - for untracked addresses (absent in neighbour cache) are silently ignored. +accept_untracked_na - INTEGER + Define behavior for accepting neighbor advertisements from devices that + are absent in the neighbor cache: - This is as per router-side behaviour documented in RFC9131. + - 0 - (default) Do not accept unsolicited and untracked neighbor + advertisements. - This has lower precedence than drop_unsolicited_na. + - 1 - Add a new neighbor cache entry in STALE state for routers on + receiving a neighbor advertisement (either solicited or unsolicited) + with target link-layer address option specified if no neighbor entry + is already present for the advertised IPv6 address. Without this knob, + NAs received for untracked addresses (absent in neighbor cache) are + silently ignored. - This will optimize the return path for the initial off-link communication - that is initiated by a directly connected host, by ensuring that - the first-hop router which turns on this setting doesn't have to - buffer the initial return packets to do neighbour-solicitation. - The prerequisite is that the host is configured to send - unsolicited neighbour advertisements on interface bringup. - This setting should be used in conjunction with the ndisc_notify setting - on the host to satisfy this prerequisite. + This is as per router-side behavior documented in RFC9131. - By default this is turned off. + This has lower precedence than drop_unsolicited_na. + + This will optimize the return path for the initial off-link + communication that is initiated by a directly connected host, by + ensuring that the first-hop router which turns on this setting doesn't + have to buffer the initial return packets to do neighbor-solicitation. + The prerequisite is that the host is configured to send unsolicited + neighbor advertisements on interface bringup. This setting should be + used in conjunction with the ndisc_notify setting on the host to + satisfy this prerequisite. + + - 2 - Extend option (1) to add a new neighbor cache entry only if the + source IP address is in the same subnet as an address configured on + the interface that received the neighbor advertisement. enhanced_dad - BOOLEAN Include a nonce option in the IPv6 neighbor solicitation messages used for @@ -2870,7 +2894,14 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max Default: 4K sctp_wmem - vector of 3 INTEGERs: min, default, max - Currently this tunable has no effect. + Only the first value ("min") is used, "default" and "max" are + ignored. + + min: Minimum size of send buffer that can be used by SCTP sockets. + It is guaranteed to each SCTP socket (but not association) even + under moderate memory pressure. + + Default: 4K addr_scope_policy - INTEGER Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst index 328f8d39eeb3..55b65f607a64 100644 --- a/Documentation/networking/sfp-phylink.rst +++ b/Documentation/networking/sfp-phylink.rst @@ -203,7 +203,7 @@ this documentation. The :c:func:`validate` method should mask the supplied supported mask, and ``state->advertising`` with the supported ethtool link modes. These are the new ethtool link modes, so bitmask operations must be - used. For an example, see drivers/net/ethernet/marvell/mvneta.c. + used. For an example, see ``drivers/net/ethernet/marvell/mvneta.c``. The :c:func:`mac_link_state` method is used to read the link state from the MAC, and report back the settings that the MAC is currently @@ -224,7 +224,7 @@ this documentation. function should modify the state and only take the link down when absolutely necessary to change the MAC configuration. An example of how to do this can be found in :c:func:`mvneta_mac_config` in - drivers/net/ethernet/marvell/mvneta.c. + ``drivers/net/ethernet/marvell/mvneta.c``. For further information on these methods, please see the inline documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`. @@ -281,4 +281,4 @@ as necessary. For information describing the SFP cage in DT, please see the binding documentation in the kernel source tree -``Documentation/devicetree/bindings/net/sff,sfp.txt`` +``Documentation/devicetree/bindings/net/sff,sfp.yaml``. diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst index 0987fd1bc220..742e90e6d822 100644 --- a/Documentation/networking/smc-sysctl.rst +++ b/Documentation/networking/smc-sysctl.rst @@ -21,3 +21,16 @@ autocorking_size - INTEGER know how/when to uncork their sockets. Default: 64K + +smcr_buf_type - INTEGER + Controls which type of sndbufs and RMBs to use in later newly created + SMC-R link group. Only for SMC-R. + + Default: 0 (physically contiguous sndbufs and RMBs) + + Possible values: + + - 0 - Use physically contiguous buffers + - 1 - Use virtually contiguous buffers + - 2 - Mixed use of the two types. Try physically contiguous buffers first. + If not available, use virtually contiguous buffers then. diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index 8cb2cd4e2a80..658ed3a71e1b 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -214,6 +214,44 @@ of calling send directly after a handshake using gnutls. Since it doesn't implement a full record layer, control messages are not supported. +Optional optimizations +---------------------- + +There are certain condition-specific optimizations the TLS ULP can make, +if requested. Those optimizations are either not universally beneficial +or may impact correctness, hence they require an opt-in. +All options are set per-socket using setsockopt(), and their +state can be checked using getsockopt() and via socket diag (``ss``). + +TLS_TX_ZEROCOPY_RO +~~~~~~~~~~~~~~~~~~ + +For device offload only. Allow sendfile() data to be transmitted directly +to the NIC without making an in-kernel copy. This allows true zero-copy +behavior when device offload is enabled. + +The application must make sure that the data is not modified between being +submitted and transmission completing. In other words this is mostly +applicable if the data sent on a socket via sendfile() is read-only. + +Modifying the data may result in different versions of the data being used +for the original TCP transmission and TCP retransmissions. To the receiver +this will look like TLS records had been tampered with and will result +in record authentication failures. + +TLS_RX_EXPECT_NO_PAD +~~~~~~~~~~~~~~~~~~~~ + +TLS 1.3 only. Expect the sender to not pad records. This allows the data +to be decrypted directly into user space buffers with TLS 1.3. + +This optimization is safe to enable only if the remote end is trusted, +otherwise it is an attack vector to doubling the TLS processing cost. + +If the record decrypted turns out to had been padded or is not a data +record it will be decrypted again into a kernel buffer without zero copy. +Such events are counted in the ``TlsDecryptRetry`` statistic. + Statistics ========== @@ -239,3 +277,12 @@ TLS implementation exposes the following per-namespace statistics - ``TlsDeviceRxResync`` - number of RX resyncs sent to NICs handling cryptography + +- ``TlsDecryptRetry`` - + number of RX records which had to be re-decrypted due to + ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will + also increment for non-data records. + +- ``TlsRxNoPadViolation`` - + number of data RX records which had to be re-decrypted due to + ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. |