aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-28net/mlx5e: Approximate IPsec per-SA payload data bytes countLeon Romanovsky1-1/+13
ConnectX devices lack ability to count payload data byte size which is needed for SA to return to libreswan for rekeying. As a solution let's approximate that by decreasing headers size from total size counted by flow steering. The calculation doesn't take into account any other headers which can be in the packet (e.g. IP extensions). Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count") Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5e: Present succeeded IPsec SA bytes and packetLeon Romanovsky1-13/+23
IPsec SA statistics presents successfully decrypted and encrypted packet and bytes, and not total handled by this SA. So update the calculation logic to take into account failures. Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core") Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5e: Add mqprio_rl cleanup and free in mlx5e_priv_cleanup()Jianbo Liu1-0/+5
In the cited commit, mqprio_rl cleanup and free are mistakenly removed in mlx5e_priv_cleanup(), and it causes the leakage of host memory and firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So, add them back. Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload") Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Dragos Tatulea <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5: E-switch, Create ingress ACL when neededChris Mi1-8/+29
Currently, ingress acl is used for three features. It is created only when vport metadata match and prio tag are enabled. But active-backup lag mode also uses it. It is independent of vport metadata match and prio tag. And vport metadata match can be disabled using the following devlink command: # devlink dev param set pci/0000:08:00.0 name esw_port_metadata \ value false cmode runtime If ingress acl is not created, will hit panic when creating drop rule for active-backup lag mode. If always create it, there will be about 5% performance degradation. Fix it by creating ingress acl when needed. If esw_port_metadata is true, ingress acl exists, then create drop rule using existing ingress acl. If esw_port_metadata is false, create ingress acl and then create drop rule. Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL") Signed-off-by: Chris Mi <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5: Use max_num_eqs_24b when setting max_io_eqsDaniel Jurgens1-5/+17
Due a bug in the device max_num_eqs doesn't always reflect a written value. As a result, setting max_io_eqs may not work but appear successful. Instead write max_num_eqs_24b, which reflects correct value. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5: Use max_num_eqs_24b capability if setDaniel Jurgens3-6/+12
A new capability with more bits is added. If it's set use that value as the maximum number of EQs available. This cap is also writable by the vhca_resource_manager to allow limiting the number of EQs available to SFs and VFs. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: William Tu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net/mlx5: IFC updates for changing max EQsDaniel Jurgens1-1/+5
Expose new capability to support changing the number of EQs available to other functions. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: William Tu <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28Merge branch 'net-selftests-mirroring-cleanup' into mainDavid S. Miller20-538/+355
Petr Machata says: ==================== selftest: Clean-up and stabilize mirroring tests The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But that only works up to a point, and on busy systems won't be always enough. In this patch set, clean up and stabilize the mirroring selftests. The original intention was to port the tests over to UDP, but the logic of ICMP ends up being so entangled in the mirroring selftests that the changes feel overly invasive. Instead, ICMP is kept, but where possible, we match on ICMP message type, thus filtering out hits by other ICMP messages. Where this is not practical (where the counter tap is put on a device that carries encapsulated packets), switch the counter condition to _at least_ X observed packets. This is less robust, but barely so -- probably the only scenario that this would not catch is something like erroneous packet duplication, which would hopefully get caught by the numerous other tests in this extensive suite. - Patches #1 to #3 clean up parameters at various helpers. - Patches #4 to #6 stabilize the mirroring selftests as described above. - Mirroring tests currently allow testing SW datapath even on HW netdevices by trapping traffic to the SW datapath. This complicates the tests a bit without a good reason: to test SW datapath, just run the selftests on the veth topology. Thus in patch #7, drop support for this dual SW/HW testing. - At this point, some cleanups were either made possible by the previous patches, or were always possible. In patches #8 to #11, realize these cleanups. - In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mlxsw: mirror_gre: Obey TESTSPetr Machata1-5/+18
This test is unusual in that overriding TESTS does not change the tests to be run. Split the individual tests into several functions and invoke them through tests_run() as appropriate. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: libs: Drop unused functionsPetr Machata2-29/+0
Nothing calls these. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: libs: Drop slow_path_trap_install()/_uninstall()Petr Machata1-16/+0
These functions are not used anymore. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror_gre_lag_lacp: Drop unnecessary codePetr Machata1-3/+0
The selftest does not use functions from mirror_gre_lib, ditch the import. It does not use arping either, so drop the require_command as well. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mlxsw: mirror_gre: SimplifyPetr Machata1-12/+5
After the previous patch, the function test_span_failable() is always called with should_fail=1. Drop the argument and streamline the code. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror: Drop dual SW/HW testingPetr Machata17-378/+77
The mirroring tests are currently run in a skip_hw and optionally a skip_sw mode. The former tests the SW datapath, the latter the HW datapath, if available. In order to be able to test SW datapath on HW loopbacks, traps are installed on ingress to get traffic from the HW datapath to the SW one. This adds an unnecessary complexity when it would be much simpler to just use a veth-based topology to test the SW datapath. Thus drop all the code that supports this dual testing. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror: mirror_test(): Allow exact count of packetsPetr Machata5-9/+13
The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But in the previous patch, where possible, counter taps were changed to match only on an exact ICMP message. At least in those cases, we can demand an exact number of packets to match. Where the tap is installed on a connective netdevice, the exact matching is not practical (though with u32, anything is possible). In those places, there should still be some leeway -- and probably bigger than before, because experience shows that these tests are very noisy. To that end, change mirror_test() so that it can be either called with an exact number to expect, or with an expression. Where leeway is needed, adjust callers to pass a ">= 10" instead of mere 10. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror: do_test_span_dir_ips(): Install accurate tapsPetr Machata4-20/+44
The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. However, often the counter tap is installed at the remote end of the gretap tunnel. Since this is a SW-datapath scenario anyway, we can make the filter arbitrarily accurate. Thus in this patch, add parameters forward_type and backward_type to several mirroring test helpers, as some other helpers already have. Then change do_test_span_dir_ips() to instead of installing one generic tap and using it for test in both directions, install the tap for each direction separately, matching on the ICMP type given by these parameters. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror_gre_lag_lacp: Check counters at tunnelPetr Machata1-15/+22
The test works by sending packets through a tunnel, whence they are forwarded to a LAG. One of the LAG children is removed from the LAG prior to the exercise, and the test then counts how many packets pass through the other one. The issue with this is that it counts all packets, not just the encapsulated ones. So instead add a second gretap endpoint to receive the sent packets, and check reception counters there. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: lib: tc_rule_stats_get(): Move default to argument definitionPetr Machata1-2/+2
The argument $dir has a fallback value of "ingress". Move the fallback from the usage site to the argument definition block to make the fact clearer. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: mirror: Drop direction argument from several functionsPetr Machata10-90/+69
The argument is not used by these functions except to propagate it for ultimately no purpose. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28selftests: libs: Expand "$@" where possiblePetr Machata4-30/+176
In some functions, argument-forwarding through "$@" without listing the individual arguments explicitly is fundamental to the operation of a function. E.g. xfail_on_veth() should be able to run various tests in the fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction mechanism. For functions such as simple_if_init(), $@ is a handy way to pass an array. In other functions, it's merely a mechanism to save some typing, which however ends up obscuring the real arguments and makes life hard for those that end up reading the code. This patch adds some of the implicit function arguments and correspondingly expands $@'s. In several cases this will come in handy as following patches adjust the parameter lists. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28Merge branch 'net-flash-modees-firmware' into mainDavid S. Miller22-11/+1963
Danielle Ratson says: ==================== Add ability to flash modules' firmware CMIS compliant modules such as QSFP-DD might be running a firmware that can be updated in a vendor-neutral way by exchanging messages between the host and the module as described in section 7.2.2 of revision 4.0 of the CMIS standard. According to the CMIS standard, the firmware update process is done using a CDB commands sequence. CDB (Command Data Block Message Communication) reads and writes are performed on memory map pages 9Fh-AFh according to the CMIS standard, section 8.12 of revision 4.0. Add a pair of new ethtool messages that allow: * User space to trigger firmware update of transceiver modules * The kernel to notify user space about the progress of the process The user interface is designed to be asynchronous in order to avoid RTNL being held for too long and to allow several modules to be updated simultaneously. The interface is designed with CMIS compliant modules in mind, but kept generic enough to accommodate future use cases, if these arise. The kernel interface that will implement the firmware update using CDB command will include 2 layers that will be added under ethtool: * The upper layer that will be triggered from the module layer, is cmis_ fw_update. * The lower one is cmis_cdb. In the future there might be more operations to implement using CDB commands. Therefore, the idea is to keep the cmis_cdb interface clean and the cmis_fw_update specific to the cdb commands handling it. The communication between the kernel and the driver will be done using two ethtool operations that enable reading and writing the transceiver module EEPROM. The operation ethtool_ops::get_module_eeprom_by_page, that is already implemented, will be used for reading from the EEPROM the CDB reply, e.g. reading module setting, state, etc. The operation ethtool_ops::set_module_eeprom_by_page, that is added in the current patchset, will be used for writing to the EEPROM the CDB command such as start firmware image, run firmware image, etc. Therefore in order for a driver to implement module flashing, that driver needs to implement the two functions mentioned above. Patchset overview: Patch #1-#2: Implement the EEPROM writing in mlxsw. Patch #3: Define the interface between the kernel and user space. Patch #4: Add ability to notify the flashing firmware progress. Patch #5: Veto operations during flashing. Patch #6: Add extended compliance codes. Patch #7: Add the cdb layer. Patch #8: Add the fw_update layer. Patch #9: Add ability to flash transceiver modules' firmware. v8: Patch #7: * In the ethtool_cmis_wait_for_cond() evaluate the condition once more to decide if the error code should be -ETIMEDOUT or something else. * s/netdev_err/netdev_err_once. v7: Patch #4: * Return -ENOMEM instead of PTR_ERR(attr) on ethnl_module_fw_flash_ntf_put_err(). Patch #9: * Fix Warning for not unlocking the spin_lock in the error flow on module_flash_fw_work_list_add(). * Avoid the fall-through on ethnl_sock_priv_destroy(). v6: * Squash some of the last patch to patch #5 and patch #9. Patch #3: * Add paragraph in .rst file. Patch #4: * Reserve '1' more place on SKB for NUL terminator in the error message string. * Add more prints on error flow, re-write the printing function and add ethnl_module_fw_flash_ntf_put_err(). * Change the communication method so notification will be sent in unicast instead of multicast. * Add new 'struct ethnl_module_fw_flash_ntf_params' that holds the relevant info for unicast communication and use it to send notification to the specific socket. * s/nla_put_u64_64bit/nla_put_uint/ Patch #7: * In ethtool_cmis_cdb_init(), Use 'const' for the 'params' parameter. Patch #8: * Add a list field to struct ethtool_module_fw_flash for module_fw_flash_work_list that will be presented in the next patch. * Move ethtool_cmis_fw_update() cleaning to a new function that will be represented in the next patch. * Move some of the fields in struct ethtool_module_fw_flash to a separate struct, so ethtool_cmis_fw_update() will get only the relevant parameters for it. * Edit the relevant functions to get the relevant params for them. * s/CMIS_MODULE_READY_MAX_DURATION_USEC/CMIS_MODULE_READY_MAX_DURATION_MSEC Patch #9: * Add a paragraph in the commit message. * Rename labels in module_flash_fw_schedule(). * Add info to genl_sk_priv_*() and implement the relevant callbacks, in order to handle properly a scenario of closing the socket from user space before the work item was ended. * Add a list the holds all the ethtool_module_fw_flash struct that corresponds to the in progress work items. * Add a new enum for the socket types. * Use both above to identify a flashing socket, add it to the list and when closing socket affect only the flashing type. * Create a new function that will get the work item instead of ethtool_cmis_fw_update(). * Edit the relevant functions to get the relevant params for them. * The new function will call the old ethtool_cmis_fw_update(), and do the cleaning, so the existence of the list should be completely isolated in module.c. =================== Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: Add ability to flash transceiver modules' firmwareDanielle Ratson4-0/+334
Add the ability to flash the modules' firmware by implementing the interface between the user space and the kernel. Example from a succeeding implementation: # ethtool --flash-module-firmware swp40 file test.bin Transceiver module firmware flashing started for device swp40 Transceiver module firmware flashing in progress for device swp40 Progress: 99% Transceiver module firmware flashing completed for device swp40 In addition, add infrastructure that allows modules to set socket-specific private data. This ensures that when a socket is closed from user space during the flashing process, the right socket halts sending notifications to user space until the work item is completed. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: cmis_fw_update: add a layer for supporting firmware update using CDBDanielle Ratson4-1/+438
According to the CMIS standard, the firmware update process is done using a CDB commands sequence. Implement a work that will be triggered from the module layer in the next patch the will initiate and execute all the CDB commands in order, to eventually complete the firmware update process. This flashing process includes, writing the firmware image, running the new firmware image and committing it after testing, so that it will run upon reset. This work will also notify user space about the progress of the firmware update process. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: cmis_cdb: Add a layer for supporting CDB commandsDanielle Ratson4-1/+730
CDB (Command Data Block Message Communication) reads and writes are performed on memory map pages 9Fh-AFh according to the CMIS standard, section 8.20 of revision 5.2. Page 9Fh is used to specify the CDB command to be executed and also provides an area for a local payload (LPL). According to the CMIS standard, the firmware update process is done using a CDB commands sequence that will be implemented in the next patch. The kernel interface that will implement the firmware update using CDB command will include 2 layers that will be added under ethtool: * The upper layer that will be triggered from the module layer, is cmis_fw_update. * The lower one is cmis_cdb. In the future there might be more operations to implement using CDB commands. Therefore, the idea is to keep the CDB interface clean and the cmis_fw_update specific to the CDB commands handling it. These two layers will communicate using the API the consists of three functions: - struct ethtool_cmis_cdb * ethtool_cmis_cdb_init(struct net_device *dev, struct ethtool_module_fw_flash_params *params); - void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb); - int ethtool_cmis_cdb_execute_cmd(struct net_device *dev, struct ethtool_cmis_cdb_cmd_args *args); Add the CDB layer to support initializing, finishing and executing CDB commands: * The initialization process will include creating of an ethtool_cmis_cdb instance, querying the module CDB support, entering and validating the password from user space (CMD 0x0000) and querying the module features (CMD 0x0040). * The finishing API will simply free the ethtool_cmis_cdb instance. * The executing process will write the CDB command to EEPROM using set_module_eeprom_by_page() that was presented earlier, and will process the reply from EEPROM. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28net: sfp: Add more extended compliance codesDanielle Ratson1-0/+6
SFF-8024 is used to define various constants re-used in several SFF SFP-related specifications. Add SFF-8024 extended compliance code definitions for CMIS compliant modules and use them in the next patch to determine the firmware flashing work. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: Veto some operations during firmware flashing processDanielle Ratson4-1/+33
Some operations cannot be performed during the firmware flashing process. For example: - Port must be down during the whole flashing process to avoid packet loss while committing reset for example. - Writing to EEPROM interrupts the flashing process, so operations like ethtool dump, module reset, get and set power mode should be vetoed. - Split port firmware flashing should be vetoed. In order to veto those scenarios, add a flag in 'struct net_device' that indicates when a firmware flash is taking place on the module and use it to prevent interruptions during the process. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: Add flashing transceiver modules' firmware notifications abilityDanielle Ratson4-0/+154
Add progress notifications ability to user space while flashing modules' firmware by implementing the interface between the user space and the kernel. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: Add an interface for flashing transceiver modules' firmwareDanielle Ratson5-1/+164
CMIS compliant modules such as QSFP-DD might be running a firmware that can be updated in a vendor-neutral way by exchanging messages between the host and the module as described in section 7.3.1 of revision 5.2 of the CMIS standard. Add a pair of new ethtool messages that allow: * User space to trigger firmware update of transceiver modules * The kernel to notify user space about the progress of the process The user interface is designed to be asynchronous in order to avoid RTNL being held for too long and to allow several modules to be updated simultaneously. The interface is designed with CMIS compliant modules in mind, but kept generic enough to accommodate future use cases, if these arise. Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28mlxsw: Implement ethtool operation to write to a transceiver module EEPROMIdo Schimmel4-0/+93
Implement the ethtool_ops::set_module_eeprom_by_page operation to allow ethtool to write to a transceiver module EEPROM, in a similar fashion to the ethtool_ops::get_module_eeprom_by_page operation. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28ethtool: Add ethtool operation to write to a transceiver module EEPROMIdo Schimmel1-8/+12
Ethtool can already retrieve information from a transceiver module EEPROM by invoking the ethtool_ops::get_module_eeprom_by_page operation. Add a corresponding operation that allows ethtool to write to a transceiver module EEPROM. The new write operation is purely an in-kernel API and is not exposed to user space. The purpose of this operation is not to enable arbitrary read / write access, but to allow the kernel to write to specific addresses as part of transceiver module firmware flashing. In the future, more functionality can be implemented on top of these read / write operations. Adjust the comments of the 'ethtool_module_eeprom' structure as it is no longer used only for read access. Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Danielle Ratson <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open()Neal Cardwell1-1/+1
In some production workloads we noticed that connections could sometimes close extremely prematurely with ETIMEDOUT after transmitting only 1 TLP and RTO retransmission (when we would normally expect roughly tcp_retries2 = TCP_RETR2 = 15 RTOs before a connection closes with ETIMEDOUT). From tracing we determined that these workloads can suffer from a scenario where in fast recovery, after some retransmits, a DSACK undo can happen at a point where the scoreboard is totally clear (we have retrans_out == sacked_out == lost_out == 0). In such cases, calling tcp_try_keep_open() means that we do not execute any code path that clears tp->retrans_stamp to 0. That means that tp->retrans_stamp can remain erroneously set to the start time of the undone fast recovery, even after the fast recovery is undone. If minutes or hours elapse, and then a TLP/RTO/RTO sequence occurs, then the start_ts value in retransmits_timed_out() (which is from tp->retrans_stamp) will be erroneously ancient (left over from the fast recovery undone via DSACKs). Thus this ancient tp->retrans_stamp value can cause the connection to die very prematurely with ETIMEDOUT via tcp_write_err(). The fix: we change DSACK undo in fast recovery (TCP_CA_Recovery) to call tcp_try_to_open() instead of tcp_try_keep_open(). This ensures that if no retransmits are in flight at the time of DSACK undo in fast recovery then we properly zero retrans_stamp. Note that calling tcp_try_to_open() is more consistent with other loss recovery behavior, since normal fast recovery (CA_Recovery) and RTO recovery (CA_Loss) both normally end when tp->snd_una meets or exceeds tp->high_seq and then in tcp_fastretrans_alert() the "default" switch case executes tcp_try_to_open(). Also note that by inspection this change to call tcp_try_to_open() implies at least one other nice bug fix, where now an ECE-marked DSACK that causes an undo will properly invoke tcp_enter_cwr() rather than ignoring the ECE mark. Fixes: c7d9d6a185a7 ("tcp: undo on DSACK during recovery") Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESPJohannes Berg1-1/+1
Fix the definition of BSS_CHANGED_UNSOL_BCAST_PROBE_RESP so that not all higher bits get set, 1<<31 is a signed variable, so when we do u64 changed = BSS_CHANGED_UNSOL_BCAST_PROBE_RESP; we get sign expansion, so the value is 0xffff'ffff'8000'0000 and that's clearly not desired. Use BIT_ULL() to make it unsigned as well as the right type for the change flags. Fixes: 178e9d6adc43 ("wifi: mac80211: fix unsolicited broadcast probe config") Reviewed-by: Miriam Rachel Korenblit <[email protected]> Link: https://patch.msgid.link/20240627104257.06174d291db2.Iba0d642916eb78a61f8ab2cc5ca9280783d9c1db@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-28dt-bindings: net: realtek,rtl82xx: Document known PHY IDs as compatible stringsMarek Vasut1-0/+23
Extract known PHY IDs from Linux kernel realtek PHY driver and convert them into supported compatible string list for this DT binding document. Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-28can: m_can: Constify struct m_can_opsChristophe JAILLET4-4/+4
'struct m_can_ops' is not modified in these drivers. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 4806 520 0 5326 14ce drivers/net/can/m_can/m_can_pci.o After: ===== text data bss dec hex filename 4862 464 0 5326 14ce drivers/net/can/m_can/m_can_pci.o Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/all/a17b96d1be5341c11f263e1e45c9de1cb754e416.1719172843.git.christophe.jaillet@wanadoo.fr Signed-off-by: Marc Kleine-Budde <[email protected]>
2024-06-28Merge patch series "can: rcar_canfd: Small improvements and cleanups"Marc Kleine-Budde1-25/+16
Geert Uytterhoeven <[email protected]> says: This series contains some improvements and cleanups for the R-Car CAN-FD driver. It has been tested on R-Car V4H (White Hawk and White Hawk Single). Link: https://lore.kernel.org/all/[email protected] [mkl: fixed typo in cover letter] Signed-off-by: Marc Kleine-Budde <[email protected]>
2024-06-28can: rcar_canfd: Remove superfluous parentheses in address calculationsGeert Uytterhoeven1-7/+7
There is no need to wrap simple variables or multiplications inside parentheses. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Reviewed-by: Vincent Mailhol <[email protected]> Link: https://lore.kernel.org/all/b5aee80895fa029070fd37d1d837cf1c0ecb52dc.1716973640.git.geert+renesas@glider.be Signed-off-by: Marc Kleine-Budde <[email protected]>
2024-06-28can: rcar_canfd: Improve printing of global operational stateGeert Uytterhoeven1-2/+3
Replace the printing of internal numerical values by the printing of strings reflecting their meaning, to make the message self-explanatory. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Reviewed-by: Vincent Mailhol <[email protected]> Link: https://lore.kernel.org/all/14c8c5ce026e9fec128404706d1c73c8ffa11ced.1716973640.git.geert+renesas@glider.be Signed-off-by: Marc Kleine-Budde <[email protected]>
2024-06-28can: rcar_canfd: Simplify clock handlingGeert Uytterhoeven1-17/+7
The main CAN clock is either the internal CANFD clock, or the external CAN clock. Hence replace the two-valued enum by a simple boolean flag. Consolidate all CANFD clock handling inside a single branch. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Reviewed-by: Vincent Mailhol <[email protected]> Link: https://lore.kernel.org/all/2cf38c10b83c8e5c04d68b17a930b6d9dbf66f40.1716973640.git.geert+renesas@glider.be Signed-off-by: Marc Kleine-Budde <[email protected]>
2024-06-27tools/power turbostat: Add local build_bug.h header for snapshot targetPatryk Wlazlyn2-2/+6
Fixes compilation errors for Makefile snapshot target described in: commit 231ce08b662a ("tools/power turbostat: Add "snapshot:" Makefile target") Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-06-27tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l'Adam Hawley1-8/+8
Commit 78464d7681f7 ("tools/power turbostat: Add columns for clustered uncore frequency") introduced 'probe_intel_uncore_frequency_cluster()' in a way which prevents printing uncore frequency columns if either of the '-q' or '-l' options are used. Systems which do not have multiple uncore frequencies per package are unaffected by this regression. Fix the function so that uncore frequency columns are shown when either the '-l' or '-q' option is used by checking if 'quiet' is true after adding counters for the uncore frequency columns. Fixes: 78464d7681f7 ("tools/power turbostat: Add columns for clustered uncore frequency") Signed-off-by: Adam Hawley <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-06-27tools/power turbostat: option '-n' is ambiguousDavid Arcari1-1/+1
In some cases specifying the '-n' command line argument will cause turbostat to fail. For instance 'turbostat -n 1' works fine; however, 'turbostat -n 1 -d' will fail. This is the result of the first call to getopt_long_only() where "MP" is specified as the optstring. This can be easily fixed by changing the optstring from "MP" to "MPn:" to remove ambiguity between the arguments. tools/power turbostat: option '-n' is ambiguous; possibilities: '-num_iterations' '-no-msr' '-no-perf' Fixes: a0e86c90b83c ("tools/power turbostat: Add --no-perf option") Signed-off-by: David Arcari <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-06-27Merge tag 'v6.10-p4' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pyll crypto fix from Herbert Xu: "Fix a build failure in qat" * tag 'v6.10-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: qat - fix linking errors when PCI_IOV is disabled
2024-06-27Merge tag 'drm-fixes-2024-06-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds18-24/+174
Pull drm fixes from Dave Airlie: "Regular fixes, mostly amdgpu with some minor fixes in other places, along with a fix for a very narrow UAF race in the pid handover code. core: - fix refcounting race on pid handover fbdev: - Fix fb_info when vmalloc is used, regression from CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM. amdgpu: - SMU 14.x fix - vram info parsing fix - mode1 reset fix - LTTPR fix - Virtual display fix - Avoid spurious error in PSP init i915: - Fix potential UAF due to race on fence register revocation nouveau - nouveau tv mode fixes panel: - Add KOE TX26D202VM0BWA timings" * tag 'drm-fixes-2024-06-28' of https://gitlab.freedesktop.org/drm/kernel: drm/drm_file: Fix pid refcounting race drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_ld_modes drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_hd_modes drm/amdgpu: Don't show false warning for reg list drm/amdgpu: avoid using null object of framebuffer drm/amd/display: Send DP_TOTAL_LTTPR_CNT during detection if LTTPR is present drm/amdgpu: Fix pci state save during mode-1 reset drm/amdgpu/atomfirmware: fix parsing of vram_info drm/amd/swsmu: add MALL init support workaround for smu_v14_0_1 drm/i915/gt: Fix potential UAF by revoke of fence registers drm/panel: simple: Add missing display timing flags for KOE TX26D202VM0BWA drm/fbdev-dma: Only set smem_start is enable per module option
2024-06-27net: thunderx: Unembed netdev structureBreno Leitao1-6/+15
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_devices from struct lmac by converting them into pointers, and allocating them dynamically. Use the leverage alloc_netdev() to allocate the net_device object at bgx_lmac_enable(). The free of the device occurs at bgx_lmac_disable(). Do not free_netdevice() if bgx_lmac_enable() fails after lmac->netdev is allocated, since bgx_lmac_disable() is called if bgx_lmac_enable() fails, and lmac->netdev will be freed there (similarly to lmac->dmacs). Link: https://lore.kernel.org/all/[email protected]/ [1] Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-27Revert "net: micro-optimize skb_datagram_iter"Sagi Grimberg1-2/+2
This reverts commit 934c29999b57b835d65442da6f741d5e27f3b584. This triggered a usercopy BUG() in systems with HIGHMEM, reported by the test robot in: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Sagi Grimberg <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-27net: phy: phy_device: Fix PHY LED blinking code commentMarek Vasut1-1/+1
Fix copy-paste error in the code comment. The code refers to LED blinking configuration, not brightness configuration. It was likely copied from comment above this one which does refer to brightness configuration. Fixes: 4e901018432e ("net: phy: phy_device: Call into the PHY driver to set LED blinking") Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-28drm/drm_file: Fix pid refcounting raceJann Horn1-5/+3
<[email protected]>, Maxime Ripard <[email protected]>, Thomas Zimmermann <[email protected]> filp->pid is supposed to be a refcounted pointer; however, before this patch, drm_file_update_pid() only increments the refcount of a struct pid after storing a pointer to it in filp->pid and dropping the dev->filelist_mutex, making the following race possible: process A process B ========= ========= begin drm_file_update_pid mutex_lock(&dev->filelist_mutex) rcu_replace_pointer(filp->pid, <pid B>, 1) mutex_unlock(&dev->filelist_mutex) begin drm_file_update_pid mutex_lock(&dev->filelist_mutex) rcu_replace_pointer(filp->pid, <pid A>, 1) mutex_unlock(&dev->filelist_mutex) get_pid(<pid A>) synchronize_rcu() put_pid(<pid B>) *** pid B reaches refcount 0 and is freed here *** get_pid(<pid B>) *** UAF *** synchronize_rcu() put_pid(<pid A>) As far as I know, this race can only occur with CONFIG_PREEMPT_RCU=y because it requires RCU to detect a quiescent state in code that is not explicitly calling into the scheduler. This race leads to use-after-free of a "struct pid". It is probably somewhat hard to hit because process A has to pass through a synchronize_rcu() operation while process B is between mutex_unlock() and get_pid(). Fix it by ensuring that by the time a pointer to the current task's pid is stored in the file, an extra reference to the pid has been taken. This fix also removes the condition for synchronize_rcu(); I think that optimization is unnecessary complexity, since in that case we would usually have bailed out on the lockless check above. Fixes: 1c7a387ffef8 ("drm: Update file owner during use") Cc: <[email protected]> Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2024-06-27Merge branch 'selftests-net-switch-pmtu-sh-to-use-the-internal-ovs-script'Jakub Kicinski3-67/+451
Aaron Conole says: ==================== selftests: net: Switch pmtu.sh to use the internal ovs script. Currently, if a user wants to run pmtu.sh and cover all the provided test cases, they need to install the Open vSwitch userspace utilities. This dependency is difficult for users as well as CI environments, because the userspace build and setup may require lots of support and devel packages to be installed, system setup to be correct, and things like permissions and selinux policies to be properly configured. The kernel selftest suite includes an ovs-dpctl.py utility which can interact with the openvswitch module directly. This lets developers and CI environments run without needing too many extra dependencies - just the pyroute2 python package. This series enhances the ovs-dpctl utility to provide support for set() and tunnel() flow specifiers, better ipv6 handling support, and the ability to add tunnel vports, and LWT interfaces. Finally, it modifies the pmtu.sh script to call the ovs-dpctl.py utility rather than the typical OVS userspace utilities. The pmtu.sh can still fall back on the Open vSwitch userspace utilities if the ovs-dpctl.py script can't be used. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-27selftests: net: add config for openvswitchAaron Conole1-0/+5
The pmtu testing will require that the OVS module is installed, so do that. Reviewed-by: Simon Horman <[email protected]> Tested-by: Simon Horman <[email protected]> Signed-off-by: Aaron Conole <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-27selftests: net: Use the provided dpctl rather than the vswitchd for tests.Aaron Conole2-23/+124
The current pmtu test infrastucture requires an installed copy of the ovs-vswitchd userspace. This means that any automated or constrained environments may not have the requisite tools to run the tests. However, the pmtu tests don't require any special classifier processing. Indeed they are only using the vswitchd in the most basic mode - as a NORMAL switch. However, the ovs-dpctl kernel utility can now program all the needed basic flows to allow traffic to traverse the tunnels and provide support for at least testing some basic pmtu scenarios. More complicated flow pipelines can be added to the internal ovs test infrastructure, but that is work for the future. For now, enable the most common cases - wide mega flows with no other prerequisites. Enhance the pmtu testing to try testing using the internal utility, first. As a fallback, if the internal utility isn't running, then try with the ovs-vswitchd userspace tools. Additionally, make sure that when the pyroute2 package is not available the ovs-dpctl utility will error out to properly signal an error has occurred and skip using the internal utility. Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: Aaron Conole <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>