aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/main.c
AgeCommit message (Collapse)AuthorFilesLines
2024-09-13RDMA/mlx5: Use IB set_netdev and get_netdev functionsChiara Meiohas1-54/+129
The IB layer provides a common interface to store and get net devices associated to an IB device port (ib_device_set_netdev() and ib_device_get_netdev()). Previously, mlx5_ib stored and managed the associated net devices internally. Replace internal net device management in mlx5_ib with ib_device_set_netdev() when attaching/detaching a net device and ib_device_get_netdev() when retrieving the net device. Export ib_device_get_netdev(). For mlx5 representors/PFs/VFs and lag creation we replace the netdev assignments with the IB set/get netdev functions. In active-backup mode lag the active slave net device is stored in the lag itself. To assure the net device stored in a lag bond IB device is the active slave we implement the following: - mlx5_core: when modifying the slave of a bond we send the internal driver event MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE. - mlx5_ib: when catching the event call ib_device_set_netdev() This patch also ensures the correct IB events are sent in switchdev lag. While at it, when in multiport eswitch mode, only a single IB device is created for all ports. The said IB device will receive all netdev events of its VFs once loaded, thus to avoid overwriting the mapping of PF IB device to PF netdev, ignore NETDEV_REGISTER events if the ib device has already been mapped to a netdev. Signed-off-by: Chiara Meiohas <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creationChiara Meiohas1-1/+2
phys_port_cnt of the IB device must be initialized before calling ib_device_set_netdev(). Previously, phys_port_cnt was initialized in the mlx5_ib init function. Remove this initialization to allow setting it separately, providing the flexibility to call ib_device_set_netdev before registering the IB device. Signed-off-by: Chiara Meiohas <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Obtain upper net device only when neededMark Bloch1-1/+1
Report the upper device's state as the RDMA port state only in RoCE LAG or switchdev LAG. Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Check RoCE LAG status before getting netdevMark Bloch1-6/+13
Check if RoCE LAG is active before calling the LAG layer for netdev. This clarifies if LAG is active. No behavior changes with this patch. Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Consider the query_vuid cap for data_directYishai Hadas1-2/+4
Consider also the query_vuid cap before enabling the data_direct functionality. This may prevent a syndrome from the FW in case the query_vuid command is not supported. (e.g. migratable VF) Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Gal Shalom <[email protected]> Link: https://patch.msgid.link/274c4f6f1ac0b1078243dd296695a49dbe58e7d1.1725907637.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-11RDMA/mlx5: Add support for DMABUF MR registrations with Data-directYishai Hadas1-0/+11
Add support for DMABUF MR registrations with Data-direct device. Upon userspace calling to register a DMABUF MR with the data direct bit set, the below algorithm will be followed. 1) Obtain a pinned DMABUF umem from the IB core using the user input parameters (FD, offset, length) and the DMA PF device. The DMA PF device is needed to allow the IOMMU to enable the DMA PF to access the user buffer over PCI. 2) Create a KSM MKEY by setting its entries according to the user buffer VA to IOVA mapping, with the MKEY being the data direct device-crossed MKEY. This KSM MKEY is umrable and will be used as part of the MR cache. The PD for creating it is the internal device 'data direct' kernel one. 3) Create a crossing MKEY that points to the KSM MKEY using the crossing access mode. 4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs managed on the mlx5_ib device. 5) Return the crossing MKEY to the user, created with its supplied PD. Upon DMA PF unbind flow, the driver will revoke the KSM entries. The final deregistration will occur under the hood once the application deregisters its MKEY. Notes: - This version supports only the PINNED UMEM mode, so there is no dependency on ODP. - The IOVA supplied by the application must be system page aligned due to HW translations of KSM. - The crossing MKEY will not be umrable or part of the MR cache, as we cannot change its crossed (i.e. KSM) MKEY over UMR. Signed-off-by: Yishai Hadas <[email protected]> Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-11RDMA/mlx5: Add the initialization flow to utilize the 'data direct' deviceYishai Hadas1-0/+90
Add the NET device initialization flow to utilize the 'data direct' device. When a NET mlx5_ib device is capable of 'data direct', the following sequence of actions will occur: - Find its affiliated 'data direct' VUID via a firmware command. - Create its own private PD and 'data direct' mkey. - Register to be notified when its 'data direct' driver is probed or removed. The DMA device of the affiliated 'data direct' device, including the private PD and the 'data direct' mkey, will be used later during MR registrations that request the data direct functionality. Signed-off-by: Yishai Hadas <[email protected]> Link: https://patch.msgid.link/b11fa87b2a65bce4db8d40341bb6cee490fa4d06.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-11RDMA/mlx5: Introduce the 'data direct' driverYishai Hadas1-0/+24
Introduce the 'data direct' driver for a ConnectX-8 Data Direct device. The 'data direct' driver functions as the affiliated DMA device for one or more capable mlx5_ib devices. This DMA device, as the name suggests, is used exclusively for DMA operations. It can be considered a DMA engine managed by a PF/VF, lacking network capabilities and having minimal overall capabilities. Consequently, the DMA NIC PF will not be exposed to or directly used by software applications. The driver will not have any direct interface or interaction with the firmware (no command interface, no capabilities, etc.). It will operate solely over PCI to enable its DMA functionality. Registration and un-registration of the driver are handled as part of the mlx5_ib initialization and exit processes, as the mlx5_ib devices will effectively be its clients. The driver will serve as the DMA device for accessing another PCI device to achieve optimal performance (both on the same NUMA node, P2P access, etc.). Upon probing, it will read its VUID over PCI to handle mlx5_ib device registrations with the same VUID. Upon removal, it will notify its clients to allow them to clean up the resources that were mmaped with its DMA device. Signed-off-by: Yishai Hadas <[email protected]> Link: https://patch.msgid.link/b77edecfd476c3f445da96ab6aef499ae47b2829.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-07-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-58/+278
Pull rdma updates from Jason Gunthorpe: "Usual collection of small improvements and fixes: - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe, hf1, qib, ocrdma - bnxt_re support for MSN, which is a new retransmit logic - Initial mana support for RC qps - Use after free bug and cleanups in iwcm - Reduce resource usage in mlx5 when RDMA verbs features are not used - New verb to drain shared recieve queues, similar to normal recieve queues. This is necessary to allow ULPs a clean shutdown. Used in the iscsi rdma target - mlx5 support for more than 16 bits of doorbell indexes - Doorbell moderation support for bnxt_re - IB multi-plane support for mlx5 - New EFA adaptor PCI IDs - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't rename the device - A collection of hns bugs - Fix long standing bug in bnxt_re with incorrect endian handling of immediate data" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) IB/hfi1: Constify struct flag_table RDMA/mana_ib: Set correct device into ib bnxt_re: Fix imm_data endianness RDMA: Fix netdev tracker in ib_device_set_netdev RDMA/hns: Fix mbx timing out before CMD execution is completed RDMA/hns: Fix insufficient extend DB for VFs. RDMA/hns: Fix undifined behavior caused by invalid max_sge RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 RDMA/hns: Fix missing pagesize and alignment check in FRMR RDMA/hns: Fix unmatch exception handling when init eq table fails RDMA/hns: Fix soft lockup under heavy CEQE load RDMA/hns: Check atomic wr length RDMA/ocrdma: Don't inline statistics functions RDMA/core: Introduce "name_assign_type" for an IB device RDMA/qib: Fix truncation compilation warnings in qib_verbs.c RDMA/qib: Fix truncation compilation warnings in qib_init.c RDMA/efa: Add EFA 0xefa3 PCI ID RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports RDMA/mlx5: Add plane index support when querying PTYS registers ...
2024-07-15Merge tag 'aux-sysfs-irqs' of ↵Jakub Kicinski1-16/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== aux-sysfs-irqs Shay Says: ========== Introduce auxiliary bus IRQs sysfs Today, PCI PFs and VFs, which are anchored on the PCI bus, display their IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files. PCI subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored on the auxiliary bus. However, these PCI SFs lack such IRQ information on the auxiliary bus, leaving users without visibility into which IRQs are used by the SFs. This absence makes it impossible to debug situations and to understand the source of interrupts/SFs for performance tuning and debug. Additionally, the SFs are multifunctional devices supporting RDMA, network devices, clocks, and more, similar to their peer PCI PFs and VFs. Therefore, it is desirable to have SFs' IRQ information available at the bus/device level. To overcome the above limitations, this short series extends the auxiliary bus to display IRQ information in sysfs, similar to that of PFs and VFs. It adds an 'irqs' directory under the auxiliary device and includes an <irq_num> sysfs file within it. For example: $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ 50 51 52 53 54 55 56 57 58 Patch summary: patch-1 adds auxiliary bus to support irqs used by auxiliary device patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary bus ========== * tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Expose SFs IRQs driver core: auxiliary bus: show auxiliary device IRQs RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded net/mlx5: Reimplement write combining test ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-04RDMA/core: Introduce "name_assign_type" for an IB deviceMark Zhang1-2/+3
The name_assign_type indicates how the name is provided. Currently these types are supported: - RDMA_NAME_ASSIGN_TYPE_UNKNOWN: Unknown or not set; - RDMA_NAME_ASSIGN_TYPE_USER: Name is provided by the user; The user-created sub device, rxe and siw device has this type. When filling nl device info, it is set in the new attribute RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE. User-space tools like udev "rdma_rename" could check this attribute to determine if this device needs to be renamed or not. Signed-off-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/522591bef9a369cc8e5dcb77787e017bffee37fe.1719837610.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-07-01RDMA/mlx5: Add plane index support when querying PTYS registersMark Zhang1-5/+7
Support the new "plane_ind" field when querying port PTYS registers. This is needed when querying the rate of a plane port. Signed-off-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/1f703c36306aa46917fcd88eadbb23b3e380d526.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-07-01RDMA/mlx5: Support plane device and driver APIs to add and delete itMark Zhang1-3/+113
This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to add and delete a plane device respectively. A mlx5 plane device is a rdma SMI device; It provides the SMI capability through user MAD for it's parent, the logical multi-plane aggregated device. For a plane port: - It supports QP0 only; - When adding a plane device, all plane ports are added; - For some commands like mad_ifc, both plane_index and native portnum is needed; - When querying or modifying a plane port context, the native portnum must be used, as the query/modify_hca_vport_context command doesn't support plane port. Signed-off-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/e933cd0562aece181f8657af2ca0f5b387d0f14e.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-07-01RDMA/mlx5: Add support to multi-plane device and portMark Zhang1-9/+51
When multi-plane is supported, a logical port, which is aggregation of multiple physical plane ports, is exposed for data transmission. Compared with a normal mlx5 IB port, this logical port supports all functionalities except Subnet Management. Signed-off-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-27RDMA/mlx5: Send UAR page index as ioctl attributeAkiva Goldberger1-0/+1
Add UAR page index as a driver ioctl attribute to increase the number of supported indices, previously limited to 16 bits by mlx5_ib_create_cq struct. Link: https://lore.kernel.org/r/0e18b34d7ec3b1ae02d694b0d545aed7413c0ef7.1719512393.git.leon@kernel.org Signed-off-by: Akiva Goldberger <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-06-21RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_initYishai Hadas1-2/+2
Fix unwind flow as part of mlx5_ib_stage_init_init to use the correct goto upon an error. Fixes: 758ce14aee82 ("RDMA/mlx5: Implement MACsec gid addition and deletion") Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Patrisious Haddad <[email protected]> Link: https://lore.kernel.org/r/aa40615116eda14ec9eca21d52017d632ea89188.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-16IB/mlx5: Allocate resources just before first QP/SRQ is createdJianbo Liu1-43/+106
Previously, all IB dev resources are initialized on driver load. As they are not always used, move the initialization to the time when they are needed. To be more specific, move PD (p0) and CQ (c0) initialization to the time when the first SRQ is created. and move SRQs(s0 and s1) initialization to the time first QP is created. To avoid concurrent creations, two new mutexes are also added. Signed-off-by: Jianbo Liu <[email protected]> Link: https://lore.kernel.org/r/98c3e53a8cc0bdfeb6dec6e5bb8b037d78ab00d8.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-16IB/mlx5: Create UMR QP just before first reg_mr occursJianbo Liu1-1/+2
UMR QP is not used in some cases, so move QP and its CQ creations from driver load flow to the time first reg_mr occurs, that is when MR interfaces are first called. The initialization of dev->umrc.pd and dev->umrc.lock is still done in driver load because pd is needed for mlx5_mkey_cache_init and the lock is reused to protect against the concurrent creation. When testing 4G bytes memory registration latency with rtool [1] and 8 threads in parallel, there is minor performance degradation (<5% for the max latency) is seen for the first reg_mr with this change. Link: https://github.com/paravmellanox/rtool [1] Signed-off-by: Jianbo Liu <[email protected]> Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-16net/mlx5: Reimplement write combining testJianbo Liu1-16/+3
The test of write combining was added before in mlx5_ib driver. It opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When BlueFlame is used, WQEs get written directly to a PCI BAR of the device (in addition to memory) so that the device handles them without having to access memory. In this test, the WQEs written in memory are different from the ones written to the BlueFlame which request CQE update. By checking the completion reports posted on CQ, we can know if BlueFlame succeeds or not. The write combining must be supported if BlueFlame succeeds as its register is written using write combining. This patch reimplements the test in the same way, but using a pair of SQ and CQ only. It is moved to mlx5_core as a general feature used by both mlx5_core and mlx5_ib. Besides, save write combine test result of the PCI function, so that its thousands of child functions such as SF can query without paying the time and resource penalty by itself. The test function is called only after failing to get the cached result. With this enhancement, all thousands of SFs of the PF attached to same driver no longer need to perform WC check explicitly, which is already done in the system. This saves several commands per SF, thereby speeds up SF creation and also saves completion EQ creation. Signed-off-by: Jianbo Liu <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-05-02RDMA/mlx5: Remove NULL check before dev_{put, hold}Jules Irenge1-2/+1
Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12RDMA/mlx5: Expose register c0 for RDMA deviceMark Bloch1-0/+24
This patch introduces improvements for matching egress traffic sent by the local device. When applicable, all egress traffic from the local vport is now tagged with the provided value. This enhancement is particularly useful for FDB steering purposes. The primary focus of this update is facilitating the transmission of traffic from the hypervisor to a VF. To achieve this, one must initiate an SQ on the hypervisor and subsequently create a rule in the FDB that matches on the eswitch manager vport and the SQN of the aforementioned SQ. Obtaining the SQN can be had from SQ opened, and the eswitch manager vport match can be substituted with the register c0 value exposed by this patch. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://lore.kernel.org/r/aa4120a91c98ff1c44f1213388c744d4cb0324d6.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-10-31IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAFGeorge Kennedy1-3/+1
In the unlikely event that workqueue allocation fails and returns NULL in mlx5_mkey_cache_init(), delete the call to mlx5r_umr_resource_cleanup() (which frees the QP) in mlx5_ib_stage_post_ib_reg_umr_init(). This will avoid attempted double free of the same QP when __mlx5_ib_add() does its cleanup. Resolves a splat: Syzkaller reported a UAF in ib_destroy_qp_user workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642): failed to create work queue infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642): mr cache init failed -12 ================================================================== BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642 Call Trace: <TASK> kasan_report (mm/kasan/report.c:590) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... </TASK> Allocated by task 1642: __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026 mm/slab_common.c:1039) create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720 ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209) ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347) mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Freed by task 1642: __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076 drivers/infiniband/hw/mlx5/main.c:4065) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") Link: https://lore.kernel.org/r/[email protected] Suggested-by: Leon Romanovsky <[email protected]> Signed-off-by: George Kennedy <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-31Merge tag 'v6.6' into rdma.git for-nextJason Gunthorpe1-1/+1
Resolve conflict by taking the spin_lock hunk from for-next: https://lore.kernel.org/r/[email protected] Required for the next patch. Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24RDMA/mlx5: Change the key being sent for MPV device affiliationPatrisious Haddad1-1/+1
Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices. Instead use a unique value from inside the core device which is already initialized at this stage. Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad <[email protected]> Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-10-04IPsec packet offload support in multiport RoCE devicesLeon Romanovsky1-0/+17
This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. Thanks [1] https://lore.kernel.org/linux-rdma/[email protected]/ Link: https://lore.kernel.org/all/[email protected] Signed-of-by: Leon Romanovsky <[email protected]> * mlx5-next: (576 commits) net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode Linux 6.6-rc3 ...
2023-10-02RDMA/mlx5: Send events from IB driver about device affiliation statePatrisious Haddad1-0/+17
Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-09-26IB/mlx5: Rename 400G_8X speed to comply to naming conventionPatrisious Haddad1-1/+1
Rename 400G_8X speed to comply to naming convention. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-09-26IB/mlx5: Add support for 800G_8X lane speedPatrisious Haddad1-0/+4
Add a check for 800G_8X speed when querying PTYS and report it back correctly when needed. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Mark Zhang <[email protected]> Link: https://lore.kernel.org/r/26fd0b6e1fac071c3eb779657bb3d8ba47f47c4f.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-09-26RDMA/mlx5: Fix NULL string errorShay Drory1-1/+1
checkpath is complaining about NULL string, change it to 'Unknown'. Fixes: 37aa5c36aa70 ("IB/mlx5: Add UARs write-combining and non-cached mapping") Signed-off-by: Shay Drory <[email protected]> Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]>
2023-08-24Merge branch 'mlx5-next' of ↵Jakub Kicinski1-9/+32
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== mlx5 MACsec RoCEv2 support From Patrisious: This series extends previously added MACsec offload support to cover RoCE traffic either. In order to achieve that, we need configure MACsec with offload between the two endpoints, like below: REMOTE_MAC=10:70:fd:43:71:c0 * ip addr add 1.1.1.1/16 dev eth2 * ip link set dev eth2 up * ip link add link eth2 macsec0 type macsec encrypt on * ip macsec offload macsec0 mac * ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 * ip macsec add macsec0 rx port 1 address $REMOTE_MAC * ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5 * ip addr add 10.1.0.1/16 dev macsec0 * ip link set dev macsec0 up And in a similar manner on the other machine, while noting the keys order would be reversed and the MAC address of the other machine. RDMA traffic is separated through relevant GID entries and in case of IP ambiguity issue - meaning we have a physical GIDs and a MACsec GIDs with the same IP/GID, we disable our physical GID in order to force the user to only use the MACsec GID. v0: https://lore.kernel.org/netdev/[email protected]/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion net/mlx5: Add RoCE MACsec steering infrastructure in core net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic net/mlx5: Configure MACsec steering for egress RoCEv2 traffic IB/core: Reorder GID delete code for RoCE net/mlx5: Add MACsec priorities in RDMA namespaces RDMA/mlx5: Implement MACsec gid addition and deletion net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering net/mlx5: Remove netdevice from MACsec steering net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style net/mlx5: Remove dependency of macsec flow steering on ethernet net/mlx5e: Move MACsec flow steering operations to be used as core library macsec: add functions to get macsec real netdevice and check offload ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-20RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletionPatrisious Haddad1-1/+5
Add RoCE MACsec rules when a gid is added for the MACsec netdevice and handle their cleanup when the gid is removed or the MACsec SA is deleted. Also support alias IP for the MACsec device, as long as we don't have more ips than what the gid table can hold. In addition handle the case where a gid is added but there are still no SAs added for the MACsec device, so the rules are added later on when the SAs are added. Signed-off-by: Patrisious Haddad <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-08-20RDMA/mlx5: Implement MACsec gid addition and deletionPatrisious Haddad1-9/+28
Handle MACsec IP ambiguity issue, since mlx5 hw can't support programming both the MACsec and the physical gid when they have the same IP address, because it wouldn't know to whom to steer the traffic. Hence in such case we delete the physical gid from the hw gid table, which would then cause all traffic sent over it to fail, and we'll only be able to send traffic over the MACsec gid. Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Raed Salem <[email protected]> Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-08-07net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()Maher Sanalla1-1/+1
To accurately represent its purpose, rename the function that retrieves the value of maximum vectors from mlx5_comp_vectors_count() to mlx5_comp_vectors_max(). Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Shay Drory <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-06-11RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functionsMaher Sanalla1-0/+3
Delay drop data is initiated for PFs that have the capability of rq_delay_drop and are in roce profile. However, PFs with RAW ethernet profile do not initiate delay drop data on function load, causing kernel panic if delay drop struct members are accessed later on in case a dropless RQ is created. Thus, stage the delay drop initialization as part of RAW ethernet PF loading process. Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support") Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Link: https://lore.kernel.org/r/2e9d386785043d48c38711826eb910315c1de141.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-03-20IB/mlx5: Add support for 400G_8X lane speedMaher Sanalla1-0/+4
Currently, when driver queries PTYS to report which link speed is being used on its RoCE ports, it does not check the case of having 400Gbps transmitted over 8 lanes. Thus it fails to report the said speed and instead it defaults to report 10G over 4 lanes. Add a check for the said speed when querying PTYS and report it back correctly when needed. Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: Maher Sanalla <[email protected]> Reviewed-by: Aya Levin <[email protected]> Reviewed-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-11/+13
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe1-29/+59
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <[email protected]>
2023-02-17RDMA/mlx5: Use query_special_contexts for mkeysOr Har-Toov1-5/+5
Use query_sepcial_contexts to get the correct value of mkeys such as null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will change them in certain configurations. Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan1-2/+2
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [[email protected]: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-08RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregisterJiri Pirko1-24/+54
When removing a network namespace with mlx5 devlink instance being in it, following callchain is performed: cleanup_net (takes down_read(&pernet_ops_rwsem) devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one_devl_locked() mlx5_detach_device() del_adev() mlx5r_remove() __mlx5_ib_remove() mlx5_ib_roce_cleanup() mlx5_remove_netdev_notifier() unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem) This deadlocks. Resolve this by converting to register_netdevice_notifier_dev_net() which does not take pernet_ops_rwsem and moves the notifier block around according to netdev it takes as arg. Use previously introduced netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-02-06RDMA/mlx5: Remove impossible check of mkey cache cleanup failureLeon Romanovsky1-6/+1
mlx5_mkey_cache_cleanup() can't fail and can be changed to be void. Link: https://lore.kernel.org/r/1acd9528995d083114e7dec2a2afc59436406583.1675328463.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-01-15RDMA/mlx: Calling qp event handler in workqueue contextMark Zhang1-0/+7
Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Patrisious Haddad <[email protected]> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]>
2022-10-06Merge tag 'v6.0' into rdma.git for-nextJason Gunthorpe1-19/+17
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]>
2022-09-05RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profileMaher Sanalla1-1/+1
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: Maher Sanalla <[email protected]> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]>
2022-08-23IB/mlx5: Remove duplicate header inclusion related to ODPDaisuke Matsuda1-2/+1
rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h. This patch removes the redundant entries. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Daisuke Matsuda <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2022-08-16RDMA/mlx5: Use the proper number of portsMark Bloch1-18/+16
The cited commit allowed the driver to operate over HCAs that have 4 physical ports. Use the number of ports of the RDMA device in the for loop instead of using the struct size. Fixes: 4cd14d44b11d ("net/mlx5: Support devices with more than 2 ports") Link: https://lore.kernel.org/r/a54a56c2ede16044a29d119209b35189c662ac72.1659944855.git.leonro@nvidia.com Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2022-08-16RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkeyAharon Landau1-0/+3
According to the ib spec: If the CI supports the Base Memory Management Extensions defined in this specification, the L_Key format must consist of: 24 bit index in the most significant bits of the R_Key, and 8 bit key in the least significant bits of the R_Key Through a successful Allocate L_Key verb invocation, the CI must let the consumer own the key portion of the returned R_Key Therefore, when creating a mkey using DEVX, the consumer is allowed to change the key part. The kernel should compare only the index part of a R_Key to determine equality with another R_Key. Adding capability in order not to break backward compatibility. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com Signed-off-by: Aharon Landau <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau1-2/+2
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Aharon Landau <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-112/+12
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...