aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)AuthorFilesLines
2024-11-03RDMA/bnxt_re: Remove some dead codeChristophe JAILLET1-19/+0
If the probe succeeds, then auxiliary_get_drvdata() can't return a NULL pointer. So several NULL checks can be removed to simplify code. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://patch.msgid.link/f02eb630734ee530315dce9f60b078f631ae93d0.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <[email protected]>
2024-11-03RDMA/bnxt_re: Fix some error handling paths in bnxt_re_probe()Christophe JAILLET1-0/+8
If bnxt_re_add_device() fails, 'en_info' still needs to be freed, as already done in the .remove() function. The commit in Fixes incorrectly removed this call, certainly because it was expecting the .remove() function was called anyway. But if the probe fails, the remove function is not called. There is no need to call bnxt_re_remove() as it was done before, kfree() is enough. Fixes: a5e099e0c464 ("RDMA/bnxt_re: Fix an error path in bnxt_re_add_device") Signed-off-by: Christophe JAILLET <[email protected]> Link: https://patch.msgid.link/9e48ff955ae55fc39a9eb1eb590d374539eab5ba.1730477345.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <[email protected]>
2024-10-21RDMA/bnxt_re: synchronize the qp-handle table arraySelvin Xavier3-4/+15
There is a race between the CREQ tasklet and destroy qp when accessing the qp-handle table. There is a chance of reading a valid qp-handle in the CREQ tasklet handler while the QP is already moving ahead with the destruction. Fixing this race by implementing a table-lock to synchronize the access. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Fixes: 84cf229f4001 ("RDMA/bnxt_re: Fix the qp table indexing") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-21RDMA/bnxt_re: Fix the usage of control path spin locksSelvin Xavier1-15/+10
Control path completion processing always runs in tasklet context. To synchronize with the posting thread, there is no need to use the irq variant of spin lock. Use spin_lock_bh instead. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-21RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of downPatrisious Haddad1-2/+2
After the cited commit below max_dest_rd_atomic and max_rd_atomic values are being rounded down to the next power of 2. As opposed to the old behavior and mlx4 driver where they used to be rounded up instead. In order to stay consistent with older code and other drivers, revert to using fls round function which rounds up to the next power of 2. Fixes: f18e26af6aba ("RDMA/mlx5: Convert modify QP to use MLX5_SET macros") Link: https://patch.msgid.link/r/d85515d6ef21a2fa8ef4c8293dce9b58df8a6297.1728550179.git.leon@kernel.org Signed-off-by: Patrisious Haddad <[email protected]> Reviewed-by: Maher Sanalla <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-21RDMA/cxgb4: Dump vendor specific QP detailsLeon Romanovsky1-0/+1
Restore the missing functionality to dump vendor specific QP details, which was mistakenly removed in the commit mentioned in Fixes line. Fixes: 5cc34116ccec ("RDMA: Add dedicated QP resource tracker function") Link: https://patch.msgid.link/r/ed9844829135cfdcac7d64285688195a5cd43f82.1728323026.git.leonro@nvidia.com Reported-by: Dr. David Alan Gilbert <[email protected]> Closes: https://lore.kernel.org/all/Zv_4qAxuC0dLmgXP@gallifrey Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix the GID table lengthKalesh AP1-1/+8
GID table length is reported by FW. The gid index which is passed to the driver during modify_qp/create_ah is restricted by the sgid_index field of struct ib_global_route. sgid_index is u8 and the max sgid possible is 256. Each GID entry in HW will have 2 GID entries in the kernel gid table. So we can support twice the gid table size reported by FW. Also, restrict the max GID to 256 also. Fixes: 847b97887ed4 ("RDMA/bnxt_re: Restrict the max_gids to 256") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pagesBhargava Chenna Marreddy1-16/+3
Avoid memory corruption while setting up Level-2 PBL pages for the non MR resources when num_pages > 256K. There will be a single PDE page address (contiguous pages in the case of > PAGE_SIZE), but, current logic assumes multiple pages, leading to invalid memory access after 256K PBL entries in the PDE. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Bhargava Chenna Marreddy <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Change the sequence of updating the CQ toggle valueChandramohan Akula2-7/+6
Currently the CQ toggle value in the shared page (read by the userlib) is updated as part of the cqn_handler. There is a potential race of application calling the CQ ARM doorbell immediately and using the old toggle value. Change the sequence of updating CQ toggle value to update in the bnxt_qplib_service_nq function immediately after reading the toggle value to be in sync with the HW updated value. Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Chandramohan Akula <[email protected]> Reviewed-by: Selvin Xavier <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix an error path in bnxt_re_add_deviceKalesh AP1-9/+3
In bnxt_re_add_device(), when register netdev notifier fails, driver is not unregistering the IB device in the error cleanup path. Also, removed the duplicate cleanup in error path of bnxt_re_probe. Fixes: 94a9dc6ac8f7 ("RDMA/bnxt_re: Group all operations under add_device and remove_device") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loopSelvin Xavier1-0/+9
Driver waits indefinitely for the fifo occupancy to go below a threshold as soon as the pacing interrupt is received. This can cause soft lockup on one of the processors, if the rate of DB is very high. Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th if the loop is taking more time. Pacing will be continuing until the occupancy is below the threshold. This is ensured by the checks in bnxt_re_pacing_timer_exp and further scheduling the work for pacing based on the fifo occupancy. Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm") Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Chandramohan Akula <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix a possible NULL pointer dereferenceKalesh AP1-3/+3
There is a possibility of a NULL pointer dereference in the failure path of bnxt_re_add_device(). To address that, moved the update of "rdev->adev" to bnxt_re_dev_add(). Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information") Link: https://patch.msgid.link/r/[email protected] Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Return more meaningful errorKalesh AP1-1/+1
When the HWRM command fails, driver currently returns -EFAULT(Bad address). This does not look correct. Modified to return -EIO(I/O error). Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling") Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix incorrect dereference of srq in async eventKashyap Desai1-2/+5
Currently driver is not getting correct srq. Dereference only if qplib has a valid srq. Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors") Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Saravanan Vajravel <[email protected]> Reviewed-by: Chandramohan Akula <[email protected]> Signed-off-by: Kashyap Desai <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix out of bound checkKalesh AP1-1/+1
Driver exports pacing stats only on GenP5 and P7 adapters. But while parsing the pacing stats, driver has a check for "rdev->dbr_pacing". This caused a trace when KASAN is enabled. BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re] Write of size 8 at addr ffff8885942a6340 by task modprobe/4809 Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/bnxt_re: Fix the max CQ WQEs for older adaptersAbhishek Mohapatra2-0/+3
Older adapters doesn't support the MAX CQ WQEs reported by older FW. So restrict the value reported to 1M always for older adapters. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Abhishek Mohapatra<[email protected]> Reviewed-by: Chandramohan Akula <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/irdma: Fix misspelling of "accept*"Alexander Zubkov1-1/+1
There is "accept*" misspelled as "accpet*" in the comments. Fix the spelling. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Alexander Zubkov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-11RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARPAnumula Murali Mohan Reddy1-5/+4
ip_dev_find() always returns real net_device address, whether traffic is running on a vlan or real device, if traffic is over vlan, filling endpoint struture with real ndev and an attempt to send a connect request will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the issue by using vlan_dev_real_dev(). Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Anumula Murali Mohan Reddy <[email protected]> Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-10-08RDMA/bnxt_re: Fix the max WQEs used in Static WQE modeSelvin Xavier1-1/+5
max_sw_wqe used for static wqe mode should be same as the max_wqe. Calculate the max_sw_wqe only for the variable WQE mode. Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-10-08RDMA/bnxt_re: Add a check for memory allocationKalesh AP1-0/+2
__alloc_pbl() can return error when memory allocation fails. Driver is not checking the status on one of the instances. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Selvin Xavier <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-10-08RDMA/bnxt_re: Fix incorrect AVID type in WQE structureSaravanan Vajravel1-1/+1
Driver uses internal data structure to construct WQE frame. It used avid type as u16 which can accommodate up to 64K AVs. When outstanding AVID crosses 64K, driver truncates AVID and hence it uses incorrect AVID to WR. This leads to WR failure due to invalid AV ID and QP is moved to error state with reason set to 19 (INVALID AVID). When RDMA CM path is used, this issue hits QP1 and it is moved to error state Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Selvin Xavier <[email protected]> Reviewed-by: Chandramohan Akula <[email protected]> Signed-off-by: Saravanan Vajravel <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-10-08RDMA/bnxt_re: Fix a possible memory leakKalesh AP1-1/+4
In bnxt_re_setup_chip_ctx() when bnxt_qplib_map_db_bar() fails driver is not freeing the memory allocated for "rdev->chip_ctx". Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-27[tree-wide] finally take no_llseek outAl Viro2-3/+0
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-09-22RDMA/bnxt_re: Remove the unused variable en_devJiapeng Chong1-2/+0
Variable en_dev is not effectively used, so delete it. drivers/infiniband/hw/bnxt_re/main.c:1980:22: warning: variable ‘en_dev’ set but not used. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10867 Signed-off-by: Jiapeng Chong <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-16RDMA/irdma: fix error message in irdma_modify_qp_roce()Vitaliy Shevtsov1-1/+1
Use a correct field max_dest_rd_atomic instead of max_rd_atomic for the error output. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Vitaliy Shevtsov <[email protected]> Link: https://lore.kernel.org/stable/20240916165817.14691-1-v.shevtsov%40maxima.ru Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-16RDMA/cxgb4: Added NULL check for lookup_atidMikhail Lobanov1-0/+5
The lookup_atid() function can return NULL if the ATID is invalid or does not exist in the identifier table, which could lead to dereferencing a null pointer without a check in the `act_establish()` and `act_open_rpl()` functions. Add a NULL check to prevent null pointer dereferencing. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Signed-off-by: Mikhail Lobanov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-16RDMA/hns: Fix ah error counter in sw stat not increasingJunxian Huang1-5/+9
There are several error cases where hns_roce_create_ah() returns directly without jumping to sw stat path, thus leading to a problem that the ah error counter does not increase. Fixes: ee20cc17e9d8 ("RDMA/hns: Support DSCP") Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/bnxt_re: Recover the device when FW error is detectedSelvin Xavier3-31/+55
If the FW crashes, L2 driver gets notified and it notifies the RoCE driver. Currently driver doesn't re-initialize the device. Add support for re-initialize the RoCE device. RoCE device is removed and re-attached in the ulp_stop and ulp_start respectively. The recovery logic expects the RoCE driver to be registered with L2 driver while its being removed. So the driver avoids unregistering with L2 driver in the recovery path. Signed-off-by: Chandramohan Akula <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/bnxt_re: Group all operations under add_device and remove_deviceSelvin Xavier1-32/+33
Adding and removing device need to be handled from multiple contexts when Firmware error recovery is supported. So group all the add and remove operations to add_device and remove_device function. Signed-off-by: Chandramohan Akula <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/bnxt_re: Use the aux device for L2 ULP callbacksChandramohan Akula1-6/+20
While registering with the L2 for ULP operations, use the aux device pointer as the handle. Aux device has the data bnxt_re_en_dev_info, which is used to store required information for the bnxt_re_suspend and bnxt_re_resume functions. Signed-off-by: Chandramohan Akula <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Kashyap Desai <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/bnxt_re: Change aux driver data to en_info to hold more informationChandramohan Akula2-11/+66
rdev will be destroyed and recreated during the FW error recovery scenarios. So to keep the state, if any, use an en_info structure which gets created/freed based on auxiliary device initialization/de-initialization. Signed-off-by: Chandramohan Akula <[email protected]> Reviewed-by: Kashyap Desai <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Use IB set_netdev and get_netdev functionsChiara Meiohas3-67/+142
The IB layer provides a common interface to store and get net devices associated to an IB device port (ib_device_set_netdev() and ib_device_get_netdev()). Previously, mlx5_ib stored and managed the associated net devices internally. Replace internal net device management in mlx5_ib with ib_device_set_netdev() when attaching/detaching a net device and ib_device_get_netdev() when retrieving the net device. Export ib_device_get_netdev(). For mlx5 representors/PFs/VFs and lag creation we replace the netdev assignments with the IB set/get netdev functions. In active-backup mode lag the active slave net device is stored in the lag itself. To assure the net device stored in a lag bond IB device is the active slave we implement the following: - mlx5_core: when modifying the slave of a bond we send the internal driver event MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE. - mlx5_ib: when catching the event call ib_device_set_netdev() This patch also ensures the correct IB events are sent in switchdev lag. While at it, when in multiport eswitch mode, only a single IB device is created for all ports. The said IB device will receive all netdev events of its VFs once loaded, thus to avoid overwriting the mapping of PF IB device to PF netdev, ignore NETDEV_REGISTER events if the ib device has already been mapped to a netdev. Signed-off-by: Chiara Meiohas <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creationChiara Meiohas2-1/+3
phys_port_cnt of the IB device must be initialized before calling ib_device_set_netdev(). Previously, phys_port_cnt was initialized in the mlx5_ib init function. Remove this initialization to allow setting it separately, providing the flexibility to call ib_device_set_netdev before registering the IB device. Signed-off-by: Chiara Meiohas <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Obtain upper net device only when neededMark Bloch1-1/+1
Report the upper device's state as the RDMA port state only in RoCE LAG or switchdev LAG. Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Check RoCE LAG status before getting netdevMark Bloch1-6/+13
Check if RoCE LAG is active before calling the LAG layer for netdev. This clarifies if LAG is active. No behavior changes with this patch. Signed-off-by: Mark Bloch <[email protected]> Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-13RDMA/mlx5: Consider the query_vuid cap for data_directYishai Hadas1-2/+4
Consider also the query_vuid cap before enabling the data_direct functionality. This may prevent a syndrome from the FW in case the query_vuid command is not supported. (e.g. migratable VF) Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Gal Shalom <[email protected]> Link: https://patch.msgid.link/274c4f6f1ac0b1078243dd296695a49dbe58e7d1.1725907637.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11RDMA/mlx5: Add implicit MR handling to ODP memory schemeMichael Guralnik2-8/+111
Implicit MRs in ODP memory scheme require allocating a private null mkey and assigning the mkey and va differently in the KSM mkey. The page faults are received on the null mkey so we also add storing the null mkey in the odp_mkey xarray. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11RDMA/mlx5: Add handling for memory scheme page fault eventsMichael Guralnik1-6/+114
The memory scheme page fault event is a new approch in handling page fault on mkeys using the on-demand-paging feature. The major shift in handling the page fault in this scheme is that the HW is taking responsibilty for parsing the faulted mkey instead of the previous approach where the driver would read and parse the wqes and query the mkeys to get to the direct mkey that we need to handle. Therefore, the event we get from FW in this scheme will contain the direct mkey and address we need to handle and require much less work from driver. Additionally, to optimize performance, the FW can generate the event on a memory area that is larger than the faulted memory operation is requiring, to 'prefetch' memory that is around it and will likely be used soon. Unlike previous types of page fault, the memory page scheme fault does not always require a resume command after handling the page fault as the FW can post multiple events on same mkey and will set the 'last' flag only on the page fault that requires the resume command. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11RDMA/mlx5: Split ODP mkey search logicMichael Guralnik1-26/+39
Split the search for the ODP mkey when handling an rdma type page fault to a helper function, later to be used in other page fault types. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11RDMA/mlx5: Enforce umem boundaries for explicit ODP page faultsMichael Guralnik1-9/+16
The new memory scheme page faults are requesting the driver to fetch additinal pages to the faulted memory access. This is done in order to prefetch pages before and after the area that got the page fault, assuming this will reduce the total amount of page faults. The driver should ensure it handles only the pages that are within the umem range. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11RDMA/mlx5: Add new ODP memory scheme eqe formatMichael Guralnik1-19/+29
Add new fields to support the new memory scheme page fault and extend the token field to u64 as in the new scheme the token is 48 bit. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11net/mlx5: Expose HW bits for Memory scheme ODPMichael Guralnik1-18/+22
Expose IFC bits to support the new memory scheme on demand paging. Change the macro reading odp capabilities to be able to read from the new IFC layout and align the code in upper layers to be compiled. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-11net/mlx5: Expand mkey page size to support 6 bitsMichael Guralnik3-18/+21
Protect the usage of the 6th bit with the relevant capability to ensure we are using the new page sizes with FW that supports the bit extension. Signed-off-by: Michael Guralnik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Fix restricted __le16 degrades to integer issueJunxian Huang1-2/+2
Fix sparse warnings: restricted __le16 degrades to integer. Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10IB/qib: Remove unused declarations in header fileZhang Zekun1-4/+0
The definition of qib_rc_rnr_retry() has been removed since commit b4238e70579c ("IB/qib: Use new rdmavt timers"). Also, the definition of mr_rcu_callback() has been remove since commit 7c2e11fe2dbe ("IB/qib: Remove qp and mr functionality from qib"). So, let's remove the unused declartions. Signed-off-by: Zhang Zekun <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Optimize hem allocation performanceJunxian Huang1-4/+6
When allocating MTT hem, for each hop level of each hem that is being allocated, the driver iterates the hem list to find out whether the bt page has been allocated in this hop level. If not, allocate a new one and splice it to the list. The time complexity is O(n^2) in worst cases. Currently the allocation for-loop uses 'unit' as the step size. This actually has taken into account the reuse of last-hop-level MTT bt pages by multiple buffer pages. Thus pages of last hop level will never have been allocated, so there is no need to iterate the hem list in last hop level. Removing this unnecessary iteration can reduce the time complexity to O(n). Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Fix 1bit-ECC recovery address in non-4K OSChengchang Tang1-1/+1
The 1bit-ECC recovery address read from HW only contain bits 64:12, so it should be fixed left-shifted 12 bits when used. Currently, the driver will shift the address left by PAGE_SHIFT when used, which is wrong in non-4K OS. Fixes: 2de949abd6a5 ("RDMA/hns: Recover 1bit-ECC error of RAM on chip") Signed-off-by: Chengchang Tang <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Fix VF triggering PF reset in abnormal interrupt handlerJunxian Huang1-2/+5
In abnormal interrupt handler, a PF reset will be triggered even if the device is a VF. It should be a VF reset. Fixes: 2b9acb9a97fe ("RDMA/hns: Add the process of AEQ overflow for hip08") Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Fix spin_unlock_irqrestore() called with IRQs enabledChengchang Tang1-8/+8
Fix missuse of spin_lock_irq()/spin_unlock_irq() when spin_lock_irqsave()/spin_lock_irqrestore() was hold. This was discovered through the lock debugging, and the corresponding log is as follows: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 96 PID: 2074 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40 ... Call trace: warn_bogus_irq_restore+0x30/0x40 _raw_spin_unlock_irqrestore+0x84/0xc8 add_qp_to_list+0x11c/0x148 [hns_roce_hw_v2] hns_roce_create_qp_common.constprop.0+0x240/0x780 [hns_roce_hw_v2] hns_roce_create_qp+0x98/0x160 [hns_roce_hw_v2] create_qp+0x138/0x258 ib_create_qp_kernel+0x50/0xe8 create_mad_qp+0xa8/0x128 ib_mad_port_open+0x218/0x448 ib_mad_init_device+0x70/0x1f8 add_client_context+0xfc/0x220 enable_device_and_get+0xd0/0x140 ib_register_device.part.0+0xf4/0x1c8 ib_register_device+0x34/0x50 hns_roce_register_device+0x174/0x3d0 [hns_roce_hw_v2] hns_roce_init+0xfc/0x2c0 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x7c/0x1d0 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0x9c/0x180 [hns_roce_hw_v2] Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-09-10RDMA/hns: Fix the overflow risk of hem_list_calc_ba_range()wenglianfa1-6/+6
The max value of 'unit' and 'hop_num' is 2^24 and 2, so the value of 'step' may exceed the range of u32. Change the type of 'step' to u64. Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: wenglianfa <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>