aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)AuthorFilesLines
2024-10-11RDMA/srpt: Make slab cache names uniqueBart Van Assche1-12/+68
Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y"), slab complains about duplicate cache names. Hence this patch. The approach is as follows: - Maintain an xarray with the slab size as index and a reference count and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem cache name. - Use 512-byte alignment for all slabs instead of only for some of the slabs. - Increment the reference count instead of calling kmem_cache_create(). - Decrement the reference count instead of calling kmem_cache_destroy(). Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data") Link: https://patch.msgid.link/r/[email protected] Reported-by: Shinichiro Kawasaki <[email protected]> Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/ Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-09-10IB/iser: Remove unused declaration in header fileZhang Zekun1-4/+0
The definition of iser_finalize_rdma_unaligned_sg() has been removed since commit dd0107a08996 ("IB/iser: set block queue_virt_boundary"). Let's remove the unused declaration in header file. Signed-off-by: Zhang Zekun <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Kalesh AP <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Acked-by: Max Gurtovoy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Remove an extra spaceJack Wang1-1/+1
No functional changes. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Alexei Pastuchov <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Do local invalidate after write io completionJack Wang1-13/+7
Switch local invalidate after write io completion avoid the chain usage of WR, this fixed the local protection error on LOCAL INVALIDATE WR. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs: Register ib event handlerGrzegorz Prajsner5-2/+58
Use ib_register_event_handler() to register event handlers for both client and server side. For now, all those handlers do, is to print type of incoming event. Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-srv: Avoid null pointer deref during path establishmentMd Haris Iqbal1-2/+11
For RTRS path establishment, RTRS client initiates and completes con_num of connections. After establishing all its connections, the information is exchanged between the client and server through the info_req message. During this exchange, it is essential that all connections have been established, and the state of the RTRS srv path is CONNECTED. So add these sanity checks, to make sure we detect and abort process in error scenarios to avoid null pointer deref. Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Print request type for errorsJack Wang1-2/+4
Extend the output to print also the request type. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Reset cid to con_num - 1 to stay in boundsMd Haris Iqbal1-0/+6
In the function init_conns(), after the create_con() and create_cm() for loop if something fails. In the cleanup for loop after the destroy tag, we access out of bound memory because cid is set to clt_path->s.con_num. This commits resets the cid to clt_path->s.con_num - 1, to stay in bounds in the cleanup loop later. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Reuse need_inval from mrJack Wang2-10/+9
mr has a member need_inval, which can be used to indicate if local invalidate is needed, switch to it and remove need_inv from rtrs_clt_io_req. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs: Reset hb_missed_cnt after receiving other traffic from peerJack Wang2-1/+3
Reset hb_missed_cnt after receiving traffic from other peer, so hb is more robust again high load on host or network. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Rate limit errors in IO pathJack Wang1-3/+3
On network errors, a large number of these logs are printed due to all the inflight IOs, rate limit them so they do not clutter kernel log. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs-clt: Fix need_inv setting in error caseJack Wang1-11/+9
In some cases need_inv can be missed for write requests, additionally driver has to handle missing invalidates for write requests. While at it, remove the else case from write invalidate path as it is possible to reach there. Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-28RDMA/rtrs: For HB error add additional clt/srv specific loggingMd Haris Iqbal2-0/+6
In case of HB error, we need to know the specific path on which it happened, for better debugging. Since the clt/srv path structures are not available in rtrs.c, it needs to be done in the individual HB error handler. This commit add those loging. A sample kernel log output after this commit: rtrs_core L357: <blya>: HB missed max reached. rtrs_server L717: <blya>: HB err handler for path=ip:x.x.x.x@ip:x.x.x.x . . rtrs_core L357: <blya>: HB missed max reached. rtrs_client L1519: <blya>: HB err handler for path=ip:x.x.x.x@ip:x.x.x.x Signed-off-by: Md Haris Iqbal <[email protected]> Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-08-19RDMA/ipoib: Remove unused declarationsZhang Zekun1-4/+0
There are some declarations without function definition, which are listed as below: 1. ipoib_ib_tx_timer_func() has also been removed since commit 8966e28d2e40 ("IB/ipoib: Use NAPI in UD/TX flows") 2. ipoib_pkey_event() has been removed since commit ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") 3. ipoib_mcast_dev_down() has been removed since commit 988bd50300ef ("IPoIB: Fix memory leak of multicast group structures") 4. ipoib_pkey_open() has been removed since commit dd57c9308aff ("IB/ipoib: Avoid multicast join attempts with invalid P_key") Remove these unused declarations. Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Zhang Zekun <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-06-26IB/isert: remove the handling of last WQE reached eventMax Gurtovoy1-3/+0
This event is raised for QPs that are associated with a Shared RQ (SRQ). The iSER target does not support SRQ. Remove this dead code. Signed-off-by: Max Gurtovoy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2024-05-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-4/+7
Pull rdma updates from Jason Gunthorpe: "Aside from the usual things this has an arch update for __iowrite64_copy() used by the RDMA drivers. This API was intended to generate large 64 byte MemWr TLPs on PCI. These days most processors had done this by just repeating writel() in a loop. S390 and some new ARM64 designs require a special helper to get this to generate. - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits) RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw RDMA/IPoIB: Fix format truncation compilation errors bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq RDMA/efa: Support QP with unsolicited write w/ imm. receive IB/hfi1: Remove generic .ndo_get_stats64 IB/hfi1: Do not use custom stat allocator RDMA/hfi1: Use RMW accessors for changing LNKCTL2 RDMA/mana_ib: implement uapi for creation of rnic cq RDMA/mana_ib: boundary check before installing cq callbacks RDMA/mana_ib: introduce a helper to remove cq callbacks RDMA/mana_ib: create and destroy RNIC cqs RDMA/mana_ib: create EQs for RNIC CQs RDMA/core: Remove NULL check before dev_{put, hold} RDMA/ipoib: Remove NULL check before dev_{put, hold} RDMA/mlx5: Remove NULL check before dev_{put, hold} RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. RDMA/core: Add an option to display driver-specific QPs in the rdmatool RDMA/efa: Add shutdown notifier RDMA/mana_ib: Fix missing ret value IB/mlx5: Use __iowrite64_copy() for write combining stores ...
2024-05-12RDMA/IPoIB: Fix format truncation compilation errorsLeon Romanovsky1-2/+6
Truncate the device name to store IPoIB VLAN name. [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 allmodconfig [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 W=1 drivers/infiniband/ulp/ipoib/ drivers/infiniband/ulp/ipoib/ipoib_vlan.c: In function ‘ipoib_vlan_add’: drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:52: error: ‘%04x’ directive output may be truncated writing 4 bytes into a region of size between 0 and 15 [-Werror=format-truncation=] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:48: note: directive argument in the range [0, 65535] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:9: note: ‘snprintf’ output between 6 and 21 bytes into a destination of size 16 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | ppriv->dev->name, pkey); | ~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/ulp/ipoib/ipoib_vlan.o] Error 1 make[6]: *** Waiting for unfinished jobs.... Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Link: https://lore.kernel.org/r/e9d3e1fef69df4c9beaf402cc3ac342bad680791.1715240029.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet1-2/+2
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Simon Horman <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Shannon Nelson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02RDMA/ipoib: Remove NULL check before dev_{put, hold}Jules Irenge1-2/+1
Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-03-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds3-5/+3
Pull rdma updates from Jason Gunthorpe: "Very small update this cycle: - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5, irdma, rxe, rtrs, mana - Simplify the hns hem mechanism - Fix EFA's MSI-X allocation in resource constrained configurations - Fix a KASN splat in srpt - Narrow hns's congestion control selection to QPs granularity and allow userspace to select it - Solve a parallel module loading race between the CM module and a driver module - Flexible array cleanup - Dump hns's SCC Conext to 'rdma res' for debugging - Make mana build page lists for HW objects that require a 0 offset correctly - Stuck CM ID debugging" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits) RDMA/cm: add timeout to cm_destroy_id wait RDMA/mana_ib: Use virtual address in dma regions for MRs RDMA/mana_ib: Fix bug in creation of dma regions RDMA/hns: Append SCC context to the raw dump of QPC RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store() RDMA/uverbs: Remove flexible arrays from struct *_filter RDMA/device: Fix a race between mad_client and cm_client init RDMA/hns: Fix mis-modifying default congestion control algorithm RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user RDMA/srpt: Do not register event handler until srpt device is fully setup RDMA/irdma: Remove duplicate assignment RDMA/efa: Limit EQs to available MSI-X vectors RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function RDMA/mana_ib: Introduce mana_ib_get_netdev helper function RDMA/mana_ib: Introduce mdev_to_gc helper function RDMA/hns: Simplify 'struct hns_roce_hem' allocation ...
2024-02-26rtnetlink: prepare nla_put_iflink() to run under RCUEric Dumazet1-2/+2
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-25RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()Alexey Kodanev1-1/+1
strnlen() may return 0 (e.g. for "\0\n" string), it's better to check the result of strnlen() before using 'len - 1' expression for the 'buf' array index. Detected using the static analysis tool - Svace. Fixes: dc3b66a0ce70 ("RDMA/rtrs-clt: Add a minimum latency multipath policy") Signed-off-by: Alexey Kodanev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Jack Wang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-02-14RDMA/srpt: fix function pointer cast warningsArnd Bergmann1-4/+5
clang-16 notices that srpt_qp_event() gets called through an incompatible pointer here: drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1815 | = (void(*)(struct ib_event *, void*))srpt_qp_event; Change srpt_qp_event() to use the correct prototype and adjust the argument inside of it. Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-02-05RDMA/srpt: Support specifying the srpt_service_guid parameterBart Van Assche1-2/+6
Make loading ib_srpt with this parameter set work. The current behavior is that setting that parameter while loading the ib_srpt kernel module triggers the following kernel crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> parse_one+0x18c/0x1d0 parse_args+0xe1/0x230 load_module+0x8de/0xa60 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x181/0x240 __x64_sys_finit_module+0x5a/0xb0 do_syscall_64+0x5f/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Cc: LiHonggang <[email protected]> Reported-by: LiHonggang <[email protected]> Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2024-02-04RDMA/srpt: Do not register event handler until srpt device is fully setupWilliam Kucharski1-2/+1
Upon rare occasions, KASAN reports a use-after-free Write in srpt_refresh_port(). This seems to be because an event handler is registered before the srpt device is fully setup and a race condition upon error may leave a partially setup event handler in place. Instead, only register the event handler after srpt device initialization is complete. Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: William Kucharski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-01-25RDMA/ipoib: Print symbolic error name instead of error codeChristian Heusel1-2/+1
Utilize the %pe print specifier to get the symbolic error name as a string (i.e "-ENOMEM") in the log message instead of the error code to increase its readability. This change was suggested in https://lore.kernel.org/all/[email protected]/ Signed-off-by: Christian Heusel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Dan Carpenter <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-01-04IB/iser: Prevent invalidating wrong MRSergey Gorenko4-8/+8
The iser_reg_resources structure has two pointers to MR but only one mr_valid field. The implementation assumes that we use only *sig_mr when pi_enable is true. Otherwise, we use only *mr. However, it is only sometimes correct. Read commands without protection information occur even when pi_enble is true. For example, the following SCSI commands have a Data-In buffer but never have protection information: READ CAPACITY (16), INQUIRY, MODE SENSE(6), MAINTENANCE IN. So, we use *sig_mr for some SCSI commands and *mr for the other SCSI commands. In most cases, it works fine because the remote invalidation is applied. However, there are two cases when the remote invalidation is not applicable. 1. Small write commands when all data is sent as an immediate. 2. The target does not support the remote invalidation feature. The lazy invalidation is used if the remote invalidation is impossible. Since, at the lazy invalidation, we always invalidate the MR we want to use, the wrong MR may be invalidated. To fix the issue, we need a field per MR that indicates the MR needs invalidation. Since the ib_mr structure already has such a field, let's use ib_mr.need_inval instead of iser_reg_resources.mr_valid. Fixes: b76a439982f8 ("IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover") Link: https://lore.kernel.org/r/[email protected] Acked-by: Max Gurtovoy <[email protected]> Signed-off-by: Sergey Gorenko <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-26IB/iser: iscsi_iser.h: fix kernel-doc warning and spellosRandy Dunlap1-3/+2
Drop one kernel-doc comment to prevent a warning: iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device' and spell 2 words correctly (buffer and deferred). Signed-off-by: Randy Dunlap <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Max Gurtovoy <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Sagi Grimberg <[email protected]> Acked-by: Max Gurtovoy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12IB/ipoib: Fix mcast list lockingDaniel Vacek1-5/+1
Releasing the `priv->lock` while iterating the `priv->multicast_list` in `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to remove the items while in the middle of iteration. If the mcast is removed while the lock was dropped, the for loop spins forever resulting in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) -----------------------------------+----------------------------------- ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...) list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev) &priv->multicast_list, list) | ipoib_mcast_join(dev, mcast) | spin_unlock_irq(&priv->lock) | | spin_lock_irqsave(&priv->lock, flags) | list_for_each_entry_safe(mcast, tmcast, | &priv->multicast_list, list) | list_del(&mcast->list); | list_add_tail(&mcast->list, &remove_list) | spin_unlock_irqrestore(&priv->lock, flags) spin_lock_irq(&priv->lock) | | ipoib_mcast_remove_list(&remove_list) (Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast, `priv->multicast_list` and we keep | remove_list, list) spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done) the other thread which is blocked | and the list is still valid on | it's stack.) Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent eventual sleeps. Unfortunately we could not reproduce the lockup and confirm this fix but based on the code review I think this fix should address such lockups. crash> bc 31 PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" -- [exception RIP: ipoib_mcast_join_task+0x1b1] RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 work (&priv->mcast_task{,.work}) RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 &mcast->list RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 mcast R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 dev priv (&priv->lock) &priv->multicast_list (aka head) ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f199a8c7e68 ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 (empty) crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, mcast_mutex.owner.counter = 0xff1c69998efec000 crash> b 8 PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" -- #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f1980153e68 ff646f1980153e68: ff1c6a1a04dc83f0 <<< work = &priv->flush_light crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>, broadcast = 0x0, The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`): crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - - ff1c6a192bd0c200 done.done = 0x0, ff1c6a192d60ac00 done.done = 0x0, Reported-by: Yuya Fujita-bishamonten <[email protected]> Signed-off-by: Daniel Vacek <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12Expose c0 and SW encap ICM for RDMALeon Romanovsky2-14/+30
These two series from Mark and Shun extend RDMA mlx5 API. Mark's series provides c0 register used to match egress traffic sent by local device. Shun's series adds new type for ICM area. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-26RDMA/IPoIB: Add tx timeout work to recover queue stop situationJack Wang3-3/+60
As we sometime run into TX timeout from IPoIB, queue seems stopped and can't recover. Diff with Mellanox OFED show Mellanox driver has timeout work to recover in such case. Add TX timeout work/NAPI work to recover such case. Also increase the watchdog_timeo to 10 seconds, so more tolerant to error. Signed-off-by: Jack Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-26RDMA/IPoIB: Fix error code return in ipoib_mcast_joinJack Wang1-0/+1
Return the error code in case of ib_sa_join_multicast fail. Signed-off-by: Jack Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs: Use %pe to print errorsSupriti Singh1-2/+2
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Use %pe to print errorsSupriti Singh1-6/+4
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Remove the warnings for req in_use checkJack Wang1-1/+1
As we chain the WR during write request: memory registration, rdma write, local invalidate, if only the last WR fail to send due to send queue overrun, the server can send back the reply, while client mark the req->in_use to false in case of error in rtrs_clt_req when error out from rtrs_post_rdma_write_sg. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Fix the max_send_wr settingJack Wang1-1/+1
For each write request, we need Request, Response Memory Registration, Local Invalidate. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flightMd Haris Iqbal1-1/+2
Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Santosh Kumar Pradhan <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is trueMd Haris Iqbal1-1/+4
Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Check return values while processing info requestMd Haris Iqbal1-6/+18
While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Start hb after path_upJack Wang1-2/+1
If we start hb too early, it will confuse server side to close the session. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Do not unconditionally enable irqJack Wang1-2/+3
When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Florian-Ewald Mueller <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-19RDMA/rtrs-clt: Add warning logs for RDMA eventsMd Haris Iqbal1-0/+2
Some RDMA CM events can trigger connection close or recovery for a certain rtrs_clt_path. Such close/recovery triggers should happen after an appropriate log message, since they can lead to IO failures. Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds5-5/+11
Pull rdma updates from Jason Gunthorpe: "Nothing exciting this cycle, most of the diffstat is changing SPDX 'or' to 'OR'. Summary: - Bugfixes for hns, mlx5, and hfi1 - Hardening patches for size_*, counted_by, strscpy - rts fixes from static analysis - Dump SRQ objects in rdma netlink, with hns support - Fix a performance regression in mlx5 MR deregistration - New XDR (200Gb/lane) link speed - SRQ record doorbell latency optimization for hns - IPSEC support for mlx5 multi-port mode - ibv_rereg_mr() support for irdma - Affiliated event support for bnxt_re - Opt out for the spec compliant qkey security enforcement as we discovered SW that breaks under enforcement - Comment and trivial updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (50 commits) IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF RDMA/mlx5: Fix mkey cache WQ flush RDMA/hfi1: Workaround truncation compilation error IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock RDMA/core: Remove NULL check before dev_{put, hold} RDMA/hfi1: Remove redundant assignment to pointer ppd RDMA/mlx5: Change the key being sent for MPV device affiliation RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error() RDMA/hns: Fix init failure of RoCE VF and HIP08 RDMA/hns: Fix unnecessary port_num transition in HW stats allocation RDMA/hns: The UD mode can only be configured with DCQCN RDMA/hns: Add check for SL RDMA/hns: Fix signed-unsigned mixed comparisons RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common() RDMA/hns: Fix printing level of asynchronous events RDMA/core: Add support to set privileged QKEY parameter RDMA/bnxt_re: Do not report SRQ error in srq notification RDMA/bnxt_re: Report async events and errors RDMA/bnxt_re: Update HW interface headers IB/mlx5: Fix rdma counter binding for RAW QP ...
2023-11-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+3
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc, scsi_debug) plus the usual assorted minor fixes and updates. The major change this time around is a prep patch for rethreading of the driver reset handler API not to take a scsi_cmd structure which starts to reduce various drivers' dependence on scsi_cmd in error handling" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits) scsi: ufs: core: Leave space for '\0' in utf8 desc string scsi: ufs: core: Conversion to bool not necessary scsi: ufs: core: Fix race between force complete and ISR scsi: megaraid: Fix up debug message in megaraid_abort_and_reset() scsi: aic79xx: Fix up NULL command in ahd_done() scsi: message: fusion: Initialize return value in mptfc_bus_reset() scsi: mpt3sas: Fix loop logic scsi: snic: Remove useless code in snic_dr_clean_pending_req() scsi: core: Add comment to target_destroy in scsi_host_template scsi: core: Clean up scsi_dev_queue_ready() scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler() scsi: target: core: Fix kernel-doc comment scsi: pmcraid: Fix kernel-doc comment scsi: core: Handle depopulation and restoration in progress scsi: ufs: core: Add support for parsing OPP scsi: ufs: core: Add OPP support for scaling clocks and regulators scsi: ufs: dt-bindings: common: Add OPP table scsi: scsi_debug: Add param to control sdev's allow_restart scsi: scsi_debug: Add debugfs interface to fail target reset scsi: scsi_debug: Add new error injection type: Reset LUN failed ...
2023-10-31Merge tag 'v6.6' into rdma.git for-nextJason Gunthorpe1-11/+5
Resolve conflict by taking the spin_lock hunk from for-next: https://lore.kernel.org/r/[email protected] Required for the next patch. Signed-off-by: Jason Gunthorpe <[email protected]>
2023-10-24RDMA/core: Remove NULL check before dev_{put, hold}Yang Li1-2/+1
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047 Signed-off-by: Yang Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-10-13scsi: target: Have drivers report if they support direct submissionsMike Christie1-0/+3
In some cases, like with multiple LUN targets or where the target has to respond to transport level requests from the receiving context it can be better to defer cmd submission to a helper thread. If the backend driver blocks on something like request/tag allocation it can block the entire target submission path and other LUs and transport IO on that session. In other cases like single LUN targets with storage that can support all the commands that the target can queue, then it's best to submit the cmd to the backend from the target's cmd receiving context. Subsequent commits will allow the user to config what they prefer, but drivers like loop can't directly submit because they can be called from a context that can't sleep. And, drivers like vhost-scsi can support direct submission, but need to keep their default behavior of deferring execution to avoid possible regressions where the backend can block. Make the drivers tell LIO core if they support direct submissions and their current default, so we can prevent users from misconfiguring the system and initialize devices correctly. Signed-off-by: Mike Christie <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>
2023-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-11/+5
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c 829955981c55 ("bpf: Fix verifier log for async callback return values") a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-11netdev: replace napi_reschedule with napi_scheduleChristian Marangi1-2/+2
Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <[email protected]> Acked-by: Jeff Johnson <[email protected]> # ath10k Acked-by: Nick Child <[email protected]> # ibm Acked-by: Marc Kleine-Budde <[email protected]> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-02IB/srp: Annotate struct srp_fr_pool with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct srp_fr_pool. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Bart Van Assche <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>