aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2023-12-30RDMA/erdma: Add hardware statistics supportCheng Xu4-0/+133
First, we add a new command to query hardware statistics, and then implement two functions: ib_device_ops.alloc_hw_port_stats and ib_device_ops.get_hw_stats to allow rdma tool can get the statistics of erdma device. Signed-off-by: Cheng Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-30RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requestsCheng Xu3-2/+26
Hardware response, such as the result of query statistics, may be too long to be directly accommodated within the CQE structure. To address this, we introduce a DMA pool to hold the hardware's responses of CMDQ requests. Signed-off-by: Cheng Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-26IB/iser: iscsi_iser.h: fix kernel-doc warning and spellosRandy Dunlap1-3/+2
Drop one kernel-doc comment to prevent a warning: iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device' and spell 2 words correctly (buffer and deferred). Signed-off-by: Randy Dunlap <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Max Gurtovoy <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Sagi Grimberg <[email protected]> Acked-by: Max Gurtovoy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-20RDMA/mana_ib: Add CQ interrupt support for RAW QPLong Li3-7/+102
At probing time, the MANA core code allocates EQs for supporting interrupts on Ethernet queues. The same interrupt mechanisum is used by RAW QP. Use the same EQs for delivering interrupts on the CQ for the RAW QP. Signed-off-by: Long Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-20RDMA/mana_ib: query device capabilitiesLong Li5-16/+112
With RDMA device registered, use it to query on hardware capabilities and cache this information for future query requests to the driver. Signed-off-by: Long Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-20RDMA/mana_ib: register RDMA device with GDMALong Li3-14/+29
Software client needs to register with the RDMA management interface on the SoC to access more features, including querying device capabilities and RC queue pair. Signed-off-by: Long Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-20RDMA/bnxt_re: Fix the sparse warningsSelvin Xavier2-2/+2
Fix the following warnings reported drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: warning: invalid assignment: |= drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: left side has type restricted __le16 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: right side has type unsigned long ... drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: warning: invalid assignment: |= drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: left side has type restricted __le64 drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: right side has type unsigned long long Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 07f830ae4913 ("RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters") Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-20RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applicationsSelvin Xavier1-3/+5
User Doorbell page indexes start at an offset for GenP7 adapters. Fix the offset that will be used for user doorbell page indexes. Fixes: a62d68581441 ("RDMA/bnxt_re: Update the BAR offsets") Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-17RDMA/bnxt_re: Share a page to expose per CQ info with userspaceSelvin Xavier5-8/+74
Gen P7 adapters needs to share a toggle bits information received in kernel driver with the user space. User space needs this info during the request notify call back to arm the CQ. User space application can get this page using the UAPI routines. Library will mmap this page and get the toggle bits to be used in the next ARM Doorbell. Uses a hash list to map the CQ structure from the CQ ID. CQ structure is retrieved from the hash list while the library calls the UAPI routine to get the toggle page mapping. Currently the full page is mapped per CQ. This can be optimized to enable multiple CQs from the same application share the same page and different offsets in the page. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-17RDMA/bnxt_re: Add UAPI to share a page with user spaceSelvin Xavier2-0/+106
Gen P7 adapters require to share a toggle value for CQ and SRQ. This is received by the driver as part of interrupt notifications and needs to be shared with the user space. Add a new UAPI infrastructure to get the shared page for CQ and SRQ. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-13PCI: Add Alibaba Vendor ID to linux/pci_ids.hShuai Xue1-2/+0
The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma") and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the Vendor ID to linux/pci_ids.h so that it can shared by several drivers later. Signed-off-by: Shuai Xue <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h Tested-by: Ilkka Koskinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2023-12-12IB/ipoib: Fix mcast list lockingDaniel Vacek1-5/+1
Releasing the `priv->lock` while iterating the `priv->multicast_list` in `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to remove the items while in the middle of iteration. If the mcast is removed while the lock was dropped, the for loop spins forever resulting in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) -----------------------------------+----------------------------------- ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...) list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev) &priv->multicast_list, list) | ipoib_mcast_join(dev, mcast) | spin_unlock_irq(&priv->lock) | | spin_lock_irqsave(&priv->lock, flags) | list_for_each_entry_safe(mcast, tmcast, | &priv->multicast_list, list) | list_del(&mcast->list); | list_add_tail(&mcast->list, &remove_list) | spin_unlock_irqrestore(&priv->lock, flags) spin_lock_irq(&priv->lock) | | ipoib_mcast_remove_list(&remove_list) (Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast, `priv->multicast_list` and we keep | remove_list, list) spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done) the other thread which is blocked | and the list is still valid on | it's stack.) Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent eventual sleeps. Unfortunately we could not reproduce the lockup and confirm this fix but based on the code review I think this fix should address such lockups. crash> bc 31 PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" -- [exception RIP: ipoib_mcast_join_task+0x1b1] RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 work (&priv->mcast_task{,.work}) RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 &mcast->list RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 mcast R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 dev priv (&priv->lock) &priv->multicast_list (aka head) ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f199a8c7e68 ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 (empty) crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, mcast_mutex.owner.counter = 0xff1c69998efec000 crash> b 8 PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" -- #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f1980153e68 ff646f1980153e68: ff1c6a1a04dc83f0 <<< work = &priv->flush_light crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>, broadcast = 0x0, The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`): crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - - ff1c6a192bd0c200 done.done = 0x0, ff1c6a192d60ac00 done.done = 0x0, Reported-by: Yuya Fujita-bishamonten <[email protected]> Signed-off-by: Daniel Vacek <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12Expose c0 and SW encap ICM for RDMALeon Romanovsky14-43/+110
These two series from Mark and Shun extend RDMA mlx5 API. Mark's series provides c0 register used to match egress traffic sent by local device. Shun's series adds new type for ICM area. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12RDMA/mlx5: Expose register c0 for RDMA deviceMark Bloch1-0/+24
This patch introduces improvements for matching egress traffic sent by the local device. When applicable, all egress traffic from the local vport is now tagged with the provided value. This enhancement is particularly useful for FDB steering purposes. The primary focus of this update is facilitating the transmission of traffic from the hypervisor to a VF. To achieve this, one must initiate an SQ on the hypervisor and subsequently create a rule in the FDB that matches on the eswitch manager vport and the SQN of the aforementioned SQ. Obtaining the SQN can be had from SQ opened, and the eswitch manager vport match can be substituted with the register c0 value exposed by this patch. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Link: https://lore.kernel.org/r/aa4120a91c98ff1c44f1213388c744d4cb0324d6.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-12RDMA/mlx5: Support handling of SW encap ICM areaShun Hao2-0/+6
New type for this ICM area, now the user can allocate/deallocate the new type of SW encap ICM memory, to store the encap header data which are managed by SW. Signed-off-by: Shun Hao <[email protected]> Link: https://lore.kernel.org/r/546fe43fc700240709e30acf7713ec6834d652bd.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Adds MSN table capability for Gen P7 adaptersSelvin Xavier4-6/+86
GenP7 HW expects an MSN table instead of PSN table. Check for the HW retransmission capability and populate the MSN table if HW retansmission is supported. Signed-off-by: Damodharam Ammepalli <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Doorbell changesSelvin Xavier1-11/+35
Update the Doorbell routines to support the latest HW definitions. Use common routine to prepare the Doorbell key. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Get the toggle bits from CQ completionsSelvin Xavier3-0/+8
Get the toggle bits from CQ completions. For older adapters these values are 0. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Update the HW interface definitionsSelvin Xavier1-10/+57
Adds HW interface definitions to support the new chip revision. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Update the BAR offsetsSelvin Xavier2-16/+10
Update the BAR offsets for handling GenP7 adapters. Use the values populated by L2 driver for getting the Doorbell offsets. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-11RDMA/bnxt_re: Support new 5760X P7 devicesSelvin Xavier8-24/+38
Add basic support for 5760X P7 devices. Add new chip revisions. The first version support is similar to the existing P5 adapters. Extend the current support for P5 adapters to P7 also. Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-07RDMA/hns: Fix memory leak in free_mr_init()Chengchang Tang1-0/+4
When a reserved QP fails to be created, the memory of the remaining created reserved QPs is leaked. Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Chengchang Tang <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-07RDMA/hns: Remove unnecessary checks for NULL in mtr_alloc_bufs()Chengchang Tang1-1/+1
ib_umem_get() never return NULL. Fixes: 3c873161a0d7 ("RDMA/hns: Add support for addressing when hopnum is 0") Signed-off-by: Chengchang Tang <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-07RDMA/hns: Add a max length of gid tableJunxian Huang1-2/+9
IB-core and rdma-core restrict the sgid_index specified by users, which is uint8_t/u8 data type, to only be within the range of 0-255, so it's meaningless to support excessively large gid_table_len. On the other hand, ib-core creates as many sysfs gid files as gid_table_len, most of which are not only useless because of the reason above, but also greatly increase the traversal time of the sysfs gid files for applications. This patch limits the maximum length of gid table to 256. Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-07RDMA/hns: Response dmac to userspaceJunxian Huang1-0/+7
While creating AH, dmac is already resolved in kernel. Response dmac to userspace so that userspace doesn't need to resolve dmac repeatedly. Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-07RDMA/hns: Rename the interruptsChengchang Tang1-3/+4
Now, different devices may have the same interrupt name, which makes it difficult for users to distinguish between these interrupts. Modify the naming style to be consistent with our network devices. Before: "hns-aeq-0" "hns-ceq-0" ... Now: "hns-0000:35:00.0-aeq-0" "hns-0000:35:00.0-ceq-0" ... Signed-off-by: Chengchang Tang <[email protected]> Signed-off-by: Junxian Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-12-04RDMA/siw: Call orq_get_current if possibleGuoqing Jiang1-1/+1
Use orq_get_current() in siw_orq_empty(). Link: https://lore.kernel.org/r/[email protected] Acked-by: Bernard Metzler <[email protected]> Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/siw: Set qp_state in siw_query_qpGuoqing Jiang1-0/+10
Run test_query_rc_qp against siw failed since siw didn't set qp_state accordingly. To address it, introduce siw_qp_state_to_ib_qp_state which convert SIW_QP_STATE_IDLE to IB_QPS_INIT which is similar as in cxgb4. rdma-core# ./build/bin/run_tests.py --dev siw0 tests.test_qp.QPTest.test_query_rc_qp -v test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ... FAIL ====================================================================== FAIL: test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gjiang/rdma-core/tests/test_qp.py", line 284, in test_query_rc_qp self.query_qp_common_test(e.IBV_QPT_RC) File "/home/gjiang/rdma-core/tests/test_qp.py", line 265, in query_qp_common_test self.verify_qp_attrs(caps, e.IBV_QPS_INIT, qp_init_attr, qp_attr) File "/home/gjiang/rdma-core/tests/test_qp.py", line 239, in verify_qp_attrs self.assertEqual(state, attr.qp_state) AssertionError: <ibv_qp_state.IBV_QPS_INIT: 1> != 0 ---------------------------------------------------------------------- Ran 1 test in 0.057s FAILED (failures=1) Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guoqing Jiang <[email protected]> Acked-by: Bernard Metzler <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/siw: Reduce memory usage of struct siw_rx_streamGuoqing Jiang1-3/+3
We can reduce the memory of the struct by move some of it's member. Before, /* size: 144, cachelines: 3, members: 17 */ /* sum members: 124, holes: 3, sum holes: 12 */ /* sum bitfield members: 7 bits (0 bytes) */ /* padding: 7 */ /* bit_padding: 1 bits */ After /* size: 128, cachelines: 2, members: 17 */ /* padding: 3 */ /* bit_padding: 1 bits */ Link: https://lore.kernel.org/r/[email protected] Acked-by: Bernard Metzler <[email protected]> Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/siw: Move tx_cpu aheadGuoqing Jiang1-1/+1
We can reduce one cacheline for the usage of struct siw_qp. Before, /* size: 1928, cachelines: 31, members: 38 */ /* sum members: 1920, holes: 2, sum holes: 8 */ /* paddings: 4, sum paddings: 13 */ /* forced alignments: 3 */ after /* size: 1920, cachelines: 30, members: 38 */ /* paddings: 4, sum paddings: 13 */ /* forced alignments: 3 */ Link: https://lore.kernel.org/r/[email protected] Acked-by: Bernard Metzler <[email protected]> Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/irdma: Avoid free the non-cqp_request scratchShifeng Li1-3/+1
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/[email protected] Suggested-by: "Ismail, Mustafa" <[email protected]> Signed-off-by: Shifeng Li <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/irdma: Fix support for 64k pagesMike Marciniszyn1-1/+1
Virtual QP and CQ require a 4K HW page size but the driver passes PAGE_SIZE to ib_umem_find_best_pgsz() instead. Fix this by using the appropriate 4k value in the bitmap passed to ib_umem_find_best_pgsz(). Fixes: 693a5386eff0 ("RDMA/irdma: Split mr alloc and free into new functions") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/irdma: Ensure iWarp QP queue memory is OS paged alignedMike Marciniszyn1-0/+5
The SQ is shared for between kernel and used by storing the kernel page pointer and passing that to a kmap_atomic(). This then requires that the alignment is PAGE_SIZE aligned. Fix by adding an iWarp specific alignment check. Fixes: e965ef0e7b2c ("RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qp") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-12-04RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgszMike Marciniszyn1-6/+0
64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ <--- Live HCA page |OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO| <--- offset | | <--- VA | MR data | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ | o | | | | o | | | | o | +-------------------------------------------+ | | | HCA 4k page | | | +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: 10c75ccb54e4 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-11-28eventfd: simplify eventfd_signal()Christian Brauner1-1/+1
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/[email protected] Acked-by: Xu Yilun <[email protected]> Acked-by: Andrew Donnellan <[email protected]> # ocxl Acked-by: Eric Farman <[email protected]> # s390 Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-11-26RDMA/IPoIB: Add tx timeout work to recover queue stop situationJack Wang3-3/+60
As we sometime run into TX timeout from IPoIB, queue seems stopped and can't recover. Diff with Mellanox OFED show Mellanox driver has timeout work to recover in such case. Add TX timeout work/NAPI work to recover such case. Also increase the watchdog_timeo to 10 seconds, so more tolerant to error. Signed-off-by: Jack Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-26RDMA/IPoIB: Fix error code return in ipoib_mcast_joinJack Wang1-0/+1
Return the error code in case of ib_sa_join_multicast fail. Signed-off-by: Jack Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info()Shifeng Li1-3/+3
When removing the irdma driver or unplugging its aux device, the ccq queue is released before destorying the cqp_cmpl_wq queue. But in the window, there may still be completion events for wqes. That will cause a UAF in irdma_sc_ccq_get_cqe_info(). [34693.333191] BUG: KASAN: use-after-free in irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333194] Read of size 8 at addr ffff889097f80818 by task kworker/u67:1/26327 [34693.333194] [34693.333199] CPU: 9 PID: 26327 Comm: kworker/u67:1 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [34693.333200] Hardware name: SANGFOR Inspur/NULL, BIOS 4.1.13 08/01/2016 [34693.333211] Workqueue: cqp_cmpl_wq cqp_compl_worker [irdma] [34693.333213] Call Trace: [34693.333220] dump_stack+0x71/0xab [34693.333226] print_address_description+0x6b/0x290 [34693.333238] ? irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333240] kasan_report+0x14a/0x2b0 [34693.333251] irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333264] ? irdma_free_cqp_request+0x151/0x1e0 [irdma] [34693.333274] irdma_cqp_ce_handler+0x1fb/0x3b0 [irdma] [34693.333285] ? irdma_ctrl_init_hw+0x2c20/0x2c20 [irdma] [34693.333290] ? __schedule+0x836/0x1570 [34693.333293] ? strscpy+0x83/0x180 [34693.333296] process_one_work+0x56a/0x11f0 [34693.333298] worker_thread+0x8f/0xf40 [34693.333301] ? __kthread_parkme+0x78/0xf0 [34693.333303] ? rescuer_thread+0xc50/0xc50 [34693.333305] kthread+0x2a0/0x390 [34693.333308] ? kthread_destroy_worker+0x90/0x90 [34693.333310] ret_from_fork+0x1f/0x40 Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Shifeng Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Shiraz Saleem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/bnxt_re: Correct module description stringKalesh AP1-1/+1
The word "Driver" is repeated twice in the "modinfo bnxt_re" output description. Fix it. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs: Use %pe to print errorsSupriti Singh1-2/+2
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Use %pe to print errorsSupriti Singh1-6/+4
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Remove the warnings for req in_use checkJack Wang1-1/+1
As we chain the WR during write request: memory registration, rdma write, local invalidate, if only the last WR fail to send due to send queue overrun, the server can send back the reply, while client mark the req->in_use to false in case of error in rtrs_clt_req when error out from rtrs_post_rdma_write_sg. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Fix the max_send_wr settingJack Wang1-1/+1
For each write request, we need Request, Response Memory Registration, Local Invalidate. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flightMd Haris Iqbal1-1/+2
Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Santosh Kumar Pradhan <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is trueMd Haris Iqbal1-1/+4
Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Check return values while processing info requestMd Haris Iqbal1-6/+18
While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-clt: Start hb after path_upJack Wang1-2/+1
If we start hb too early, it will confuse server side to close the session. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-22RDMA/rtrs-srv: Do not unconditionally enable irqJack Wang1-2/+3
When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Florian-Ewald Mueller <[email protected]> Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-19RDMA/rtrs-clt: Add warning logs for RDMA eventsMd Haris Iqbal1-0/+2
Some RDMA CM events can trigger connection close or recovery for a certain rtrs_clt_path. Such close/recovery triggers should happen after an appropriate log message, since they can lead to IO failures. Signed-off-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Grzegorz Prajsner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>
2023-11-19RDMA/hns: Support SW stats with debugfsJunxian Huang12-43/+220
Support SW stats with debugfs. Query output: $ cat /sys/kernel/debug/hns_roce/hns_0/sw_stat/sw_stat aeqe --- 3341 ceqe --- 0 cmds --- 6764 cmds_err --- 0 posted_mbx --- 3344 polled_mbx --- 3 mbx_event --- 3341 qp_create_err --- 0 qp_modify_err --- 0 cq_create_err --- 0 cq_modify_err --- 0 srq_create_err --- 0 srq_modify_err --- 0 xrcd_alloc_err --- 0 mr_reg_err --- 0 mr_rereg_err --- 0 ah_create_err --- 0 mmap_err --- 0 uctx_alloc_err --- 0 Signed-off-by: Junxian Huang <[email protected]> Signed-off-by: Chengchang Tang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]>