aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2020-07-03RDMA/mlx5: Introduce ODP prefetch counterMaor Gottlieb2-9/+13
For debugging purpose it will be easier to understand if prefetch works okay if it has its own counter. Introduce ODP prefetch counter and count per MR the total number of prefetched pages. In addition remove comment which is not relevant anymore and anyway not in the correct place. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02IB/sa: Resolv use-after-free in ib_nl_make_request()Divya Indi1-21/+17
There is a race condition where ib_nl_make_request() inserts the request data into the linked list but the timer in ib_nl_request_timeout() can see it and destroy it before ib_nl_send_msg() is done touching it. This could happen, for instance, if there is a long delay allocating memory during nlmsg_new() This causes a use-after-free in the send_mad() thread: [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core] [ <ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm] [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr] [<ffffffff810a02f9>] process_one_work+0x169/0x4a0 [<ffffffff810a0b2b>] worker_thread+0x5b/0x560 [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50 [<ffffffff810a68fb>] kthread+0xcb/0xf0 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 [<ffffffff816f25a7>] ret_from_fork+0x47/0x90 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 The ownership rule is once the request is on the list, ownership transfers to the list and the local thread can't touch it any more, just like for the normal MAD case in send_mad(). Thus, instead of adding before send and then trying to delete after on errors, move the entire thing under the spinlock so that the send and update of the lists are atomic to the conurrent threads. Lightly reoganize things so spinlock safe memory allocations are done in the final NL send path and the rest of the setup work is done before and outside the lock. Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Divya Indi <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02RDMA/core: Fix bogus WARN_ON during ib_unregister_device_queued()Jason Gunthorpe1-3/+8
ib_unregister_device_queued() can only be used by drivers using the new dealloc_device callback flow, and it has a safety WARN_ON to ensure drivers are using it properly. However, if unregister and register are raced there is a special destruction path that maintains the uniform error handling semantic of 'caller does ib_dealloc_device() on failure'. This requires disabling the dealloc_device callback which triggers the WARN_ON. Instead of using NULL to disable the callback use a special function pointer so the WARN_ON does not trigger. Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Suggested-by: Hillf Danton <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02IB/hfi1: Do not destroy link_wq when the device is shut downKaike Wan1-5/+5
The workqueue link_wq should only be destroyed when the hfi1 driver is unloaded, not when the device is shut down. Fixes: 71d47008ca1b ("IB/hfi1: Create workqueue for link events") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02IB/hfi1: Do not destroy hfi1_wq when the device is shut downKaike Wan3-6/+31
The workqueue hfi1_wq is destroyed in function shutdown_device(), which is called by either shutdown_one() or remove_one(). The function shutdown_one() is called when the kernel is rebooted while remove_one() is called when the hfi1 driver is unloaded. When the kernel is rebooted, hfi1_wq is destroyed while all qps are still active, leading to a kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0 PGD 0 Oops: 0000 [#1] SMP Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE) crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018 task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000 RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0 RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046 RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000 RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900 R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8 R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000 FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0 Call Trace: [<ffffffff94cb8105>] queue_work_on+0x45/0x50 [<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1] [<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt] [<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280 [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt] [<ffffffff94ca7f58>] call_timer_fn+0x38/0x110 [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt] [<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300 [<ffffffff94ca0f05>] __do_softirq+0xf5/0x280 [<ffffffff9537832c>] call_softirq+0x1c/0x30 [<ffffffff94c2e675>] do_softirq+0x65/0xa0 [<ffffffff94ca1285>] irq_exit+0x105/0x110 [<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60 [<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170 <EOI> [<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0 [<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230 [<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0 [<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0 [<ffffffff94c57db7>] start_secondary+0x1f7/0x270 [<ffffffff94c000d5>] start_cpu+0x5/0x14 The solution is to destroy the workqueue only when the hfi1 driver is unloaded, not when the device is shut down. In addition, when the device is shut down, no more work should be scheduled on the workqueues and the workqueues are flushed. Fixes: 8d3e71136a08 ("IB/{hfi1, qib}: Add handling of kernel restart") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02RDMA/mlx5: Fix legacy IPoIB QP initializationLeon Romanovsky1-0/+4
Legacy IPoIB sets IB_QP_CREATE_NETIF_QP QP create flag and because mlx5 doesn't use this flag, the process_create_flags() failed to create IPoIB QPs. Fixes: 2978975ce7f1 ("RDMA/mlx5: Process create QP flags in one place") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02IB/hfi1: Add explicit cast OPA_MTU_8192 to 'enum ib_mtu'Nathan Chancellor1-1/+1
Clang warns: drivers/infiniband/hw/hfi1/qp.c:198:9: warning: implicit conversion from enumeration type 'enum opa_mtu' to different enumeration type 'enum ib_mtu' [-Wenum-conversion] mtu = OPA_MTU_8192; ~ ^~~~~~~~~~~~ enum opa_mtu extends enum ib_mtu. There are typically two ways to deal with this: * Remove the expected types and just use 'int' for all parameters and types. * Explicitly cast the enums between each other. This driver chooses to do the later so do the same thing here. Fixes: 6d72344cf6c4 ("IB/ipoib: Increase ipoib Datagram mode MTU's upper limit") Link: https://lore.kernel.org/r/[email protected] Link: https://github.com/ClangBuiltLinux/linux/issues/1062 Link: https://lore.kernel.org/linux-rdma/20200527040350.GA3118979@ubuntu-s3-xlarge-x86/ Signed-off-by: Nathan Chancellor <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-02RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()Jason Gunthorpe2-36/+31
ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that the only time flush_workqueue() can be called is if it also clears IPOIB_FLAG_OPER_UP. Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky enough, and lockdep doesn't help us to find it early: CPU0 CPU1 CPU2 __ipoib_ib_dev_flush() down_read(vlan_rwsem) ipoib_vlan_add() rtnl_trylock() down_write(vlan_rwsem) ipoib_mcast_carrier_on_task() while (!rtnl_trylock()) msleep(20); ipoib_flush_ah() flush_workqueue(priv->wq) Clean up the ah_reaper related functions and lifecycle to make sense: - Start/Stop of the reaper should only be done in open/stop NDOs, not in any other places - cancel and flush of the reaper should only happen in the stop NDO. cancel is only functional when combined with IPOIB_STOP_REAPER. - Non-stop places were flushing the AH's just need to flush out dead AH's synchronously and ignore the background task completely. It is fully locked and harmless to leave running. Which ultimately fixes the ABBA deadlock by removing the unnecessary flush_workqueue() from the problematic place under the vlan_rwsem. Fixes: efc82eeeae4e ("IB/ipoib: No longer use flush as a parameter") Link: https://lore.kernel.org/r/[email protected] Reported-by: Kamal Heib <[email protected]> Tested-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-30IB/hfi1: Convert PCIBIOS_* errors to generic -E* errorsBolarinwa Olayemi Saheed1-7/+15
pcie_speeds() and restore_pci_variables() returns PCIBIOS_ error codes from PCIe capability accessors. PCIBIOS_ error codes have positive values. Passing on these values is inconsistent with functions which return only a negative value on failure. Before passing on the return value of PCIe capability accessors, call pcibios_err_to_errno() to convert any positive PCIBIOS_ error codes to negative generic error values. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Bjorn Helgaas <[email protected]> Signed-off-by: Bolarinwa Olayemi Saheed <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-26Merge branch '40GbE' of ↵David S. Miller2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2020-06-25 This series contains updates to i40e driver and removes the individual driver versions from all of the Intel wired LAN drivers. Shiraz moves the client header so that it can easily be shared between the i40e LAN driver and i40iw RDMA driver. Jesse cleans up the unused defines, since they are just dead weight. Alek reduces the unreasonably long wait time for a PF reset after reboot by using jiffies to limit the maximum wait time for the PF reset to succeed. Added additional logging to let the user know when the driver transitions into recovery mode. Adds new device support for our 5 Gbps NICs. Todd adds a check to see if MFS is set after warm reboot and notifies the user when MFS is set to anything lower than the default value. Arkadiusz fixes a possible race condition, where were holding a spin-lock while in atomic context. v2: removed code comments that were no longer applicable in patch 2 of the series. Also removed 'inline' from patch 4 and patch 8 of the series. Also re-arranged code to be able to remove the forward function declarations. Dropped patch 9 of the series, while the author works on cleaning up the commit message. v3: Updated patch 8 description to answer Jakub's questions ==================== Signed-off-by: David S. Miller <[email protected]>
2020-06-25i40e: Move client header locationShiraz Saleem2-2/+1
Move i40e_client.h to include/linux/net/intel/* since its shared between i40iw and i40e. Signed-off-by: Shiraz Saleem <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2020-06-24RDMA/core: Delete not-used create RWQ table functionLeon Romanovsky1-39/+0
The RWQ table is used for RSS uverbs and not in used for the kernel consumers, delete ib_create_rwq_ind_table() routine that is not called at all. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/mad: Delete RMPP_STATE_CANCELING stateShay Drory1-13/+4
The cancel_delayed_work can be called under lock since it doesn't sleep. This makes the RMPP_STATE_CANCELING state not needed anymore, remove it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shay Drory <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/mad: Change atomics to refcount APIShay Drory3-14/+14
The refcount API provides better safety than atomics API. Therefore, change atomic functions to refcount functions. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/mad: Issue complete whenever decrements agent refcountShay Drory1-7/+7
Replace calls of atomic_dec() to mad_agent_priv->refcount with calls to deref_mad_agent() in order to issue complete. Most likely the refcount is > 1 at these points, but it is difficult to prove. Performance is not important on these paths, so be obviously correct. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Add atomic triggered sleep/wakeupMike Marciniszyn2-20/+53
When running iperf in a two host configuration the following trace can occur: [ 319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out The issue happens because the current implementation relies on the netif txq being stopped to control the flushing of the tx list. There are two resources that the transmit logic can wait on and stop the txq: - SDMA descriptors - Ring space to hold completions The ring space is tested on the sending side and relieved when the ring is consumed in the napi tx reaping. Unfortunately, that reaping can run conncurrently with the workqueue flushing of the txlist. If the txq is started just before the workitem executes, the txlist will never be flushed, leading to the txq being stuck. Fix by: - Adding sleep/wakeup wrappers * Use an atomic to control the call to the netif routines inside the wrappers - Use another atomic to record ring space exhaustion * Only wakeup when the a ring space exhaustion has happened and it relieved Add additional wrappers to clarify the ring space resource handling. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Correct -EBUSY handling in tx codeMike Marciniszyn1-17/+18
The current code mishandles -EBUSY in two ways: - The flow change doesn't test the return from the flush and runs on to process the current packet racing with the wakeup processing - The -EBUSY handling for a single packet inserts the tx into the txlist after the submit call, racing with the same wakeup processing Fix the first by dropping the skb and returning NETDEV_TX_OK. Fix the second by insuring the the list entry within the txreq is inited when allocated. This enables the sleep routine to detect that the txreq has used the non-list api and queue the packet to the txlist. Both flaws can lead to having the flushing thread executing in causing two threads to manipulate the txlist. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Fix module use count flaw due to leftover module put callsDennis Dalessandro1-17/+2
When the try_module_get calls were removed from opening and closing of the i2c debugfs file, the corresponding module_put calls were missed. This results in an inaccurate module use count that requires a power cycle to fix. Fixes: 09fbca8e6240 ("IB/hfi1: No need to use try_module_get for debugfs") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Reviewed-by: Kaike Wan <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24IB/hfi1: Restore kfree in dummy_netdev cleanupDennis Dalessandro1-1/+1
We need to do some rework on the dummy netdev. Calling the free_netdev() would normally make sense, and that will be addressed in an upcoming patch. For now just revert the behavior to what it was before keeping the unused variable removal part of the patch. The dd->dumm_netdev is mainly used for packet receiving through alloc_netdev_mqs() for typical net devices. A a result, it should be freed with kfree instead of free_netdev() that leads to a crash when unloading the hfi1 module: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 RIP: 0010:__hw_addr_flush+0x12/0x80 Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84 RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298 RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549 R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298 R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000 FS: 00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dev_addr_flush+0x15/0x30 free_netdev+0x7e/0x130 hfi1_netdev_free+0x59/0x70 [hfi1] remove_one+0x65/0x110 [hfi1] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0xec/0x1b0 driver_detach+0x46/0x90 bus_remove_driver+0x58/0xd0 pci_unregister_driver+0x26/0xa0 hfi1_mod_cleanup+0xc/0xd54 [hfi1] __x64_sys_delete_module+0x16c/0x260 ? exit_to_usermode_loop+0xa4/0xc0 do_syscall_64+0x5b/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 193ba03141bb ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA/ipoib: Return void from ipoib_ib_dev_stop()Kamal Heib2-4/+2
The return value from ipoib_ib_dev_stop() is always 0 - change it to be void. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24Merge branch 'raw_dumps' into rdma.git for-nextJason Gunthorpe11-161/+244
Maor Gottlieb says: ==================== The following series adds support to get the RDMA resource data in RAW format. The main motivation for doing this is to enable vendors to return the entire QP/CQ/MR data without a need from the vendor to set each field separately. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies * branch 'raw_dumps': RDMA/mlx5: Add support to get MR resource in RAW format RDMA/mlx5: Add support to get CQ resource in RAW format RDMA/mlx5: Add support to get QP resource in RAW format RDMA: Add support to dump resource tracker in RAW format RDMA: Add dedicated CM_ID resource tracker function RDMA: Add dedicated QP resource tracker function RDMA: Add a dedicated CQ resource tracker function RDMA: Add dedicated MR resource tracker function RDMA/core: Don't call fill_res_entry for PD net/mlx5: Add support in query QP, CQ and MKEY segments net/mlx5: Export resource dump interface Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA/mlx5: Add support to get MR resource in RAW formatMaor Gottlieb3-0/+10
Add support to get MR (mkey) resource dump in RAW format. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA/mlx5: Add support to get CQ resource in RAW formatMaor Gottlieb3-0/+10
Add support to get CQ resource dump in RAW format. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA/mlx5: Add support to get QP resource in RAW formatMaor Gottlieb3-0/+79
Add a generic function to use the resource dump mechanism to get the QP resource data. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA: Add support to dump resource tracker in RAW formatMaor Gottlieb2-62/+121
Add support to get resource dump in raw format. It enable drivers to return the entire device specific QP/CQ/MR context without a need from the driver to set each field separately. The raw query returns only the device specific data, general data is still returned by using the existing queries. Example: $ rdma res show mr dev mlx5_1 mrn 2 -r -j [{"ifindex":7,"ifname":"mlx5_1", "data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated CM_ID resource tracker functionMaor Gottlieb5-30/+7
In order to avoid double multiplexing of the resource when it is a cm id, add a dedicated callback function. In addition remove fill_res_entry which is not used anymore. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated QP resource tracker functionMaor Gottlieb4-7/+5
In order to avoid double multiplexing of the resource when it is a QP, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add a dedicated CQ resource tracker functionMaor Gottlieb8-22/+11
In order to avoid double multiplexing of the resource when it is a CQ, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated MR resource tracker functionMaor Gottlieb8-49/+17
In order to avoid double multiplexing of the resource when it is a MR, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA/core: Don't call fill_res_entry for PDMaor Gottlieb1-8/+1
None of the drivers implement it, remove it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22IB/mad: Fix use after free when destroying MAD agentShay Drory1-1/+1
Currently, when RMPP MADs are processed while the MAD agent is destroyed, it could result in use after free of rmpp_recv, as decribed below: cpu-0 cpu-1 ----- ----- ib_mad_recv_done() ib_mad_complete_recv() ib_process_rmpp_recv_wc() unregister_mad_agent() ib_cancel_rmpp_recvs() cancel_delayed_work() process_rmpp_data() start_rmpp() queue_delayed_work(rmpp_recv->cleanup_work) destroy_rmpp_recv() free_rmpp_recv() cleanup_work()[1] spin_lock_irqsave(&rmpp_recv->agent->lock) <-- use after free [1] cleanup_work() == recv_cleanup_handler Fix it by waiting for the MAD agent reference count becoming zero before calling to ib_cancel_rmpp_recvs(). Fixes: 9a41e38a467c ("IB/mad: Use IDR for agent IDs") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA/rxe: Remove unused rxe_mem_map_pagesKamal Heib2-47/+0
This function is not in use - delete it. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA/hfi1: Remove hfi1_create_qp declarationKamal Heib1-14/+0
The function isn't implemented - delete the declaration. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA/ipoib: Return void from ipoib_mcast_stop_thread()Kamal Heib2-4/+2
The return value from ipoib_mcast_stop_thread() is always 0 - change it to be void. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udataLeon Romanovsky1-1/+1
Don't deref udata if it is NULL BUG: kernel NULL pointer dereference, address: 0000000000000030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 SMP PTI CPU: 2 PID: 1592 Comm: python3 Not tainted 5.7.0-rc6+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:create_qp+0x39e/0xae0 [mlx5_ib] Code: c0 0d 00 00 bf 10 01 00 00 e8 be a9 e4 e0 48 85 c0 49 89 c2 0f 84 0c 07 00 00 41 8b 85 74 63 01 00 0f c8 a9 00 00 00 10 74 0a <41> 8b 46 30 0f c8 41 89 42 14 41 8b 52 18 41 0f b6 4a 1c 0f ca 89 RSP: 0018:ffffc9000067f8b0 EFLAGS: 00010206 RAX: 0000000010170000 RBX: ffff888441313000 RCX: 0000000000000000 RDX: 0000000000000200 RSI: 0000000000000000 RDI: ffff88845b1d4400 RBP: ffffc9000067fa60 R08: 0000000000000200 R09: ffff88845b1d4200 R10: ffff88845b1d4200 R11: ffff888441313000 R12: ffffc9000067f950 R13: ffff88846ac00140 R14: 0000000000000000 R15: ffff88846c2bc000 FS: 00007faa1a3c0540(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000446dca003 CR4: 0000000000760ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 mlx5_ib_create_qp+0x897/0xfa0 [mlx5_ib] ib_create_qp+0x9e/0x300 [ib_core] create_qp+0x92d/0xb20 [ib_uverbs] ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs] ? release_resource+0x30/0x30 ib_uverbs_create_qp+0xc4/0xe0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc8/0xf0 [ib_uverbs] ib_uverbs_run_method+0x223/0x770 [ib_uverbs] ? track_pfn_remap+0xa7/0x100 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ? remap_pfn_range+0x358/0x490 ib_uverbs_cmd_verbs.isra.6+0x19b/0x370 [ib_uverbs] ? rdma_umap_priv_init+0x82/0xe0 [ib_core] ? vm_mmap_pgoff+0xec/0x120 ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs] ksys_ioctl+0x92/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: e383085c2425 ("RDMA/mlx5: Set ECE options during QP create") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA/counter: Query a counter before releaseMark Zhang1-1/+3
Query a dynamically-allocated counter before release it, to update it's hwcounters and log all of them into history data. Otherwise all values of these hwcounters will be lost. Fixes: f34a55e497e8 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-22RDMA: Correct trivial kernel-doc inconsistenciesColton Lewis5-4/+30
Silence documentation build warnings by correcting kernel-doc comments. ./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'uobject' not described in 'ib_create_srq_user' ./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'udata' not described in 'ib_create_srq_user' ./drivers/infiniband/core/umem_odp.c:161: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_alloc_child' ./drivers/infiniband/core/umem_odp.c:225: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_get' ./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'ah_attr' description in 'rvt_create_ah' ./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'create_flags' description in 'rvt_create_ah' ./drivers/infiniband/ulp/iser/iscsi_iser.h:363: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ./drivers/infiniband/ulp/iser/iscsi_iser.h:377: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd0' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd1' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd2' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd3' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd4' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd0' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd1' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd2' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd3' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:342: warning: Function parameter or member 'reserved' not described in 'opa_veswport_summary_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd0' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd1' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd2' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd3' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd4' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd5' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd6' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd7' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd8' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd9' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:460: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:485: warning: Function parameter or member 'reserved' not described in 'opa_vnic_notice_attr' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:500: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad_trap' Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Colton Lewis <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-19RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()Fan Guo1-0/+1
If ib_dma_mapping_error() returns non-zero value, ib_mad_post_receive_mads() will jump out of loops and return -ENOMEM without freeing mad_priv. Fix this memory-leak problem by freeing mad_priv in this case. Fixes: 2c34e68f4261 ("IB/mad: Check and handle potential DMA mapping errors") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Fan Guo <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18IB/srpt: Remove WARN_ON from srpt_cm_req_recvJing Xiangfeng1-3/+0
The callers pass the pointer '&req' or 'private_data' to srpt_cm_req_recv(), and 'private_data' is initialized in srp_send_req(). 'sdev' is allocated and stored in srpt_add_one(). It's easy to show that sdev and req are always valid. So we remove unnecessary WARN_ON. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jing Xiangfeng <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/mlx5: Fix integrity enabled QP creationMax Gurtovoy1-0/+3
create_flags checks was refactored and broke the creation on integrity enabled QPs and actually broke the NVMe/RDMA and iSER ULP's when using mlx5 driven devices. Fixes: 2978975ce7f1 ("RDMA/mlx5: Process create QP flags in one place") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPsLeon Romanovsky1-9/+1
Like any other QP type, rely on FW for the RAW_PACKET QPs to decide if ECE is supported or not. This fixes an inability to create RAW_PACKET QPs with latest rdma-core with the ECE support. Fixes: e383085c2425 ("RDMA/mlx5: Set ECE options during QP create") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/mlx5: Fix remote gid value in query QPMaor Gottlieb1-2/+1
Remote gid is not copied to the right address. Fix it by using rdma_ah_set_dgid_raw to copy the remote gid value from the QP context on query QP. Fixes: 70bd7fb87625 ("RDMA/mlx5: Remove manually crafted QP context the query call") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/mlx5: Don't access ib_qp fields in internal destroy QP pathLeon Romanovsky1-10/+18
destroy_qp_common is called for flows where QP is already created by HW. While it is called from IB/core, the ibqp.* fields will be fully initialized, but it is not the case if this function is called during QP creation. Don't rely on ibqp fields as much as possible and initialize send_cq/recv_cq as temporal solution till all drivers will be converted to IB/core QP allocation scheme. refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 5372 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 5372 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_core_put_rsc+0x70/0x80 destroy_resource_common+0x8e/0xb0 mlx5_core_destroy_qp+0xaf/0x1d0 mlx5_ib_destroy_qp+0xeb0/0x1460 ib_destroy_qp_user+0x2d5/0x7d0 create_qp+0xed3/0x2130 ib_uverbs_create_qp+0x13e/0x190 ? ib_uverbs_ex_create_qp ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 08d53976609a ("RDMA/mlx5: Copy response to the user in one place") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/core: Check that type_attrs is not NULL prior accessLeon Romanovsky1-15/+21
In disassociate flow, the type_attrs is set to be NULL, which is in an implicit way is checked in alloc_uobj() by "if (!attrs->context)". Change the logic to rely on that check, to be consistent with other alloc_uobj() places that will fix the following kernel splat. BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2743 Comm: python3 Not tainted 5.7.0-rc6-for-upstream-perf-2020-05-23_19-04-38-5 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:alloc_begin_fd_uobject+0x18/0xf0 [ib_uverbs] Code: 89 43 48 eb 97 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec 08 48 8b 1f <48> 8b 43 18 48 8b 80 80 00 00 00 48 3d 20 10 33 a0 74 1c 48 3d 30 RSP: 0018:ffffc90001127b70 EFLAGS: 00010282 RAX: ffffffffa0339fe0 RBX: 0000000000000000 RCX: 8000000000000007 RDX: fffffffffffffffb RSI: ffffc90001127d28 RDI: ffff88843fe1f600 RBP: ffff88843fe1f600 R08: ffff888461eb06d8 R09: ffff888461eb06f8 R10: ffff888461eb0700 R11: 0000000000000000 R12: ffff88846a5f6450 R13: ffffc90001127d28 R14: ffff88845d7d6ea0 R15: ffffc90001127cb8 FS: 00007f469bff1540(0000) GS:ffff88846f980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000450018003 CR4: 0000000000760ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? xa_store+0x28/0x40 rdma_alloc_begin_uobject+0x4f/0x90 [ib_uverbs] ib_uverbs_create_comp_channel+0x87/0xf0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] ib_uverbs_cmd_verbs.isra.8+0x96d/0xae0 [ib_uverbs] ? get_page_from_freelist+0x3bb/0xf70 ? _copy_to_user+0x22/0x30 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ? __wake_up_common_lock+0x87/0xc0 ib_uverbs_ioctl+0xbc/0x130 [ib_uverbs] ksys_ioctl+0x83/0xc0 ? ksys_write+0x55/0xd0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f469ac43267 Fixes: 849e149063bd ("RDMA/core: Do not allow alloc_commit to fail") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/hns: Fix an cmd queue issue when resettingYangyang Li1-1/+1
If a IMP reset caused by some hardware errors and hns RoCE driver reset occurred at the same time, there is a possiblity that the IMP will stop dealing with command and users can't use the hardware. The logs are as follows: hns3 0000:fd:00.1: cleaned 0, need to clean 1 hns3 0000:fd:00.1: firmware version query failed -11 hns3 0000:fd:00.1: Cmd queue init failed hns3 0000:fd:00.1: Upgrade reset level hns3 0000:fd:00.1: global reset interrupt The hns NIC driver divides the reset process into 3 status: initialization, hardware resetting and softwaring restting. RoCE driver gets reset status by interfaces provided by NIC driver and commands will not be sent to the IMP if the driver is in any above status. The main reason for this issue is that there is a time gap between status 1 and 2, if the RoCE driver sends commands to the IMP during this gap, the IMP will stop working because it is not ready. To eliminate the time gap, the hns NIC driver has added a new interface in commit a4de02287abb9 ("net: hns3: provide .get_cmdq_stat interface for the client"), so RoCE driver can ensure that no commands will be sent during resetting. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yangyang Li <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/hns: Fix a calltrace when registering MR from userspaceYangyang Li4-14/+17
ibmr.device is assigned after MR is successfully registered, but both write_mtpt() and frmr_write_mtpt() accesses it during the mr registration process, which may cause the following error when trying to register MR in userspace and pbl_hop_num is set to 0. pc : hns_roce_mtr_find+0xa0/0x200 [hns_roce] lr : set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2] sp : ffff00023e73ba20 x29: ffff00023e73ba20 x28: ffff00023e73bad8 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000002 x24: 0000000000000000 x23: ffff00023e73bad0 x22: 0000000000000000 x21: ffff0000094d9000 x20: 0000000000000000 x19: ffff8020a6bdb2c0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0140000000000000 x12: 0040000000000041 x11: ffff000240000000 x10: 0000000000001000 x9 : 0000000000000000 x8 : ffff802fb7558480 x7 : ffff802fb7558480 x6 : 000000000003483d x5 : ffff00023e73bad0 x4 : 0000000000000002 x3 : ffff00023e73bad8 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000094d9708 Call trace: hns_roce_mtr_find+0xa0/0x200 [hns_roce] set_mtpt_pbl+0x54/0x118 [hns_roce_hw_v2] hns_roce_v2_write_mtpt+0x14c/0x168 [hns_roce_hw_v2] hns_roce_mr_enable+0x6c/0x148 [hns_roce] hns_roce_reg_user_mr+0xd8/0x130 [hns_roce] ib_uverbs_reg_mr+0x14c/0x2e0 [ib_uverbs] ib_uverbs_write+0x27c/0x3e8 [ib_uverbs] __vfs_write+0x60/0x190 vfs_write+0xac/0x1c0 ksys_write+0x6c/0xd8 __arm64_sys_write+0x24/0x30 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Solve above issue by adding a pointer of structure hns_roce_dev as a parameter of write_mtpt() and frmr_write_mtpt(), so that both of these functions can access it before finishing MR's registration. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yangyang Li <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshakeLeon Romanovsky1-0/+8
Missed steps during ECE handshake left userspace application with less options for the ECE handshake. Pass ECE options in the additional transitions. Fixes: 50aec2c3135e ("RDMA/mlx5: Return ECE data after modify QP") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/cma: Protect bind_list and listen_list while finding matching cm idMark Zhang1-0/+18
The bind_list and listen_list must be accessed under a lock, add the missing locking around the access in cm_ib_id_from_event() In addition add lockdep asserts to make it clearer what the locking semantic is here. general protection fault: 0000 [#1] SMP NOPTI CPU: 226 PID: 126135 Comm: kworker/226:1 Tainted: G OE 4.12.14-150.47-default #1 SLE15 Hardware name: Cray Inc. Windom/Windom, BIOS 0.8.7 01-10-2020 Workqueue: ib_cm cm_work_handler [ib_cm] task: ffff9c5a60a1d2c0 task.stack: ffffc1d91f554000 RIP: 0010:cma_ib_req_handler+0x3f1/0x11b0 [rdma_cm] RSP: 0018:ffffc1d91f557b40 EFLAGS: 00010286 RAX: deacffffffffff30 RBX: 0000000000000001 RCX: ffff9c2af5bb6000 RDX: 00000000000000a9 RSI: ffff9c5aa4ed2f10 RDI: ffffc1d91f557b08 RBP: ffffc1d91f557d90 R08: ffff9c340cc80000 R09: ffff9c2c0f901900 R10: 0000000000000000 R11: 0000000000000001 R12: deacffffffffff30 R13: ffff9c5a48aeec00 R14: ffffc1d91f557c30 R15: ffff9c5c2eea3688 FS: 0000000000000000(0000) GS:ffff9c5c2fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002b5cc03fa320 CR3: 0000003f8500a000 CR4: 00000000003406e0 Call Trace: ? rdma_addr_cancel+0xa0/0xa0 [ib_core] ? cm_process_work+0x28/0x140 [ib_cm] cm_process_work+0x28/0x140 [ib_cm] ? cm_get_bth_pkey.isra.44+0x34/0xa0 [ib_cm] cm_work_handler+0xa06/0x1a6f [ib_cm] ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to+0x7c/0x4b0 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 process_one_work+0x1da/0x400 worker_thread+0x2b/0x3f0 ? process_one_work+0x400/0x400 kthread+0x118/0x140 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x22/0x40 Code: 00 66 83 f8 02 0f 84 ca 05 00 00 49 8b 84 24 d0 01 00 00 48 85 c0 0f 84 68 07 00 00 48 2d d0 01 00 00 49 89 c4 0f 84 59 07 00 00 <41> 0f b7 44 24 20 49 8b 77 50 66 83 f8 0a 75 9e 49 8b 7c 24 28 Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532Michal Kalderon1-2/+11
Private data passed to iwarp_cm_handler is copied for connection request / response, but ignored otherwise. If junk is passed, it is stored in the event and used later in the event processing. The driver passes an old junk pointer during connection close which leads to a use-after-free on event processing. Set private data to NULL for events that don 't have private data. BUG: KASAN: use-after-free in ucma_event_handler+0x532/0x560 [rdma_ucm] kernel: Read of size 4 at addr ffff8886caa71200 by task kworker/u128:1/5250 kernel: kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm] kernel: Call Trace: kernel: dump_stack+0x8c/0xc0 kernel: print_address_description.constprop.0+0x1b/0x210 kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm] kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm] kernel: __kasan_report.cold+0x1a/0x33 kernel: ? ucma_event_handler+0x532/0x560 [rdma_ucm] kernel: kasan_report+0xe/0x20 kernel: check_memory_region+0x130/0x1a0 kernel: memcpy+0x20/0x50 kernel: ucma_event_handler+0x532/0x560 [rdma_ucm] kernel: ? __rpc_execute+0x608/0x620 [sunrpc] kernel: cma_iw_handler+0x212/0x330 [rdma_cm] kernel: ? iw_conn_req_handler+0x6e0/0x6e0 [rdma_cm] kernel: ? enqueue_timer+0x86/0x140 kernel: ? _raw_write_lock_irq+0xd0/0xd0 kernel: cm_work_handler+0xd3d/0x1070 [iw_cm] Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-18RDMA/efa: Set maximum pkeys device attributeGal Pressman1-0/+1
The max_pkeys device attribute was not set in query device verb, set it to one in order to account for the default pkey (0xffff). This information is exposed to userspace and can cause malfunction Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Firas JahJah <[email protected]> Reviewed-by: Yossi Leybovich <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>