aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2018-09-25IB/mlx5: Set uid as part of QP creationYishai Hadas1-0/+1
Set uid as part of QP creation so that the firmware can manage the QP object in a secured way. The uid for the destroy and the modify commands is set by mlx5_core. This will enable using a QP that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-25IB/mlx5: Use uid as part of PD commandsYishai Hadas4-3/+24
Use uid as part of PD commands so that the firmware can manage the PD object in a secured way. For example when a QP is created its uid must match the CQ uid which it uses. Next patches in this series will use the uid from the PD, then will come a patch to set the uid on the PD so that all objects will be properly work in one change. Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-24RDMA/bnxt_re: Fix system crash during RDMA resource initializationSelvin Xavier1-55/+38
bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses the L2 driver. The following sequence can trigger a crash Acquires the rtnl_lock -> Registers roce driver callback with L2 driver -> release the rtnl lock bnxt_re acquires the rtnl_lock -> Request for MSIx vectors -> release the rtnl_lock Issue happens when bnxt_re proceeds with remaining part of initialization and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic. The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are not initialized yet, <snip> [ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.726674] PGD 0 [ 3551.726679] Oops: 0002 1 SMP ... [ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014 [ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000 [ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] ... [ 3551.726872] Call Trace: [ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re] [ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en] [ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en] [ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en] [ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en] [ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en] [ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250 [ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0 [ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210 [ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260 [ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30 [ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290 [ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210 [ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0 [ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880 ... [ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21 ... [ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re] [ 3551.727175] RSP <ffff97e3173bf788> [ 3551.727177] CR2: 0000000000000000 Avoid this inconsistent state and system crash by acquiring the rtnl lock for the entire duration of device initialization. Re-factor the code to remove the rtnl lock from the individual function and acquire and release it from the caller. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-21Merge branch 'mlx5-vport-loopback' into rdma.getDoug Ledford3-40/+133
For dependencies, branch based on 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git mlx5 mcast/ucast loopback control enhancements from Leon Romanovsky: ==================== This is short series from Mark which extends handling of loopback traffic. Originally mlx5 IB dynamically enabled/disabled both unicast and multicast based on number of users. However RAW ethernet QPs need more granular access. ==================== Fixed failed automerge in mlx5_ib.h (minor context conflict issue) mlx5-vport-loopback branch: RDMA/mlx5: Enable vport loopback when user context or QP mandate RDMA/mlx5: Allow creating RAW ethernet QP with loopback support RDMA/mlx5: Refactor transport domain bookkeeping logic net/mlx5: Rename incorrect naming in IFC file Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/mlx5: Enable vport loopback when user context or QP mandateMark Bloch3-19/+59
A user can create a QP which can accept loopback traffic, but that's not enough. We need to enable loopback on the vport as well. Currently vport loopback is enabled only when more than 1 users are using the IB device, update the logic to consider whatever a QP which supports loopback was created, if so enable vport loopback even if there is only a single user. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/mlx5: Allow creating RAW ethernet QP with loopback supportMark Bloch2-14/+50
Expose two new flags: MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC Those flags can be used at creation time in order to allow a QP to be able to receive loopback traffic (unicast and multicast). We store the state in the QP to be used on the destroy path to indicate with which flags the QP was created with. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/mlx5: Refactor transport domain bookkeeping logicMark Bloch2-19/+36
In preparation to enable loopback on a single user context move the logic that enables/disables loopback to separate functions and group variables under a single struct. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-22net/mlx5: Rename incorrect naming in IFC fileMark Bloch1-2/+2
Remove a trailing underscore from the multicast/unicast names. Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-09-21RDMA/cxgb4: remove redundant null pointer check before kfree_skbzhong jiang2-4/+2
kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21IB/mlx4: Remove unnecessary parenthesesNathan Chancellor1-1/+1
Clang warns when more than one set of parentheses are used in single conditional statements. drivers/infiniband/hw/mlx4/mcg.c:676:16: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((method == IB_MGMT_METHOD_GET_RESP)) { ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mcg.c:676:16: note: remove extraneous parentheses around the comparison to silence this warning if ((method == IB_MGMT_METHOD_GET_RESP)) { ~ ^ ~ drivers/infiniband/hw/mlx4/mcg.c:676:16: note: use '=' to turn this equality comparison into an assignment if ((method == IB_MGMT_METHOD_GET_RESP)) { ^~ = Remove the unnecessary parentheses to silence this warning. Reported-by: Nick Desaulniers <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21IB/nes: Remove unnecessary parenthesesNathan Chancellor1-1/+1
Clang warns when more than one set of parentheses are used in single conditional statements. drivers/infiniband/hw/nes/nes_hw.c:1446:27: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] } while ((temp_phy_data2 == temp_phy_data)); ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: remove extraneous parentheses around the comparison to silence this warning } while ((temp_phy_data2 == temp_phy_data)); ~ ^ ~ drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: use '=' to turn this equality comparison into an assignment } while ((temp_phy_data2 == temp_phy_data)); ^~ = Remove the unnecessary parentheses to silence this warning. Reported-by: Nick Desaulniers <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/uverbs: Get rid of ucontext->tgidJason Gunthorpe2-5/+0
Nothing uses this now, just delete it. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Avoid synchronize_srcu in the ODP MR destruction pathJason Gunthorpe1-2/+8
synchronize_rcu is slow enough that it should be avoided on the syscall path when user space is destroying MRs. After all the rework we can now trivially do this by having call_srcu kfree the per_mm. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Handle a half-complete start/end sequenceJason Gunthorpe1-13/+26
mmu_notifier_unregister() can race between a invalidate_start/end and cause the invalidate_end to be skipped. This causes an imbalance in the locking, which lockdep complains about. This is not actually a bug, as we immediately kfree the memory holding the lock, but it simple enough to fix. Mark when the notifier is being destroyed and abort the start callback. This can be done under the lock we already obtained, and can re-purpose the invalidate_range test we already have. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Get rid of per_mm->notifier_countJason Gunthorpe1-95/+18
This is intrinsically racy and the scheme is simply unnecessary. New MR registration can wait for any on going invalidation to fully complete. CPU0 CPU1 if (atomic_read()) if (atomic_dec_and_test() && !list_empty()) { /* not taken */ } list_add() Putting the new UMEM into some kind of purgatory until another invalidate rolls through.. Instead hold the read side of the umem_rwsem across the pair'd start/end and get rid of the racy 'deferred add' approach. Since all umem's in the rbt are always ready to go, also get rid of the mn_counters_active stuff. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Use umem->owning_mm inside ODPJason Gunthorpe4-148/+170
Since ODP had a single struct mmu_notifier located in the ucontext it could only handle a single MM at a time, and this prevented it from using the new owning_mm system. With the prior rework it is now simple to let ODP track multiple MMs per ucontext, finish the job so that the per_mm is allocated on a mm by mm basis, and freed when the last umem is dropped from the ucontext. As a side effect the new saner locking removes the lockdep splat about nesting the umem_rwsem between mmu_notifier_unregister and ib_umem_odp_release. It also makes ODP work with multiple processes, across, fork, etc. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mmJason Gunthorpe3-81/+98
This is the first step to make ODP use the owning_mm that is now part of struct ib_umem. Each ODP umem is linked to a single per_mm structure, which in turn, is linked to a single mm, via the embedded mmu_notifier. This first patch introduces the structure and reworks eveything to use it. This also needs to introduce tgid into the ib_ucontext_per_mm, as get_user_pages_remote() requires the originating task for statistics tracking. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Get rid of struct ib_umem.odp_dataJason Gunthorpe5-23/+24
This no longer has any use, we can use container_of to get to the umem_odp, and a simple flag to indicate if this is an odp MR. Remove the few remaining references to it. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Make ib_umem_odp into a sub structure of ib_umemJason Gunthorpe3-76/+65
These two structures are linked together, use the container_of pattern instead of a double allocation to make the code simpler and easier to follow. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-21RDMA/umem: Use ib_umem_odp in all function signatures connected to ODPJason Gunthorpe5-95/+105
All of these functions already require the ODP version of the umem struct, make this very clear by having the signature require it. This paves the way to using the container_of() pattern to link umem_odp and umem together. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20IB/hfi1: Fix destroy_qp hang after a link downMichael J. Ruhl3-9/+41
rvt_destroy_qp() cannot complete until all in process packets have been released from the underlying hardware. If a link down event occurs, an application can hang with a kernel stack similar to: cat /proc/<app PID>/stack quiesce_qp+0x178/0x250 [hfi1] rvt_reset_qp+0x23d/0x400 [rdmavt] rvt_destroy_qp+0x69/0x210 [rdmavt] ib_destroy_qp+0xba/0x1c0 [ib_core] nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma] nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma] nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma] nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma] process_one_work+0x17a/0x440 worker_thread+0x126/0x3c0 kthread+0xcf/0xe0 ret_from_fork+0x58/0x90 0xffffffffffffffff quiesce_qp() waits until all outstanding packets have been freed. This wait should be momentary. During a link down event, the cleanup handling does not ensure that all packets caught by the link down are flushed properly. This is caused by the fact that the freeze path and the link down event is handled the same. This is not correct. The freeze path waits until the HFI is unfrozen and then restarts PIO. A link down is not a freeze event. The link down path cannot restart the PIO until link is restored. If the PIO path is restarted before the link comes up, the application (QP) using the PIO path will hang (until link is restored). Fix by separating the linkdown path from the freeze path and use the link down path for link down events. Close a race condition sc_disable() by acquiring both the progress and release locks. Close a race condition in sc_stop() by moving the setting of the flag bits under the alloc lock. Cc: <[email protected]> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Michael J. Ruhl <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-20IB/hfi1: Fix context recovery when PBC has an UnsupportedVLMichael J. Ruhl1-2/+7
If a packet stream uses an UnsupportedVL (virtual lane), the send engine will not send the packet, and it will not indicate that an error has occurred. This will cause the packet stream to block. HFI has 8 virtual lanes available for packet streams. Each lane can be enabled or disabled using the UnsupportedVL mask. If a lane is disabled, adding a packet to the send context must be disallowed. The current mask for determining unsupported VLs defaults to 0 (allow all). This is incorrect. Only the VLs that are defined should be allowed. Determine which VLs are disabled (mtu == 0), and set the appropriate unsupported bit in the mask. The correct mask will allow the send engine to error on the invalid VL, and error recovery will work correctly. Cc: <[email protected]> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Lukasz Odzioba <[email protected]> Signed-off-by: Michael J. Ruhl <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-20IB/hfi1: Invalid user input can result in crashMichael J. Ruhl1-1/+1
If the number of packets in a user sdma request does not match the actual iovectors being sent, sdma_cleanup can be called on an uninitialized request structure, resulting in a crash similar to this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] PGD 8000001044f61067 PUD 1052706067 PMD 0 Oops: 0000 [#1] SMP CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE ------------ 3.10.0-862.el7.x86_64 #1 Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000 RIP: 0010:[<ffffffffc0ae8bb7>] [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286 RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000 RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000 RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06 R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000 R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540 FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0 Call Trace: [<ffffffffc0b0570d>] user_sdma_send_pkts+0xdcd/0x1990 [hfi1] [<ffffffff9fe75fb0>] ? gup_pud_range+0x140/0x290 [<ffffffffc0ad3105>] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1] [<ffffffffc0b0777b>] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1] [<ffffffffc0ac193a>] hfi1_aio_write+0xba/0x110 [hfi1] [<ffffffffa001a2bb>] do_sync_readv_writev+0x7b/0xd0 [<ffffffffa001bede>] do_readv_writev+0xce/0x260 [<ffffffffa022b089>] ? tty_ldisc_deref+0x19/0x20 [<ffffffffa02268c0>] ? n_tty_ioctl+0xe0/0xe0 [<ffffffffa001c105>] vfs_writev+0x35/0x60 [<ffffffffa001c2bf>] SyS_writev+0x7f/0x110 [<ffffffffa051f7d5>] system_call_fastpath+0x1c/0x21 Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb <48> 8b 51 08 49 89 d4 83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02 RIP [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] RSP <ffff8b2ed1f9bab0> CR2: 0000000000000008 There are two exit points from user_sdma_send_pkts(). One (free_tx) merely frees the slab entry and one (free_txreq) cleans the sdma_txreq prior to freeing the slab entry. The free_txreq variation can only be called after one of the sdma_init*() variations has been called. In the panic case, the slab entry had been allocated but not inited. Fix the issue by exiting through free_tx thus avoiding sdma_clean(). Cc: <[email protected]> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Lukasz Odzioba <[email protected]> Signed-off-by: Michael J. Ruhl <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-20IB/hfi1: Fix SL array bounds checkIra Weiny1-1/+7
The SL specified by a user needs to be a valid SL. Add a range check to the user specified SL value which protects from running off the end of the SL to SC table. CC: [email protected] Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-20RDMA/uverbs: Fix validity check for modify QPMajd Dibbiny1-23/+45
Uverbs shouldn't enforce QP state in the command unless the user set the QP state bit in the attribute mask. In addition, only copy qp attr fields which have the corresponding bit set in the attribute mask over to the internal attr structure. Fixes: 88de869bbe4f ("RDMA/uverbs: Ensure validity of current QP state value") Fixes: bc38a6abdd5a ("[PATCH] IB uverbs: core implementation") Signed-off-by: Majd Dibbiny <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-20RDMA/usnic: Do not use ucontext->tgidJason Gunthorpe2-48/+46
Update this driver to match the code it copies from umem.c which no longer uses tgid. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/umem: Do not use current->tgid to track the mm_structJason Gunthorpe1-41/+36
This is just wrong, the process that calls into the reg_mr is the process associated with the umem, and that does not have to be the same process that created the context. When this code was first written mmgrab() didn't exist, however these days we can just directly hold the mm_struct pointer in the umem and have no ambiguity when it comes to releasing the umem as to which mm it was associated with. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/ucontext: Get rid of the old disassociate flowJason Gunthorpe1-41/+10
The disassociate_ucontext function in every driver is now empty, so we don't need this ugly and wrong code that was messing with tgids. rdma_user_mmap_io does this same work in a better way. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/hns: Use rdma_user_mmap_ioJason Gunthorpe2-88/+21
Rely on the new core code helper to map BAR memory from the driver. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/mlx5: Use rdma_user_mmap_ioJason Gunthorpe2-125/+7
Rely on the new core code helper to map BAR memory from the driver. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/mlx4: Use rdma_user_mmap_ioJason Gunthorpe2-123/+24
Rely on the new core code helper to map BAR memory from the driver. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-20RDMA/ucontext: Add a core API for mmaping driver IO memoryJason Gunthorpe4-1/+230
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-19RDMA/hns: Move all prints out of irq handleliuyixian2-132/+97
It will trigger unnecessary interrupts caused by time out if prints inside aeq handle under some configurations. Thus, move all prints out of aeq handle to work queue. Signed-off-by: liuyixian <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-19IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loopBart Van Assche1-3/+3
Use different loop variables for the inner and outer loop. This avoids that an infinite loop occurs if there are more RDMA channels than target->req_ring_size. Fixes: d92c0da71a35 ("IB/srp: Add multichannel support") Cc: <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-19RDMA/uverbs: Fix error unwind in ib_uverbs_add_oneJason Gunthorpe1-13/+10
The error path has several mistakes - cdev_del should not be called if cdev_device_add fails - We must call put_device on all the goto exit paths as that is what frees the uapi, SRCU and the struct itself. While we are here consolidate all the uvdev_dev init that cannot fail at the top. Fixes: c5c4d92e70f3 ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()") Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Parav Pandit <[email protected]>
2018-09-19RDMA/core: Properly return the error code of rdma_set_src_addr_rcuYueHaibing1-6/+7
rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than always return 0. Also copy_src_l2_addr should return 'ret' as its return value when rdma_translate_ip fails. Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev during destination resolve") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-19RDMA/i40iw: Fix incorrect iterator typeHåkon Bugge1-1/+1
Commit f27b4746f378 ("i40iw: add connection management code") uses an incorrect rcu iterator, whilst holding the rtnl_lock. Since the critical region invokes i40iw_manage_qhash(), which is a sleeping function, the rcu locking and traversal cannot be used. Signed-off-by: Håkon Bugge <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-19RDMA/uverbs: Remove is_closed from ib_uverbs_fileJason Gunthorpe2-7/+2
This does nothing but indicate if the uverbs_file is in the device's list, use list_del_init instead. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-7/+4
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <[email protected]>
2018-09-13ucma: fix a use-after-free in ucma_resolve_ip()Cong Wang1-0/+2
There is a race condition between ucma_close() and ucma_resolve_ip(): CPU0 CPU1 ucma_resolve_ip(): ucma_close(): ctx = ucma_get_ctx(file, cmd.id); list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { mutex_lock(&mut); idr_remove(&ctx_idr, ctx->id); mutex_unlock(&mut); ... mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); rdma_destroy_id(ctx->cm_id); ... ucma_free_ctx(ctx); ret = rdma_resolve_addr(); ucma_put_ctx(ctx); Before idr_remove(), ucma_get_ctx() could still find the ctx and after rdma_destroy_id(), rdma_resolve_addr() may still access id_priv pointer. Also, ucma_put_ctx() may use ctx after ucma_free_ctx() too. ucma_close() should call ucma_put_ctx() too which tests the refcnt and waits for the last one releasing it. The similar pattern is already used by ucma_destroy_id(). Reported-and-tested-by: [email protected] Reported-by: [email protected] Cc: Jason Gunthorpe <[email protected]> Cc: Doug Ledford <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-13IB/ipoib: Log sysfs 'dev_id' accesses from userspaceArseny Maslennikov1-0/+31
Some tools may currently be using only the deprecated attribute; let's print an elaborate and clear deprecation notice to kmsg. To do that, we have to replace the whole sysfs file, since we inherit the original one from netdev. Signed-off-by: Arseny Maslennikov <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-13IB/ipoib: Use dev_port to expose network interface port numbersArseny Maslennikov1-0/+2
Some InfiniBand network devices have multiple ports on the same PCI function. This initializes the `dev_port' sysfs field of those network interfaces with their port number. Prior to this the kernel erroneously used the `dev_id' sysfs field of those network interfaces to convey the port number to userspace. The use of `dev_id' was considered correct until Linux 3.15, when another field, `dev_port', was defined for this particular purpose and `dev_id' was reserved for distinguishing stacked ifaces (e.g: VLANs) with the same hardware address as their parent device. Similar fixes to net/mlx4_en and many other drivers, which started exporting this information through `dev_id' before 3.15, were accepted into the kernel 4 years ago. See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs'). Signed-off-by: Arseny Maslennikov <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2018-09-12Merge tag 'pci-v4.19-fixes-1' of ↵Linus Torvalds1-7/+4
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Add Tyrel Datwyler as maintainer for PPC64 RPA hotplug (Tyrel Datwyler) - Add Gustavo Pimentel as DesignWare PCI maintainer (Joao Pinto) - Fix a Switchtec Spectre v1 vulnerability (Gustavo A. R. Silva) - Revert an unnecessary Intel 300 ACS quirk (Mika Westerberg) - Fix pciehp hot-add/powerfault detection that left indicators in wrong state (Keith Busch) - Fix pci_reset_bus() logic error (Dennis Dalessandro) - Revert IB/hfi1 PCI reset change that caused a deadlock (Dennis Dalessandro) - Allow enabling PASID on Root Complex Integrated Endpoints (Felix Kuehling) * tag 'pci-v4.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix enabling of PASID on RC integrated endpoints IB/hfi1,PCI: Allow bus reset while probing PCI: Fix faulty logic in pci_reset_bus() PCI: pciehp: Fix hot-add vs powerfault detection order switchtec: Fix Spectre v1 vulnerability Revert "PCI: Add ACS quirk for Intel 300 series" MAINTAINERS: Add Gustavo Pimentel as DesignWare PCI maintainer MAINTAINERS: Add entries for PPC64 RPA PCI hotplug drivers
2018-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller9-12/+33
2018-09-12RDMA/core: Consider net ns of gid attribute for RoCEParav Pandit4-16/+70
When resolving destination address or route, when net namespace is unavailable, refer to the net namespace of the netdevice of the SGID attribute. This is typically the case for requests arriving from the network for RoCE ports. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-12RDMA/core: Introduce rdma_read_gid_attr_ndev_rcu() to check GID attributeParav Pandit2-0/+35
Introduce an API rdma_read_gid_attr_ndev_rcu() to return GID attribute netdevice which is in UP state for accessing netdevice's fields such as net namespace and ifindex. This is useful for users who intent to access netdevice fields under rcu lock. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-12RDMA/core: Simplify roce_resolve_route_from_path()Parav Pandit3-53/+41
Currently RoCE route resolve functionality is split between two functions. (a) roce_resolve_route_from_path() and its helper function rdma_resolve_ip_route(). Due to this multiple sockaddr src structures are created in both functions with rdma_dev_addr is an interface between the two for checks. Since there is only one user of rdma_resolve_ip_route() as RoCE, combine the functionality of both functions to roce_resolve_route_from_path() and further reduce the scope of rdma_dev_addr to core/addr.c This also allow to extend addr_resolve() in subsequent patch to consider netdev properties of GID in safer way under rcu lock. Additionally src and dst addresses were always provided, so skip the src addr NULL pointer check as they are present on the stack now. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-12RDMA/core: Protect against changing dst->dev during destination resolveParav Pandit1-15/+46
During resolving address process, during route lookup and while performing src address translation in case of loopback mode, hold the rcu lock so that if netdevice is moving to different net namespace, or being unregistered, it can be synchronized with net/core/dev.c, ie change_net_namespace() ->dev_close_many() ->rt6_uncached_list_flush_dev() who would change dst->dev to loopback device of the given net namespace. Therefore, hold the rcu lock and sync with synchronize_net() of change_net_namespace() to ensure that netdevice cannot get freed while dst->dev is being used. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-12RDMA/core: Refer to network type instead of device typeParav Pandit1-19/+16
Set and refer to rdma_dev_addr network type instead of dst->ndev to reduce dependency on accessing dst netdevice. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-09-12RDMA/core: Use common code flow for IPv4/6 for addr resolveParav Pandit1-17/+15
Use common code flow for resolving neighbour and for finding source addresses. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>