aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2020-05-27IB/ipoib: Fix double free of skb in case of multicast traffic in CM modeValentine Fatiev4-12/+26
When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of datagram skb. The current mechanism assumes that the order in the completion queue is the same as the order of sent packets for all QPs. Order is kept only for specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and few CM's) in parallel. The problem: ---------------------------------------------------------- Transmit queue: ----------------- UD skb pointer kept in queue itself, CM skb kept in spearate queue and uses transmit queue as a placeholder to count the number of total transmitted packets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ tail head Completion queue (problematic scenario) - the order not the same as in the transmit queue: 1 2 3 4 5 6 7 8 9 ------------------------------------ ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 ------------------------------------ 1. CM1 'wc' processing - skb freed in cm separate ring. - tx_tail of transmit queue increased although UD2 is not freed. Now driver assumes UD2 index is already freed and it could be used for new transmitted skb. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ ^ (Bad)tail head (Bad - Could be used for new SKB) In this case (due to heavy load) UD2 skb pointer could be replaced by new transmitted packet UD_NEW, as the driver assumes its free. At this point we will have to process two 'wc' with same index but we have only one pointer to free. During second attempt to free the same skb we will have NULL pointer exception. 2. UD2 'wc' processing - skb freed according the index we got from 'wc', but it was already overwritten by mistake. So actually the skb that was released is the skb of the new transmitted packet and not the original one. 3. UD_NEW 'wc' processing - attempt to free already freed skb. NUll pointer exception. The fix: ----------------------------------------------------------------------- The fix is to stop using the UD ring as a placeholder for CM packets, the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a new cyclic variables global_tx_head and global_tx_tail are introduced for managing and counting the overall outstanding sent packets, then the send queue will be stopped and waken based on these variables only. Note that no locking is needed since global_tx_head is updated in the xmit flow and global_tx_tail is updated in the NAPI flow only. A previous attempt tried to use one variable to count the outstanding sent packets, but it did not work since xmit and NAPI flows can run at the same time and the counter will be updated wrongly. Thus, we use the same simple cyclic head and tail scheme that we have today for the UD tx_ring. Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Valentine Fatiev <[email protected]> Signed-off-by: Alaa Hleihel <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Doug Ledford <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Return ECE data after modify QPLeon Romanovsky2-4/+46
After users sets the ECE option, FW will return the agreed/supported bits through an output structures of modify QP stages for regular QPs or through create QP for the DCT. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Set ECE options during modify QPLeon Romanovsky4-14/+23
The most common way to set ECE option will be during modify QP command in INIT2RTR, RTR2RTS and RTS2RTS stages, so update mlx5 to support it. The new bit in the comp_mask is needed to mark that kernel supports ECE and can receive data instead of "reserved" field in the struct mlx5_ib_modify_qp. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Convert modify QP to use MLX5_SET macrosLeon Romanovsky1-106/+97
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common MLX5_SET() macros. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Remove manually crafted QP context the query callLeon Romanovsky1-73/+56
As a preparation to removal hand crafted mlx5_qp_context, convert query_qp_attr() to use proper MLX5_GET() macros. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Use direct modify QP implementationLeon Romanovsky1-6/+11
As a preparation to removal hand crafted mlx5_qp_context, convert counter code to use mlx5_cmd_exec_in() directly. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Set ECE options during QP createLeon Romanovsky1-9/+66
Allow users to ask creation of QPs with specific ECE options. Such early set even before RDMA-CM connection is established is useful if user knows exactly which option he needs. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/mlx5: Get ECE options from FW during create QPLeon Romanovsky3-11/+17
Supported ECE options are returned from FW in the create_qp phase and zero means that field is not valid. Such default value allows us to reuse reserved field without worries about comp_mask. Update create QP API to return ECE options. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/cma: Provide ECE reject reasonLeon Romanovsky5-10/+25
IBTA declares "vendor option not supported" reject reason in REJ messages if passive side doesn't want to accept proposed ECE options. Due to the fact that ECE is managed by userspace, there is a need to let users to provide such rejected reason. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/cma: Connect ECE to rdma_acceptLeon Romanovsky2-3/+30
The rdma_accept() is called by both passive and active sides of CMID connection to mark readiness to start data transfer. For passive side, this is called explicitly, for active side, it is called implicitly while receiving REP message. Provide ECE data to rdma_accept function needed for passive side to send that REP message. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/cm: Send and receive ECE parameter over the wireLeon Romanovsky2-5/+42
ECE parameters are exchanged through REQ->REP/SIDR_REP messages, this patch adds the data to provide to other side of CMID communication channel. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/ucma: Deliver ECE parameters through UCMA eventsLeon Romanovsky1-1/+5
Passive side of CMID connection receives ECE request through REQ message and needs to respond with relevant REP message which will be forwarded to active side. The UCMA events interface is responsible for such communication with the user space (librdmacm). Extend it to provide ECE wire data. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/ucma: Extend ucma_connect to receive ECE parametersLeon Romanovsky3-3/+33
Active side of CMID initiates connection through librdmacm's rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to handle those new parameters. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27Merge branch 'mellanox/mlx5-next' into rdma.git for/nextJason Gunthorpe1-1/+4
From the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in following patches * branch 'mellanox/mlx5-next': net/mlx5: Add ability to read and write ECE options net/mlx5: Add support for RDMA TX FT headers modifying net/mlx5: Move iseg access helper routines close to mlx5_core driver net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27IB/mlx5: Fix DEVX support for MLX5_CMD_OP_INIT2INIT_QP commandMark Zhang1-0/+4
The commit citied in the Fixes line wasn't complete and solved only part of the problems. Update the mlx5_ib to properly support MLX5_CMD_OP_INIT2INIT_QP command in the DEVX, that is required when modify the QP tx_port_affinity. Fixes: 819f7427bafd ("RDMA/mlx5: Add init2init as a modify command") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/core: Fix double destruction of uobjectJason Gunthorpe1-7/+13
Fix use after free when user user space request uobject concurrently for the same object, within the RCU grace period. In that case, remove_handle_idr_uobject() is called twice and we will have an extra put on the uobject which cause use after free. Fix it by leaving the uobject write locked after it was removed from the idr. Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of UVERBS_LOOKUP_WRITE will do the work. refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x94/0xce panic+0x234/0x56f __warn+0x1cc/0x1e1 report_bug+0x200/0x310 fixup_bug.part.11+0x32/0x80 do_error_trap+0xd3/0x100 do_invalid_op+0x31/0x40 invalid_op+0x1e/0x30 RIP: 0010:refcount_warn_saturate+0xfe/0x1a0 Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3 R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08 uverbs_uobject_put+0xfd/0x140 __uobj_perform_destroy+0x3d/0x60 ib_uverbs_close_xrcd+0x148/0x170 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49 RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27RDMA/core: Use sizeof_field() helperGustavo A. R. Silva4-10/+10
Make use of the sizeof_field() helper instead of an open-coded version. Link: https://lore.kernel.org/r/20200527144152.GA22605@embeddedor Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/ipoib: Remove can_sleep parameter from iboib_mcast_allocKamal Heib1-6/+5
can_sleep is always 0 when iboib_mcast_alloc() is called, so remove it and use GFP_ATOMIC instead of GFP_KERNEL. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/iw_cxgb4: cleanup device debugfs entries on ULD removePotnuri Bharat Teja1-0/+1
Remove device specific debugfs entries immediately if LLD detaches a particular ULD device in case of fatal PCI errors. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/pvrdma: Fix missing pci disable in pvrdma_pci_probe()Qiushi Wu1-1/+1
In function pvrdma_pci_probe(), pdev was not disabled in one error path. Thus replace the jump target “err_free_device” by "err_disable_pdev". Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Qiushi Wu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Make the end of sge process more clearYixian Liu1-2/+2
Instead of i with the sge number of wr will make the comparision more clear, that is, when the sge number in wr is small than the maximum supported sge number in the queue, then a stop sge needed to be filled at the end of sges in wr. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yixian Liu <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Simplify process related to poll cqLang Cheng1-8/+3
Set hns_roce_v2_cq_set_ci to inline type and remove unnecessary next_cqe_sw_v2(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Remove redundant parameters from free_srq/qp_wrid()Wenpeng Liang2-7/+6
The redundant parameters "hr_dev" need to be removed from free_kernel_wrid() and free_srq_wrid(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wenpeng Liang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Remove redundant type cast for general pointersWeihang Li3-137/+73
There is no need to do a type cast on genernal pointers, they could be assigned to any type of variables. In addition, optimize initialization of some variables and adjust order of them. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Optimize the usage of MTRXi Wang2-27/+30
Currently, the MTR region is configed before hns_roce_mtr_map() is invoked, but in some scenarios, the region is configed at MTR creation, the caller need to store this config and call hns_roce_mtr_map() later. So optimize the usage by wrapping the MTR region config into MTR. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Refactor the QP context filling process related to WQE buffer ↵Xi Wang1-115/+149
configure Split the code related to WQE buffer configure from the QPC filling process into two functions: config_qp_sq_buf() and config_qp_rq_buf(), this will make the code more readable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Change variables representing quantity to unsignedWeihang Li2-10/+11
Number of sge/eqe is always non-negative, they should be defined in type of unsigned. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Change all page_shift to unsignedWeihang Li5-24/+27
page_shift is used to calculate the page size, it's always non-negative, and should be in type of unsigned. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Rename QP buffer related functionXi Wang1-4/+4
Rename the function related to QP buffer to make the code more readable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Remove unused code about assertYangyang Li2-5/+0
The codes related to assert are no longer used and need to be deleted. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yangyang Li <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Optimize post and poll processLang Cheng1-13/+14
Add unlikely() and likely() to optimize main I/O process code. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Add CQ flag instead of independent enable flagLang Cheng3-11/+11
It's easier to understand and maintain enable flags of cq using a single field in type of u32 than defining a field for every flags in the structure hns_roce_cq, and we can add new flags for features more conveniently in the future. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-25RDMA/hns: Let software PI/CI grow naturallyLang Cheng1-5/+4
The hardware can truncate PI/CI when posting or polling, the driver does not need to do truncation. Therefore keep the software's PI/CI consistent with it in the hardware. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-22RDMA/rtrs: Get rid of the do_next_path while_next_path macrosDanil Kipnis1-16/+13
The macros do_each_path/while_each_path lead to a smatch warning: drivers/infiniband/ulp/rtrs/rtrs-clt.c:1196 rtrs_clt_failover_req() warn: inconsistent indenting drivers/infiniband/ulp/rtrs/rtrs-clt.c:2890 rtrs_clt_request() warn: inconsistent indenting Also checkpatch complains: ERROR: Macros with multiple statements should be enclosed in a do - while loop The macros are used only in two places: for a normal IO path and for the failover path triggered after errors. Get rid of the macros and just use a for loop iterating over the list of paths in both places. It is easier to read and also less lines of code. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/[email protected] Reported-by: kbuild test robot <[email protected]> Signed-off-by: Danil Kipnis <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-22RDMA/rtrs: server: Use already dereferenced rtrs_sess structureMd Haris Iqbal1-2/+2
The rtrs_sess structure has already been extracted above from the rtrs_srv_sess structure. Use that to avoid redundant dereferencing. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Md Haris Iqbal <[email protected]> Acked-by: Danil Kipnis <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-22IB/cma: Fix ports memory leak in cma_configfsMaor Gottlieb1-0/+13
The allocated ports structure in never freed. The free function should be called by release_cma_ports_group, but the group is never released since we don't remove its default group. Remove default groups when device group is deleted. Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21RDMA/mlx5: Fix NULL pointer dereference in destroy_prefetch_workMaor Gottlieb1-0/+1
q_deferred_work isn't initialized when creating an explicit ODP memory region. This can lead to a NULL pointer dereference when user performs asynchronous prefetch MR. Fix it by initializing q_deferred_work for explicit ODP. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 4 PID: 6074 Comm: kworker/u16:6 Not tainted 5.7.0-rc1-for-upstream-perf-2020-04-17_07-03-39-64 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib] RIP: 0010:__wake_up_common+0x49/0x120 Code: 04 89 54 24 0c 89 4c 24 08 74 0a 41 f6 01 04 0f 85 8e 00 00 00 48 8b 47 08 48 83 e8 18 4c 8d 67 08 48 8d 50 18 49 39 d4 74 66 <48> 8b 70 18 31 db 4c 8d 7e e8 eb 17 49 8b 47 18 48 8d 50 e8 49 8d RSP: 0000:ffffc9000097bd88 EFLAGS: 00010082 RAX: ffffffffffffffe8 RBX: ffff888454cd9f90 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888454cd9f90 RBP: ffffc9000097bdd0 R08: 0000000000000000 R09: ffffc9000097bdd0 R10: 0000000000000000 R11: 0000000000000001 R12: ffff888454cd9f98 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000044c19e002 CR4: 0000000000760ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7a/0xc0 destroy_prefetch_work+0x5a/0x60 [mlx5_ib] mlx5_ib_prefetch_mr_work+0x64/0x80 [mlx5_ib] process_one_work+0x15b/0x360 worker_thread+0x49/0x3d0 kthread+0xf5/0x130 ? rescuer_thread+0x310/0x310 ? kthread_bind+0x10/0x10 ret_from_fork+0x1f/0x30 Fixes: de5ed007a03d ("IB/mlx5: Fix implicit ODP race") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/uverbs: Introduce create/destroy QP commands over ioctlYishai Hadas5-41/+405
Introduce create/destroy QP commands over the ioctl interface to let it be extended to get an asynchronous event FD. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/uverbs: Introduce create/destroy WQ commands over ioctlYishai Hadas5-24/+198
Introduce create/destroy WQ commands over the ioctl interface to let it be extended to get an asynchronous event FD. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/uverbs: Introduce create/destroy SRQ commands over ioctlYishai Hadas5-33/+238
Introduce create/destroy SRQ commands over the ioctl interface to let it be extended to get an asynchronous event FD. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/uverbs: Extend CQ to get its own asynchronous event FDYishai Hadas2-3/+24
Extend CQ to get its own asynchronous event FD. The event FD is an optional attribute, in case wasn't given the ufile event FD will be used. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/uverbs: Refactor related objects to use their own asynchronous event FDYishai Hadas4-9/+39
Refactor related objects to use their own asynchronous event FD. The ufile event FD will be the default in case an object won't have its own event FD. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21RDMA/core: Allow the ioctl layer to abort a fully created uobjectJason Gunthorpe9-57/+63
While creating a uobject every create reaches a point where the uobject is fully initialized. For ioctls that go on to copy_to_user this means they need to open code the destruction of a fully created uobject - ie the RDMA_REMOVE_DESTROY sort of flow. Open coding this creates bugs, eg the CQ does not properly flush the events list when it does its error unwind. Provide a uverbs_finalize_uobj_create() function which indicates that the uobject is fully initialized and that abort should call to destroy_hw to destroy the uobj->object and related. Methods can call this function if they go on to have error cases after setting uobj->object. Once done those error cases can simply do return, without an error unwind. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21Merge tag 'v5.7-rc6' into rdma.git for-nextJason Gunthorpe22-71/+115
Linux 5.7-rc6 Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c resolved by deleting dr_cq_event, matching how netdev resolved it. Required for dependencies in the following patches. Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/hfi1: Enable the transmit side of the datagram ipoib netdevPiotr Stankiewicz2-0/+3
This patch hooks the transmit side of the datagram netdev with ipoib by setting the rdma_netdev_get_params function for the hfi1 ib_device_ops structue. It also enables the receiving side by adding the AIP capability into the default capabilities. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/ipoib: Add capability to switch between datagram and connected modeGary Leshner1-8/+10
This is the prerequisite modification to the ipoib ulp to allow a rdma netdev to obtain the default ndo ops for init/uninit/open/close. This is accomplished by setting the netdev ops field within the callback function passed to the netdev allocation routine which in turn was passed into the rdma netdev allocation routine. This allows the rdma netdev to call back into the ulp to create the resources required for connected mode operation. Additionally as the ulp is not re-entrant, when switching modes, the number of real tx queues is set to 1 for the connected mode. For datagram mode the number of real tx queues is set to the actual number of tx queues specified at the netdev's allocation. For the internal ulp netdev the number of tx queues defaults to 1. It is up to the rdma netdev to specify the actual number it can support. When the driver does not support a rdma netdev for acceleration, (-ENOTSUPPORTED return code or the verbs function for allocation is NULL) the ipoib ulp functions are unaffected by using the internal netdev allocated by the ipoib ulp. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Gary Leshner <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/hfi1: Add packet histogram trace eventGrzegorz Andrejczuk3-1/+43
Add a simple trace event taking context number and building simple histogram to print packets distribution between contexts. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Grzegorz Andrejczuk <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu sizeGary Leshner3-0/+6
When in connected mode ipoib sent broadcast pings which exceeded the mtu size for broadcast addresses. Add an mtu attribute to the rdma_netdev structure which ipoib sets to its mcast mtu size. The RDMA netdev uses this value to determine if the skb length is too long for the mtu specified and if it is, drops the packet and logs an error about the errant packet. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Gary Leshner <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/hfi1: Activate the dummy netdevGrzegorz Andrejczuk11-342/+178
As described in earlier patches, ipoib netdev will share receive contexts with existing VNIC netdev through a dummy netdev. The following changes are made to achieve that: - Set up netdev receive contexts after user contexts. A function is added to count the available netdev receive contexts. - Add functions to set/get receive map table free index. - Rename NUM_VNIC_MAP_ENTRIES as NUM_NETDEV_MAP_ENTRIES. - Let the dummy netdev own the receive contexts instead of VNIC. - Allocate the dummy netdev when the hfi1 device is added and free it when the device is removed. - Initialize AIP RSM rules when the IpoIb rxq is initialized and remove the rules when it is de-initialized. - Convert VNIC to use the dummy netdev. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Sadanand Warrier <[email protected]> Signed-off-by: Grzegorz Andrejczuk <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-21IB/hfi1: Add rx functions for dummy netdevGrzegorz Andrejczuk5-2/+414
This patch adds the rx functions for the dummy netdev: - Functions to allocate/free the dummy netdev. - Functions to allocate/free receiving contexts for the netdev. - Functions to initialize/de-initialize the receive queue. - Functions to enable/disable the receive queue. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Sadanand Warrier <[email protected]> Signed-off-by: Grzegorz Andrejczuk <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>