Age | Commit message (Collapse) | Author | Files | Lines |
|
When connected mode is set, and we have connected and datagram traffic in
parallel, ipoib might crash with double free of datagram skb.
The current mechanism assumes that the order in the completion queue is
the same as the order of sent packets for all QPs. Order is kept only for
specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and
few CM's) in parallel.
The problem:
----------------------------------------------------------
Transmit queue:
-----------------
UD skb pointer kept in queue itself, CM skb kept in spearate queue and
uses transmit queue as a placeholder to count the number of total
transmitted packets.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127
------------------------------------------------------------
NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
^ ^
tail head
Completion queue (problematic scenario) - the order not the same as in
the transmit queue:
1 2 3 4 5 6 7 8 9
------------------------------------
ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
------------------------------------
1. CM1 'wc' processing
- skb freed in cm separate ring.
- tx_tail of transmit queue increased although UD2 is not freed.
Now driver assumes UD2 index is already freed and it could be used for
new transmitted skb.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127
------------------------------------------------------------
NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
^ ^ ^
(Bad)tail head
(Bad - Could be used for new SKB)
In this case (due to heavy load) UD2 skb pointer could be replaced by new
transmitted packet UD_NEW, as the driver assumes its free. At this point
we will have to process two 'wc' with same index but we have only one
pointer to free.
During second attempt to free the same skb we will have NULL pointer
exception.
2. UD2 'wc' processing
- skb freed according the index we got from 'wc', but it was already
overwritten by mistake. So actually the skb that was released is the
skb of the new transmitted packet and not the original one.
3. UD_NEW 'wc' processing
- attempt to free already freed skb. NUll pointer exception.
The fix:
-----------------------------------------------------------------------
The fix is to stop using the UD ring as a placeholder for CM packets, the
cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a
new cyclic variables global_tx_head and global_tx_tail are introduced for
managing and counting the overall outstanding sent packets, then the send
queue will be stopped and waken based on these variables only.
Note that no locking is needed since global_tx_head is updated in the xmit
flow and global_tx_tail is updated in the NAPI flow only. A previous
attempt tried to use one variable to count the outstanding sent packets,
but it did not work since xmit and NAPI flows can run at the same time and
the counter will be updated wrongly. Thus, we use the same simple cyclic
head and tail scheme that we have today for the UD tx_ring.
Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Valentine Fatiev <[email protected]>
Signed-off-by: Alaa Hleihel <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
After users sets the ECE option, FW will return the agreed/supported bits
through an output structures of modify QP stages for regular QPs or
through create QP for the DCT.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The most common way to set ECE option will be during modify QP command in
INIT2RTR, RTR2RTS and RTS2RTS stages, so update mlx5 to support it.
The new bit in the comp_mask is needed to mark that kernel supports ECE
and can receive data instead of "reserved" field in the struct
mlx5_ib_modify_qp.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common
MLX5_SET() macros.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
As a preparation to removal hand crafted mlx5_qp_context, convert
query_qp_attr() to use proper MLX5_GET() macros.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
As a preparation to removal hand crafted mlx5_qp_context, convert counter
code to use mlx5_cmd_exec_in() directly.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Allow users to ask creation of QPs with specific ECE options. Such early
set even before RDMA-CM connection is established is useful if user knows
exactly which option he needs.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Supported ECE options are returned from FW in the create_qp phase and zero
means that field is not valid. Such default value allows us to reuse
reserved field without worries about comp_mask.
Update create QP API to return ECE options.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
IBTA declares "vendor option not supported" reject reason in REJ messages
if passive side doesn't want to accept proposed ECE options.
Due to the fact that ECE is managed by userspace, there is a need to let
users to provide such rejected reason.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The rdma_accept() is called by both passive and active sides of CMID
connection to mark readiness to start data transfer. For passive side,
this is called explicitly, for active side, it is called implicitly while
receiving REP message.
Provide ECE data to rdma_accept function needed for passive side to send
that REP message.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
ECE parameters are exchanged through REQ->REP/SIDR_REP messages, this
patch adds the data to provide to other side of CMID communication
channel.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Passive side of CMID connection receives ECE request through REQ message
and needs to respond with relevant REP message which will be forwarded to
active side.
The UCMA events interface is responsible for such communication with the
user space (librdmacm). Extend it to provide ECE wire data.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Active side of CMID initiates connection through librdmacm's
rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to
handle those new parameters.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
From the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Required for dependencies in following patches
* branch 'mellanox/mlx5-next':
net/mlx5: Add ability to read and write ECE options
net/mlx5: Add support for RDMA TX FT headers modifying
net/mlx5: Move iseg access helper routines close to mlx5_core driver
net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The commit citied in the Fixes line wasn't complete and solved
only part of the problems. Update the mlx5_ib to properly support
MLX5_CMD_OP_INIT2INIT_QP command in the DEVX, that is required when
modify the QP tx_port_affinity.
Fixes: 819f7427bafd ("RDMA/mlx5: Add init2init as a modify command")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Fix use after free when user user space request uobject concurrently for
the same object, within the RCU grace period.
In that case, remove_handle_idr_uobject() is called twice and we will have
an extra put on the uobject which cause use after free. Fix it by leaving
the uobject write locked after it was removed from the idr.
Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of
UVERBS_LOOKUP_WRITE will do the work.
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x94/0xce
panic+0x234/0x56f
__warn+0x1cc/0x1e1
report_bug+0x200/0x310
fixup_bug.part.11+0x32/0x80
do_error_trap+0xd3/0x100
do_invalid_op+0x31/0x40
invalid_op+0x1e/0x30
RIP: 0010:refcount_warn_saturate+0xfe/0x1a0
Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c
RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc
RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3
R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c
R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08
uverbs_uobject_put+0xfd/0x140
__uobj_perform_destroy+0x3d/0x60
ib_uverbs_close_xrcd+0x148/0x170
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465b49
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49
RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc
R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Make use of the sizeof_field() helper instead of an open-coded version.
Link: https://lore.kernel.org/r/20200527144152.GA22605@embeddedor
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
can_sleep is always 0 when iboib_mcast_alloc() is called, so remove it and
use GFP_ATOMIC instead of GFP_KERNEL.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Remove device specific debugfs entries immediately if LLD detaches a
particular ULD device in case of fatal PCI errors.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Potnuri Bharat Teja <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In function pvrdma_pci_probe(), pdev was not disabled in one error
path. Thus replace the jump target “err_free_device” by
"err_disable_pdev".
Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiushi Wu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Instead of i with the sge number of wr will make the comparision more
clear, that is, when the sge number in wr is small than the maximum
supported sge number in the queue, then a stop sge needed to be filled at
the end of sges in wr.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yixian Liu <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Set hns_roce_v2_cq_set_ci to inline type and remove unnecessary
next_cqe_sw_v2().
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lang Cheng <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The redundant parameters "hr_dev" need to be removed from
free_kernel_wrid() and free_srq_wrid().
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wenpeng Liang <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is no need to do a type cast on genernal pointers, they could be
assigned to any type of variables. In addition, optimize initialization of
some variables and adjust order of them.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Currently, the MTR region is configed before hns_roce_mtr_map() is
invoked, but in some scenarios, the region is configed at MTR creation,
the caller need to store this config and call hns_roce_mtr_map() later. So
optimize the usage by wrapping the MTR region config into MTR.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
configure
Split the code related to WQE buffer configure from the QPC filling
process into two functions: config_qp_sq_buf() and config_qp_rq_buf(),
this will make the code more readable.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Number of sge/eqe is always non-negative, they should be defined in type
of unsigned.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
page_shift is used to calculate the page size, it's always non-negative,
and should be in type of unsigned.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Rename the function related to QP buffer to make the code more readable.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The codes related to assert are no longer used and need to be deleted.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yangyang Li <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add unlikely() and likely() to optimize main I/O process code.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lang Cheng <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
It's easier to understand and maintain enable flags of cq using a single
field in type of u32 than defining a field for every flags in the
structure hns_roce_cq, and we can add new flags for features more
conveniently in the future.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lang Cheng <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The hardware can truncate PI/CI when posting or polling, the driver does
not need to do truncation. Therefore keep the software's PI/CI consistent
with it in the hardware.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lang Cheng <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The macros do_each_path/while_each_path lead to a smatch warning:
drivers/infiniband/ulp/rtrs/rtrs-clt.c:1196 rtrs_clt_failover_req() warn: inconsistent indenting
drivers/infiniband/ulp/rtrs/rtrs-clt.c:2890 rtrs_clt_request() warn: inconsistent indenting
Also checkpatch complains:
ERROR: Macros with multiple statements should be enclosed in a do - while loop
The macros are used only in two places: for a normal IO path and for the
failover path triggered after errors.
Get rid of the macros and just use a for loop iterating over the list of
paths in both places. It is easier to read and also less lines of code.
Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Danil Kipnis <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The rtrs_sess structure has already been extracted above from the
rtrs_srv_sess structure. Use that to avoid redundant dereferencing.
Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Md Haris Iqbal <[email protected]>
Acked-by: Danil Kipnis <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The allocated ports structure in never freed. The free function should be
called by release_cma_ports_group, but the group is never released since
we don't remove its default group.
Remove default groups when device group is deleted.
Fixes: 045959db65c6 ("IB/cma: Add configfs for rdma_cm")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
q_deferred_work isn't initialized when creating an explicit ODP memory
region. This can lead to a NULL pointer dereference when user performs
asynchronous prefetch MR. Fix it by initializing q_deferred_work for
explicit ODP.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 6074 Comm: kworker/u16:6 Not tainted 5.7.0-rc1-for-upstream-perf-2020-04-17_07-03-39-64 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
RIP: 0010:__wake_up_common+0x49/0x120
Code: 04 89 54 24 0c 89 4c 24 08 74 0a 41 f6 01 04 0f 85 8e 00 00 00 48 8b 47 08 48 83 e8 18 4c 8d 67 08 48 8d 50 18 49 39 d4 74 66 <48> 8b 70 18 31 db 4c 8d 7e e8 eb 17 49 8b 47 18 48 8d 50 e8 49 8d
RSP: 0000:ffffc9000097bd88 EFLAGS: 00010082
RAX: ffffffffffffffe8 RBX: ffff888454cd9f90 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888454cd9f90
RBP: ffffc9000097bdd0 R08: 0000000000000000 R09: ffffc9000097bdd0
R10: 0000000000000000 R11: 0000000000000001 R12: ffff888454cd9f98
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000044c19e002 CR4: 0000000000760ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
__wake_up_common_lock+0x7a/0xc0
destroy_prefetch_work+0x5a/0x60 [mlx5_ib]
mlx5_ib_prefetch_mr_work+0x64/0x80 [mlx5_ib]
process_one_work+0x15b/0x360
worker_thread+0x49/0x3d0
kthread+0xf5/0x130
? rescuer_thread+0x310/0x310
? kthread_bind+0x10/0x10
ret_from_fork+0x1f/0x30
Fixes: de5ed007a03d ("IB/mlx5: Fix implicit ODP race")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Introduce create/destroy QP commands over the ioctl interface to let it
be extended to get an asynchronous event FD.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Introduce create/destroy WQ commands over the ioctl interface to let it
be extended to get an asynchronous event FD.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Introduce create/destroy SRQ commands over the ioctl interface to let it
be extended to get an asynchronous event FD.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Extend CQ to get its own asynchronous event FD.
The event FD is an optional attribute, in case wasn't given the ufile
event FD will be used.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Refactor related objects to use their own asynchronous event FD.
The ufile event FD will be the default in case an object won't have its own
event FD.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
While creating a uobject every create reaches a point where the uobject is
fully initialized. For ioctls that go on to copy_to_user this means they
need to open code the destruction of a fully created uobject - ie the
RDMA_REMOVE_DESTROY sort of flow.
Open coding this creates bugs, eg the CQ does not properly flush the
events list when it does its error unwind.
Provide a uverbs_finalize_uobj_create() function which indicates that the
uobject is fully initialized and that abort should call to destroy_hw to
destroy the uobj->object and related.
Methods can call this function if they go on to have error cases after
setting uobj->object. Once done those error cases can simply do return,
without an error unwind.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Linux 5.7-rc6
Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
resolved by deleting dr_cq_event, matching how netdev resolved it.
Required for dependencies in the following patches.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This patch hooks the transmit side of the datagram netdev with
ipoib by setting the rdma_netdev_get_params function for the
hfi1 ib_device_ops structue. It also enables the receiving side
by adding the AIP capability into the default capabilities.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Piotr Stankiewicz <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This is the prerequisite modification to the ipoib ulp to allow a
rdma netdev to obtain the default ndo ops for init/uninit/open/close.
This is accomplished by setting the netdev ops field within the
callback function passed to the netdev allocation routine which
in turn was passed into the rdma netdev allocation routine.
This allows the rdma netdev to call back into the ulp to create the
resources required for connected mode operation.
Additionally as the ulp is not re-entrant, when switching modes,
the number of real tx queues is set to 1 for the connected mode.
For datagram mode the number of real tx queues is set to the
actual number of tx queues specified at the netdev's allocation.
For the internal ulp netdev the number of tx queues defaults to 1.
It is up to the rdma netdev to specify the actual number it can support.
When the driver does not support a rdma netdev for acceleration,
(-ENOTSUPPORTED return code or the verbs function for allocation is
NULL) the ipoib ulp functions are unaffected by using the internal
netdev allocated by the ipoib ulp.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Gary Leshner <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add a simple trace event taking context number and building simple
histogram to print packets distribution between contexts.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Grzegorz Andrejczuk <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When in connected mode ipoib sent broadcast pings which exceeded the mtu
size for broadcast addresses.
Add an mtu attribute to the rdma_netdev structure which ipoib sets to its
mcast mtu size.
The RDMA netdev uses this value to determine if the skb length is too long
for the mtu specified and if it is, drops the packet and logs an error
about the errant packet.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Gary Leshner <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
As described in earlier patches, ipoib netdev will share receive
contexts with existing VNIC netdev through a dummy netdev. The
following changes are made to achieve that:
- Set up netdev receive contexts after user contexts. A function is
added to count the available netdev receive contexts.
- Add functions to set/get receive map table free index.
- Rename NUM_VNIC_MAP_ENTRIES as NUM_NETDEV_MAP_ENTRIES.
- Let the dummy netdev own the receive contexts instead of VNIC.
- Allocate the dummy netdev when the hfi1 device is added and free it
when the device is removed.
- Initialize AIP RSM rules when the IpoIb rxq is initialized and
remove the rules when it is de-initialized.
- Convert VNIC to use the dummy netdev.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Sadanand Warrier <[email protected]>
Signed-off-by: Grzegorz Andrejczuk <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This patch adds the rx functions for the dummy netdev:
- Functions to allocate/free the dummy netdev.
- Functions to allocate/free receiving contexts for the netdev.
- Functions to initialize/de-initialize the receive queue.
- Functions to enable/disable the receive queue.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Sadanand Warrier <[email protected]>
Signed-off-by: Grzegorz Andrejczuk <[email protected]>
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|