Age | Commit message (Collapse) | Author | Files | Lines |
|
Some variables have been initialized when used. As a result, here removes
some unncessary initial assignment.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
drivers/infiniband/hw/bnxt_re/main.c:1012:25:
warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable]
Fixes: f86b31c6a28f ("RDMA/bnxt_re: Static NQ depth allocation")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Currently it triggers a WARN_ON and then goes ahead and destroys the
uobject anyhow, leaking any driver memory.
The only place that leaks driver memory should be during FD close() in
uverbs_destroy_ufile_hw().
Drivers are only allowed to fail destroy uobjects if they guarantee
destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the
loop to give the driver that chance.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Romain Perier <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Romain Perier <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Romain Perier <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Romain Perier <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Romain Perier <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The SRQ can be destroyed right before mlx5_cmd_get_srq is called.
In such case the latter will return NULL instead of expected SRQ.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In ucma_process_join(), if the call to xa_alloc() fails, the function will
return without freeing mc. Fix this by jumping to the correct line.
In the process I renamed the jump labels to something more memorable for
extra clarity.
Link: https://lore.kernel.org/r/[email protected]
Addresses-Coverity-ID: 1496814 ("Resource leak")
Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray")
Signed-off-by: Alex Dewar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When the returned speed from __ethtool_get_link_ksettings() is
SPEED_UNKNOWN this will lead to reporting a wrong speed and width for
providers that uses ib_get_eth_speed(), fix that by defaulting the
netdev_speed to SPEED_1000 in case the returned value from
__ethtool_get_link_ksettings() is SPEED_UNKNOWN.
Fixes: d41861942fc5 ("IB/core: Add generic function to extract IB speed from netdev")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
It's not safe to access the next CQ in list_for_each_entry() after
invoking ib_free_cq(), because the CQ has already been freed in current
iteration. It should be replaced by list_for_each_entry_safe().
Fixes: c7ff819aefea ("RDMA/core: Introduce shared CQ pool API")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
As qedr driver supports both RoCE and iWarp, make sure to set the
max_pkeys only when running in RoCE mode.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Acked-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This function has a lot of gotos which could be replaced by simple
returns, making the function tidier and less bug prone.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Dewar <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Commit 36a8f01cd24b ("IB/qib: Add congestion control agent
implementation") erroneously marked a couple of switch cases as /*
FALLTHROUGH */, which were later converted to fallthrough statements by
commit df561f6688fe ("treewide: Use fallthrough pseudo-keyword"). This
triggered a Coverity warning about unreachable code.
Remove the fallthrough statements.
Link: https://lore.kernel.org/r/[email protected]
Addresses-Coverity: ("Unreachable code")
Fixes: 36a8f01cd24b ("IB/qib: Add congestion control agent implementation")
Signed-off-by: Alex Dewar <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Required due to dependencies in following patches.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Change rxe pools to use kzalloc instead of kmem_cache to allocate memory
for rxe objects. The pools are not really necessary and they trigger
hardened user copy warnings as the ioctl framework copies the QP number
directly to userspace.
Also the general project to move object alloation to the core code will
eventually clean these out anyhow.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add SPDX headers to all rxe .c and .h files.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are
required to always eventually destroy their ubojects, so trigger a WARN_ON
to detect this driver bug.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The UDP source port number in RoCE v2 is used to create entropy for
network routers (ECMP), load balancers and 802.3ad link aggregation
switching that are not aware of RoCE IB headers. Considering that the IB
core has achieved a new interface to get a hashed value of it, the fixed
value of it in QPC and UD WQE in hns driver could be fixed and the port
number is to be set dynamically now.
For QPC of RC, the value could be hashed from flow_lable if the user pass
it in or from remote qpn and local qpn. For WQE of UD, it is set according
to fl or as a random value.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Fixed several minor checkpatch warnings in existing rxe source.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
It should be considered an illegal operation if the ULP attempts to modify
a QP from another state to the current hardware state. Otherwise, the ULP
can modify some fields of QPC at any time. For example, for a QP in state
of RTS, modify it from RTR to RTS can change the PSN, which is always not
as expected.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lang Cheng <[email protected]>
Signed-off-by: Weihang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Driver crashes when destroy_qp is re-tried because of an error
returned. This is because the qp entry was removed from the qp list during
the first call.
Remove qp from the list only if destroy_qp returns success.
The driver will still trigger a WARN_ON due to the memory leaking, but at
least it isn't corrupting memory too.
Fixes: 8dae419f9ec7 ("RDMA/bnxt_re: Refactor queue pair creation code")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When computing the first psn entry, driver checks for page alignment. If
this address is not page aligned,it attempts to compute the offset in that
page for later use by using ALIGN macro. ALIGN macro does not return
offset bytes but the requested aligned address and hence cannot be used
directly to store as offset. Since driver was using the address itself
instead of offset, it resulted in invalid address when filling the psn
buffer.
Fixed driver to use PAGE_MASK macro to calculate the offset.
Fixes: fddcbbb02af4 ("RDMA/bnxt_re: Simplify obtaining queue entry from hw ring")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Naresh Kumar PBS <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Some adapters report more than 256 gid entries. Restrict it to 256 for
now.
Fixes: 1ac5a4047975("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Naresh Kumar PBS <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
At first, driver allocates memory for NQ based on qplib_ctx->cq_count and
qplib_ctx->srqc_count. Later when creating ring, it uses a static value
of 128K -1.
Fixing this with a static value for now.
Fixes: b08fe048a69d ("RDMA/bnxt_re: Refactor net ring allocation function")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Naresh Kumar PBS <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
qp->id can be a value outside the max number of qp. Indexing the qp table
with the id can cause out of bounds crash. So changing the qp table
indexing by (qp->id % max_qp -1).
Allocating one extra entry for QP1. Some adapters create one more than the
max_qp requested to accommodate QP1. If the qp->id is 1, store the
inforamtion in the last entry of the qp table.
Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
QP1 Rx CQE reports transparent VLAN ID in the completion and this is used
while reporting the completion for received MAD packet. Check if the vlan
id is configured before reporting it in the work completion.
Fixes: 84511455ac5b ("RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
If the pkey_table is not available (which is the case when RoCE is not
supported), the cited commit caused a regression where mlx4_devices
without RoCE are not created.
Fix this by returning a pkey table length of zero in procedure
eth_link_query_port() if the pkey-table length reported by the device is
zero.
Link: https://lore.kernel.org/r/[email protected]
Cc: <[email protected]>
Fixes: 1901b91f9982 ("IB/core: Fix potential NULL pointer dereference in pkey cache")
Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE")
Signed-off-by: Mark Bloch <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
To avoid the following kernel panic when calling kmem_cache_create() with
a NULL pointer from pool_cache(), Block the rxe_param_set_add() from
running if the rdma_rxe module is not initialized.
BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
Call Trace:
rxe_alloc+0xc8/0x160 [rdma_rxe]
rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
__ib_alloc_pd+0xcb/0x160 [ib_core]
ib_mad_init_device+0x296/0x8b0 [ib_core]
add_client_context+0x11a/0x160 [ib_core]
enable_device_and_get+0xdc/0x1d0 [ib_core]
ib_register_device+0x572/0x6b0 [ib_core]
? crypto_create_tfm+0x32/0xe0
? crypto_create_tfm+0x7a/0xe0
? crypto_alloc_tfm+0x58/0xf0
rxe_register_device+0x19d/0x1c0 [rdma_rxe]
rxe_net_add+0x3d/0x70 [rdma_rxe]
? dev_get_by_name_rcu+0x73/0x90
rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
parse_args+0x179/0x370
? ref_module+0x1b0/0x1b0
load_module+0x135e/0x17e0
? ref_module+0x1b0/0x1b0
? __do_sys_init_module+0x13b/0x180
__do_sys_init_module+0x13b/0x180
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7f9137ed296e
This can be triggered if a user tries to use the 'module option' which is
not actually a real module option but some idiotic (and thankfully no
obsolete) sysfs interface.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Now that the query_pkey() isn't mandatory by the RDMA core, this callback
can be removed from the usnic provider. The libfabric userspace never
touches the pkey.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When page_address() fails, umem should be freed just like when
rxe_mem_alloc() fails.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dinghao Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Use cancel_work_sync() to ensure that the wq is not running and simply
assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the
close_wq since flush_workqueue() is no longer needed.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When a new connection is established the RDMA CM creates a new cm_id and
passes it through to the event handler. However inside the UCMA the new ID
is not assigned a ucma_context until the user retrieves the event from a
syscall.
This creates a weird edge condition where a cm_id's context can continue
to point at the listening_id that created it, and a number of additional
edge conditions on event list clean up related to destroying half created
IDs.
There is also a race condition in ucma_get_events() where the
cm_id->context is being assigned without holding the handler_mutex.
Simplify all of this by creating the ucma_context inside the event handler
itself and eliminating the edge case of a half created cm_id. All cm_id's
can be uniformly destroyed via __destroy_id() or via the close_work.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Since the backlog is now an atomic the file->mut is now only protecting
the event_list and ctx_list. Narrow its scope to make it clear
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is no reason to grab the file->mut just to do this inc/dec work. Use
an atomic.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
All entry points to the rdma_cm from a ULP must be single threaded,
even this error unwinds. Add the missing locking.
Fixes: 7c11910783a1 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This value is locked under the file->mut, ensure it is held whenever
touching it.
The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is
already not possible for the write side to run, the movement is just for
clarity.
Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
ctx->file is changed under the file->mut lock by ucma_migrate_id(), which
is impossible to lock correctly. Instead change ctx->file under the
handler_lock and ctx_table lock and revise all places touching ctx->file
to use this locking when reading ctx->file.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The only reader of destroying is inside a handler under the handler_mutex,
so directly use the handler_mutex when setting it instead of the larger
file->mut.
As the refcount could be zero here, and the cm_id already freed, and
additional refcount grab around the locking is required to touch the
cm_id.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In almost all cases rdma_accept() is called under the handler_mutex by
ULPs from their handler callbacks. The one exception was ucma which did
not get the handler_mutex.
To improve the understand-ability of the locking scheme obtain the mutex
for ucma as well.
This improves how ucma works by allowing it to directly use handler_mutex
for some of its internal locking against the handler callbacks intead of
the global file->mut lock.
There does not seem to be a serious bug here, other than a DISCONNECT event
can be delivered concurrently with accept succeeding.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
It is not really necessary to keep a linked list of mcs associated with
each context when we can just scan the xarray to find the right things.
The removes another overloading of file->mut by relying on the xarray
locking for mc instead.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The store to ctx->cm_id was based on the idea that _ucma_find_context()
would not return the ctx until it was fully setup.
Without locking this doesn't work properly.
Split things so that the xarray is allocated with NULL to reserve the ID
and once everything is final set the cm_id and store.
Along the way this shows that the error unwind in ucma_get_event() if a
new ctx is created is wrong, fix it up.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate
this duplicated code into a function.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
During the file_operations release function it is already not possible
that write() can be running concurrently, remove the extra locking
around the ctx_list.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a
wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming
that the refcount can only go to zero from ucma_destroy_id() which also
removes it from the xarray.
Use refcount_inc_not_zero() instead.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When DCT QPs work in RoCE LAG mode:
1. DCT creation is allowed only when it is supported
2. The "port" of a DCT QP is assigned in a round-robin way
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
DCI QP supports tx_affinity as well.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Surface the operation of MAD exchanges during connection
establishment. Some samples:
[root@klimt ~]# trace-cmd report -F ib_cma
cpus=4
kworker/0:4-123 [000] 60.677388: icm_send_rep: local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT
kworker/u8:11-391 [002] 60.678808: icm_send_req: local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT
kworker/0:4-123 [000] 60.679652: icm_send_rtu: local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT
nfsd-1954 [001] 60.691350: icm_send_rep: local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT
nfsd-1954 [003] 62.017931: icm_send_drep: local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT
Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In the interest of converging on a common instrumentation infrastructure,
modernize the pr_debug() call sites added by commit 119bf81793ea ("IB/cm:
Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma"
subsystem.
The conversion is somewhat mechanical. Someone more familiar with the
semantics of the recorded information might suggest additional data
capture.
Some benefits include:
- Tracepoints enable "always on" reporting of these errors
- The error records are structured and compact
- Tracepoints provide hooks for eBPF scripts
Sample output:
nfsd-1954 [003] 62.017901: icm_dreq_skipped: local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT
Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|