Age | Commit message (Collapse) | Author | Files | Lines |
|
ucmd in hns_roce_create_qp_common() are not initialized. But it works fine
until new member sdb_addr is added to struct hns_roce_ib_create_qp.
If the user-mode driver uses an old version ABI, then the value of the new
member will be undefined after ib_copy_from_udata().
This patch fixes it by initialize this variable to 0. And the default value
of the new member sdb_addr will be 0 which is invalid.
Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Chengchang Tang <[email protected]>
Signed-off-by: Junxian Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The current driver will print all asynchronous events. Some of the
print levels are set improperly, e.g. SRQ limit reach and SRQ last
wqe reach, which may also occur during normal operation of the software.
Currently, the information of these event is printed as a warning,
which causes a large amount of printing even during normal use of the
application. As a result, the service performance deteriorates.
This patch fixes the printing storms by modifying the print level.
Fixes: b00a92c8f2ca ("RDMA/hns: Move all prints out of irq handle")
Signed-off-by: Chengchang Tang <[email protected]>
Signed-off-by: Junxian Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add netlink command that enables/disables privileged QKEY by default.
It is disabled by default, since according to IB spec only privileged
users are allowed to use privileged QKEY.
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."
Using rdma tool,
$rdma system set privileged-qkey on
When enabled non-privileged users would be able to use
controlled QKEYs which are considered privileged.
Using rdma tool,
$rdma system set privileged-qkey off
When disabled only privileged users would be able to use
controlled QKEYs.
You can also use the command below to check the parameter state:
$rdma system show
netns shared privileged-qkey off copy-on-fork on
Signed-off-by: Patrisious Haddad <[email protected]>
Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
In the SRQ notification handler, do not report the SRQ_ERROR
in the default event case, as there was no error.
Signed-off-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Report QP, SRQ and CQ async events and errors.
Signed-off-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Updating the HW structures for the affiliated event and error
reporting. Newly added interface structures will be used in the
followup patch.
Signed-off-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Previously when we had a RAW QP, we bound a counter to it when it moved
to INIT state, using the counter context inside RQC.
But when we try to modify that counter later in RTS state we used
modify QP which tries to change the counter inside QPC instead of RQC.
Now we correctly modify the counter set_id inside of RQC instead of QPC
for the RAW QP.
Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/2e5ab6713784a8fe997d19c508187a0dfecf2dfc.1696847964.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
In some cases, like with multiple LUN targets or where the target has to
respond to transport level requests from the receiving context it can be
better to defer cmd submission to a helper thread. If the backend driver
blocks on something like request/tag allocation it can block the entire
target submission path and other LUs and transport IO on that session.
In other cases like single LUN targets with storage that can support all
the commands that the target can queue, then it's best to submit the cmd
to the backend from the target's cmd receiving context.
Subsequent commits will allow the user to config what they prefer, but
drivers like loop can't directly submit because they can be called from a
context that can't sleep. And, drivers like vhost-scsi can support direct
submission, but need to keep their default behavior of deferring execution
to avoid possible regressions where the backend can block.
Make the drivers tell LIO core if they support direct submissions and their
current default, so we can prevent users from misconfiguring the system and
initialize devices correctly.
Signed-off-by: Mike Christie <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
This PR is collected from
https://lore.kernel.org/all/[email protected]
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).
These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.
[1] https://lore.kernel.org/linux-rdma/[email protected]/
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Handle IPsec steering upon master unbind/bind
net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
net/mlx5: Add create alias flow table function to ipsec roce
net/mlx5: Implement alias object allow and create functions
net/mlx5: Add alias flow table bits
net/mlx5: Store devcom pointer inside IPsec RoCE
net/mlx5: Register mlx5e priv to devcom in MPV mode
RDMA/mlx5: Send events from IB driver about device affiliation state
net/mlx5: Introduce ifc bits for migration in a chunk mode
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
kernel/bpf/verifier.c
829955981c55 ("bpf: Fix verifier log for async callback return values")
a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Now that napi_schedule return a bool, we can drop napi_reschedule that
does the same exact function. The function comes from a very old commit
bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct
net_device") and the purpose is actually deprecated in favour of
different logic.
Convert every user of napi_reschedule to napi_schedule.
Signed-off-by: Christian Marangi <[email protected]>
Acked-by: Jeff Johnson <[email protected]> # ath10k
Acked-by: Nick Child <[email protected]> # ibm
Acked-by: Marc Kleine-Budde <[email protected]> # for can/dev/rx-offload.c
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel from iwcm_ctl_table and ucma_ctl_table
Signed-off-by: Joel Granados <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Add support for reregister MR verb API by doing a de-register
followed by a register MR with the new attributes. Reuse resources
like iwmr handle and HW stag where possible.
Signed-off-by: Sindhu Devale <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Like any other set command, require admin permissions to do it.
Cc: [email protected]
Fixes: 2b34c5580226 ("RDMA/core: Add command to set ib_core device net namspace sharing mode")
Link: https://lore.kernel.org/r/75d329fdd7381b52cbdf87910bef16c9965abb1f.1696443438.git.leon@kernel.org
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Fix typos.
Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169643338101.8035.6826446669479247727.stgit@manet.1015granger.net
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).
These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.
Thanks
[1] https://lore.kernel.org/linux-rdma/[email protected]/
Link: https://lore.kernel.org/all/[email protected]
Signed-of-by: Leon Romanovsky <[email protected]>
* mlx5-next: (576 commits)
net/mlx5: Handle IPsec steering upon master unbind/bind
net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
net/mlx5: Add create alias flow table function to ipsec roce
net/mlx5: Implement alias object allow and create functions
net/mlx5: Add alias flow table bits
net/mlx5: Store devcom pointer inside IPsec RoCE
net/mlx5: Register mlx5e priv to devcom in MPV mode
RDMA/mlx5: Send events from IB driver about device affiliation state
net/mlx5: Introduce ifc bits for migration in a chunk mode
Linux 6.6-rc3
...
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct tid_rb_node.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Dennis Dalessandro <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mthca_icm_table.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct srp_fr_pool.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Bart Van Assche <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct siw_pbl.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Bernard Metzler <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct usnic_uiom_chunk.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Christian Benvenuti <[email protected]>
Cc: Nelson Escobar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ib_pkey_cache.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Jason Gunthorpe <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: "Håkon Bugge" <[email protected]>
Cc: Avihai Horon <[email protected]>
Cc: Anand Khoje <[email protected]>
Cc: Mark Bloch <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
During execution of mlx5_mkey_cache_cleanup(), there is a guarantee
that MR are not registered and/or destroyed. It means that we don't
need newly introduced cache disable flag.
Fixes: 374012b00457 ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup")
Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Initialize the structure to 0 so that it's fields won't have random
values. For example fields like rec.traffic_class (as well as
rec.flow_label and rec.sl) is used to generate the user AH through:
cma_iboe_join_multicast
cma_make_mc_event
ib_init_ah_from_mcmember
And a random traffic_class causes a random IP DSCP in RoCEv2.
Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Signed-off-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Compared with normal doorbell, using record doorbell can shorten the
process of ringing the doorbell and reduce the latency.
Add a flag HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB to allow FW to
enable/disable SRQ record doorbell.
If the flag above is set, allocate the dma buffer for SRQ record
doorbell and write the buffer address into SRQC during SRQ creation.
For userspace SRQ, add a flag HNS_ROCE_RSP_SRQ_CAP_RECORD_DB to notify
userspace whether the SRQ record doorbell is enabled.
Signed-off-by: Yangyang Li <[email protected]>
Signed-off-by: Junxian Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Send blocking events from IB driver whenever the device is done being
affiliated or if it is removed from an affiliation.
This is useful since now the EN driver can register to those event and
know when a device is affiliated or not.
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The IBTA specification 1.7 has new speed - XDR, supporting signaling
rate of 200Gb.
Ethtool support of IPoIB driver translates IB speed to signaling rate.
Added translation of XDR IB type to rate of 200Gb Ethernet speed.
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/ca252b79b7114af967de3d65f9a38992d4d87a14.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Adjust mlx5 function which maps the speed rate from IB spec values
to internal driver values to be able to handle speeds up to 800Gb.
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/301c803d8486b0df8aefad3fb3cc10dc58671985.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Rename 400G_8X speed to comply to naming convention.
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add a check for 800G_8X speed when querying PTYS and report it back
correctly when needed.
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/26fd0b6e1fac071c3eb779657bb3d8ba47f47c4f.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Under MAD query port, Report NDR speed when NDR is supported in the port
capability mask.
Signed-off-by: Or Har-Toov <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/d30bdec2a66a8a2edd1d84ee61453c58cf346b43.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add new IBTA speed XDR, the new rate that was added to Infiniband spec
as part of XDR and supporting signaling rate of 200Gb.
In order to report that value to rdma-core, add new u32 field to
query_port response.
Signed-off-by: Or Har-Toov <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Currently, mkeys are managed via xarray. This implementation leads to
a degradation in cases many MRs are unregistered in parallel, due to xarray
internal implementation, for example: deregistration 1M MRs via 64 threads
is taking ~15% more time[1].
Hence, implement mkeys management via LIFO queue, which solved the
degradation.
[1]
2.8us in kernel v5.19 compare to 3.2us in kernel v6.4
Signed-off-by: Shay Drory <[email protected]>
Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Fix the deadlock by refactoring the MR cache cleanup flow to flush the
workqueue without holding the rb_lock.
This adds a race between cache cleanup and creation of new entries which
we solve by denied creation of new entries after cache cleanup started.
Lockdep:
WARNING: possible circular locking dependency detected
[ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted
[ 2785.339778 ] ------------------------------------------------------
[ 2785.340848 ] devlink/53872 is trying to acquire lock:
[ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900
[ 2785.343403 ]
[ 2785.343403 ] but task is already holding lock:
[ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]
[ 2785.346273 ]
[ 2785.346273 ] which lock already depends on the new lock.
[ 2785.346273 ]
[ 2785.347720 ]
[ 2785.347720 ] the existing dependency chain (in reverse order) is:
[ 2785.349003 ]
[ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}:
[ 2785.350160 ] __mutex_lock+0x14c/0x15c0
[ 2785.350962 ] delayed_cache_work_func+0x2d1/0x610 [mlx5_ib]
[ 2785.352044 ] process_one_work+0x7c2/0x1310
[ 2785.352879 ] worker_thread+0x59d/0xec0
[ 2785.353636 ] kthread+0x28f/0x330
[ 2785.354370 ] ret_from_fork+0x1f/0x30
[ 2785.355135 ]
[ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}:
[ 2785.356515 ] __lock_acquire+0x2d8a/0x5fe0
[ 2785.357349 ] lock_acquire+0x1c1/0x540
[ 2785.358121 ] __flush_work+0xe8/0x900
[ 2785.358852 ] __cancel_work_timer+0x2c7/0x3f0
[ 2785.359711 ] mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib]
[ 2785.360781 ] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib]
[ 2785.361969 ] __mlx5_ib_remove+0x68/0x120 [mlx5_ib]
[ 2785.362960 ] mlx5r_remove+0x63/0x80 [mlx5_ib]
[ 2785.363870 ] auxiliary_bus_remove+0x52/0x70
[ 2785.364715 ] device_release_driver_internal+0x3c1/0x600
[ 2785.365695 ] bus_remove_device+0x2a5/0x560
[ 2785.366525 ] device_del+0x492/0xb80
[ 2785.367276 ] mlx5_detach_device+0x1a9/0x360 [mlx5_core]
[ 2785.368615 ] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core]
[ 2785.369934 ] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core]
[ 2785.371292 ] devlink_reload+0x439/0x590
[ 2785.372075 ] devlink_nl_cmd_reload+0xaef/0xff0
[ 2785.372973 ] genl_family_rcv_msg_doit.isra.0+0x1bd/0x290
[ 2785.374011 ] genl_rcv_msg+0x3ca/0x6c0
[ 2785.374798 ] netlink_rcv_skb+0x12c/0x360
[ 2785.375612 ] genl_rcv+0x24/0x40
[ 2785.376295 ] netlink_unicast+0x438/0x710
[ 2785.377121 ] netlink_sendmsg+0x7a1/0xca0
[ 2785.377926 ] sock_sendmsg+0xc5/0x190
[ 2785.378668 ] __sys_sendto+0x1bc/0x290
[ 2785.379440 ] __x64_sys_sendto+0xdc/0x1b0
[ 2785.380255 ] do_syscall_64+0x3d/0x90
[ 2785.381031 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 2785.381967 ]
[ 2785.381967 ] other info that might help us debug this:
[ 2785.381967 ]
[ 2785.383448 ] Possible unsafe locking scenario:
[ 2785.383448 ]
[ 2785.384544 ] CPU0 CPU1
[ 2785.385383 ] ---- ----
[ 2785.386193 ] lock(&dev->cache.rb_lock);
[ 2785.386940 ] lock((work_completion)(&(&ent->dwork)->work));
[ 2785.388327 ] lock(&dev->cache.rb_lock);
[ 2785.389425 ] lock((work_completion)(&(&ent->dwork)->work));
[ 2785.390414 ]
[ 2785.390414 ] *** DEADLOCK ***
[ 2785.390414 ]
[ 2785.391579 ] 6 locks held by devlink/53872:
[ 2785.392341 ] #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
[ 2785.393630 ] #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0
[ 2785.395324 ] #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core]
[ 2785.397322 ] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core]
[ 2785.399231 ] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
[ 2785.400864 ] #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib]
Fixes: b95845178328 ("RDMA/mlx5: Change the cache structure to an RB-tree")
Signed-off-by: Shay Drory <[email protected]>
Signed-off-by: Michael Guralnik <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
checkpath is complaining about NULL string, change it to 'Unknown'.
Fixes: 37aa5c36aa70 ("IB/mlx5: Add UARs write-combining and non-cached mapping")
Signed-off-by: Shay Drory <[email protected]>
Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The mutex was not unlocked on some of the error flows.
Moved the unlock location to include all the error flow scenarios.
Fixes: e1f4a52ac171 ("RDMA/mlx5: Create an indirect flow table for steering anchor")
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Hamdan Igbaria <[email protected]>
Link: https://lore.kernel.org/r/1244a69d783da997c0af0b827c622eb00495492e.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
After the change to use dynamic cache structure, new cache entries
can be added and the mkey allocation can no longer assume that all
mkeys created for the cache have access_flags equal to zero.
Example of a flow that exposes the issue:
A user registers MR with RO on a HCA that cannot UMR RO and the mkey is
created outside of the cache. When the user deregisters the MR, a new
cache entry is created to store mkeys with RO.
Later, the user registers 2 MRs with RO. The first MR is reused from the
new cache entry. When we try to get the second mkey from the cache we see
the entry is empty so we go to the MR cache mkey allocation flow which
would have allocated a mkey with no access flags, resulting the user getting
a MR without RO.
Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow")
Reviewed-by: Edward Srouji <[email protected]>
Signed-off-by: Michael Guralnik <[email protected]>
Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Use user_backed_iter() to see if iterator is UBUF/IOVEC rather than poking
inside the iterator.
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Dennis Dalessandro <[email protected]>
cc: Jason Gunthorpe <[email protected]>
cc: Leon Romanovsky <[email protected]>
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
In order to be sure that 'buff' is never truncated, its size should be
12, not 11.
When building with W=1, this fixes the following warnings:
drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’:
drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
268 | sprintf(buff, "%d", i);
| ^
drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
268 | sprintf(buff, "%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
286 | sprintf(buff, "%d", i);
| ^
drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
286 | sprintf(buff, "%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
`strncpy` is deprecated for use on NUL-terminated destination strings [1]
and as such we should prefer more robust and less ambiguous string
interfaces.
We know `txselect_list` is expected to be NUL-terminated based on its
use in `param_get_string()`:
| int param_get_string(char *buffer, const struct kernel_param *kp)
| {
| const struct kparam_string *kps = kp->str;
| return scnprintf(buffer, PAGE_SIZE, "%s\n", kps->string);
| }
Note that `txselect_list` is assigned to `kp_txselect`'s string field:
| static struct kparam_string kp_txselect = {
| .string = txselect_list,
| .maxlen = MAX_ATTEN_LEN
| };
Wherein it is then assigned the set and get methods:
| module_param_call(txselect, setup_txselect, param_get_string,
| &kp_txselect, S_IWUSR | S_IRUGO);
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: [email protected]
Signed-off-by: Justin Stitt <[email protected]>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-qib-qib_iba7322-c-v1-1-373727763f5b@google.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We see that `buf` is expected to be NUL-terminated based on it's use
within a trace event wherein `is_misc_err_name` and `is_various_name`
map to `is_name` through `is_table`:
| TRACE_EVENT(hfi1_interrupt,
| TP_PROTO(struct hfi1_devdata *dd, const struct is_table *is_entry,
| int src),
| TP_ARGS(dd, is_entry, src),
| TP_STRUCT__entry(DD_DEV_ENTRY(dd)
| __array(char, buf, 64)
| __field(int, src)
| ),
| TP_fast_assign(DD_DEV_ASSIGN(dd);
| is_entry->is_name(__entry->buf, 64,
| src - is_entry->start);
| __entry->src = src;
| ),
| TP_printk("[%s] source: %s [%d]", __get_str(dev), __entry->buf,
| __entry->src)
| );
Considering the above, a suitable replacement is `strscpy_pad` due to
the fact that it guarantees NUL-termination on the destination buffer
while maintaining the NUL-padding behavior that strncpy provides.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: [email protected]
Signed-off-by: Justin Stitt <[email protected]>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-hfi1-chip-c-v1-1-37afcf4964d9@google.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
`strncpy` is deprecated for use on NUL-terminated destination strings [1]
and as such we should prefer more robust and less ambiguous string
interfaces.
A suitable replacement is `strscpy_pad` due to the fact that it
guarantees NUL-termination on the destination buffer.
It is unclear to me whether `i40iw_client.name` requires NUL-padding but
have opted to keep the NUL-padding behavior that strncpy provides to
ensure no functional change.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc: [email protected]
Signed-off-by: Justin Stitt <[email protected]>
Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-irdma-i40iw_if-c-v1-1-22d87aef7186@google.com
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
rc_qp_count and ud_qp_count is not decremented during qp destroy.
Fix this.
Fixes: cb95709e0dca ("bnxt_re: Update the hw counters for resource stats")
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Flag that indicate control path command completion should be cleared
only after copying the command response data. As soon as the is_in_used
flag is clear, the waiting thread can proceed with wrong response
data. This wrong data is causing multiple issues like wrong lkey
used in data traffic and wrong AH Id etc.
Use a memory barrier to ensure that the response data
is copied and visible to the process waiting on a different
cpu core before clearing the is_in_used flag.
Clear the is_in_used after copying the command response.
Fixes: bcfee4ce3e01 ("RDMA/bnxt_re: remove redundant cmdq_bitmap")
Signed-off-by: Saravanan Vajravel <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The SRQ restrack attributes come from the context maintained by ROCEE.
Example:
$ rdma res show srq -jp -dd
[ {
"ifindex": 0,
"ifname": "hns_0",
"srqn": 0,
"type": "BASIC",
"lqpn": [ "14-15","22-23" ],
"pdn": 2,
"pid": 1224,
"comm": "ib_send_bw",{
"drv_srqn": 0,
"drv_wqe_cnt": 512,
"drv_max_gs": 2,
"drv_xrcdn": 0
}
} ]
$ rdma res show srq link hns_0 -jpr
[ {
"ifindex": 0,
"ifname": "hns_0",
"data": [ 149,0,0,0,0,0,0,0,0,0,0,0,119,101,120,99,0,
46,62,31,0,0,0,0,3,0,0,1,0,58,62,31,0,0,0,0,
30,159,15,0,0,0,64,5,0,0,0,0,0,0,0,0,0,0,0,
9,0,0,0,0,0,0,0,0 ]
} ]
Signed-off-by: wenglianfa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add support to dump SRQ resource in raw format. It enable drivers to
return the entire device specific SRQ context without setting each
field separately.
Example:
$ rdma res show srq -r
dev hns3 149000...
$ rdma res show srq -j -r
[{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}]
Signed-off-by: wenglianfa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add a dedicated callback function for SRQ resource tracker.
Signed-off-by: wenglianfa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of
custom masking and shifting, and remove extract_width() which only
wraps that FIELD_GET().
Signed-off-by: Ilpo Järvinen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
No functionality change. The variable which is not initialized fully
will introduce potential risks.
Signed-off-by: Zhu Yanjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|