Age | Commit message (Collapse) | Author | Files | Lines |
|
Change the switch with one case into a simple if statement so the code is
less confusing.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
All callers for process_mad allocate MAD structures with proper sizes,
there is no need to recheck it.
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This function always returns 0, so just use void and remove the bogus
checking at the only call site.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Ensure that MAD output buffer is zero-based allocated in all the callers
of process_mad and remove the various memset()'s from the drivers.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Function always returns zero, just delete it.
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Trivial cleanup to fix the following warning:
drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line:
Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Delete never implemented and used MAD functions.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Although the mentioned patch fixes a use-after-free bug, it introduces a
hang during shutdown. Since the latter is worse, revert this patch.
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Honggang Li <[email protected]>
Fixes: 9b64f7d0bb0a ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
Signed-off-by: Bart Van Assche <[email protected]>
Acked-by: Honggang Li <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is no need to return always zero for function which is not
supported.
Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is no need to return always zero for function which is not
supported.
Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is no need to return always zero for function which is not
supported.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Improve return code from ib_modify_port() by doing the following:
- Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code
- Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that
didn't implement the modify_port callback, otherwise return
"-EOPNOTSUPP"
Fixes: 61e0962d5221 ("IB: Avoid ib_modify_port() failure for RoCE devices")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kamal Heib <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This patch adds the iWARP specific doorbells to the doorbell recovery
mechanism.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Remove all functions related to mmap from qedr and use the common API.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Remove the functions related to managing the mmap_xa database. This code
is now common in ib_core.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Bernard Metzler <[email protected]>
Reviewed-by: Bernard Metzler <[email protected]>
Tested-by: Bernard Metzler <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Remove the functions related to managing the mmap_xa database. This code
was replaced with common code in ib_core.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.
However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped. The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.
Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before. Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.
The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.
This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
There is a spelling mistake in a esw_warn warning message. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently when a call to esw_vport_create_legacy_ingress_acl_group
fails the error exit path to label 'out' will cause a kvfree on the
uninitialized pointer spec. Fix this by ensuring pointer spec is
initialized to NULL to avoid this issue.
Addresses-Coverity: ("Uninitialized pointer read")
Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().
This enables to clearly identify PF, VF and non virtual functions.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe
(manager vport).
Metadata when supported, must be enabled on own vport which is
used to pass metadata to vport of NIC Rx Flow Table.
Due to this error, traffic tagged by ingress ACL is not processed
correctly at NIC rx flow table level which is supposed to work
on metadata tag.
Hence, instead of working on eswitch manager vport, always working on
eswitch own vport regardless of PF or ECPF.
Given that mlx5_eswitch_query/modify_esw_vport_context() is used to
access other vport in legacy mode and own vport settings in switchdev mode,
extend low level API to explicitly specify other_vport.
Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Drop, untagged, spoof check and untagged spoof check flow groups are
limited to legacy mode only.
Therefore, following refactoring is done to
(a) improve code readability
(b) have better code split between legacy and offloads mode
1. Move legacy flow groups under legacy structure
2. Add validity check for group deletion
3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode
4. Rename esw_vport_enable_ingress_acl() to
esw_vport_create_ingress_acl_table() and limit its scope to
table creation
5. Introduce legacy flow groups creation helper
esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode
6. Reduce offloads ingress groups from 4 to just 1 metadata group
per vport
7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free.
8. Shortern error message to remove redundant 'E-switch'
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Now that there is clear separation for acl setup/cleanup between legacy
and offloads mode, limit metdata disablement to offloads mode.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently legacy mode enables ACL while enabling vport, while offloads
mode enable ACL when moving to offloads mode.
Bring consistency to both modes by enabling/disabling ACL when
enabling/disabling a vport.
It also eliminates creating ingress ACL table on unused ECPF vport in
offloads mode.
Signed-off-by: Vu Pham <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport.
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
It is better to create/destroy ACL related drop counters where the actual
drop rule ACLs are created/destroyed, so that ACL configuration is self
contained for ingress and egress.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport in
unified way for legacy vs offloads mode.
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In subsequent patch, esw_enable_vport() could fail and return error.
Prepare code to handle such error.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When eswitch is disabled, vport event handler is unregistered.
This unregistration already synchronizes with running EQ event handler
in below code flow.
mlx5_eswitch_disable()
mlx5_eswitch_event_handlers_unregister()
mlx5_eq_notifier_unregister()
atomic_notifier_chain_unregister()
synchronize_rcu()
notifier_callchain
eswitch_vport_event()
queue_work()
Additionally vport->enabled flag is set under state_lock during
esw_enable_vport() but is not read under state_lock in
(a) esw_disable_vport() and (b) under atomic context
eswitch_vport_event().
It is also necessary to synchronize with already scheduled vport event.
This is already achieved using below sequence.
mlx5_eswitch_event_handlers_unregister()
[..]
flush_workqueue()
Hence,
(a) Remove vport->enabled check in eswitch_vport_event() which
doesn't make any sense.
(b) Remove redundant flush_workqueue() on every vport disable.
(c) Keep esw_disable_vport() symmetric with esw_enable_vport() for
state_lock.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
To improve code readability, move legacy drop counters and droup rule
under legacy structure.
While at it,
(a) prefix drop flow counters helper with legacy_.
(b) nullify the rule pointers only if they were valid.
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Metadata fields are offload mode specific.
To improve code readability, move metadata under offloads structure.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
fdb_table is used for both legacy and offloads mode.
It was incorrect to comment that fdb_table is legacy specific.
Hence, fix the comment to reflect that fdb_table is used in legacy and
offloads mode.
Fixes: 131ce7014043 ("net/mlx5: E-Switch, Remove redundant mc_promisc NULL check")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently esw_enable_vport() does vport check for zero to enable drop
counters regardless of execution on ECPF/PF.
While esw_disable_vport() considers such scenario.
To keep consistency across code for checking for manager_vport,
introduce and use mlx5_esw_is_manager_vport() to check if a specified
vport is eswitch manager vport or not.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Between legacy mode and switchdev mode, only two fields are changed,
vlan_tag and flow action.
Hence to avoid duplicte code between two modes, introduce and and use
helper function to configure allowed VLAN rule.
While at it, get rid of duplicate debug message.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Changing the function name esw_ingress_acl_common_config() to
esw_ingress_acl_config() to be consistent with egress config
function naming in offloads mode.
Signed-off-by: Vu Pham <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Refactor vport egress config in offloads mode
Refactoring vport egress configuration in offloads mode that
includes egress prio tag configuration.
This makes code symmetric to ingress configuration.
Signed-off-by: Vu Pham <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Changed "managerss" to "managers".
Fixes: a1b3839ac4a4 ("net/mlx5: E-Switch, Properly refer to the esw manager vport")
Signed-off-by: Qing Huang <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.
Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine. The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.
The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.
Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael Guralnik <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.
Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:
drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'
Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and
a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.
Fixes: a07fc0bb483e ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Jason Gunthorpe says:
====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.
This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.
The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.
This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================
Merge conflicts with the odp statistics patch resolved.
* branch 'odp_rework':
RDMA/odp: Remove broken debugging call to invalidate_range
RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
RDMA/mlx5: Rework implicit ODP destroy
RDMA/mlx5: Avoid double lookups on the pagefault path
RDMA/mlx5: Reduce locking in implicit_mr_get_data()
RDMA/mlx5: Use an xarray for the children of an implicit ODP
RDMA/mlx5: Split implicit handling from pagefault_mr
RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
RDMA/mlx5: Rework implicit_mr_get_data
RDMA/mlx5: Delete struct mlx5_priv->mkey_table
RDMA/mlx5: Use a dedicated mkey xarray for ODP
RDMA/mlx5: Split sig_err MR data into its own xarray
RDMA/mlx5: Use SRCU properly in ODP prefetch
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
invalidate_range() also obtains the umem_mutex which is being held at this
point, so if this path were was ever called it would deadlock. Thus
conclude the debugging never triggers and rework it into a simple WARN_ON
and leave things as they are.
While here add a note to explain how we could possibly get inconsistent
page pointers.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.
Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.
The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.
For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.
Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.
Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.
Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.
This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.
For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.
This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.
This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.
The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|