aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-06RDMA/ocrdma: Simplify process_mad functionLeon Romanovsky1-8/+4
Change the switch with one case into a simple if statement so the code is less confusing. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/mad: Do not check MAD sizes in roce and ib driversLeon Romanovsky5-20/+0
All callers for process_mad allocate MAD structures with proper sizes, there is no need to recheck it. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/ocrdma: Make ocrdma_pma_counters() return voidLeon Romanovsky3-9/+4
This function always returns 0, so just use void and remove the bogus checking at the only call site. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/mad: Allocate zeroed MAD bufferLeon Romanovsky5-14/+2
Ensure that MAD output buffer is zero-based allocated in all the callers of process_mad and remove the various memset()'s from the drivers. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qib: Delete empty check_cc_key functionLeon Romanovsky1-11/+0
Function always returns zero, just delete it. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qib: Delete extra lineLeon Romanovsky1-1/+0
Trivial cleanup to fix the following warning: drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line: Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/mad: Delete never implemented functionsLeon Romanovsky2-59/+0
Delete never implemented and used MAD functions. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"Bart Van Assche1-5/+1
Although the mentioned patch fixes a use-after-free bug, it introduces a hang during shutdown. Since the latter is worse, revert this patch. Link: https://lore.kernel.org/r/[email protected] Reported-by: Honggang Li <[email protected]> Fixes: 9b64f7d0bb0a ("RDMA/srpt: Postpone HCA removal until after configfs directory removal") Signed-off-by: Bart Van Assche <[email protected]> Acked-by: Honggang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qedr: Remove unsupported modify_port callbackKamal Heib3-9/+0
There is no need to return always zero for function which is not supported. Fixes: ac1b36e55a51 ("qedr: Add support for user context verbs") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/ocrdma: Remove unsupported modify_port callbackKamal Heib3-9/+0
There is no need to return always zero for function which is not supported. Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/hns: Remove unsupported modify_port callbackKamal Heib1-7/+0
There is no need to return always zero for function which is not supported. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/core: Fix return code when modify_port isn't supportedKamal Heib1-1/+5
Improve return code from ib_modify_port() by doing the following: - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that didn't implement the modify_port callback, otherwise return "-EOPNOTSUPP" Fixes: 61e0962d5221 ("IB: Avoid ib_modify_port() failure for RoCE devices") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qedr: Add iWARP doorbell recovery supportMichal Kalderon2-6/+43
This patch adds the iWARP specific doorbells to the doorbell recovery mechanism. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qedr: Add doorbell overflow recovery supportMichal Kalderon3-50/+300
Use the doorbell recovery mechanism to register rdma related doorbells that will be restored in case there is a doorbell overflow attention. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/qedr: Use the common mmap APIMichal Kalderon4-121/+98
Remove all functions related to mmap from qedr and use the common API. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/siw: Use the common mmap_xa helpersMichal Kalderon4-102/+114
Remove the functions related to managing the mmap_xa database. This code is now common in ib_core. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Bernard Metzler <[email protected]> Reviewed-by: Bernard Metzler <[email protected]> Tested-by: Bernard Metzler <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/efa: Use the common mmap_xa helpersMichal Kalderon3-194/+153
Remove the functions related to managing the mmap_xa database. This code was replaced with common code in ib_core. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon8-34/+70
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-06RDMA/core: Create mmap database and cookie helper functionsMichal Kalderon5-0/+275
Create some common API's for adding entries to a xa_mmap. Searching for an entry and freeing one. The general approach is copied from the EFA driver and improved to be more general and do more to help the drivers. Integration with the core allows a reference counted scheme with a free function so that the driver can know when its mmaps are all gone. This significant new functionality will be helpful for drivers to have the correct lifetime model for mmap objects. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-05net/mlx5: fix spelling mistake "metdata" -> "metadata"Colin Ian King1-1/+1
There is a spelling mistake in a esw_warn warning message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-05net/mlx5: fix kvfree of uninitialized pointer specColin Ian King1-1/+1
Currently when a call to esw_vport_create_legacy_ingress_acl_group fails the error exit path to label 'out' will cause a kvfree on the uninitialized pointer spec. Fix this by ensuring pointer spec is initialized to NULL to avoid this issue. Addresses-Coverity: ("Uninitialized pointer read") Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-05RDMA/core: Move core content from ib_uverbs to ib_coreMichal Kalderon4-72/+86
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-01IB/mlx5: Introduce and use mlx5_core_is_vf()Parav Pandit2-1/+6
Instead of deciding a given device is virtual function or not based on a device is PF or not, use already defined MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf(). This enables to clearly identify PF, VF and non virtual functions. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Enable metadata on own vportParav Pandit3-23/+16
Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe (manager vport). Metadata when supported, must be enabled on own vport which is used to pass metadata to vport of NIC Rx Flow Table. Due to this error, traffic tagged by ingress ACL is not processed correctly at NIC rx flow table level which is supposed to work on metadata tag. Hence, instead of working on eswitch manager vport, always working on eswitch own vport regardless of PF or ECPF. Given that mlx5_eswitch_query/modify_esw_vport_context() is used to access other vport in legacy mode and own vport settings in switchdev mode, extend low level API to explicitly specify other_vport. Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager") Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Refactor ingress acl configurationParav Pandit3-114/+200
Drop, untagged, spoof check and untagged spoof check flow groups are limited to legacy mode only. Therefore, following refactoring is done to (a) improve code readability (b) have better code split between legacy and offloads mode 1. Move legacy flow groups under legacy structure 2. Add validity check for group deletion 3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode 4. Rename esw_vport_enable_ingress_acl() to esw_vport_create_ingress_acl_table() and limit its scope to table creation 5. Introduce legacy flow groups creation helper esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode 6. Reduce offloads ingress groups from 4 to just 1 metadata group per vport 7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free. 8. Shortern error message to remove redundant 'E-switch' Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Restrict metadata disablement to offloads modeParav Pandit3-7/+6
Now that there is clear separation for acl setup/cleanup between legacy and offloads mode, limit metdata disablement to offloads mode. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Offloads shift ACL programming during enable/disable vportVu Pham3-31/+24
Currently legacy mode enables ACL while enabling vport, while offloads mode enable ACL when moving to offloads mode. Bring consistency to both modes by enabling/disabling ACL when enabling/disabling a vport. It also eliminates creating ingress ACL table on unused ECPF vport in offloads mode. Signed-off-by: Vu Pham <[email protected]> Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Offloads introduce and use per vport acl tables APIsParav Pandit1-17/+32
Introduce and use per vport ACL tables creation and destroy APIs, so that subsequently patch can use them during enabling/disabling a vport. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Move ACL drop counters life cycle close to ACL lifecycleParav Pandit1-39/+35
It is better to create/destroy ACL related drop counters where the actual drop rule ACLs are created/destroyed, so that ACL configuration is self contained for ingress and egress. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Legacy introduce and use per vport acl tables APIsParav Pandit1-13/+60
Introduce and use per vport ACL tables creation and destroy APIs, so that subsequently patch can use them during enabling/disabling a vport in unified way for legacy vs offloads mode. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Prepare code to handle vport enable errorParav Pandit3-19/+50
In subsequent patch, esw_enable_vport() could fail and return error. Prepare code to handle such error. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Tide up state_lock and vport enabled flag usageParav Pandit1-9/+5
When eswitch is disabled, vport event handler is unregistered. This unregistration already synchronizes with running EQ event handler in below code flow. mlx5_eswitch_disable() mlx5_eswitch_event_handlers_unregister() mlx5_eq_notifier_unregister() atomic_notifier_chain_unregister() synchronize_rcu() notifier_callchain eswitch_vport_event() queue_work() Additionally vport->enabled flag is set under state_lock during esw_enable_vport() but is not read under state_lock in (a) esw_disable_vport() and (b) under atomic context eswitch_vport_event(). It is also necessary to synchronize with already scheduled vport event. This is already achieved using below sequence. mlx5_eswitch_event_handlers_unregister() [..] flush_workqueue() Hence, (a) Remove vport->enabled check in eswitch_vport_event() which doesn't make any sense. (b) Remove redundant flush_workqueue() on every vport disable. (c) Keep esw_disable_vport() symmetric with esw_enable_vport() for state_lock. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Move legacy drop counter and rule under legacy structureParav Pandit2-44/+50
To improve code readability, move legacy drop counters and droup rule under legacy structure. While at it, (a) prefix drop flow counters helper with legacy_. (b) nullify the rule pointers only if they were valid. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Move metdata fields under offloads structureParav Pandit2-18/+21
Metadata fields are offload mode specific. To improve code readability, move metadata under offloads structure. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Correct comment for legacy fieldsParav Pandit1-1/+1
fdb_table is used for both legacy and offloads mode. It was incorrect to comment that fdb_table is legacy specific. Hence, fix the comment to reflect that fdb_table is used in legacy and offloads mode. Fixes: 131ce7014043 ("net/mlx5: E-Switch, Remove redundant mc_promisc NULL check") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Introduce and use mlx5_esw_is_manager_vport()Parav Pandit2-6/+13
Currently esw_enable_vport() does vport check for zero to enable drop counters regardless of execution on ECPF/PF. While esw_disable_vport() considers such scenario. To keep consistency across code for checking for manager_vport, introduce and use mlx5_esw_is_manager_vport() to check if a specified vport is eswitch manager vport or not. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-switch, Introduce and use vlan rule config helperParav Pandit3-68/+58
Between legacy mode and switchdev mode, only two fields are changed, vlan_tag and flow action. Hence to avoid duplicte code between two modes, introduce and and use helper function to configure allowed VLAN rule. While at it, get rid of duplicate debug message. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-Switch, Rename ingress acl config in offloads modeVu Pham1-3/+3
Changing the function name esw_ingress_acl_common_config() to esw_ingress_acl_config() to be consistent with egress config function naming in offloads mode. Signed-off-by: Vu Pham <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: E-Switch, Rename egress config to generic nameVu Pham1-24/+26
Refactor vport egress config in offloads mode Refactoring vport egress configuration in offloads mode that includes egress prio tag configuration. This makes code symmetric to ingress configuration. Signed-off-by: Vu Pham <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-01net/mlx5: Fixed a typo in a comment in esw_del_uc_addr()Qing Huang1-1/+1
Changed "managerss" to "managers". Fixes: a1b3839ac4a4 ("net/mlx5: E-Switch, Properly refer to the esw manager vport") Signed-off-by: Qing Huang <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-10-31IB/mlx5: Test write combining supportMichael Guralnik4-3/+223
Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael Guralnik <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-31RDMA/mlx5: Return proper error valueLeon Romanovsky1-1/+1
Returned value from mlx5_mr_cache_alloc() is checked to be error or real pointer. Return proper error code instead of NULL which is not checked later. Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-29RDMA/hns: Fix build error againArnd Bergmann2-5/+20
This is not the first attempt to fix building random configurations, unfortunately the attempt in commit a07fc0bb483e ("RDMA/hns: Fix build error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and CONFIG_INFINIBAND_HNS_HIP08=y: drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module' Revert commits a07fc0bb483e ("RDMA/hns: Fix build error") and a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") to get back to the previous state, then fix the issues described there differently, by adding more specific dependencies: INFINIBAND_HNS can now only be built-in if at least one of HNS or HNS3 are built-in, and the individual back-ends are only available if that code is reachable from the main driver. Fixes: a07fc0bb483e ("RDMA/hns: Fix build error") Fixes: a3e2d4c7e766 ("RDMA/hns: remove obsolete Kconfig comment") Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE") Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe11-663/+624
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/odp: Remove broken debugging call to invalidate_rangeJason Gunthorpe1-19/+19
invalidate_range() also obtains the umem_mutex which is being held at this point, so if this path were was ever called it would deadlock. Thus conclude the debugging never triggers and rework it into a simple WARN_ON and leave things as they are. While here add a note to explain how we could possibly get inconsistent page pointers. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroyJason Gunthorpe3-59/+88
For creation, as soon as the umem_odp is created the notifier can be called, however the underlying MR may not have been setup yet. This would cause problems if mlx5_ib_invalidate_range() runs. There is some confusing/ulocked/racy code that might by trying to solve this, but without locks it isn't going to work right. Instead trivially solve the problem by short-circuiting the invalidation if there are not yet any DMA mapped pages. By definition there is nothing to invalidate in this case. The create code will have the umem fully setup before anything is DMA mapped, and npages is fully locked by the umem_mutex. For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap the pages before destroying the MR. This drives npages to zero and prevents similar racing with invalidate while the MR is undergoing destruction. Arguably it would be better if the umem was created after the MR and destroyed before, but that would require a big rework of the MR code. Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/mlx5: Do not store implicit children in the odp_mkeys xarrayJason Gunthorpe1-30/+6
These mkeys are entirely internal and are never used by the HW for page fault. They should also never be used by userspace for prefetch. Simplify & optimize things by not including them in the xarray. Since the prefetch path can now never see a child mkey there is no need for the second synchronize_srcu() during imr destroy. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/mlx5: Rework implicit ODP destroyJason Gunthorpe5-66/+120
Use SRCU in a sensible way by removing all MRs in the implicit tree from the two xarrays (the update operation), then a synchronize, followed by a normal single threaded teardown. This is only a little unusual from the normal pattern as there can still be some work pending in the unbound wq that may also require a workqueue flush. This is tracked with a single atomic, consolidating the redundant existing atomics and wait queue. For understand-ability the entire ODP implicit create/destroy flow now largely exists in a single pair of functions within odp.c, with a few support functions for tearing down an unused child. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/mlx5: Avoid double lookups on the pagefault pathJason Gunthorpe1-106/+80
Now that the locking is simplified combine pagefault_implicit_mr() with implicit_mr_get_data() so that we sweep over the idx range only once, and do the single xlt update at the end, after the child umems are setup. This avoids double iteration/xa_loads plus the sketchy failure path if the xa_load() fails. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-28RDMA/mlx5: Reduce locking in implicit_mr_get_data()Jason Gunthorpe1-12/+26
Now that the child MRs are stored in an xarray we can rely on the SRCU lock to protect the xa_load and use xa_cmpxchg on the slow allocation path to resolve races with concurrent page fault. This reduces the scope of the critical section of umem_mutex for implicit MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if the child MR is already in the xarray. This makes it consistent with the normal ODP MR critical section for umem_lock, and the locking approach used for destroying an unusued implicit child MR. The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr() since it is no longer called with any locks. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>