aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)AuthorFilesLines
2020-03-27IB/mlx5: Expose UAR object and its alloc/destroy commandsYishai Hadas2-9/+165
Expose UAR object and its alloc/destroy commands to be used over the ioctl interface by user space applications. This API supports both BF & NC modes and enables a dynamic allocation of UARs once really needed. As the number of driver objects were limited by the core ones when the merged tree is prepared, had to decrease the number of core objects to enable the new UAR object usage. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Remove redundant judgment of qp_typeWeihang Li1-7/+0
Type of qp has been checked in check_send_valid(), so this judgment should be removed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Remove redundant assignment of wc->smac when polling cqWeihang Li1-8/+1
The field smac in ib_wc was used for create AH and then it will be treated as destination mac address in UD sqwqe, but related code about filling smac into AH has been removed in core. Actually, the dmac in UD sqwqe is parsed from the dgid in grh which is passed in by ULP now, so this assignment should be removed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Remove redundant qpc setup operationsLang Cheng1-236/+1
Before calling modify_qp_reset_to_init(), the entire qpc mask has been cleared, so it is no longer necessary to clear the specific fields in the mask. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Remove meaningless printsWenpeng Liang2-11/+3
ceq and aeq is a ring buffer, consumer index of them will be set to zero after reaching the maximum value. The warning should be removed or it may mislead the users. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wenpeng Liang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Remove definition of cq doorbell structureLang Cheng1-5/+0
The struct hns_roce_v2_cq_db is unused, it should be removed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Adjust the qp status value sequence of the hardwareLang Cheng1-1/+1
Interchange SQD and SQE to match the protocol. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Optimize hns_roce_alloc_vf_resource()Lijun Ou1-81/+62
The capbilities of hardware should be got at first and then used in hns_roce_alloc_vf_resource(). Also removes an unnecessary if ... else condition in it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Simplify attribute judgment codeLang Cheng1-2/+1
Combine attribute flags before masking them. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Fix a wrong judgment of return valueWeihang Li1-1/+1
hns_roce_alloc_mtt_range() never return -1, ret should be checked whether it is zero instead of -1. Fixes: 1ceb0b11a8a2 ("RDMA/hns: Fix non-standard error codes") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26RDMA/hns: Unify format of printsLijun Ou2-73/+89
Use ibdev_err/dbg/warn() instead of dev_err/dbg/warn(), and modify some prints into format of "failed to do something, ret = n". Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-26IB/hfi1: Use scnprintf() for avoiding potential buffer overflowTakashi Iwai1-2/+2
Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-25RDMA/mlx5: Block delay drop to unprivileged usersMaor Gottlieb1-0/+4
It has been discovered that this feature can globally block the RX port, so it should be allowed for highly privileged users only. Fixes: 03404e8ae652("IB/mlx5: Add support to dropless RQ") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24IB/mlx5: Generally use the WC auto detection test resultYishai Hadas2-13/+5
Now that we have direct and reliable detection of WC support by the system, use is broadly. The only case we have to worry about is when the WC autodetector cannot run. For this fringe case generally assume that that WC is available, except in the well defined case of no PAT support on x86 which is tested by calling arch_can_pci_mmap_wc(). If WC is wrongly assumed to be available then it causes a small performance hit on paths in userspace that are tuned to the assumption that WC is available. There is no functional loss. It is very unlikely that any platforms exist that lack WC and also care about the micro optimization of WC in the fringe case where autodetection does not work. By removing the fairly bogus CONFIG tests this makes WC work broadly on all arches and all platforms. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Michael Guralnik <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24RDMA/hns: Optimize mhop put flow for multi-hop addressingXi Wang1-100/+61
Optimizes hns_roce_table_mhop_get() by encapsulating code about clearing hem into clear_mhop_hem(), which will make the code flow clearer. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24RDMA/hns: Optimize mhop get flow for multi-hop addressingXi Wang1-115/+182
Splits hns_roce_table_mhop_get() into 4 sub-functions to make the code flow clearer. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24RDMA/bnxt_re: Wait for all the CQ events before freeing CQ data structuresSelvin Xavier2-0/+74
Destroy CQ command to firmware returns the num_cnq_events as a response. This indicates the driver about the number of CQ events generated for this CQ. Driver should wait for all these events before freeing the CQ host structures. Also, add routine to clean all the pending notification for the CQs getting destroyed. This avoids the possibility of accessing the CQ data structures after its freed. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24RDMA/mlx5: Fix access to wrong pointer while performing flush due to errorLeon Romanovsky3-2/+27
The main difference between send and receive SW completions is related to separate treatment of WQ queue. For receive completions, the initial index to be flushed is stored in "tail", while for send completions, it is in deleted "last_poll". CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G OE --------- -t - 4.18.0-147.el8.ppc64le #1 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] NIP: c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000 REGS: c0000036cc9db940 TRAP: 0400 Tainted: G OE --------- -t - (4.18.0-147.el8.ppc64le) MSR: 9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004488 XER: 20040000 CFAR: c00800000e586af0 IRQMASK: 0 GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800 GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011 GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438 GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010 GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000 GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800 NIP [c000003c7c00a000] 0xc000003c7c00a000 LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core] Call Trace: [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable) [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core] [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0 [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760 [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0 [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80 Fixes: 8e3b68830186 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-24IB/mlx5: Fix a NULL vs IS_ERR() checkDan Carpenter1-2/+2
The kzalloc() function returns NULL, not error pointers. Fixes: 30f2fe40c72b ("IB/mlx5: Introduce UAPIs to manage packet pacing") Link: https://lore.kernel.org/r/20200320132641.GF95012@mwanda Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-23IB/hfi1: Ensure pq is not left on waitlistMike Marciniszyn1-3/+22
The following warning can occur when a pq is left on the dmawait list and the pq is then freed: WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0 list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230). Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 Call Trace: [<ffffffff91f65ac0>] dump_stack+0x19/0x1b [<ffffffff91898b78>] __warn+0xd8/0x100 [<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff91a1dabe>] ? ___slab_alloc+0x24e/0x4f0 [<ffffffff91b97025>] __list_add+0x65/0xc0 [<ffffffffc03926a5>] defer_packet_queue+0x145/0x1a0 [hfi1] [<ffffffffc0372987>] sdma_check_progress+0x67/0xa0 [hfi1] [<ffffffffc03779d2>] sdma_send_txlist+0x432/0x550 [hfi1] [<ffffffff91a20009>] ? kmem_cache_alloc+0x179/0x1f0 [<ffffffffc0392973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1] [<ffffffffc0393e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1] [<ffffffff918ab65e>] ? try_to_del_timer_sync+0x5e/0x90 [<ffffffff91a3fe1a>] ? __check_object_size+0x1ca/0x250 [<ffffffffc0395546>] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1] [<ffffffffc034e0da>] hfi1_aio_write+0xca/0x120 [hfi1] [<ffffffff91a4245b>] do_sync_readv_writev+0x7b/0xd0 [<ffffffff91a4409e>] do_readv_writev+0xce/0x260 [<ffffffff918df69f>] ? pick_next_task_fair+0x5f/0x1b0 [<ffffffff918db535>] ? sched_clock_cpu+0x85/0xc0 [<ffffffff91f6b16a>] ? __schedule+0x13a/0x860 [<ffffffff91a442c5>] vfs_writev+0x35/0x60 [<ffffffff91a4447f>] SyS_writev+0x7f/0x110 [<ffffffff91f78ddb>] system_call_fastpath+0x22/0x27 The issue happens when wait_event_interruptible_timeout() returns a value <= 0. In that case, the pq is left on the list. The code continues sending packets and potentially can complete the current request with the pq still on the dmawait list provided no descriptor shortage is seen. If the pq is torn down in that state, the sdma interrupt handler could find the now freed pq on the list with list corruption or memory corruption resulting. Fix by adding a flush routine to ensure that the pq is never on a list after processing a request. A follow-up patch series will address issues with seqlock surfaced in: https://lore.kernel.org/r/[email protected] The seqlock use for sdma will then be converted to a spin lock since the list_empty() doesn't need the protection afforded by the sequence lock currently in use. Fixes: a0d406934a46 ("staging/rdma/hfi1: Add page lock limit check for SDMA requests") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/efa: Use in-kernel offsetofend() to check field availabilityLeon Romanovsky1-5/+2
Remove custom and duplicated variant of offsetofend(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Gal Pressman <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18IB/hfi1: Remove kobj from hfi1_devdataKaike Wan3-27/+5
The field kobj was added to hfi1_devdata structure to manage the life time of the hfi1_devdata structure for PSM accesses: commit e11ffbd57520 ("IB/hfi1: Do not free hfi1 cdev parent structure early") Later another mechanism user_refcount/user_comp was introduced to provide the same functionality: commit acd7c8fe1493 ("IB/hfi1: Fix an Oops on pci device force remove") This patch will remove this kobj field, as it is no longer needed. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Kaike Wan <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Check if depth of qp is 0 before configureLang Cheng1-44/+33
Depth of qp shouldn't be allowed to be set to zero, after ensuring that, subsequent process can be simplified. And when qp is changed from reset to reset, the capability of minimum qp depth was used to identify hardware of hip06, it should be changed into a more readable form. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lang Cheng <[email protected]> Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18i40iw: Report correct firmware versionSindhu, Devale8-9/+167
The driver uses a hard-coded value for FW version and reports an inconsistent FW version between ibv_devinfo and /sys/class/infiniband/i40iw/fw_ver. Retrieve the FW version via a Control QP (CQP) operation and report it consistently across sysfs and query device. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Link: https://lore.kernel.org/r/[email protected] Reported-by: Jarod Wilson <[email protected]> Signed-off-by: Sindhu, Devale <[email protected]> Signed-off-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Optimize wqe buffer set flow for post sendXi Wang1-248/+224
Splits hns_roce_v2_post_send() into three sub-functions: set_rc_wqe(), set_ud_wqe() and update_sq_db() to simplify the code. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Optimize base address table config flow for qp bufferXi Wang3-41/+21
Currently, before the qp is created, a page size needs to be calculated for the base address table to store all base addresses in the mtr. As a result, the parameter configuration of the mtr is complex. So integrate the process of calculating the base table page size into the hem related interface to simplify the process of using mtr. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Optimize the wr opcode conversion from ib to hnsXi Wang1-27/+36
Simplify the wr opcode conversion from ib to hns by using a map table instead of the switch-case statement. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Optimize wqe buffer filling process for post sendXi Wang1-31/+32
Encapsulates the wqe buffer process details for datagram seg, fast mr seg and atomic seg. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-18RDMA/hns: Rename wqe buffer related functionsXi Wang4-15/+16
There are serval global functions related to wqe buffer in the hns driver and are called in different files. These symbols cannot directly represent the namespace they belong to. So add prefix 'hns_roce_' to 3 wqe buffer related global functions: get_recv_wqe(), get_send_wqe(), and get_send_extend_sge(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xi Wang <[email protected]> Signed-off-by: Weihang Li <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-17RDMA/bnxt_re: Remove unnecessary sched countSelvin Xavier2-9/+0
Since the lifetime of bnxt_re_task is controlled by the kref of device, sched_count is no longer required. Remove it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-17RDMA/bnxt_re: Fix lifetimes in bnxt_re_taskJason Gunthorpe1-0/+2
A work queue cannot just rely on the ib_device not being freed, it must hold a kref on the memory so that the BNXT_RE_FLAG_IBDEV_REGISTERED check works. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-17RDMA/bnxt_re: Use ib_device_try_get()Jason Gunthorpe2-14/+14
There are a couple places in this driver running from a work queue that need the ib_device to be registered. Instead of using a broken internal bit rely on the new core code to guarantee device registration. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/hns: Fix wrong judgments of udata->outlenWeihang Li1-4/+4
These judgments were used to keep the compatibility with older versions of userspace that don't have the field named "cap_flags" in structure hns_roce_ib_create_cq_resp. But it will be wrong to compare outlen with the size of resp if another new field were added in resp. oulen should be compared with the end offset of cap_flags in resp. Fixes: 4f8f0d5e33dd ("RDMA/hns: Package the flow of creating cq") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13Merge branch 'mlx5_mr_cache' into rdma.git for-nextJason Gunthorpe4-283/+414
Leon Romanovsky says: ==================== This series fixes various corner cases in the mlx5_ib MR cache implementation, see specific commit messages for more information. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Due to dependencies * branch 'mlx5_mr-cache': RDMA/mlx5: Allow MRs to be created in the cache synchronously RDMA/mlx5: Revise how the hysteresis scheme works for cache filling RDMA/mlx5: Fix locking in MR cache work queue RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work RDMA/mlx5: Fix MR cache size and limit debugfs RDMA/mlx5: Always remove MRs from the cache before destroying them RDMA/mlx5: Simplify how the MR cache bucket is located RDMA/mlx5: Rename the tracking variables for the MR cache RDMA/mlx5: Replace spinlock protected write with atomic var {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib {IB,net}/mlx5: Assign mkey variant in mlx5_ib only {IB,net}/mlx5: Setup mkey variant before mr create command invocation
2020-03-13RDMA/mlx5: Allow MRs to be created in the cache synchronouslyJason Gunthorpe2-61/+87
If the cache is completely out of MRs, and we are running in cache mode, then directly, and synchronously, create an MR that is compatible with the cache bucket using a sleeping mailbox command. This ensures that the thread that is waiting for the MR absolutely will get one. When a MR allocated in this way becomes freed then it is compatible with the cache bucket and will be recycled back into it. Deletes the very buggy ent->compl scheme to create a synchronous MR allocation. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Revise how the hysteresis scheme works for cache fillingJason Gunthorpe2-15/+27
Currently if the work queue is running then it is in 'hysteresis' mode and will fill until the cache reaches the high water mark. This implicit state is very tricky and doesn't interact with pending very well. Instead of self re-scheduling the work queue after the add_keys() has started to create the new MR, have the queue scheduled from reg_mr_callback() only after the requested MR has been added. This avoids the bad design of an in-rush of queue'd work doing back to back add_keys() until EAGAIN then sleeping. The add_keys() will be paced one at a time as they complete, slowly filling up the cache. Also, fix pending to be only manipulated under lock. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Fix locking in MR cache work queueJason Gunthorpe2-46/+80
All of the members of mlx5_cache_ent must be accessed while holding the spinlock, add the missing spinlock in the __cache_work_func(). Using cache->stopped and flush_workqueue() is an inherently racy way to shutdown self-scheduling work on a queue. Replace it with ent->disabled under lock, and always check disabled before queuing any new work. Use cancel_work_sync() to shutdown the queue. Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as coherency is less important here. Split fill_delay from the bitfield. C bitfield updates are not atomic and this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use test_bit()/set_bit(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_workJason Gunthorpe1-15/+25
Accesses to these members needs to be locked. There is no reason not to hold a spinlock while calling queue_work(), so move the tests into a helper and always call it under lock. The helper should be called when available_mrs is adjusted. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Fix MR cache size and limit debugfsJason Gunthorpe1-64/+88
The size_write function is supposed to adjust the total_mr's to match the user's request, but lacks locking and safety checking. total_mrs can only be adjusted by at most available_mrs. mrs already assigned to users cannot be revoked. Ensure that the user provides a target value within the range of available_mrs and within the high/low water mark. limit_write has confusing and wrong sanity checking, and doesn't have the ability to deallocate on limit reduction. Since both functions use the same algorithm to adjust the available_mrs, consolidate it into one function and write it correctly. Fix the locking and by holding the spinlock for all accesses to ent->X. Always fail if the user provides a malformed string. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Always remove MRs from the cache before destroying themJason Gunthorpe1-6/+13
The cache bucket tracks the total number of MRs that exists, both inside and outside of the cache. Removing a MR from the cache (by setting cache_ent to NULL) without updating total_mrs will cause the tracking to leak and be inflated. Further fix the rereg_mr path to always destroy the MR. reg_create will always overwrite all the MR data in mlx5_ib_mr, so the MR must be completely destroyed, in all cases, before this function can be called. Detach the MR from the cache and unconditionally destroy it to avoid leaking HW mkeys. Fixes: afd1417404fb ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure") Fixes: 56e11d628c5d ("IB/mlx5: Added support for re-registration of MRs") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Simplify how the MR cache bucket is locatedJason Gunthorpe3-98/+71
There are many bad APIs here that are accepting a cache bucket index instead of a bucket pointer. Many of the callers already have a bucket pointer, so this results in a lot of confusing uses of order2idx(). Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and alloc_cached_mr(). Once the MR is in the cache, store the cache bucket pointer directly in the MR, replacing the 'bool allocated_from cache'. In the end there is only one place that needs to form index from order, alloc_mr_from_cache(). Increase the safety of this function by disallowing it from accessing cache entries in the ODP special area. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Rename the tracking variables for the MR cacheJason Gunthorpe2-31/+42
The old names do not clearly indicate the intent. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx5: Replace spinlock protected write with atomic varSaeed Mahameed3-10/+3
mkey variant calculation was spinlock protected to make it atomic, replace that with one atomic variable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ibMichael Guralnik1-3/+3
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the logic inside mlx5_ib and cleanup the code in mlx5_core. Signed-off-by: Michael Guralnik <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-03-13{IB,net}/mlx5: Assign mkey variant in mlx5_ib onlySaeed Mahameed3-10/+54
mkey variant is not required for mlx5_core use, move the mkey variant counter to mlx5_ib. Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-03-13{IB,net}/mlx5: Setup mkey variant before mr create command invocationSaeed Mahameed1-5/+2
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is wrong and will lead to using a different key variant than the one submitted to firmware on create mkey command invocation. To fix this, we store the mkey variant before invoking the firmware command and use it later on completion (reg_mr_callback). Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Eli Cohen <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-03-13RDMA/mlx5: Use offsetofend() instead of duplicated variantLeon Romanovsky2-31/+27
Convert mlx5 driver to use offsetofend() instead of its duplicated variant. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/mlx4: Delete duplicated offsetofend implementationLeon Romanovsky1-6/+3
Convert mlx4 to use in-kernel offsetofend() instead of its duplicated implementation. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller4-11/+13
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge branch 'ct-offload' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux