aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rdmavt
AgeCommit message (Collapse)AuthorFilesLines
2017-04-28IB/rdmavt/hfi1/qib: Use the MGID and MLID for multicast addressingMichael J. Ruhl1-17/+44
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). The current code only uses the MGID for identifying multicast groups. Update the driver to be compliant with this definition. Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Dasaratharaman Chandramouli <[email protected]> Signed-off-by: Michael J. Ruhl <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-28IB/rdmavt: restore IRQs on error path in rvt_create_ah()Dan Carpenter1-1/+1
We need to call spin_unlock_irqrestore() instead of vanilla spin_unlock() on this error path. Fixes: 119a8e708d16 ("IB/rdmavt: Add AH to rdmavt") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-25IB: Replace ib_umem page_size by page_shiftArtemy Kovalyov1-4/+4
Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <[email protected]> CC: Steve Wise <[email protected]> CC: Lijun Ou <[email protected]> CC: Shiraz Saleem <[email protected]> CC: Adit Ranadive <[email protected]> CC: Dennis Dalessandro <[email protected]> CC: Ram Amrani <[email protected]> Signed-off-by: Artemy Kovalyov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Ram Amrani <[email protected]> Acked-by: Shiraz Saleem <[email protected]> Acked-by: Selvin Xavier <[email protected]> Acked-by: Selvin Xavier <[email protected]> Acked-by: Adit Ranadive <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-20Merge branch 'k.o/for-4.12' into k.o/for-4.12-rdma-netdeviceDoug Ledford7-41/+317
2017-04-05IB/hfi1: Eliminate synchronize_rcu() in mr deleteMike Marciniszyn1-16/+33
The synchronize_rcu() call can be eliminated to improve memory deregistration performance. There are two key fields involved: - The rcu pointer itself - the lkey_published field To close the window between the rcu read of the mregion pointer and the reference count the code should: 1. To lkey/rkey validation (reader) Read the rcu pointer. If the pointer is non-NULL, get a reference. To the current validation tests use a READ_ONCE() on the lkey_published. Upon any failure release the reference. 2. To the remove logic (delete) Insure the published is zeroed prior to setting the pointer to NULL. This requires using rcu_assign_pointer() to insure lkey_published is written prior to the NULL. 3. To the insert logic (add) Insure the published is set use an rcu_assign_pointer() to insure the pointer is after all MR fields. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-05IB/rdmavt: Avoid reseting wqe send_flags in unreserveMike Marciniszyn1-2/+5
The wqe should be read only and in fact the superfluous reset of the RVT_SEND_RESERVE_USED flag causes an issue where reserved operations elicit a bad completion to the ULP. The maintenance of the flag is now entirely within rvt_post_one_wr() where a reserved operation will set the flag and a non-reserved operation will insure the operation that is about to be posted has the flag reset. Fixes: Commit 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations") Reviewed-by: Don Hiatt <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-05IB/rdmavt, IB/hfi1: Fix timer migration regressionsSebastian Sanchez3-2/+116
RC timeout counter isn't getting incremented. Increment counter and add the trace for it. Fixes: 87c23b4ab018 ("IB/rdmavt: Adding timer logic to rdmavt") Reviewed-by: Brian Welty <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Sebastian Sanchez <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-05IB/rdmavt: Add tracing for cq entry and pollMike Marciniszyn3-0/+131
The following fields are defined for filtering and triggering: - wr_id - status - opcode - qpn - length - idx Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-05IB/rdmavt: Add additional fields to post send traceMike Marciniszyn2-4/+32
This fix is to get additional debugging information. The following fields are added: - wqe - qpt - num_sge - ssn - pid - send_flags These additional fields provide for more focused filtering and triggering. The patch also moves the trace to just before the wqe is posted to get the most accurate information and future proofs the code to trace all possible reserved opcodes. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-04-05IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependentMike Marciniszyn1-17/+0
The work to create a completion helper moved the translation of send wqe operations to completion opcodes to rdmvat. This precludes having driver dependent operations. Make the translation driver dependent by doing the translation in the driver prior to the rvt_qp_swqe_complete() call using restored translation tables. Fixes: Commit f2dc9cdce83c ("IB/rdmavt: Add a send completion helper") Fixes: Commit 0771da5a6e9d ("IB/hfi1,IB/qib: Use new send completion helper") Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-03-24infiniband: Fix alignment of mmap cookies to support VIPT cachingJason Gunthorpe1-2/+2
When vmalloc_user is used to create memory that is supposed to be mmap'd to user space, it is necessary for the mmap cookie (eg the offset) to be aligned to SHMLBA. This creates a situation where all virtual mappings of the same physical page share the same virtual cache index and guarantees VIPT coherence. Otherwise the cache is non-coherent and the kernel will not see writes by userspace when reading the shared page (or vice-versa). Reported-by: Josh Beavers <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-27scripts/spelling.txt: add "therfore" pattern and fix typo instancesMasahiro Yamada1-3/+3
Fix typos and add the following to the scripts/spelling.txt: therfore||therefore Besides, tidy up comment blocks for 80-col wrapping. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-25Merge tag 'for-next-dma_ops' of ↵Linus Torvalds7-259/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...
2017-02-23Merge tag 'for-linus' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
2017-02-19IB/hfi1, qib, rdmavt: Move AETH defines to rdma/ib_hdrs.hDon Hiatt2-11/+11
Rename RVT AETH defines and export in rdma/ib_hdrs.h Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Don Hiatt <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/hfi1: Add rvt_rnr_tbl_to_usec functionDon Hiatt1-0/+11
Return usec from an index into ib_rvt_rnr_table. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Don Hiatt <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/hfi1, rdmavt: Update copy_sge to use boolean argumentsBrian Welty1-1/+1
Convert copy_sge and related SGE state functions to use boolean. For determining if QP is in user mode, add helper function in rdmavt_qp.h. This is used to determine if QP needs the last byte ordering. While here, change rvt_pd.user to a boolean. Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Dean Luick <[email protected]> Signed-off-by: Brian Welty <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/rdmavt: Adding timer logic to rdmavtVenkata Sandeep Dhanalakota1-2/+181
To move common code across target to rdmavt for code reuse. Reviewed-by: Mike Marciniszyn <[email protected]> Reviewed-by: Brian Welty <[email protected]> Signed-off-by: Venkata Sandeep Dhanalakota <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavtBrian Welty2-2/+192
Add rvt_compute_aeth() and rvt_get_credit() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Brian Welty <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavtBrian Welty1-0/+38
Add rvt_rc_error() and rvt_comm_est() as shared functions in rdmavt, moved from hfi1/qib logic. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Brian Welty <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-19IB/rdmavt: Use per-CPU reference count for MRsSebastian Sanchez1-21/+38
Having per-CPU reference count for each MR prevents cache-line bouncing across the system. Thus, it prevents bottlenecks. Use per-CPU reference counts per MR. The per-CPU reference count for FMRs is used in atomic mode to allow accurate testing of the busy state. Other MR types run in per-CPU mode MR until they're freed. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Sebastian Sanchez <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-02-14IB: Query ports via the core instead of direct into the driverOr Gerlitz1-3/+4
Change the drivers to call ib_query_port in their get port immutable handler instead of their own query port handler. Doing this required to set the core cap flags of this device before the ib_query_port call is made, since the IB core might need these caps to serve the port query. Drivers are ensured by the IB core that the port attributes passed to the port query verb implementation are zero, and hence we removed the zeroing from the drivers. This patch doesn't add any new functionality. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Matan Barak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Steve Wise <[email protected]> Acked-by: Adit Ranadive <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-01-24IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating itBart Van Assche7-259/+8
Make the rxe and rdmavt drivers use dma_virt_ops. Update the comments that refer to the source files removed by this patch. Remove struct ib_dma_mapping_ops. Remove ib_device.dma_ops. Signed-off-by: Bart Van Assche <[email protected]> Cc: Andrew Boyer <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Jonathan Toppins <[email protected]> Cc: Alex Estrin <[email protected]> Cc: Leon Romanovsky <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-15rdma: fix buggy code that the compiler warns aboutLinus Torvalds1-1/+2
Get rid of this warning: drivers/infiniband/sw/rdmavt/cq.c: In function ‘rvt_cq_exit’: drivers/infiniband/sw/rdmavt/cq.c:542:2: warning: ‘worker’ may be used uninitialized in this function [-Wmaybe-uninitialized] kthread_destroy_worker(worker); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ by fixing the function to actually work. Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: Petr Mladek <[email protected]> Cc: Doug Ledford <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-15Merge tag 'for-linus' of ↵Linus Torvalds9-187/+486
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14IB/rdmavt: Only put mmap_info ref if it existsJim Foraker1-1/+2
rvt_create_qp() creates qp->ip only when a qp creation request comes from userspace (udata is not NULL). If we exceed the number of available queue pairs however, the error path always attempts to put a kref to this structure. If the requestor is inside the kernel, this leads to a crash. We fix this by checking that qp->ip is not NULL before caling kref_put(). Signed-off-by: Jim Foraker <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Acked-by: Jonathan Toppins <[email protected]> Acked-by: Alex Estrin <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-14IB/rdmavt: Handle the kthread worker using the new APIPetr Mladek1-23/+11
Use the new API to create and destroy the cq kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. In addition, the *on_cpu() variant binds the kthread to the given cpu and the related memory node. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior. Note that we must use the on_cpu() variant because the function starts the kthread and it must bind it to the right CPU before waking. The numa node is associated for given CPU as well. Signed-off-by: Petr Mladek <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-14IB/rdmavt: Avoid queuing work into a destroyed cq kthread workerPetr Mladek1-14/+16
The memory barrier is not enough to protect queuing works into a destroyed cq kthread. Just imagine the following situation: CPU1 CPU2 rvt_cq_enter() worker = cq->rdi->worker; rvt_cq_exit() rdi->worker = NULL; smp_wmb(); kthread_flush_worker(worker); kthread_stop(worker->task); kfree(worker); // nothing queued yet => // nothing flushed and // happily stopped and freed if (likely(worker)) { // true => read before CPU2 acted cq->notify = RVT_CQ_NONE; cq->triggered++; kthread_queue_work(worker, &cq->comptask); BANG: worker has been flushed/stopped/freed in the meantime. This patch solves this by protecting the critical sections by rdi->n_cqs_lock. It seems that this lock is not much contended and looks reasonable for this purpose. One catch is that rvt_cq_enter() might be called from IRQ context. Therefore we must always take the lock with IRQs disabled to avoid a possible deadlock. Signed-off-by: Petr Mladek <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-11IB/rdmavt: Add a send completion helperMike Marciniszyn1-0/+17
This is for use by client drivers to drive send completions into a CQ. A new exported table allows for the mapping of ib_wr_opcode into a ib_wc_opcode. Reviewed-by: Ashutosh Dixit <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-11IB/hfi1: Use reference count wrapper for MRsSebastian Sanchez1-4/+4
Some parts of the code don't use the standard driver wrapper for memory region reference counters. Use the standard driver wrapper throughout the code. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Sebastian Sanchez <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-11IB/hfi1: Replace qp->refcount release code with standard driver wrapperSebastian Sanchez1-3/+2
Some parts of the code don't use the standard release wrapper rvt_put_qp() for decrementing and testing the refcount to then try to use a resource. Replace this code with the standard driver wrapper. Fixes: Commit 4d6f85c3fa55 ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines") Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Sebastian Sanchez <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-11IB/rdmavt: Add trace of MR segsMike Marciniszyn3-0/+117
Add tracing of MR segment information. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-12-11IB/rdmavt: Fix trace hierarchyDennis Dalessandro4-137/+312
Split rdmavt traces into separate files to preserve the original hierarchy since only one trace sub system may now be defined per header file. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-11-15IB/hfi1: Optimize lkey validation structuresMike Marciniszyn1-5/+5
Profiling shows that the key validation is susceptible to cache line trading when accessing the lkey table. Fix by separating out the read mostly fields from the write fields. In addition the shift amount, which is function of the lkey table size, is precomputed and stored with the table pointer. Since both the shift and table pointer are in the same read mostly cacheline, this saves a cache line in this hot path. Reviewed-by: Sebastian Sanchez <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-11-15IB/rdmavt: rdmavt can handle non aligned page mapsDennis Dalessandro1-3/+0
The initial code for rdmavt carried with it a restriction that was a vestige from the qib driver, that to dma map a page it had to be less than a page size. This is not the case on modern hardware, both qib and hfi1 will be just fine with unaligned map requests. This fixes a 4.8 regression where by an IPoIB transfer of > PAGE_SIZE will hang because the dma map page call always fails. This was introduced after commit 5faba5469522 ("IB/ipoib: Report SG feature regardless of HW UD CSUM capability") added the capability to use SG by default. Rather than override this, the HW supports it, so allow SG. Cc: Stable <[email protected]> # 4.8 Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-11kthread: kthread worker API cleanupPetr Mladek1-5/+5
A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [[email protected]: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-06IB/{rxe,core,rdmavt}: Fix kernel crash for reg MRParav Pandit1-0/+17
This patch fixes below kernel crash on memory registration for rxe and other transport drivers which has dma_ops extension. IB/core invokes ib_map_sg_attrs() in generic manner with dma attributes which is used by mlx5 and mthca adapters. However in doing so it ignored honoring dma_ops extension of software based transports for sg map/unmap operation. This results in calling dma_map_sg_attrs of hardware virtual device resulting in crash for null reference. We extend the core to support sg_map/unmap_attrs and transport drivers to implement those dma_ops callback functions. Verified usign perftest applications. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81032a75>] check_addr+0x35/0x60 ... Call Trace: [<ffffffff81032b39>] ? nommu_map_sg+0x99/0xd0 [<ffffffffa02b31c6>] ib_umem_get+0x3d6/0x470 [ib_core] [<ffffffffa01cc329>] rxe_mem_init_user+0x49/0x270 [rdma_rxe] [<ffffffffa01c793a>] ? rxe_add_index+0xca/0x100 [rdma_rxe] [<ffffffffa01c995f>] rxe_reg_user_mr+0x9f/0x130 [rdma_rxe] [<ffffffffa00419fe>] ib_uverbs_reg_mr+0x14e/0x2c0 [ib_uverbs] [<ffffffffa003d3ab>] ib_uverbs_write+0x15b/0x3b0 [ib_uverbs] [<ffffffff811e92a6>] ? mem_cgroup_commit_charge+0x76/0xe0 [<ffffffff811af0a9>] ? page_add_new_anon_rmap+0x89/0xc0 [<ffffffff8117e6c9>] ? lru_cache_add_active_or_unevictable+0x39/0xc0 [<ffffffff811f0da8>] __vfs_write+0x28/0x120 [<ffffffff811f1239>] ? rw_verify_area+0x49/0xb0 [<ffffffff811f1492>] vfs_write+0xb2/0x1b0 [<ffffffff811f27d6>] SyS_write+0x46/0xa0 [<ffffffff814f7d32>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-03IB/rdmavt: Trivial function comment corrected.Parav Pandit1-1/+1
Corrected function name in comment from qib_ to rvt_. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-02IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debugMike Marciniszyn1-0/+8
This patch adds lockdep asserts in key code paths for insuring lock correctness. Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-02IB/rdmavt: Add qp init functionMike Marciniszyn1-42/+58
Add an rvt_qp_init() to initialize specific common fields as the qp is created or reset. The routine is shared by the rvt_reset_qp() and the rvt_create_qp(). The intent is that lock dep assertions will only appear in the rvt_reset_qp(). Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-02IB/rdmavt: Move reset calldown to reset pathMike Marciniszyn1-6/+5
The reset calldown is misplaced. It should only be called in the code that actually transitions the QP to reset. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-10-02IB/rdmavt: Correct sparse annotationMike Marciniszyn1-6/+3
The __must_hold() is sufficent to correct the sparse context imbalance inside a function. Per Documentation/sparse.txt: __must_hold - The specified lock is held on function entry and exit. Fixes: Commit c0a67f6ba356 ("IB/rdmavt: Annotate rvt_reset_qp()") Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-09-16IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routinesMike Marciniszyn1-3/+2
This improves readability and hides the reference count mechanism from the client drivers. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-09-16IB/rdmavt: Don't vfree a kzalloc'ed memory regionColin Ian King1-1/+1
The userspace memory region 'mr' is allocated with kzalloc in __rvt_alloc_mr however it is incorrectly being freed with vfree in __rvt_free_mr. Fix this by using kfree to free it. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-08-22IB/rdmvat: Fix double vfree() in rvt_create_qp() error pathMike Marciniszyn1-1/+2
The unwind logic for creating a user QP has a double vfree of the non-shared receive queue when handling a "too many qps" failure. The code unwinds the mmmap info by decrementing a reference count which will call rvt_release_mmap_info() which in turn does the vfree() of the r_rq.wq. The unwind code then does the same free. Fix by guarding the vfree() with the same test that is done in close and only do the vfree() if qp->ip is NULL. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds5-68/+315
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-03IB/rdmavt: Disable by defaultBart Van Assche1-1/+0
There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration") Signed-off-by: Bart Van Assche <[email protected]> Cc: Jubin John <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Mike Marciniszyn <[email protected]> Cc: <[email protected]> # v4.6+ Signed-off-by: Doug Ledford <[email protected]>
2016-08-02IB/rdmavt: Eliminate redundant opcode test in mr ref clearIra Weiny1-2/+1
The use of the specific opcode test is redundant since all ack entry users correctly manipulate the mr pointer to selectively trigger the reference clearing. The overly specific test hinders the use of implementation specific operations. The change needs to get rid of the union to insure that an atomic value is not seen as an MR pointer. Reviewed-by: Ashutosh Dixit <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-08-02IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabledJianxin Xiong1-16/+32
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on the server contains messages like these: [ 931.992501] svcrdma: Error -22 posting RDMA_READ [ 952.076879] svcrdma: Error -22 posting RDMA_READ [ 982.154127] svcrdma: Error -22 posting RDMA_READ [ 1012.235884] svcrdma: Error -22 posting RDMA_READ [ 1042.319194] svcrdma: Error -22 posting RDMA_READ Here is why: With the base memory management extension enabled, FRMR is used instead of FMR. The xprtrdma server issues each RDMA read request as the following bundle: (1)IB_WR_REG_MR, signaled; (2)IB_WR_RDMA_READ, signaled; (3)IB_WR_LOCAL_INV, signaled & fencing. These requests are signaled. In order to generate completion, the fast register work request is processed by the hfi1 send engine after being posted to the work queue, and the corresponding lkey is not valid until the request is processed. However, the rdmavt driver validates lkey when the RDMA read request is posted and thus it fails immediately with error -EINVAL (-22). This patch changes the work flow of local operations (fast register and local invalidate) so that fast register work requests are always processed immediately to ensure that the corresponding lkey is valid when subsequent work requests are posted. Local invalidate requests are processed immediately if fencing is not required and no previous local invalidate request is pending. To allow completion generation for signaled local operations that have been processed before posting to the work queue, an internal send flag RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag and only generates completion for such requests. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Jianxin Xiong <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>