| Age | Commit message (Collapse) | Author | Files | Lines |
|
Disable all of drivers/infiniband/hw/ and rdmavt for UML builds until
someone needs it and provides patches to support it.
This prevents build errors in hw/qib/qib_wc_x86_64.c.
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA).
Co-developed-by: Ajay Sharma <[email protected]>
Signed-off-by: Ajay Sharma <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Signed-off-by: Long Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add erdma to the kernel build environment, and sort the source
order in drivers/infiniband/Kconfig.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Cheng Xu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add Kconfig and Makefile to build irdma driver.
Remove i40iw driver and add an alias in irdma.
Remove legacy exported symbols i40e_register_client
and i40e_unregister_client from i40e as they are no
longer used.
irdma is the replacement driver that supports X722.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.
Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the buffer
allocation. When the buffer is registered as a memory region, the file
descriptor is passed to the RDMA driver along with other parameters.
Implement the common code for importing dma-buf object and mapping dma-buf
pages.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jianxin Xiong <[email protected]>
Reviewed-by: Sean Hefty <[email protected]>
Acked-by: Michael J. Ruhl <[email protected]>
Acked-by: Christian Koenig <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
dma_virt_ops requires that all pages have a kernel virtual address.
Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
and make all three drivers depend on the new symbol.
Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been obsolete
since commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config
symbol in lib/Kconfig")
Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:
This includes:
- Dropping the need to allocate and free memory to hold its output
- No need any more to use put_page() to unpin the pages
- The logic to detect contiguous pages is done based on the returned
order, no need to run per page and evaluate.
In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.
As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Enable CQ ioctl commands by default, this functionality is fully mature
to be used over ioctl, no reason to maintain any more the EXP KCONFIG
entry to enable it.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.
This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.
There are a variety of indentation styles found.
a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'
In order to convert all of them to 1 tab + 'help', I ran the
following commend:
$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Add rtrs Makefile, Kconfig and also corresponding lines into upper layer
infiniband/ulp files.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Danil Kipnis <[email protected]>
Signed-off-by: Jack Wang <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Remove iw_cxgb3 module from kernel as the corresponding HW Chelsio T3 has
reached EOL.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Potnuri Bharat Teja <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
ODP is working with userspace VA's in the interval tree which always fit
into an unsigned long, so we can use the common code.
This comes at a cost of a 16 byte increase in ib_umem_odp struct size due
to storing the interval tree start/last in addition to the umem
addr/length. However these values were computed and are performance
critical for the interval lookup, so this seems like a worthwhile trade
off.
Removes 2k of .text from the kernel.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Added parameter in ib_device for enabling dynamic interrupt moderation so
that it can be configured in userspace using rdma tool.
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]
Please set on/off.
rdma dev show
0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]
Signed-off-by: Yamin Friedman <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Broken up commit to add the Soft iWarp RDMA driver.
Signed-off-by: Bernard Metzler <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
For dependencies in next patches.
Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This driver was first merged over 10 years ago and has not seen major
activity by the authors in the last 7 years. However, in that time it has
been patched 150 times to adapt it to changing kernel APIs.
Further, the hardware has several issues, like not supporting 64 bit DMA,
that make it rather uninteresting for use with modern systems and RDMA.
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Shiraz Saleem <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This has been marked CONFIG_BROKEN for over a year now with no complaints.
Delete the whole thing for good.
The module provided the /dev/infiniband/ucmX interface.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Pull rdma updates from Jason Gunthorpe:
"This has been a smaller cycle than normal. One new driver was
accepted, which is unusual, and at least one more driver remains in
review on the list.
Summary:
- Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
vmw_pvrdma
- Many patches from MatthewW converting radix tree and IDR users to
use xarray
- Introduction of tracepoints to the MAD layer
- Build large SGLs at the start for DMA mapping and get the driver to
split them
- Generally clean SGL handling code throughout the subsystem
- Support for restricting RDMA devices to net namespaces for
containers
- Progress to remove object allocation boilerplate code from drivers
- Change in how the mlx5 driver shows representor ports linked to VFs
- mlx5 uapi feature to access the on chip SW ICM memory
- Add a new driver for 'EFA'. This is HW that supports user space
packet processing through QPs in Amazon's cloud"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
RDMA/ipoib: Allow user space differentiate between valid dev_port
IB/core, ipoib: Do not overreact to SM LID change event
RDMA/device: Don't fire uevent before device is fully initialized
lib/scatterlist: Remove leftover from sg_page_iter comment
RDMA/efa: Add driver to Kconfig/Makefile
RDMA/efa: Add the efa module
RDMA/efa: Add EFA verbs implementation
RDMA/efa: Add common command handlers
RDMA/efa: Implement functions that submit and complete admin commands
RDMA/efa: Add the ABI definitions
RDMA/efa: Add the com service API definitions
RDMA/efa: Add the efa_com.h file
RDMA/efa: Add the efa.h header file
RDMA/efa: Add EFA device definitions
RDMA: Add EFA related definitions
RDMA/umem: Remove hugetlb flag
RDMA/bnxt_re: Use core helpers to get aligned DMA address
RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
RDMA/umem: Add API to find best driver supported page size in an MR
...
|
|
Add EFA Makefile and Kconfig.
Signed-off-by: Gal Pressman <[email protected]>
Reviewed-by: Steve Wise <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Make the anon_inodes facility unconditional so that it can be used by core
VFS code and pidfd code.
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
[[email protected]: adapt commit message to mention pidfds]
Signed-off-by: Christian Brauner <[email protected]>
|
|
The next patch will add dependency from ib_umem_get in to ib_uverbs so
move the required ib_umem_xxx functionality to it's correct module -
ib_uverbs - and avoid circular dependecy from the form of ib_core ->
ib_uverbs -> ib_core in depmod.
Since this now requires all drivers to be build modular if uverbs is
modular, hoist the test a couple drivers had into the main kconfig and
apply it to all drivers uniformly.
Signed-off-by: Shamir Rabinovitch <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The zap_vma_ptes() is declared but not defined on NOMMU kernels, causing a
link error for the newly added uverbs code:
drivers/infiniband/core/uverbs_main.o: In function `uverbs_user_mmap_disassociate':
uverbs_main.c:(.text+0x114c): undefined reference to `zap_vma_ptes'
drivers/infiniband/core/uverbs_main.o: In function `rdma_umap_open':
uverbs_main.c:(.text+0x53c): undefined reference to `zap_vma_ptes'
Since all user access for all of our drivers depend on remapping pages to
user space disable USER_ACCESS when there is no mmu.
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Building UCM with CONFIG_INFINIBAND_USER_ACCESS=m results in a
set of link errors including:
drivers/infiniband/core/ucm.o: In function `ib_ucm_event_handler':
ucm.c:(.text+0x6dc): undefined reference to `ib_copy_path_rec_to_user'
drivers/infiniband/core/ucma.o: In function `ucma_event_handler':
ucma.c:(.text+0xdc0): undefined reference to `ib_copy_ah_attr_to_user'
To get it to build-test again, this makes the option itself a
tristate, which lets Kconfig figure out the dependency correctly.
Fixes: 486edfb1039d ("IB/ucm: Fix compiling ucm.c")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Even though this interface is marked CONFIG_BROKEN we still expect it to
compile, at least until we delete it completely.
Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations
can be detected.
Fixes: e7ff98aefc9e ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers")
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In commit 357d23c811a7 ("Remove the obsolete libibcm library")
in rdma-core [1], we removed obsolete library which used the
/dev/infiniband/ucmX interface.
Following multiple syzkaller reports about non-sanitized
user input in the UCMA module, the short audit reveals the same
issues in UCM module too.
It is better to disable this interface in the kernel,
before syzkaller team invests time and energy to harden
this unused interface.
[1] https://github.com/linux-rdma/rdma-core/pull/279
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Allow INFINIBAND without INFINIBAND_ADDR_TRANS because fuzzing has been
finding fair number of CM bugs. So provide option to disable it.
Signed-off-by: Greg Thelen <[email protected]>
Cc: Tarick Bedeir <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Enable the ioctl() uAPI for IB by default if the standard write()
uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are
also available under the old write() uAPI are put inside a new
INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig.
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Pull more rdma updates from Doug Ledford:
"Items of note:
- two patches fix a regression in the 4.15 kernel. The 4.14 kernel
worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
4.15. The fix is here.
- one of the patches (the endian notation patch from Lijun) looks
like a lot of lines of change, but it's mostly mechanical in
nature. It amounts to the biggest chunk of change in it (it's about
2/3rds of the overall pull request).
Summary:
- Clean up some function signatures in rxe for clarity
- Tidy the RDMA netlink header to remove unimplemented constants
- bnxt_re driver fixes, one is a regression this window.
- Minor hns driver fixes
- Various fixes from Dan Carpenter and his tool
- Fix IRQ cleanup race in HFI1
- HF1 performance optimizations and a fix to report counters in the right units
- Fix for an IPoIB startup sequence race with the external manager
- Oops fix for the new kabi path
- Endian cleanups for hns
- Fix for mlx5 related to the new automatic affinity support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
net/mlx5: increase async EQ to avoid EQ overrun
mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
RDMA/hns: Fix the endian problem for hns
IB/uverbs: Use the standard kConfig format for experimental
IB: Update references to libibverbs
IB/hfi1: Add 16B rcvhdr trace support
IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
IB/core: Avoid a potential OOPs for an unused optional parameter
IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
IB/ipoib: Fix for potential no-carrier state
IB/hfi1: Show fault stats in both TX and RX directions
IB/hfi1: Remove blind constants from 16B update
IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
IB/hfi1: Do not override given pcie_pset value
IB/hfi1: Optimize process_receive_ib()
IB/hfi1: Remove unnecessary fecn and becn fields
IB/hfi1: Look up ibport using a pointer in receive path
IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
IB/hfi1: Remove dependence on qp->s_hdrwords
...
|
|
We really don't want people turning this on just yet, make it very
clear with capital letters.
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
These days the userspace comes from rdma-core, revise references
in the kernel to point to the current repository.
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Updates to use cond_resched() instead of cond_resched_rcu_qs()
where feasible (currently everywhere except in kernel/rcu and
in kernel/torture.c). Also a couple of fixes to avoid sending
IPIs to offline CPUs.
- Updates to simplify RCU's dyntick-idle handling.
- Updates to remove almost all uses of smp_read_barrier_depends()
and read_barrier_depends().
- Miscellaneous fixes.
- Torture-test updates.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The smp_read_barrier_depends() does nothing at all except on DEC Alpha,
and no current DEC Alpha systems use Infiniband:
lkml.kernel.org/r/20171023085921.jwbntptn6ictbnvj@tower
This commit therefore makes Infiniband depend on !ALPHA and removes
the now-ineffective invocations of smp_read_barrier_depends() from
the InfiniBand driver.
Please note that this patch should not be construed as my saying that
InfiniBand's memory ordering is correct, but rather that this patch does
not in any way affect InfiniBand's correctness. In other words, the
result of applying this patch is bug-for-bug compatible with the original.
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Cree <[email protected]>
Cc: Andrea Parri <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
[ paulmck: Removed drivers/dma/ioat/dma.c per Jason Gunthorpe's feedback. ]
Acked-by: Jason Gunthorpe <[email protected]>
|
|
If NO_DMA=y:
ERROR: "bad_dma_ops" [net/sunrpc/xprtrdma/rpcrdma.ko] undefined!
ERROR: "bad_dma_ops" [net/smc/smc.ko] undefined!
ERROR: "bad_dma_ops" [net/rds/rds_rdma.ko] undefined!
ERROR: "bad_dma_ops" [net/9p/9pnet_rdma.ko] undefined!
ERROR: "bad_dma_ops" [drivers/nvme/target/nvmet-rdma.ko] undefined!
ERROR: "bad_dma_ops" [drivers/nvme/host/nvme-rdma.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/ulp/srpt/ib_srpt.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/ulp/srp/ib_srp.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/ulp/isert/ib_isert.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/ulp/ipoib/ib_ipoib.ko] undefined!
ERROR: "bad_dma_ops" [drivers/infiniband/core/ib_core.ko] undefined!
Before, this was handled implicitly by the dependency on PCI.
Add an explicit dependency on HAS_DMA to fix this.
Fixes: 931bc0d91639f8fb ("IB: Move PCI dependency from root KConfig to HW's KConfigs")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The infiniband subsystem causes a link failure when the umem
driver is built on MMU-less systems:
mm/mmu_notifier.o: In function `do_mmu_notifier_register':
mmu_notifier.c:(.text+0x32): undefined reference to `mm_take_all_locks'
drivers/infiniband/core/umem.o: In function `ib_umem_get':
umem.c:(.text+0x132): undefined reference to `can_do_mlock'
drivers/infiniband/core/umem_odp.o: In function `ib_umem_odp_map_dma_pages':
umem_odp.c:(.text+0x766): undefined reference to `get_user_pages_remote'
This bug has existed for a while but only become apparent in ARM
randconfig builds when the dependency on PCI was lifted, as none
of the ARM-NOMMU targets support PCI at the moment.
We could probably get the umem driver to build by providing an
alternative implementation 'can_do_mlock()' that returns false
on NOMMU-systems, but then we'd still have a problem with the
mmu-notifiers required by CONFIG_INFINIBAND_ON_DEMAND_PAGING,
so simply forbidding umem with NOMMU seems like the simplest
workaround.
Fixes: 931bc0d91639 ("IB: Move PCI dependency from root KConfig to HW's KConfigs")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
No reason to have dependency on PCI for the entire infiniband stack so
move it to KConfig of only the drivers that actually using PCI.
Signed-off-by: Yuval Shaia <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl
interface. This interface is experimental and is subject to change.
Signed-off-by: Matan Barak <[email protected]>
Reviewed-by: Yishai Hadas <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
OPA VNIC netdev function supports Ethernet functionality over Omni-Path
fabric by encapsulating Ethernet packets inside Omni-Path packet header.
It allocates a rdma netdev device and interfaces with the network stack to
provide standard Ethernet network interfaces. It overrides HFI1 device's
netdev operations where it is required.
Reviewed-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Niranjana Vishwanathapura <[email protected]>
Signed-off-by: Sadanand Warrier <[email protected]>
Signed-off-by: Sudeep Dutt <[email protected]>
Signed-off-by: Andrzej Kacprowski <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Makefile and Kconfig changes for enabling bnxt_re compilation
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Sriharsha Basavapatna <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch series adds a driver for a paravirtual RDMA device. The
device is developed for VMware's Virtual Machines and allows existing RDMA
applications to continue to use existing Verbs API when deployed in VMs
on ESXi. We recently did a presentation in the OFA Workshop [1] regarding
this device.
Description and RDMA Support
============================
The virtual device is exposed as a dual function PCIe device. One part
is a virtual network device (VMXNet3) which provides networking properties
like MAC, IP addresses to the RDMA part of the device. The networking
properties are used to register GIDs required by RDMA applications to
communicate.
These patches add support and the all required infrastructure for
letting applications use such a device. We support the mandatory Verbs API as
well as the base memory management extensions (Local Inv, Send with Inv and
Fast Register Work Requests). We currently support both Reliable Connected
and Unreliable Datagram QPs but do not support Shared Receive Queues
(SRQs).
Also, we support the following types of Work Requests:
o Send/Receive (with or without Immediate Data)
o RDMA Write (with or without Immediate Data)
o RDMA Read
o Local Invalidate
o Send with Invalidate
o Fast Register Work Requests
This version only adds support for version 1 of RoCE. We will add RoCEv2
support in a future patch. We do support registration of both MAC-based
and IP-based GIDs. I have also created a git tree for our user-level driver
[2].
Testing
=======
We have tested this internally for various types of Guest OS - Red Hat,
Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12
using backported versions of this driver. The tests included several
runs of the performance tests (included with OFED), Intel MPI PingPong
benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough
to test the backported version of the driver internally on their hardware
using a VMware provided ESX build. I have also applied and tested this
with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch
series should be applied all together. I split out the commits so that
it may be easier to review.
PVRDMA Resources
================
[1] OFA Workshop Presentation -
https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf
[2] Libpvrdma User-level library -
http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
Reviewed-by: Jorgen Hansen <[email protected]>
Reviewed-by: George Zhang <[email protected]>
Reviewed-by: Aditya Sarwade <[email protected]>
Reviewed-by: Bryan Tan <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Adit Ranadive <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Adds a skeletal implementation of the qed* RoCE driver -
basically the ability to communicate with the qede driver and
receive notifications from it regarding various init/exit events.
Signed-off-by: Rajesh Borundia <[email protected]>
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch added Kconfig and Makefile for building RoCE module.
Signed-off-by: Wei Hu <[email protected]>
Signed-off-by: Nenglong Zhao <[email protected]>
Signed-off-by: Lijun Ou <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Soft RoCE (RXE) - The software RoCE driver
ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device. This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.
The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.
A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices. The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.
Architecture:
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
+---------------------------------------------------------------+
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------------------------------+
| Application |
+-----------------------------------------------------------+
+-----------------------------------+
| libibverbs |
User +-----------------------------------+
+----------------+ +----------------+
| librxe | | HW RoCE lib |
+----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------+ +------------+
| Sockets | | RDMA ULP |
+--------------+ +------------+
+--------------+ +---------------------+
| TCP/IP | | ib_core |
+--------------+ +---------------------+
+------------+ +----------------+
Kernel | ib_rxe | | HW RoCE driver |
+------------+ +----------------+
+------------------------------------+
| NIC driver |
+------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Soft RoCE resources:
[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Amir Vadai <[email protected]>
Signed-off-by: Moni Shoua <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: Jubin John <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
|
|
MAINTAINERS, Kconfig, and Makefile to build i40iw module
Signed-off-by: Faisal Latif <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch introduces the basics for a new module called rdma_vt. This new
driver is a software implementation of the InfiniBand verbs and aims to
replace the multiple implementations that exist and duplicate each others'
code.
While the call to actually register the device with the IB core happens in
rdma_vt, most of the work is still done in the drivers themselves. This
will be changing in a follow on patch this is just laying the groundwork
for this infrastructure.
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.
In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.
The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This adds an abstraction that allows ULPs to simply pass a completion
object and completion callback with each submitted WR and let the RDMA
core handle the nitty gritty details of how to handle completion
interrupts and poll the CQ.
In detail there is a new ib_cqe structure which just contains the
completion callback, and which can be used to get at the containing
object using container_of. It is pointed to by the WR and WC as an
alternative to the wr_id field, similar to how many ULPs already use
the field to store a pointer using casts.
A driver using the new completion callbacks allocates it's CQs using
the new ib_create_cq API, which in addition to the number of CQEs and
the completion vectors also takes a mode on how we poll for CQEs.
Three modes are available: direct for drivers that never take CQ
interrupts and just poll for them, softirq to poll from softirq context
using the to be renamed blk-iopoll infrastructure which takes care of
rearming and budgeting, or a workqueue for consumer who want to be
called from user context.
Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
the current version of the workqueue code because my two previous
attempts sucked too much and converted the iSER initiator to the new
API.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The ehca driver is only supported on IBM machines with a custom EBus.
As they have opted to build their newer machines using more industry
standard technology and haven't really been pushing EBus capable
machines for a while, this driver can now safely be moved to the
staging area and scheduled for eventual removal. This plan was brought
to IBM's attention and received their sign-off.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Doug Ledford <[email protected]>
|