aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2018-08-15IB/core: Change filter function return type from int to boolParav Pandit2-31/+43
Filter functions returns either 0 or 1, therefore better change their return type from int to bool to reflect the same. Additionally some filter functions have suffix of _filter some doesn't. Make all filter function consistent to have __filter suffix to improve code readability. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-15IB/core: Update GID entries for netdevice whose mac address changesParav Pandit1-6/+7
Update all GID table entries of the netdevice whose MAC address changed. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-15IB/core: Add default GIDs of the bond master netdevParav Pandit1-29/+59
Currently following issues exist: 1. Default GIDs of the lower (slave) netdevice if the bond netdevice is added. Rather default GID should be of bond master netdevice. 2. Due to this, when failover event occurs FAILOVER event handler attempts to delete the GID of the upper device and tries to add the default GID of the lower device. This is incorrect behavior. To have simple and correct code: (a) Split default GIDs addition out of add_netdev_ips(). This allows easier removal in future if RoCE default GIDs are removed. (b) Add default GIDs of the bond master device by using right filter and callback function. (c) Remove unused function enum_netdev_default_gids(). Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-15IB/core: Consider adding default GIDs of bond deviceParav Pandit1-1/+23
Now that we correctly delete the default GIDs of lower devices during CHANGEUPPER event, add default GIDs of the bonding master device. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-15IB/core: Delete lower netdevice default GID entries in bonding scenarioParav Pandit1-9/+62
When NETDEV_CHANGEUPPER event occurs, lower device is not yet established as slave of the master, and when upper device is bond device, default GID entries not deleted. Due to this, when bond device is fully configured, default GID entries of bond device cannot be added as default GID entries are occupied by the lower netdevice. This is incorrect. Default GID entries should really be of bond netdevice because in all RoCE GIDs (default or IP), MAC address of the bond device will be used. It is confusing to have default GID of netdevice which is not really used for any purpose. Therefore, as first step, implement (a) filter function which filters if a CHANGEUPPER event netdevice and associated upper device is master device or not. (b) callback function which deletes the default GIDs of lower (event netdevice). Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14IB/core: Avoid confusing del_netdev_default_ipsParav Pandit1-13/+10
Currently bond_delete_netdev_default_gids() is called by two callers. (a) del_netdev_default_ips_join() (b) del_netdev_default_ips() Both above functions changes the argument order while calling bond_delete_netdev_default_gids(). This required silly del_netdev_default_ips() wrapper. Additionally, del_netdev_default_ips() deletes default GIDs not IP based GIDs. del_netdev_default_ips() having _ips suffix is confusing. Therefore, get rid of confusing del_netdev_default_ips() and simplify bond_delete_netdev_default_gids() to follow same argument order as its caller. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14IB/core: Add comment for change upper netevent handlingParav Pandit1-16/+39
Add comment for handling CHANGEUPPER netevent handling. To improve code readability, (a) move cmd definitions to its respective if-else branches, (b) avoid single line structure definitions. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14qedr: Add user space support for SRQYuval Bason2-44/+191
This patch adds support for SRQ's created in user space and update qedr_affiliated_event to deal with general SRQ events. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Bason <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14qedr: Add support for kernel mode SRQ'sYuval Bason5-13/+458
Implement the SRQ specific verbs and update the poll_cq verb to deal with SRQ completions. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Bason <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14qedr: Add wrapping generic structure for qpidr and adjust idr routines.Yuval Bason4-29/+31
Today, we are using idr mechanism for QP's only. This patch prepares the qedr_idr stuctures and the idr routines for both QP's and SRQ's. Signed-off-by: Yuval Bason <[email protected]> Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14IB/mlx5: Fix leaking stack memory to userspaceJason Gunthorpe1-1/+1
mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes were written. Fixes: 41d902cb7c32 ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp") Cc: <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-14Merge tag 'acpi-4.19-rc1' of ↵Linus Torvalds1-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These revert two ACPICA commits that are not needed any more, rework the property graphs support in ACPI to be more aligned with the analogous DT code, add some new quirks and remove one that isn't needed any more, add a special platform driver to enumerate multiple I2C devices hooked up to the same device object in the ACPI tables and update the battery and button drivers. Specifics: - Revert two ACPICA commits that are not needed any more (Erik Schmauss). - Rework property graph support in the ACPI device properties framework to make it behave more like the analogous DT code and update the documentation of it (Sakari Ailus). - Change the default ACPI device status after initialization to ACPI_STA_DEFAULT instead of 0 (Hans de Goede). - Add a special platform driver for enumerating multiple I2C devices hooked up to the same object in the ACPI tables (Hans de Goede). - Fix the ACPI battery driver to avoid reporting full capacity on systems without support for that and clean it up (Hans de Goede, Dmitry Rozhkov, Lucas Rangit Magasweran). - Add two system wakeup quirks to the ACPI EC driver (Aaron Ma, Mika Westerberg). - Add the touchscreen on Dell Venue Pro 7139 to the list of "always present" devices to make it work (Tristian Celestin). - Revert a special tables handling quirk for Dell XPS 9570 and Precision M5530 which is not needed any more (Kai Heng Feng). - Add support for a new OEM _OSI string to allow system vendors to work around issues with NVidia HDMI audio (Alex Hung). - Prevent the ACPI button driver from reporting excessive system wakeup events and clean it up (Ravi Chandra Sadineni, Randy Dunlap). - Clean up two minor code style issues in the ACPI core and GHES handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd)" * tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits) platform/x86: Add ACPI i2c-multi-instantiate pseudo driver ACPI / x86: utils: Remove status workaround from acpi_device_always_present() ACPI / scan: Create platform device for fwnodes with multiple i2c devices ACPI / scan: Initialize status to ACPI_STA_DEFAULT ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th ACPI: bus: Fix a pointer coding style issue arm64 / ACPI: clean the additional checks before calling ghes_notify_sea() ACPI / scan: Add static attribute to indirect_io_hosts[] ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity ACPI / EC: Use ec_no_wakeup on ThinkPad X1 Yoga 3rd ACPI / battery: get rid of negations in conditions ACPI / battery: use specialized print macros ACPI / battery: reorder headers alphabetically ACPI / battery: drop inclusion of init.h ACPI: battery: remove redundant old_present check on insertion ACPI: property: graph: Update graph documentation to use generic references ACPI: property: graph: Improve graph documentation for port/ep numbering ACPI: property: graph: Fix graph documentation ACPI: property: Update documentation for hierarchical data extension 1.1 ACPI: property: Document key numbering for hierarchical data extension refs ...
2018-08-14Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
2018-08-14Merge branches 'acpica' and 'acpi-property'Rafael J. Wysocki1-6/+4
Merge ACPICA changes and updates of the ACPI device properties framework for 4.19. These revert two ACPICA commits that are not needed any more and modify the properties graph support in ACPI to be more in-line with the analogous DT code. * acpica: ACPICA: Update version to 20180629 ACPICA: Revert "iASL compiler: allow compilation of externals with paths that refer to existing names" ACPICA: Revert "iASL: change processing of external op namespace nodes for correctness" * acpi-property: ACPI: property: graph: Update graph documentation to use generic references ACPI: property: graph: Improve graph documentation for port/ep numbering ACPI: property: graph: Fix graph documentation ACPI: property: Update documentation for hierarchical data extension 1.1 ACPI: property: Document key numbering for hierarchical data extension refs ACPI: property: Use data node name and reg property for graphs ACPI: property: Allow direct graph endpoint references ACPI: property: Make the ACPI graph API private ACPI: property: Document hierarchical data extension references ACPI: property: Allow making references to non-device nodes ACPI: Convert ACPI reference args to generic fwnode reference args
2018-08-13IB/ucm: Fix compiling ucm.cJason Gunthorpe2-6/+6
Even though this interface is marked CONFIG_BROKEN we still expect it to compile, at least until we delete it completely. Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations can be detected. Fixes: e7ff98aefc9e ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers") Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...
2018-08-13IB/uverbs: Do not check for device disassociation during ioctlJason Gunthorpe1-28/+13
Now that the ioctl path and uobjects are converted to use uverbs_api, it is now safe to remove the disassociation protection from the common ioctl code. This completes the work to make destroy functions continue to work even after device disassociation. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-13IB/uverbs: Remove struct uverbs_root_spec and all supporting codeJason Gunthorpe6-742/+2
Everything now uses the uverbs_uapi data structure. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-13IB/uverbs: Use uverbs_api to unmarshal ioctl commandsJason Gunthorpe3-269/+210
Convert the ioctl method syscall path to use the uverbs_api data structures. The new uapi structure includes all the same information, just in a different and more optimal way. - Use attr_bkey instead of 2 level radix trees for everything related to attributes. This includes the attribute storage, presence, and detection of missing mandatory attributes. - Avoid iterating over all attribute storage at finish, instead use find_first_bit with the attr_bkey to locate only those attrs that need cleanup. - Organize things to always run, and always rely on, cleanup. This avoids a bunch of tricky error unwind cases. - Locate the method using the radix tree, and locate the attributes using a very efficient incremental radix tree lookup - Use the precomputed destroy_bkey to handle uobject destruction - Use the precomputed allocation sizes and precomputed 'need_stack' to avoid maths in the fast path. This is optimal if userspace does not pass (many) unsupported attributes. Overall this results in much better codegen for the attribute accessors, everything is now stored in bitmaps or linear arrays indexed by attr_bkey. The compiler can compute attr_bkey values at compile time for all method attributes, meaning things like uverbs_attr_is_valid() now compile into single instruction bit tests. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-13IB/uverbs: Use uverbs_alloc for allocationsJason Gunthorpe2-65/+38
Several handlers need temporary allocations for the life of the method, switch them to use the uverbs_alloc allocator. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]>
2018-08-13IB/uverbs: Add a simple allocator to uverbs_attr_bundleJason Gunthorpe1-20/+89
This is similar in spirit to devm, it keeps track of any allocations linked to this method call and ensures they are all freed when the method exits. Further, if there is space in the internal/onstack buffer then the allocator will hand out that memory and avoid an expensive call to kalloc/kfree in the syscall path. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]>
2018-08-10IB/uverbs: Remove the ib_uverbs_attr pointer from each attrJason Gunthorpe2-35/+64
Memory in the bundle is valuable, do not waste it holding an 8 byte pointer for the rare case of writing to a PTR_OUT. We can compute the pointer by storing a small 1 byte array offset and the base address of the uattr memory in the bundle private memory. This also means we can access the kernel's copy of the ib_uverbs_attr, so drop the copy of flags as well. Since the uattr base should be private bundle information this also de-inlines the already too big uverbs_copy_to inline and moves create_udata into uverbs_ioctl.c so they can see the private struct definition. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]>
2018-08-10IB/uverbs: Provide implementation private memory for the uverbs_attr_bundleJason Gunthorpe1-55/+57
This already existed as the anonymous 'ctx' structure, but this was not really a useful form. Hoist this struct into bundle_priv and rework the internal things to use it instead. Move a bunch of the processing internal state into the priv and reduce the excessive use of function arguments. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]>
2018-08-10IB/uverbs: Use uverbs_api to manage the object type inside the uobjectJason Gunthorpe3-51/+57
Currently the struct uverbs_obj_type stored in the ib_uobject is part of the .rodata segment of the module that defines the object. This is a problem if drivers define new uapi objects as we will be left with a dangling pointer after device disassociation. Switch the uverbs_obj_type for struct uverbs_api_object, which is allocated memory that is part of the uverbs_api and is guaranteed to always exist. Further this moves the 'type_class' into this memory which means access to the IDR/FD function pointers is also guaranteed. Drivers cannot define new types. This makes it safe to continue to use all uobjects, including driver defined ones, after disassociation. Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-10IB/uverbs: Build the specs into a radix tree at runtimeJason Gunthorpe5-3/+408
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Michael J. Ruhl <[email protected]>
2018-08-10IB/uverbs: Have the core code create the uverbs_root_specJason Gunthorpe5-49/+50
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Michael J. Ruhl <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]>
2018-08-09IB/uverbs: Fix reading of 32 bit flagsJason Gunthorpe1-1/+1
This is missing a zeroing of the high bits of flags, and is also not correct for big endian machines. Properly zero extend the 32 bit flags into the 64 bit stack variable. Reported-by: Michael J. Ruhl <[email protected]> Fixes: bccd06223f21 ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language") Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Michael J. Ruhl <[email protected]>
2018-08-08RDMA/rxe: Set wqe->status correctly if an unexpected response is receivedBart Van Assche1-0/+1
Every function that returns COMPST_ERROR must set wqe->status to another value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code path for which this is not yet the case. Signed-off-by: Bart Van Assche <[email protected]> Cc: <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-08iw_cxgb4: pass window scale in flowc work requestPotnuri Bharat Teja2-24/+28
This will allow FW to not send more data to TP (which would then need to be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR. Also refactor send_flowc() a bit to clean it up. Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-08RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wqLeon Romanovsky1-1/+3
[ 61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34 [ 61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int' [ 61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96 [ 61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 61.188315] Call Trace: [ 61.188661] dump_stack+0xc7/0x13b [ 61.190427] ubsan_epilogue+0x9/0x49 [ 61.190899] __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f [ 61.197040] mlx5_ib_create_wq+0x1c99/0x1d50 [ 61.206632] ib_uverbs_ex_create_wq+0x499/0x820 [ 61.213892] ib_uverbs_write+0x77e/0xae0 [ 61.248018] vfs_write+0x121/0x3b0 [ 61.249831] ksys_write+0xa1/0x120 [ 61.254024] do_syscall_64+0x7c/0x2a0 [ 61.256178] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 61.259211] RIP: 0033:0x7f54bab70e99 [ 61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 [ 61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99 [ 61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003 [ 61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002 [ 61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0 [ 61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000 Cc: <[email protected]> # 4.7 Fixes: 79b20a6c3014 ("IB/mlx5: Add receive Work Queue verbs") Cc: syzkaller <[email protected]> Reported-by: Noa Osherovich <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-07IB/ucm: Initialize sgid request GID attribute pointerParav Pandit1-4/+1
sgid_attr is uninitialized on the stack, initialize it to NULL. Fixes: 398391071f25 ("IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Yossi Itigin <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-05Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe8-27/+39
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <[email protected]>
2018-08-05Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+54
Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <[email protected]>
2018-08-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-5/+54
Pull rdma fix from Jason Gunthorpe: "One bug for missing user input validation: refuse invalid port numbers in the modify_qp system call" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Expand primary and alt AV port checks
2018-08-02IB/ipoib: Consolidate checking of the proposed child interfaceJason Gunthorpe2-28/+52
Move all the checking for pkey and other validity to the __ipoib_vlan_add function. This removes the last difference from the control flow of the __ipoib_vlan_add to make the overall design simpler to understand. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Erez Shitrit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Maintain the child_intfs list from ndo_init/uninitJason Gunthorpe3-26/+16
This fixes a bug in the netlink path where the vlan_rwsem was not held around __ipoib_vlan_add causing the child_intfs to be manipulated unsafely. In the process this greatly simplifies the vlan_rwsem write side locking to only cover a single non-sleeping statement. This also further increases the safety of the removal ordering by holding the netdev of the parent while the child is active to ensure most bugs become either an oops on a NULL priv or a deadlock on the netdev refcount. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Do not remove child devices from within the ndo_uninitJason Gunthorpe3-11/+30
Switching to priv_destructor and needs_free_netdev created a subtle ordering problem in ipoib_remove_one. Now that unregister_netdev frees the netdev and priv we must ensure that the children are unregistered before trying to unregister the parent, or child unregister will use after free. The solution is to unregister the children, then parent, in the same batch all while holding the rtnl_lock. This closes all the races where a new child could have been added and ensures proper ordering. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Get rid of the sysfs_mutexJason Gunthorpe4-48/+65
This mutex was introduced to deal with the deadlock formed by calling unregister_netdev from within the sysfs callback of a netdev. Now that we have priv_destructor and needs_free_netdev we can switch to the more targeted solution of running the unregister from a work queue. This avoids the deadlock and gets rid of the mutex. The next patch in the series needs this mutex eliminated to create atomicity of unregisteration. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02RDMA/netdev: Use priv_destructor for netdev cleanupJason Gunthorpe4-76/+105
Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Denis Drozdov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Move init code to ndo_initJason Gunthorpe4-119/+114
Now that we have a proper ndo_uninit, move code that naturally pairs with the ndo_uninit into ndo_init. This allows the netdev core to natually handle ordering. This fixes the situation where register_netdev can fail before calling ndo_init, in which case it wouldn't call ndo_uninit either. Also move a bunch of duplicated init code that is shared between child and parent for clarity. Now the child and parent register functions look very similar. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Move all uninit code into ndo_uninitJason Gunthorpe3-32/+34
Currently uninit is sometimes done twice in error flows, and is sprinkled a bit all over the place. Improve the clarity of the design by moving all uninit only into ndo_uinit. Some duplication is removed: - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but this duplicates the process in ipoib_neigh_hash_init - Flushing priv->wq was sometimes done before unregister, but that duplicates what has been done in ndo_uninit Uniniting the IB event queue must remain before unregister_netdev as it requires the RTNL lock to be dropped, this is moved to a helper to make that flow really clear and remove some duplication in error flows. If register_netdev fails (and ndo_init is NULL) then it almost always calls ndo_uninit, which lets us remove all the extra code from the error unwinds. The next patch in the series will close the 'almost always' hole by pairing a proper ndo_init with ndo_uninit. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02IB/ipoib: Use cancel_delayed_work_sync for neigh-clean taskErez Shitrit2-24/+10
The neigh_reap_task is self restarting, but so long as we call cancel_delayed_work_sync() it will be guaranteed to not be running and never start again. Thus we don't need to have the racy IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes calling flush_workqueue after the cancel. This fixes a situation where the GC work could have been left running in some rare situations. Signed-off-by: Erez Shitrit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-02IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWNJason Gunthorpe4-13/+18
This essentially duplicates the netdev's reg_state, so just use that directly. The reg_state is updated under the rntl_lock, and all places using GOING_DOWN already acquire the rtnl_lock so checking is safe. Since the only place we use GOING_DOWN is for the parent device this does not fix any bugs, but it is a step to tidy up the unregister flow so that after later patches the flow is uniform and sane. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2018-08-02iw_cxgb4: Support FW write completion WRPotnuri Bharat Teja4-2/+183
To optimize NVME-oF READ IOPs, use a specialized WQE that combines the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target driver. This reduces uP overhead per NVME-oF IO, and results in over 10% improvement in NVME-oF 4K READ IOPs. Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-02iw_cxgb4: RDMA write with immediate supportPotnuri Bharat Teja4-15/+79
Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode. Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-02rdma/cxgb4: fix some info leaksDan Carpenter1-4/+3
In c4iw_create_qp() there are several struct members which potentially aren't inintialized like uresp.rq_key. I've fixed this code before in in commit ae1fe07f3f42 ("RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()") so this time I'm just going to take a big hammer approach and memset the whole struct to zero. Hopefully, it will stay fixed this time. In c4iw_create_srq() we don't clear uresp.reserved. Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Raju Rangoju <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-02RDMA/hns: Support flush cqe for hip08 in kernel spaceYixian Liu4-20/+240
According to IB protocol, there are some cases that work requests must return the flush error completion status through the completion queue. Due to hardware limitation, the driver needs to assist the flush process. This patch adds the support of flush cqe for hip08 in the cases that needed, such as poll cqe, post send, post recv and aeqe handle. The patch also considered the compatibility between kernel and user space. Signed-off-by: Yixian Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-08-02scsi: target: srp, vscsi, sbp, qla: use target_remove_sessionMike Christie1-2/+1
This converts the drivers that called transport_deregister_session_configfs and then immediately called transport_deregister_session to use target_remove_session. Signed-off-by: Mike Christie <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Chris Boot <[email protected]> Cc: Bryant G. Ly <[email protected]> Cc: Michael Cyr <[email protected]> Cc: <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2018-08-02scsi: target: rename target_alloc_sessionMike Christie1-3/+3
Rename target_alloc_session to target_setup_session to avoid confusion with the other transport session allocation function that only allocates the session and because the target_alloc_session does so much more. It allocates the session, sets up the nacl and registers the session. The next patch will then add a remove function to match the setup in this one, so it should make sense for all drivers, except iscsi, to just call those 2 functions to setup and remove a session. iscsi will continue to be the odd driver. Signed-off-by: Mike Christie <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Chris Boot <[email protected]> Cc: Bryant G. Ly <[email protected]> Cc: Michael Cyr <[email protected]> Cc: <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Andrzej Pietrasiewicz <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Juergen Gross <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2018-08-01IB/IPoIB: Set ah valid flag in multicast send flowDenis Drozdov1-0/+1
The change of ipoib_ah data structure with adding "valid" flag and checks of ah->valid in ipoib_start_xmit affected multicast packet flow. Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag remains unset, so that ipoib_start_xmit end up with neigh_refresh_path instead of sending the packet using neigh. "ah->valid" has to be set in multicast send flow. As a result IPoIB starts sending packets via neigh immediately and eliminates 60sec delay of neigh keep alive interval. The typical example of this issue are two sequential arpings: arping 11.134.208.9 -> got response (mcast_send) arping 11.134.208.9 -> no response (ah->valid = 0) Fixes: fa9391dbad4b ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events") Signed-off-by: Denis Drozdov <[email protected]> Reviewed-by: Erez Shitrit <[email protected]> Reviewed-by: Feras Daoud <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>