aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/core/nldev.c
AgeCommit message (Collapse)AuthorFilesLines
2020-07-29RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QPMark Zhang1-3/+0
When dumping QPs bound to a counter, raw QPs should be allowed to dump without the CAP_NET_RAW privilege. This is consistent with what "rdma res show qp" does. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-10RDMA/counter: Add PID category support in auto modeMark Zhang1-2/+6
With the "PID" category QPs have same PID will be bound to same counter; If this category is not set then QPs have different PIDs will be bound to same counter. This is implemented for 2 reasons: 1. The counter is a limited resource, while there may be dozens of applications, each of which creates several types of QPs, which means it may doesn't have enough counter. 2. The system administrator needs all QPs created by all applications with same type bound to one counter. The counter name and PID is only make sense when "PID" category are configured. This category can also be used in combine with others, e.g. QP type. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-24RDMA: Add support to dump resource tracker in RAW formatMaor Gottlieb1-62/+118
Add support to get resource dump in raw format. It enable drivers to return the entire device specific QP/CQ/MR context without a need from the driver to set each field separately. The raw query returns only the device specific data, general data is still returned by using the existing queries. Example: $ rdma res show mr dev mlx5_1 mrn 2 -r -j [{"ifindex":7,"ifname":"mlx5_1", "data":[0,4,255,254,0,0,0,0,0,0,0,0,16,28,0,216,...]}] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated CM_ID resource tracker functionMaor Gottlieb1-11/+2
In order to avoid double multiplexing of the resource when it is a cm id, add a dedicated callback function. In addition remove fill_res_entry which is not used anymore. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated QP resource tracker functionMaor Gottlieb1-3/+2
In order to avoid double multiplexing of the resource when it is a QP, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add a dedicated CQ resource tracker functionMaor Gottlieb1-3/+2
In order to avoid double multiplexing of the resource when it is a CQ, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA: Add dedicated MR resource tracker functionMaor Gottlieb1-14/+4
In order to avoid double multiplexing of the resource when it is a MR, add a dedicated callback function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-06-23RDMA/core: Don't call fill_res_entry for PDMaor Gottlieb1-8/+1
None of the drivers implement it, remove it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-12RDMA/core: Fix double put of resourceMaor Gottlieb1-2/+1
Do not decrease the reference count of resource tracker object twice in the error flow of res_get_common_doit. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-13RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SETJason Gunthorpe1-1/+5
Empty device names cannot be added to sysfs and crash with: kobject: (00000000f9de3792): attempted to be registered with empty name! WARNING: CPU: 1 PID: 10856 at lib/kobject.c:234 kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 10856 Comm: syz-executor459 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:kobject_add_internal+0x7ac/0x9a0 lib/kobject.c:234 Code: 7a ca ca f9 e9 f0 f8 ff ff 4c 89 f7 e8 cd ca ca f9 e9 95 f9 ff ff e8 13 25 8c f9 4c 89 e6 48 c7 c7 a0 08 1a 89 e8 a3 76 5c f9 <0f> 0b 41 bd ea ff ff ff e9 52 ff ff ff e8 f2 24 8c f9 0f 0b e8 eb RSP: 0018:ffffc90002006eb0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815eae46 RDI: fffff52000400dc8 RBP: ffffc90002006f08 R08: ffff8880972ac500 R09: ffffed1015d26659 R10: ffffed1015d26658 R11: ffff8880ae9332c7 R12: ffff888093034668 R13: 0000000000000000 R14: ffff8880a69d7600 R15: 0000000000000001 kobject_add_varg lib/kobject.c:390 [inline] kobject_add+0x150/0x1c0 lib/kobject.c:442 device_add+0x3be/0x1d00 drivers/base/core.c:2412 ib_register_device drivers/infiniband/core/device.c:1371 [inline] ib_register_device+0x93e/0xe40 drivers/infiniband/core/device.c:1343 rxe_register_device+0x52e/0x655 drivers/infiniband/sw/rxe/rxe_verbs.c:1231 rxe_add+0x122b/0x1661 drivers/infiniband/sw/rxe/rxe.c:302 rxe_net_add+0x91/0xf0 drivers/infiniband/sw/rxe/rxe_net.c:539 rxe_newlink+0x39/0x90 drivers/infiniband/sw/rxe/rxe.c:318 nldev_newlink+0x28a/0x430 drivers/infiniband/core/nldev.c:1538 rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline] rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Prevent empty names when checking the name provided from userspace during newlink and rename. Fixes: 3856ec4b93c9 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support") Fixes: 05d940d3a3ec ("RDMA/nldev: Allow IB device rename through RDMA netlink") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-and-tested-by: [email protected] Signed-off-by: Jason Gunthorpe <[email protected]>
2020-03-04RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missingMark Zhang1-0/+2
This fixes the kernel crash when a RDMA_NLDEV_CMD_STAT_SET command is received, but the QP number parameter is not available. iwpm_register_pid: Unable to send a nlmsg (client = 2) infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98 general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 9754 Comm: syz-executor069 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nla_get_u32 include/net/netlink.h:1474 [inline] RIP: 0010:nldev_stat_set_doit+0x63c/0xb70 drivers/infiniband/core/nldev.c:1760 Code: fc 01 0f 84 58 03 00 00 e8 41 83 bf fb 4c 8b a3 58 fd ff ff 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d RSP: 0018:ffffc900068bf350 EFLAGS: 00010247 RAX: dffffc0000000000 RBX: ffffc900068bf728 RCX: ffffffff85b60470 RDX: 0000000000000000 RSI: ffffffff85b6047f RDI: 0000000000000004 RBP: ffffc900068bf750 R08: ffff88808c3ee140 R09: ffff8880a25e6010 R10: ffffed10144bcddc R11: ffff8880a25e6ee3 R12: 0000000000000000 R13: ffff88809acb0000 R14: ffff888092a42c80 R15: 000000009ef2e29a FS: 0000000001ff0880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4733e34000 CR3: 00000000a9b27000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline] rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4403d9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc0efbc5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 000000000000004a R11: 0000000000000246 R12: 0000000000401c60 R13: 0000000000401cf0 R14: 0000000000000000 R15: 0000000000000000 Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Signed-off-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-13RDMA/core: Do not erase the type of ib_cq.uobjectJason Gunthorpe1-1/+2
This is a struct ib_ucq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-45/+96
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-10-28Merge tag 'v5.4-rc5' into rdma.git for-nextJason Gunthorpe1-6/+4
Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-24RDMA/nldev: Skip counter if port doesn't matchMark Zhang1-1/+1
The counter resource should return -EAGAIN if it was requested for a different port, this is similar to how QP works if the users provides a port filter. Otherwise port filtering in netlink will return broken counter nests. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-23RDMA/core: Check that process is still alive before sending it to the usersLeon Romanovsky1-7/+21
The PID information can disappear asynchronously because the task can be killed and moved to zombie state. In this case, PID will be zero in similar way to the kernel tasks. Recognize such situation where we are asking to return orphaned object and simply skip filling PID attribute. As part of this change, document the same scenario in counter.c code. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-23RDMA/restrack: Remove PID namespace supportLeon Romanovsky1-12/+1
IB resources are bounded to IB device and file descriptors, both entities are unaware to PID namespaces and to task lifetime. The difference in model caused to unpredictable behavior for the following scenario: 1. Create FD and context 2. Share it with ephemeral child 3. Create any object and exit that child The end result of this flow, that those newly created objects will be tracked by restrack, but won't be visible for users because task_struct associated with them already exited. The right thing is to rely on net namespace only for any filtering purposes and drop PID namespace. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-22RDMA/nldev: Provide MR statisticsErez Alfasi1-6/+39
Add RDMA nldev netlink interface for dumping MR statistics information. Output example: $ ./ibv_rc_pingpong -o -P -s 500000000 local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: $ rdma stat show mr dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Erez Alfasi <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-22RDMA/mlx5: Return ODP type per MRErez Alfasi1-0/+13
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show mr' dump. For example: $ rdma -dd resource show mr dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000 pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit For non-ODP MRs, we won't print "drv_odp ..." at all. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Erez Alfasi <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-22RDMA/nldev: Allow different fill function per resourceErez Alfasi1-21/+23
So far res_get_common_{dumpit, doit} was using the default resource fill function which was defined as part of the nldev_fill_res_entry fill_entries. Add a fill function pointer as an argument allows us to use different fill function in case we want to dump different values then 'rdma resource' flow do, but still use the same existing general resources dumping flow. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Erez Alfasi <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-04RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error pathLeon Romanovsky1-6/+4
Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-10-04RDMA/core: Fix an error handling path in 'res_get_common_doit()'Christophe JAILLET1-1/+1
According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe1-2/+1
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <[email protected]>
2019-08-21Merge branch 'odp_fixes' into rdma.git for-nextJason Gunthorpe1-2/+6
Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <[email protected]>
2019-08-20RDMA/restrack: Rewrite PID namespace check to be reliableLeon Romanovsky1-2/+1
task_active_pid_ns() is wrong API to check PID namespace because it posses some restrictions and return PID namespace where the process was allocated. It created mismatches with current namespace, which can be different. Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable results without any relation to allocated PID namespace. Fixes: 8be565e65fa9 ("RDMA/nldev: Factor out the PID namespace check") Fixes: 6a6c306a09b5 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API") Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Doug Ledford <[email protected]>
2019-08-12RDMA/core: Fix error code in stat_get_doit_qp()Dan Carpenter1-2/+6
We need to set the error codes on these paths. Currently the only possible error code is -EMSGSIZE so that's what the patch uses. Fixes: 83c2c1fcbd08 ("RDMA/nldev: Allow get counter mode through RDMA netlink") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/20190809101311.GA17867@mwanda Signed-off-by: Doug Ledford <[email protected]>
2019-07-25RDMA/core: Support netlink commands in non init_net net namespacesParav Pandit1-10/+10
Now that IB core supports RDMA device binding with specific net namespace, enable IB core to accept netlink commands in non init_net namespaces. This is done by having per net namespace netlink socket. At present only netlink device handling client RDMA_NL_NLDEV supports device handling in multiple net namespaces. Hence do not accept netlink messages for other clients in non init_net net namespaces. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-08IB/core: Work on the caller socket net namespace in nldev_newlink()Parav Pandit1-1/+1
While creating new RDMA devices based on netdevice name, consider the net namespace of the caller skb's socket similar to rest of the doit() callbacks and nldev_dellink() which deletes the RDMA device created using nldev_newlink(). Fixes: 3856ec4b93c94 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support") Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-08RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlinkYamin Friedman1-0/+14
Added parameter in ib_device for enabling dynamic interrupt moderation so that it can be configured in userspace using rdma tool. In order to set adaptive-moderation for an ib device the command is: rdma dev set [DEV] adaptive-moderation [on|off] Please set on/off. rdma dev show 0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0 sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on rdma resource show cq dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE adaptive-moderation off comm [ib_core] Signed-off-by: Yamin Friedman <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/nldev: Allow get default counter statistics through RDMA netlinkMark Zhang1-1/+97
This patch adds the ability to return the hwstats of per-port default counters (which can also be queried through sysfs nodes). Signed-off-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/nldev: Allow get counter mode through RDMA netlinkMark Zhang1-1/+65
Provide an option to get current counter mode through RDMA netlink. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/nldev: Allow counter manual mode configration through RDMA netlinkMark Zhang1-13/+98
Provide an option to allow users to manually bind a qp with a counter through RDMA netlink. Limit it to users with ADMIN capability only. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/netlink: Implement counter dumpit calbackMark Zhang1-0/+213
This patch adds the ability to return all available counters together with their properties and hwstats. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/nldev: Allow counter auto mode configration through RDMA netlinkMark Zhang1-0/+78
Provide an option to enable/disable per-port counter auto mode through RDMA netlink. Limit it to users with ADMIN capability only. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-05RDMA/restrack: Make is_visible_in_pid_ns() as an APIMark Zhang1-13/+2
Remove is_visible_in_pid_ns() from nldev.c and make it as a restrack API, so that it can be taken advantage by other parts like counter. Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-06-25RDMA/netlink: Audit policy settings for netlink attributesDoug Ledford1-13/+12
For all string attributes for which we don't currently accept the element as input, we only use it as output, set the string length to RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1. That way we will only accept a null string for that element. This will prevent someone from writing a new input routine that uses the element without also updating the policy to have a valid value. Also while there, make sure the existing entries that are valid have the correct policy, if not, correct the policy. Remove unnecessary checks for nla_strlcpy() overflow once the policy has been set correctly. Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-06-19RDMA/netlink: Resort policy arrayDoug Ledford1-75/+78
Sort the netlink policy array by netlink attribute name. This will make it easier in the future to find the entry you are looking for when you need to make changes, or to make sure you don't add the same entry twice. Fix the whitespace while we are there. Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2019-06-18RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEVJason Gunthorpe1-0/+1
Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2019-06-18RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoloadJason Gunthorpe1-0/+94
Allow userspace to issue a netlink query against the ib_device for something like "uverbs" and get back the char dev name, inode major/minor, and interface ABI information for "uverbs0". Since we are now in netlink this can also trigger a module autoload to make the uverbs device come into existence. Largely this will let us replace searching and reading inside sysfs to setup devices, and provides an alternative (using driver_id) to device name based provider binding for things like rxe. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2019-05-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-12/+15
Pull more rdma updates from Jason Gunthorpe: "This is being sent to get a fix for the gcc 9.1 build warnings, and I've also pulled in some bug fix patches that were posted in the last two weeks. - Avoid the gcc 9.1 warning about overflowing a union member - Fix the wrong callback type for a single response netlink to doit - Bug fixes from more usage of the mlx5 devx interface" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net/mlx5: Set completion EQs as shared resources IB/mlx5: Verify DEVX general object type correctly RDMA/core: Change system parameters callback from dumpit to doit RDMA: Directly cast the sockaddr union to sockaddr
2019-05-13RDMA/core: Change system parameters callback from dumpit to doitParav Pandit1-12/+15
.dumpit() callback is used for returning same type of data in the loop, e.g. loop over ports, resources, devices. However system parameters are general and standalone for whole subsystem. It means that getting system parameters should be doit callback. Fixes: cb7e0e130503 ("RDMA/core: Add interface to read device namespace sharing mode") Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-05-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-11/+101
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
2019-04-27netlink: make validation more configurable for future strictnessJohannes Berg1-18/+18
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek1-4/+5
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-04-22RDMA/core: Add a netlink command to change net namespace of rdma deviceParav Pandit1-1/+12
Provide an option to change the net namespace of a rdma device through a netlink command. When multiple rdma devices exists in a system, and when containers are used, this will limit rdma device visibility to a specified net namespace. An example command to change net namespace of mlx5_1 device to the previously created net namespace 'foo' is: $ ip netns add foo $ rdma dev set mlx5_1 netns foo Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-04-08RDMA/nldev: Return device protocolLeon Romanovsky1-1/+23
Add new RDMA_NLDEV_ATTR_DEV_PROTOCOL attribute to give ability for UDEV rules create IB device stable names based on link type protocol. The assumption that devices like mlx4 with duality in their link type under one IB device struct won't be allowed in the future. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-03-28RDMA/core: Add command to set ib_core device net namspace sharing modeParav Pandit1-0/+25
Add netlink command that enables/disables sharing rdma device among multiple net namespaces. Using rdma tool, $rdma sys set netns shared (default mode) When rdma subsystem netns mode is set to shared mode, rdma devices will be accessible in all net namespaces. Using rdma tool, $rdma sys set netns exclusive When rdma subsystem netns mode is set to exclusive mode, devices will be accessible in only one net namespace at any given point of time. If there are any net namespaces other than default init_net exists, while executing this command, it will fail and mode cannot be changed. To change this mode, netlink command is used instead of sysctl, because netlink command allows to auto load a module. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-03-28RDMA/core: Add interface to read device namespace sharing modeParav Pandit1-0/+32
Add an interface via netlink command to query whether rdma devices are shared among multiple net namespaces or not. When using RDMAtool, it can be queried as, $rdma system show netns netns shared Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-03-28RDMA/core: Extend ib_device_get_by_index for net namespaceParav Pandit1-9/+9
Extend ib_device_get_by_index() API to check device access for net namespace for serving netlink commands. Also enforce net ns check on dumpit commands which iterate over all registered rdma devices and which don't call ib_device_get_by_index(). Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-02-22RDMA/core: Fix a WARN() messageDan Carpenter1-3/+1
The first parameter of WARN_ONCE() is a condition, then following parameters are the message. In this case, we left out the condition so it will just print the ops->type string. Fixes: 3856ec4b93c9 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Majd Dibbiny <[email protected]> Reviewed-by: Steve Wise <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>