aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-03-27IB/uverbs: UAPI pointers should use __aligned_u64 typeMatan Barak1-1/+1
The ioctl() UAPIs are meant to be used by both user-space and kernel ioctl() handlers. Mostly, these UAPI structs tend to consist of simple types, but sometimes user-space pointers may be passed between user-space and kernel. We would like to avoid dereferencing a user-space pointer in the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64 type. Fixes: 1f7ff9d5d36a ('IB/uverbs: Move to new headers and make naming consistent') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27Merge branch '32compat'Jason Gunthorpe21-280/+310
The design of the uAPI had intended all structs to share the same layout on 32 and 64 bit compiles. Unfortunately over the years some errors have crept in. This series fixes all the incompatabilities. It goes along with a userspace rdma-core series that causes the providers to use these structs directly and then does various self-checks on the command formation. Those checks were combined with output from pahole on 32 and 64 bit compiles to confirm that the structure layouts are the same. This series does not make implicit padding explicit, as long as the implicit padding is the same on 32 and 64 bit compiles. Finally, the issue is put to rest by using __aligned_u64 in the uapi headers, if new code copies that type, and is checked in userspace, it is unlikely we will see problems in future. There are two patches that break the ABI for a 32 bit kernel, one for rxe and one for mlx4. Both patches have notes, but the overall feeling from Doug and I is that providing compat is just too difficult and not necessary since there is no real user of a 32 bit userspace and 32 bit kernel for various good reasons. The 32 bit userspace / 64 bit kernel case however does seem to have some real users and does need to work as designed. * 32compat: RDMA: Change all uapi headers to use __aligned_u64 instead of __u64 RDMA/rxe: Fix uABI structure layouts for 32/64 compat RDMA/mlx4: Fix uABI structure layouts for 32/64 compat RDMA/qedr: Fix uABI structure layouts for 32/64 compat RDMA/ucma: Fix uABI structure layouts for 32/64 compat RDMA: Remove minor pahole differences between 32/64
2018-03-27RDMA: Change all uapi headers to use __aligned_u64 instead of __u64Jason Gunthorpe19-276/+276
The new auditing standard for the subsystem will be to only use __aligned_64 in uapi headers to try and prevent 32/64 compat bugs from existing in the future. Changing all existing usage will help ensure new developers copy the right idea. The before/after of this patch was tested using pahole on 32 and 64 bit compiles to confirm it has no change in the structure layout, so this patch is a NOP. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/rxe: Fix uABI structure layouts for 32/64 compatJason Gunthorpe2-3/+15
With 32 bit compilation several of the fields become misaligned here. Fixing this is an ABI break for 32 bit rxe and it is in well used portions of the rxe ABI. To handle this we bump the ABI version, as expected. However the user space driver doesn't handle it properly today, so all existing user space continues to work. Updated userspace will start to require the necessary kernel version. We don't expect there to be any 32 bit users of rxe. Most likely cases, such as ARM 32 already generally don't work because rxe does not handle the CPU cache properly on its shared with userspace pages. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/mlx4: Fix uABI structure layouts for 32/64 compatJason Gunthorpe1-0/+1
rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on 32 bit compared to 64 bit, add explicit padding. The rss caps were introduced recently and are very rarely used in user space, mainly for DPDK. We don't expect there to be a real 32 bit user, so this change is done without compat considerations. Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/qedr: Fix uABI structure layouts for 32/64 compatJason Gunthorpe1-0/+4
struct qedr_alloc_ucontext_resp is a different length in 32 and 64 bit compiles due to implicit compiler padding. The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are not padded by the compiler, but in user space the compiler pads them due to the way the core and driver structs are concatenated. Make this padding explicit and consistent for future sanity. The kernel driver can already handle the user buffer being smaller than required and copies correctly, so no compat or ABI break happens from introducing the explicit padding. Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Fix uABI structure layouts for 32/64 compatJason Gunthorpe2-2/+12
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles. The kernel requires it to be the expected length or longer so 32 bit builds running on a 64 bit kernel will not work. Retain full compat by having all kernels accept a struct with or without the trailing reserved field. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA: Remove minor pahole differences between 32/64Jason Gunthorpe2-0/+3
To help automatic detection we want pahole to report the same struct layouts for 32 and 64 bit compiles. These cases are all implicit padding added at the end of embedded structs as part of a union. The added reserved fields have no impact on the ABI. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23RDMA/ocrdma: Fix structure layout for ocrdma_alloc_pdJason Gunthorpe1-2/+2
The udata's for alloc_pd cannot contain u64s due to alignment constraints. Switch the two never-used u64's to arrays of u32 to reduce the required struct alignment to 4 bytes. These reserved fields are totally unnecessary, never written and never read. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23iw_cxgb4: Add ib_device->get_netdev supportSteve Wise1-0/+19
This is useful to rdma ULPs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23IB/cma: Resolve route only while receiving CM requestsParav Pandit4-0/+19
Currently CM request for RoCE follows following flow. rdma_create_id() rdma_resolve_addr() rdma_resolve_route() For RC QPs: rdma_connect() ->cma_connect_ib() ->ib_send_cm_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() For UD QPs: rdma_connect() ->cma_resolve_ib_udp() ->ib_send_cm_sidr_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() In both the flows, route is already resolved before sending CM requests. Therefore, code is refactored to avoid resolving route second time in ib_cm layer. ib_init_ah_attr_from_path() is extended to resolve route when it is not yet resolved for RoCE link layer. This is achieved by caller setting route_resolved field in path record whenever it has route already resolved. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit1-1/+1
ib_query_gid() in commit [1] refers to RoCE GID table capability of the HCA using rdma_cap_roce_gid_table(). ib_core maintains the GID table cache regardless of the HCA provider drivers capability to maintain RoCE GID table. Therefore, whether to return a GID table entry from the software cache or from HCA should be done based on whether the port is RoCE or not. [1] commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22RDMA/restrack: Remove ambiguity in resource track clean logicLeon Romanovsky2-2/+45
The restrack clean routine had simple, but powerful WARN_ON check to see if all resources are cleared prior to releasing device. The WARN_ON check performed very well, but lack of information which device caused to resource leak, the object type and origin made debug to be fun and challenging at the same time. The fact that all dumps were the same because restrack_clean() is called in dealloc() didn't help either. So let's fix spelling error and convert WARN_ON to be more debug friendly. The dmesg cut below gives example of how the output will look output for the case fixed in patch [1] [ 438.421372] restrack: ------------[ cut here ]------------ [ 438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2 [ 438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed [ 438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed [ 438.429660] restrack: ------------[ cut here ]------------ [1] https://patchwork.kernel.org/patch/10298695/ Cc: Michal Kalderon <Michal.Kalderon@cavium.com> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22RDMA/hns: Fix cq record doorbell enable in kernelYixian Liu1-21/+17
Upon detecting both kernel and user space support record doorbell, the kernel needs to enable this capability in hardware by db_en, and it should take place before cq context configuration in hns_roce_cq_alloc. Currently, db_en is configured after cq alloc and db_map_user has similar problem. Reported-by: Xiping Zhang <zhangxiping3@huawei.com> Fixes: 9b44703d0a21 ("RDMA/hns: Support cq record doorbell for the user space") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22RDMA/cxgb3: Use structs to describe the uABI instead of opencodingJason Gunthorpe2-1/+8
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21IB/mlx4: Eliminate duplicate barriers on weakly-ordered archsSinan Kaya1-2/+2
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Enable ioctl() uAPI by default for new verbsMatan Barak3-10/+7
Enable the ioctl() uAPI for IB by default if the standard write() uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are also available under the old write() uAPI are put inside a new INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Add macros to simplify adding driver specific attributesMatan Barak1-0/+31
Previously, adding driver specific attributes required drivers to declare all the hierarchy - object tree, object, methods and the attributes themselves. A common use case is adding a few attributes to an existing common method. In order to simplify the driver's code, we add some macros to do all these declarations automatically. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new fileMatan Barak5-177/+217
Currently, all objects are declared in uverbs_std_types. This could lead to a huge file once we implement all objects, methods and handlers. Moving each object to its own file to keep the files smaller and more readable. uverbs_std_types.c will only contain the parsing tree definition and objects without any methods. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Expose parsing tree of all common objects to providersMatan Barak3-35/+37
The ioctl() based uverbs is based on merging feature trees. This teaches the generic parser how to parse methods according to the provider's support. In order to support merging with the common objects, exporting the common-object-tree to the provider drivers. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Safely extend existing attributesMatan Barak5-31/+104
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing attributes. The behavior of this flag was the kernel accepts anything bigger than the minimum size it specified. This is unsafe, since in order to safely extend an attribute, we need to make sure unknown size is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the unknown size is zero. In addition, attributes are now decorated with UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum and known length. Users of this flag needs to use copy_from_or_zero functions/macros. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Enable compact representation of uverbs_attr_specMatan Barak2-14/+24
Downstream patches extend uverbs_attr_spec with new fields. In order to save space, we move the type and flags fields to the various attribute flavors contained in the union. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak21-6/+48
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Move to new headers and make naming consistentMatan Barak10-247/+396
Use macros to make names consistent in ioctl() uAPI: The ioctl() uAPI works with object-method hierarchy. The method part also states which handler should be executed when this method is called from user-space. Therefore, we need to tie method, method's id, method's handler and the object owning this method together. Previously, this was done through explicit developer chosen names. This makes grepping the code harder. Changing the method's name, method's handler and object's name to be automatically generated based on the ids. The headers are split in a way so they be included and used by user-space. One header strictly contains structures that are used directly by user-space applications, where another header is used for internal library (i.e. libibverbs) to form the ioctl() commands. Other header simply contains the required general command structure. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/srp: Disallow duplicate RDMA/CM connectionsBart Van Assche1-3/+0
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER fields from the login request specify the I_T nexus. Whether or not an SRP target closes an existing connection for an I_T nexus when a login request is received depends on the value of the MULTICHANNEL field in the login request. The SRP initiator derives the value of the INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext, .ioc_guid, .initiator_ext .sgid members of the srp_target_port structure. This means that the .rdma_cm.dst check must be removed from srp_conn_unique(). This patch avoids that for target ports that have multiple addresses, e.g. an IPv4 and an IPv6 address, and if a connection is established to both target port addresses, that the initiator logs in alternatingly every 10 seconds to the other target port address. An SRP target must namely terminate all but one connections for a given I_T nexus if the MULTICHANNEL field has not been set in the login request. Fixes: 19f313438c77 ("IB/srp: Add RDMA/CM support") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/mlx5: Packet packing enhancement for RAW QPBodong Wang4-21/+98
Enable RAW QP to be able to configure burst control by modify_qp. By using burst control with rate limiting, user can achieve best performance and accuracy. The burst control information is passed by user through udata. This patch also reports burst control capability for mlx5 related hardwares, burst control is only marked as supported when both packet_pacing_burst_bound and packet_pacing_typical_size are supported. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19net/mlx5: Packet pacing enhancementBodong Wang4-33/+76
Add two new parameters: max_burst_sz and typical_pkt_size (both in bytes) to rate limit configurations. max_burst_sz: The device will schedule bursts of packets for an SQ connected to this rate, smaller than or equal to this value. Value 0x0 indicates packet bursts will be limited to the device defaults. This field should be used if bursts of packets must be strictly kept under a certain value. typical_pkt_size: When the rate limit is intended for a stream of similar packets, stating the typical packet size can improve the accuracy of the rate limiter. The expected packet size will be the same for all SQs associated with the same rate limit index. Ethernet driver is updated according to this change, but these two parameters will be kept as 0 due to lacking of proper way to get the configurations from user space which requires to change ndo_set_tx_maxrate interface. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19RDMA/hns: Fix init resp when alloc ucontextYixian Liu1-1/+1
The data in resp will be copied from kernel to userspace, thus it needs to be initialized to zeros to avoid copying uninited stack memory. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e088a685eae9 ("RDMA/hns: Support rq record doorbell for the user space") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Remove unimplemented ib_peek_cqParav Pandit1-12/+0
ib_peek_cq() verb doesn't seem be implemented in current code. There is some past reference to it at [1] about it being unimplemented. Lot of user documentation created out of kdoc refers to this unimplemented API. Therefore, remove unimplemented API. [1] http://lists.openfabrics.org/pipermail/ofw/2008-May/002465.html Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Use rdma_is_port_valid()Parav Pandit1-3/+2
Use rdma_is_port_valid() which performs port validity check instead of open coding the same check. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19RDMA/bnxt: Fix structure layout for bnxt_re_pd_respJason Gunthorpe1-1/+6
What is going on here is a bit subtle, in the kernel there is no problem because the struct is copied using copy_from_user, so it can safely have an 8 byte alignment, however in userspace it must be constructed by concatenation with the ib_uverbs_alloc_pd_resp struct. This is due to the required memory layout to execute the command. Since ibv_uverbs_alloc_pd_resp is only 4 bytes long, this causes misalignment, and the user space will experience an unexpected padding. Currently it works around this via pointer maths. Make everything more robust by having the compiler reduce the alignment of the struct to 4. The userspace has assertions to ensure this works properly in all situations. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/mlx5: Set the default active rate and width to QDR and 4XHonggang Li1-0/+3
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"), the mlx5_ib driver set the default active_width and active_speed to IB_WIDTH_4X and IB_SPEED_QDR. When the RoCE port is down, the RoCE port does not negotiate the active width with the remote side, causing the active width to be zero. When running userspace ibstat to view the port status, ibstat will panic as it reads an invalid width from sys file. This patch restores the original behavior. Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"). Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Set speed string to SDR for invalid active ratesHonggang Li1-0/+1
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"), the mlx5_ib driver set default active_width and active_speed to IB_WIDTH_4X and IB_SPEED_QDR. Now, the active_width and active_speed are zeros if the RoCE port is in DOWN state. The speed string should be set to " SDR" instead of a blank string when active_speed is zero. Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-16RDMA/restrack: Don't rely on uninitialized variable in restrack_add flowLeon Romanovsky1-1/+3
The restrack code relies on the fact that object structures are zeroed at the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it caused to the following crash. [ 137.392209] general protection fault: 0000 [#1] SMP KASAN PTI [ 137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G W 4.16.0-rc1-00099-g00313983cda6 #11 [ 137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0 [ 137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206 [ 137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: ffffffffb82314df [ 137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: 70696b533d454741 [ 137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: 0000000000000000 [ 137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: ffff8801d90c0ad0 [ 137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: 0000000000000014 [ 137.404736] FS: 00007fb415d43740(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000 [ 137.406074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: 00000000003606b0 [ 137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 137.410385] Call Trace: [ 137.411058] ib_destroy_cq+0x23/0x60 [ 137.411460] uverbs_free_cq+0x37/0xa0 [ 137.412040] remove_commit_idr_uobject+0x38/0xf0 [ 137.413042] _rdma_remove_commit_uobject+0x5c/0x160 [ 137.413782] ? lookup_get_idr_uobject+0x39/0x50 [ 137.414737] rdma_remove_commit_uobject+0x3b/0x70 [ 137.415742] ib_uverbs_destroy_cq+0x114/0x1d0 [ 137.416260] ? ib_uverbs_req_notify_cq+0x160/0x160 [ 137.417073] ? kernel_text_address+0x5c/0x90 [ 137.417805] ? __kernel_text_address+0xe/0x30 [ 137.418766] ? unwind_get_return_address+0x2f/0x50 [ 137.419558] ib_uverbs_write+0x453/0x6a0 [ 137.420220] ? show_ibdev+0x90/0x90 [ 137.420653] ? __kasan_slab_free+0x136/0x180 [ 137.421155] ? kmem_cache_free+0x78/0x1e0 [ 137.422192] ? remove_vma+0x83/0x90 [ 137.422614] ? do_munmap+0x447/0x6c0 [ 137.423045] ? vm_munmap+0xb0/0x100 [ 137.423481] ? SyS_munmap+0x1d/0x30 [ 137.424120] ? do_syscall_64+0xeb/0x250 [ 137.424984] ? entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 137.425611] ? lru_add_drain_all+0x270/0x270 [ 137.426116] ? lru_add_drain_cpu+0xa3/0x170 [ 137.426616] ? lru_add_drain+0x11/0x20 [ 137.427058] ? free_pages_and_swap_cache+0xa6/0x120 [ 137.427672] ? tlb_flush_mmu_free+0x78/0x90 [ 137.428168] ? arch_tlb_finish_mmu+0x6d/0xb0 [ 137.428680] __vfs_write+0xc4/0x350 [ 137.430917] ? kernel_read+0xa0/0xa0 [ 137.432758] ? remove_vma+0x90/0x90 [ 137.434781] ? __kasan_slab_free+0x14b/0x180 [ 137.437486] ? remove_vma+0x83/0x90 [ 137.439836] ? kmem_cache_free+0x78/0x1e0 [ 137.442195] ? percpu_counter_add_batch+0x1d/0x90 [ 137.444389] vfs_write+0xf7/0x280 [ 137.446030] SyS_write+0xa1/0x120 [ 137.447867] ? SyS_read+0x120/0x120 [ 137.449670] ? mm_fault_error+0x180/0x180 [ 137.451539] ? _cond_resched+0x16/0x50 [ 137.453697] ? SyS_read+0x120/0x120 [ 137.455883] do_syscall_64+0xeb/0x250 [ 137.457686] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 137.459595] RIP: 0033:0x7fb415637b94 [ 137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: 00007fb415637b94 [ 137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: 0000000000000003 [ 137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: 00005565022d40c0 [ 137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: 00005565022d2520 [ 137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: 00007ffdebea7fd0 [ 137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee [ 137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968 [ 137.486436] ---[ end trace 81835a1ea6722eed ]--- [ 137.488566] Kernel panic - not syncing: Fatal exception [ 137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Add Scatter FCS support over WQ creationGuy Levi3-9/+32
As a default, for Ethernet packets, the device scatters only the payload of ingress packets. The scatter FCS feature lets the user to get the FCS (Ethernet's frame check sequence) in the received WR's buffer as a 4 Bytes trailer following the packet's payload. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Report TSO capabilitiesYishai Hadas2-2/+29
Report to the user area the TSO device capabilities, it includes the max_tso size and the QP types that support it. The TSO is applicable only when when of the ports is ETH and the device supports it. uresp logic around rss_caps is updated to fix a till-now harmless bug computing the length of the structure to copy. The code did not handle the implicit padding before rss_caps correctly. This is necessay to copy tss_caps successfully. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15i40iw: Tear-down connection after CQP Modify QP failureHenry Orosco4-10/+30
There is no explicit tear-down sequence initiated on connections if the Control QP OP, Modify QP to close, fails. Fix this by triggering a driver generated Asynchronous Event (AE) on Modify QP failures and tear-down the connection on receipt of the AE. This fix can be generalized to other Modify QP failures (i.e. RTS->TERM, IDLE->RTS, etc) as any modify failure will require a connection tear-down. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15i40iw: Refactor of driver generated AEsHenry Orosco6-10/+108
The flush CQP OP can be used to optionally generate Asynchronous Events (AEs) in addition to QP flush. Consolidate all HW AE generation code under a new function i40iw_gen_ae which use the flush CQP OP to only generate AEs. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/cxgb4: Use structs to describe the uABI instead of opencodingJason Gunthorpe2-1/+8
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/hns: Use structs to describe the uABI instead of opencodingJason Gunthorpe2-1/+9
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/i40iw: Move uapi header to include/uapiJason Gunthorpe3-3/+4
All of these defines are part of the uABI for the driver, this header duplicates providers/i40iw/i40iw-abi.h in rdma-core. Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/mlx4: Move flag constants to uapi headerJason Gunthorpe6-9/+11
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/rxe: Use structs to describe the uABI instead of opencodingJason Gunthorpe8-79/+116
Open coding pointer math is not acceptable for describing the uABI in RDMA. Provide structs for all the cases. The udata is casted to the struct as close to the verbs entry point as possible for maximum clarity. Function signatures and so forth are revised to allow for this. Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/rxe: Get rid of confusing udata parameter to rxe_cq_chk_attrJason Gunthorpe3-4/+4
It isn't used and it couldn't possibly ever be used correctly. Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMAVT: Fix synchronization around percpu_refTejun Heo1-4/+6
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: linux-rdma@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/qedr: eliminate duplicate barriers on weakly-ordered archsSinan Kaya1-2/+2
Code includes wmb() followed by writel() in multiple places. writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/hns: Fix cqn type and init respYixian Liu3-14/+12
This patch changes the type of cqn from u32 to u64 to keep userspace and kernel consistent, initializes resp both for cq and qp to zeros, and also changes the condition judgment of outlen considering future caps extension. Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Fixes: e088a685eae9 (hns: Support rq record doorbell for the user space) Fixes: 9b44703d0a21 (hns: Support cq record doorbell for the user space) Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Move rdma_addr_find_l2_eth_by_grh to core_priv.hParav Pandit2-5/+5
Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function and therefore declaration in include/rdma/ib_addr.h was fine. But now that its scope is limited to ib_core module, its better to have it in core_priv.h. [1] commit 1060f8653414 ("IB/{core/cm}: Fix generating a return AH for RoCEE") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/cm: Introduce and use helper function to get cm_port from pathParav Pandit1-4/+13
Introduce and use helper function get_cm_port_from_path() to get cm_port based on the the path record entry. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Refactor ib_init_ah_attr_from_path() for RoCEParav Pandit2-96/+108
Resolving route for RoCE for a path record is needed only for the received CM requests. Therefore, (a) ib_init_ah_attr_from_path() is refactored first to isolate the code of resolving route. (b) Setting dlid, path bits is not needed for RoCE. Additionally ah attribute initialization is done from the path record entry, so it is better to refer to path record entry type for different link layer instead of ah attribute type while initializing ah attribute itself. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>