aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-06-10Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', ↵Roland Dreier74-662/+3781
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
2014-06-10RDMA/cxgb4: Add support for iWARP Port Mapper user space serviceSteve Wise3-43/+262
Based on original work by Vipul Pandya. Signed-off-by: Steve Wise <[email protected]> [ Fix htons -> ntohs to make sparse happy. - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2014-06-10RDMA/nes: Add support for iWARP Port Mapper user space serviceTatyana Nikolova4-64/+296
Signed-off-by: Tatyana Nikolova <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-10RDMA/core: Add support for iWARP Port Mapper user space serviceTatyana Nikolova9-6/+1865
This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP Port Mapper implementation is based on the port mapper specification section in the Sockets Direct Protocol paper - http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf Existing iWARP RDMA providers use the same IP address as the native TCP/IP stack when creating RDMA connections. They need a mechanism to claim the TCP ports used for RDMA connections to prevent TCP port collisions when other host applications use TCP ports. The iWARP Port Mapper provides a standard mechanism to accomplish this. Without this service it is possible for RDMA application to bind/listen on the same port which is already being used by native TCP host application. If that happens the incoming TCP connection data can be passed to the RDMA stack with error. The iWARP Port Mapper solution doesn't contain any changes to the existing network stack in the kernel space. All the changes are contained with the infiniband tree and also in user space. The iWARP Port Mapper service is implemented as a user space daemon process. Source for the IWPM service is located at http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary The iWARP driver (port mapper client) sends to the IWPM service the local IP address and TCP port it has received from the RDMA application, when starting a connection. The IWPM service performs a socket bind from user space to get an available TCP port, called a mapped port, and communicates it back to the client. In that sense, the IWPM service is used to map the TCP port, which the RDMA application uses to any port available from the host TCP port space. The mapped ports are used in iWARP RDMA connections to avoid collisions with native TCP stack which is aware that these ports are taken. When an RDMA connection using a mapped port is terminated, the client notifies the IWPM service, which then releases the TCP port. The message exchange between the IWPM service and the iWARP drivers (between user space and kernel space) is implemented using netlink sockets. 1) Netlink interface functions are added: ibnl_unicast() and ibnl_mulitcast() for sending netlink messages to user space 2) The signature of the existing ibnl_put_msg() is changed to be more generic 3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW corresponding to the two iWarp drivers - nes and cxgb4 which use the IWPM service 4) Enums are added to enumerate the attributes in the netlink messages, which are exchanged between the user space IWPM service and the iWARP drivers Signed-off-by: Tatyana Nikolova <[email protected]> Signed-off-by: Steve Wise <[email protected]> Reviewed-by: PJ Waskiewicz <[email protected]> [ Fold in range checking fixes and nlh_next removal as suggested by Dan Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2014-06-09IB/mlx4: Fix gfp passing in create_qp_common()Jiri Kosina1-2/+2
There are two kzalloc() calls which were not converted to use value of gfp passed to create_qp_common() instead of using hardcoded GFP_KERNEL in 40f2287bd583 ("IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO"). Fix this by passing gfp value down properly. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-06IB/umad: Fix use-after-free on closeBart Van Assche1-11/+19
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n> triggers a use-after-free. __fput() invokes f_op->release() before it invokes cdev_put(). Make sure that the ib_umad_device structure is freed by the cdev_put() call instead of f_op->release(). This avoids that changing the port mode from IB into Ethernet and back to IB followed by restarting opensmd triggers the following kernel oops: general protection fault: 0000 [#1] PREEMPT SMP RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170 Call Trace: [<ffffffff81190f20>] cdev_put+0x20/0x30 [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0 [<ffffffff8118e35e>] ____fput+0xe/0x10 [<ffffffff810723bc>] task_work_run+0xac/0xe0 [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0 [<ffffffff814b8398>] int_signal+0x12/0x17 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051 Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Yann Droneaud <[email protected]> Cc: <[email protected]> # 3.x: 8ec0a0e6b58: IB/umad: Fix error handling Signed-off-by: Roland Dreier <[email protected]>
2014-06-05IB/core: Fix kobject leak on device register error flowHaggai Eran1-26/+26
The ports kobject isn't being released during error flow in device registration. This patch refactors the ports kobject cleanup into a single function called from both the error flow in device registration and from the unregistration function. A couple of attributes aren't being deleted (iw_stats_group, and ib_class_attributes). While this may be handled implicitly by the destruction of their kobjects, it seems better to handle all the attributes the same way. Signed-off-by: Haggai Eran <[email protected]> [ Make free_port_list_attributes() static. - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2014-06-05RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_respYann Droneaud2-2/+4
The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_alloc_ucontext_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp { __u64 status_page_key; /* 0 8 */ __u32 status_page_size; /* 8 4 */ - /* size: 12, cachelines: 1, members: 2 */ - /* last cacheline: 12 bytes */ + /* size: 16, cachelines: 1, members: 2 */ + /* padding: 4 */ + /* last cacheline: 16 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. Additionally, as reported by Dan Carpenter, without the implicit padding being properly cleared, an information leak would take place in most architectures. This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_alloc_ucontext_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/[email protected] Link: http://marc.info/[email protected] Link: http://marc.info/?i=20140328082428.GH25192@mwanda Cc: <[email protected]> Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes") Reported-by: Yann Droneaud <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Yann Droneaud <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-04mlx4_core: Fix GFP flags parameters to be gfp_tRoland Dreier3-3/+3
Otherwise sparse gives a bunch of warnings like drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: sparse: incorrect type in argument 4 (different base types) drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: expected int [signed] gfp drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: got restricted gfp_t Signed-off-by: Roland Dreier <[email protected]>
2014-06-04IB/core: Fix port kobject deletion during error flowHaggai Eran1-9/+17
When encountering an error during the add_port function, adding a port to sysfs, the port kobject is freed without being deleted from sysfs. Instead of freeing it directly, the patch uses kobject_put to release the kobject and delete it. Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-04IB/core: Remove unneeded kobject_get/put callsHaggai Eran1-5/+2
The ib_core module will call kobject_get on the parent object of each kobject it creates. This is redundant since kobject_add does that anyway. As a side effect, this patch should fix leaking the ports kobject and the device kobject during unregister flow, since the previous code didn't seem to take into account the kobject_get calls on behalf of the child kobjects. Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-04IB/core: Fix sparse warnings about redeclared functionsRoland Dreier2-9/+9
Fix a few functions that are declared with __attribute_const__ in the ib_verbs.h header file but defined without it in verbs.c. This gets rid of the following sparse warnings: drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers Signed-off-by: Roland Dreier <[email protected]>
2014-06-03IB/mad: Fix sparse warning about gfp_t useRoland Dreier1-1/+1
Properly convert gfp_t & result to bool to fix: drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types) drivers/infiniband/core/sa_query.c:621:33: expected bool [unsigned] [usertype] preload drivers/infiniband/core/sa_query.c:621:33: got restricted gfp_t Signed-off-by: Roland Dreier <[email protected]>
2014-06-02IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIOJiri Kosina16-70/+82
Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-02IB: Add a QP creation flag to use GFP_NOIO allocationsOr Gerlitz2-3/+16
This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-02IB: Return error for unsupported QP creation flagsOr Gerlitz2-1/+5
Fix the usnic and thw qib drivers to err when QP creation flags that they don't understand are provided. Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-06-02IB: Allow build of hw/ and ulp/ subdirectories independentlyYann Droneaud3-17/+19
It is not possible to build only the drivers/infiniband/hw/ (or ulp/) subdirectory with command such as: $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/ This fails with following error messages: make[2]: Nothing to be done for `all'. make[2]: Nothing to be done for `relocs'. CHK include/config/kernel.release Using /home/ydroneaud/src/linux as source for kernel GEN /home/ydroneaud/src/linux/obj-x86_64/Makefile CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALL /home/ydroneaud/src/linux/scripts/checksyscalls.sh /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'. Stop. make[1]: *** [drivers/infiniband/hw/] Error 2 make: *** [sub-make] Error 2 This patch creates a Makefile in hw/ and ulp/ and moves each corresponding parts of drivers/infiniband/Makefile in the new Makefiles. It should not break build except if some hw/ drivers or ulp/ were allowed previously to be built while CONFIG_INFINIBAND is set to 'n', but according to drivers/infiniband/Kconfig, it's not possible. So it should be safe to apply. Signed-off-by: Yann Droneaud <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-30mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statementRoland Dreier1-14/+14
The handling of MLX4_QP_ST_MLX in verify_qp_parameters() was accidentally put inside the inner switch statement (that handles which transition of RC/UC/XRC QPs is happening). Fix this by moving the case to the outer switch statement. The compiler pointed this out with: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters': >> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2875:3: warning: case value '7' not in enumerated type 'enum qp_transition' [-Wswitch] case MLX4_QP_ST_MLX: Reported-by: kbuild test robot <[email protected]> Fixes: 99ec41d0a48c ("mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs") Signed-off-by: Roland Dreier <[email protected]>
2014-05-29RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_respYann Droneaud2-2/+3
The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_create_cq_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_create_cq_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp { __u32 size; /* 28 4 */ __u32 qid_mask; /* 32 4 */ - /* size: 36, cachelines: 1, members: 6 */ - /* last cacheline: 36 bytes */ + /* size: 40, cachelines: 1, members: 6 */ + /* padding: 4 */ + /* last cacheline: 40 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. This patch adds an explicit padding at end of structure c4iw_create_cq_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_create_cq_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/[email protected] Cc: <[email protected]> Fixes: cfdda9d764362 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Fixes: e24a72a3302a6 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()") Cc: Dan Carpenter <[email protected]> Signed-off-by: Yann Droneaud <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/srp: Avoid problems if a header uses pr_fmtJoe Perches1-1/+1
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of header files before it gets around to defining PFX. This causes problems if any of the header files do a pr_... and use pr_fmt(). Fix this by using KBUILD_MODNAME instead of the private PFX. Acked-by: Chris Metcalf <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/umad: Fix error handlingBart Van Assche1-22/+27
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL or if nonseekable_open() fails. Avoid leaking a kref count, that sm_sem is kept down and also that the IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if nonseekable_open() fails. Since container_of() never returns NULL, remove the code that tests whether container_of() returns NULL. Moving the kref_get() call from the start of ib_umad_*open() to the end is safe since it is the responsibility of the caller of these functions to ensure that the cdev pointer remains valid until at least when these functions return. Signed-off-by: Bart Van Assche <[email protected]> Cc: <[email protected]> [[email protected]: rework a bit to reduce the amount of code changed] Signed-off-by: Yann Droneaud <[email protected]> [ nonseekable_open() can't actually fail, but.... - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPsJack Morgenstein3-1/+141
This commit adds the sysfs interface for enabling QP0 on VFs for selected VF/port. By default, no VFs are enabled for QP0 operation. To enable QP0 operation on a VF/port, under /sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries: - smi_enabled (read-only). Indicates whether smi is currently enabled for the indicated VF/port - enable_smi_admin (rw). Used by the admin to request that smi capability be enabled or disabled for the indicated VF/port. 0 = disable, 1 = enable. The requested enablement will occur at the next reset of the VF (e.g. driver restart on the VM which owns the VF). Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPsJack Morgenstein8-35/+126
This commit adds the infrastructure for enabling selected VFs to operate SMI (QP0) MADs without restriction. Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs are MLX QPs. As such, they operate over VL15. Therefore, they are not affected by "credit" problems or changes in the VLArb table (which may shut down VL0). Non-enabled VFs may only create UD proxy QP0 qps (which are forced by the hypervisor to send packets using the q-key it assigns and places in the qp-context). Thus, non-enabled VFs will not pose a security risk. The hypervisor discards any privileged MADs it receives from these non-enabled VFs. By default, all VFs are NOT enabled, and must explicitly be enabled by the administrator. The sysfs interface which operates the VF enablement infrastructure is provided in the next commit. Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responsesJack Morgenstein4-38/+66
Currently, VFs in SRIOV VFs are denied QP0 access. The main reason for this decision is security, since Subnet Management Datagrams (SMPs) are not restricted by network partitioning and may affect the physical network topology. Moreover, even the SM may be denied access from portions of the network by setting management keys unknown to the SM. However, it is desirable to grant SMI access to certain privileged VFs, so that certain network management activities may be conducted within virtual machines instead of the hypervisor. This commit does the following: 1. Create QP0 tunnel QPs for all VFs. 2. Discard SMI mads sent-from/received-for non-privileged VFs in the hypervisor MAD multiplex/demultiplex logic. SMI mads from/for privileged VFs are allowed to pass. 3. MAD_IFC wrapper changes/fixes. For non-privileged VFs, only host-view MAD_IFC commands are allowed, and only for SMI LID-Routed GET mads. For privileged VFs, there are no restrictions. This commit does not allow privileged VFs as yet. To determine if a VF is privileged, it calls function mlx4_vf_smi_enabled(). This function returns 0 unconditionally for now. The next two commits allow defining and activating privileged VFs. Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrappedJack Morgenstein1-9/+17
mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations counters and for modifying the IB port capability flags. For example, when opensm is started up on the hypervisor, mlx4_ib_modify_port is called to set the port's IsSM flag. In multifunction mode, the SET_PORT command used in this flow should be wrapped (so that the PF port capability flags are also tracked, thus enabling the aggregate of all the VF/PF capability flags to be tracked properly). The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT() to differentiate it from procedure mlx4_SET_PORT() in port.c. mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port(). Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag even when running over RoCE. Therefore, when RoCE is active, mlx4_ib_modify_port should return OK unconditionally (since the capability flags and qkey violations counter are not relevant). Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAPJack Morgenstein1-1/+1
Commit eb17711bc1d6 ("net/mlx4_core: Introduce nic_info new flag in QUERY_FUNC_CAP") did: if (func_cap->flags1 & QUERY_FUNC_CAP_FLAGS1_OFFSET) { which should be: if (func_cap->flags1 & QUERY_FUNC_CAP_FLAGS1_FORCE_VLAN) { Fix that. Fixes: eb17711bc1d6 ("net/mlx4_core: Introduce nic_info new flag in QUERY_FUNC_CAP") Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29mlx4_core: Fix memory leaks in SR-IOV error pathsDotan Barak1-0/+14
Fix a few memory leaks that happen if errors happen in SR-IOV mode. Signed-off-by: Dotan Barak <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-29IB/qib: Additional Intel branding changesVinit Agnihotri1-4/+4
This patches changes user visible function names containing "qlogic" in module init and cleanup. Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Vinit Agnihotri <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-28RDMA/cxgb3: Remove a couple unneeded conditionsDan Carpenter1-4/+2
We know that "reset_tpt_entry" is false on this side of the if else statement so there is no need to check again. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-28IB/mlx4: fix unitialised variable is_mcastColin Ian King1-1/+1
Commit 297e0dad7206 ("IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing") introduced a bug where is_mcast is now no longer initialized on the non-multicast condition and so it can be any random value from the stack. This issue was detected by cppcheck: [drivers/infiniband/hw/mlx4/ah.c:103]: (error) Uninitialized variable: is_mcast Simple fix is to initialise is_mcast to zero. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-28IB/ipath: Use time_before()/_after()Manuel Schölling2-4/+4
Time comparisons must use time_after / time_before to avoid problems when jiffies wraps. Signed-off-by: Manuel Schölling <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-28IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bitsRoland Dreier1-1/+1
We need to cast wr_id to unsigned long before casting to a pointer. This fixes: drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_umr_cq_handler': >> drivers/infiniband/hw/mlx5/mr.c:724:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] context = (struct mlx5_ib_umr_context *)wc.wr_id; Reported-by: kbuild test robot <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/usnic: Fix source file missing copyright and licenseUpinder Malhi1-0/+18
Prepends copyright and license to usnic_uiom_interval_tree.c Signed-off-by: Upinder Malhi <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/ipath: Translate legacy diagpkt into newer extended diagpktDennis Dalessandro1-0/+4
This patch addresses an issue where the legacy diagpacket is sent in from the user, but the driver operates on only the extended diagpkt. This patch specifically initializes the extended diagpkt based on the legacy packet. Cc: <[email protected]> Reported-by: Rickard Strandqvist <[email protected]> Reviewed-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/qib: Fix port in pkey change eventMike Marciniszyn1-1/+1
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE. As of the dual port qib QDR card, this is not necessarily correct. Change to use the port as specified in the call. Cc: <[email protected]> Reported-by: Alex Estrin <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27RDMA/cxgb3: Fix information leak in send_abort()Dan Carpenter1-0/+1
The cpl_abort_req struct has several reserved members which need to be cleared to avoid disclosing kernel information. I have added a memset() so now it matches the cxgb4 version of this function. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: add missing padding at end of struct mlx5_ib_create_srqYann Droneaud2-1/+14
The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. Tool pahole could be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_srq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq { __u64 db_addr; /* 8 8 */ __u32 flags; /* 16 4 */ - /* size: 20, cachelines: 1, members: 3 */ - /* last cacheline: 20 bytes */ + /* size: 24, cachelines: 1, members: 3 */ + /* padding: 4 */ + /* last cacheline: 24 bytes */ }; ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lay in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail and the uverb will fail. This patch makes create_srq_user() takes care of the input data size to handle the case where no padding was provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/[email protected] Cc: <[email protected]> Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: add missing padding at end of struct mlx5_ib_create_cqYann Droneaud2-1/+13
The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_cq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq { __u64 db_addr; /* 8 8 */ __u32 cqe_size; /* 16 4 */ - /* size: 20, cachelines: 1, members: 3 */ - /* last cacheline: 20 bytes */ + /* size: 24, cachelines: 1, members: 3 */ + /* padding: 4 */ + /* last cacheline: 24 bytes */ }; This ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, a x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lies in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail when trying to read the 4 bytes of padding and the uverb will fail. This patch makes create_cq_user() takes care of the input data size to handle the case where no padding is provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/[email protected] Cc: <[email protected]> Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: Refactor UMR to have its own context structShachar Raindel2-22/+31
Instead of having the UMR context part of each memory region, allocate a struct on the stack. This allows queuing multiple UMRs that access the same memory region. Signed-off-by: Shachar Raindel <[email protected]> Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: Set QP offsets and parameters for user QPs and not just for kernel QPsHaggai Eran1-0/+4
For user QPs, the creation process does not currently initialize the fields: * qp->rq.offset * qp->sq.offset * qp->sq.wqe_shift Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27mlx5_core: Store MR attributes in mlx5_mr_core during creation and after UMRHaggai Eran3-1/+8
The patch stores iova, pd and size during mr creation and after UMRs that modify them. It removes the unused access flags field. Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: Add MR to radix tree in reg_mr_callbackHaggai Eran1-0/+9
For memory regions that are allocated using reg_umr, the suffix of mlx5_core_create_mkey isn't being called. Instead the creation is completed in a callback function (reg_mr_callback). This means that these MRs aren't being added to the MR radix tree. Add them in the callback. Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27IB/mlx5: Fix error handling in reg_umrHaggai Eran1-15/+16
If ib_post_send fails when posting the UMR work request in reg_umr, the code doesn't release the temporary pas buffer allocated, and doesn't dma_unmap it. Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27mlx5_core: Copy DIF fields only when input and output space values matchSagi Grimberg1-3/+6
Some DIF implementations (SCSI initiator/target) may want to use different input/output values for application tag and/or reference tag. So in case memory/wire domain values don't match HW must not copy them. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27mlx5_core: Simplify signature handover wqe for interleaved buffersSagi Grimberg1-15/+9
No need for repetition format pattern in case the data and protection are already interleaved in the memory domain since the pattern already exists. A single key entry is sufficient and may save some extra fetch ops. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-27mlx5_core: Fix signature handover operation for interleaved buffersSagi Grimberg1-1/+4
When the data and protection are interleaved in the memory domain, no need to expand the mkey total length. At the moment no Linux user works (iSER initiator & target) in interleaved mode. This may change in the future as for SCSI pass-through devices there is no real point in target performing de-interleaving and re-interleaving of the protection data in the PT stage. Regardless, signature verbs support this mode. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-26IB/iser: Bump version to 1.4Or Gerlitz1-1/+1
Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-26IB/iser: Add missing newlines to logging messagesRoi Dayan1-4/+4
Logging messages need terminating newlines to avoid possible message interleaving. Add them. Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-26IB/iser: Fix a possible race in iser connection states transitionAriel Nahum1-2/+2
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event and ep_disconnect may race. In this case, the iser connection state may transition to UP (after ep_disconnect transitioned it to TERMINATING), while the connection is being torn down. Upon RDMA_CM event ESTABLISHED we allow iser connection state to transition to UP only from PENDING. We also make sure to protect this state change (done under the connection lock). Signed-off-by: Ariel Nahum <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-05-26IB/iser: Simplify connection managementAriel Nahum3-91/+99
iSER relies on refcounting to manage iser connections establishment and teardown. Following commit 39ff05dbbbdb ("IB/iser: Enhance disconnection logic for multi-pathing"), iser connection maintain 3 references: - iscsi_endpoint (at creation stage) - cma_id (at connection request stage) - iscsi_conn (at bind stage) We can avoid taking explicit refcounts by correctly serializing iser teardown flows (graceful and non-graceful). Our approach is to trigger a scheduled work to handle ordered teardown by gracefully waiting for 2 cleanup stages to complete: 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion 2. Flush errors processing Each completed stage will notify a waiting worker thread when it is done to allow teardwon continuation. Since iSCSI connection establishment may trigger endpoint disconnect without a successful endpoint connect, we rely on the iscsi <-> iser binding (.conn_bind) to learn about the teardown policy we should take wrt cleanup stages. Since all cleanup worker threads are scheduled (release_wq) in .ep_disconnect it is safe to assume that when module_exit is called, all cleanup workers are already scheduled. Thus proper module unload shall flush all scheduled works before allowing safe exit, to guarantee no resources got left behind. Signed-off-by: Ariel Nahum <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>