Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
While we're at it, use permission defines instead
of octal values and rearrange a little bit.
Signed-off-by: Jenny Derzhavetz <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: John Fleck <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: John Fleck <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: John Fleck <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.
Signed-off-by: Kaike Wan <[email protected]>
Signed-off-by: John Fleck <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().
Reported-by: Konstantin Krotov <[email protected]>
Fixes: fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Cc: <[email protected]> #v3.19
Signed-off-by: Doug Ledford <[email protected]>
|
|
Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:
WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<ffffffff81191fc4>] vfs_write+0xb4/0x130
[<ffffffff8119276f>] sys_write+0x5f/0xa0
[<ffffffff815a0a59>] system_call_fastpath+0x16/0x1b
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Signed-off-by: Doug Ledford <[email protected]>
|
|
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Sebastian Parschauer <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.
In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.
Fixes: 35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The pkey mapping for RoCE must remain the default mapping:
VFs:
virtual index 0 = mapped to real index 0 (0xFFFF)
All others indices: mapped to a real pkey index containing an
invalid pkey.
PF:
virtual index i = real index i.
Don't allow users to change these mappings using files found in
sysfs.
Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from warning level to
debug level.
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
send_mad_to_wire takes the same spinlock that is taken in
the interrupt context. Therefore, it needs irqsave/restore.
Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Doug Ledford <[email protected]>
|
|
1.Change query_gid hook to return value from IB/Core GID
management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.
modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.
Signed-off-by: Moni Shoua <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.
Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).
Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
(1) if a related net-device is linked, delete all inactive
slaves default GIDs and add the upper device GIDs.
(2) if a related net-device is unlinked, delete all upper GIDs
and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
(1) delete the bond GIDs from inactive slaves
(2) delete the inactive slave's default GIDs
(3) Add the bond GIDs to the active slave.
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Smatch says that, based on the indenting, we should probably add curly
braces here.
Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.
Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.
In order to populate the GID table, we listen for events:
(a) netdev up/down/change_addr events - if a netdev is built onto
our RoCE device, we need to add/delete its IPs. This involves
adding all GIDs related to this ndev, add default GIDs, etc.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.
RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.
RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).
Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.
While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.
In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.
When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).
The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.
The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.
Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.
Add information in the notifier info block for that purpose.
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.
addrconf_ifid_eui48 is used to gernerate this address, export it.
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Fully replaced by a more generic and suitable
ib_alloc_mr.
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Ported from upstream qib commit
68c02e232b8a ("qib: Support ib_alloc_mr verb")
Tested-by: Jubin John <[email protected]>
Reviewed-by: Jubin John <[email protected]>
Signed-off-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.
Fixes: 0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|
|
Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.
Signed-off-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>
|