Age | Commit message (Collapse) | Author | Files | Lines |
|
Re-initializing the wait object in rdma_init()/rdma_fini() causes a
timing window which can lead to a deadlock during close. Once this
deadlock hits, all RDMA activity over the T4 device will be stuck.
There's no need to re-init the wait object, so remove it.
Signed-off-by: Steve Wise <[email protected]>
Cc: <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
This removes unused code found by running 'make namespacecheck';
compile tested only.
Signed-off-by: Stephen Hemminger <[email protected]>
Acked-by: Steve Wise <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Put the variables accessed together in the hot-path into common
cachelines, and separate them by RW vs RO to avoid false dirtying.
We keep a local copy of the lkey and rkey in the target to avoid
traversing pointers (and associated cache lines) to find them.
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: David Dillow <[email protected]>
|
|
We don't need protection against the SCSI stack, so use our own lock to
allow parallel progress on separate CPUs.
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <[email protected]>
|
|
We only need the lock to cover list and credit manipulations, so push
those into srp_remove_req() and update the call chains.
We reorder the request removal and command completion in
srp_process_rsp() to avoid the SCSI mid-layer sending another command
before we've released our request and added any credits returned by the
target. This prevents us from returning HOST_BUSY unneccesarily.
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out, small cleanups, and modified to avoid potential extraneous
HOST_BUSY returns by David Dillow ]
Signed-off-by: David Dillow <[email protected]>
|
|
We only need locks to protect our lists and number of credits available.
By pre-consuming the credit for the request, we can reduce our lock
coverage to just those areas. If we don't actually send the request,
we'll need to put the credit back into the pool.
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <[email protected]>
|
|
We use req->scmnd != NULL to indicate an active request, so there's no
need to keep a separate list for them. We can afford the array iteration
during error handling, and dropping it gives us one less item that needs
lock protection.
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
fs: scale mntget/mntput
fs: rename vfsmount counter helpers
fs: implement faster dentry memcmp
fs: prefetch inode data in dcache lookup
fs: improve scalability of pseudo filesystems
fs: dcache per-inode inode alias locking
fs: dcache per-bucket dcache hash locking
bit_spinlock: add required includes
kernel: add bl_list
xfs: provide simple rcu-walk ACL implementation
btrfs: provide simple rcu-walk ACL implementation
ext2,3,4: provide simple rcu-walk ACL implementation
fs: provide simple rcu-walk generic_check_acl implementation
fs: provide rcu-walk aware permission i_ops
fs: rcu-walk aware d_revalidate method
fs: cache optimise dentry and inode for rcu-walk
fs: dcache reduce branches in lookup path
fs: dcache remove d_mounted
fs: fs_struct use seqlock
fs: rcu-walk for path lookup
...
|
|
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: Nick Piggin <[email protected]>
|
|
dcache_lock no longer protects anything. remove it.
Signed-off-by: Nick Piggin <[email protected]>
|
|
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: Nick Piggin <[email protected]>
|
|
Only one CPU at a time will own an RX IU, so using the address of the IU
as the work request cookie allows us to avoid taking a lock. We can
similarly prepare the TX path for lockless posting by moving the free TX
IUs to a list. This also removes the requirement that the queue sizes be
a power of 2.
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out, small cleanups, and modified to avoid needing an extra field
in the IU by David Dillow]
Signed-off-by: David Dillow <[email protected]>
|
|
Signed-off-by: Bart Van Assche <[email protected]>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <[email protected]>
|
|
We can only have one task management comment outstanding, so move the
completion and status to the target port. This allows us to handle
resets of a LUN without a corresponding request having been sent.
Meanwhile, we don't need to play games with host_scribble, just use it
as the pointer it is.
This fixes a crash when we issue a bus reset using sg_reset.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893
Reported-by: Bart Van Assche <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: David Dillow <[email protected]>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
net/ipv4/fib_frontend.c
|
|
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
|
|
In ib_uverbs_poll_cq() code there is a potential integer overflow if
userspace passes in a large cmd.ne. The calls to kmalloc() would
allocate smaller buffers than intended, leading to memory corruption.
There iss also an information leak if resp wasn't all used.
Unprivileged userspace may call this function, although only if an
RDMA device that uses this function is present.
Fix this by copying CQ entries one at a time, which avoids the
allocation entirely, and also by moving this copying into a function
that makes sure to initialize all memory copied to userspace.
Special thanks to Jason Gunthorpe <[email protected]>
for his help and advice.
Cc: <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
[ Monkey around with things a bit to avoid bad code generation by gcc
when designated initializers are used. - Roland ]
Signed-off-by: Roland Dreier <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Fix information leak in marshalling code
IB/pack: Remove some unused code added by the IBoE patches
IB/mlx4: Fix IBoE link state
IB/mlx4: Fix IBoE reported link rate
mlx4_core: Workaround firmware bug in query dev cap
IB/mlx4: Fix memory ordering of VLAN insertion control bits
MAINTAINERS: Update NetEffect entry
|
|
|
|
ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr
with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved}
fields uninitialized to copy_to_user(). This leads to leaking of
contents of kernel stack memory to userspace.
Signed-off-by: Vasiliy Kulikov <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Remove unused functions added by commit ff7f5aab354d ("IB/pack: IBoE UD
packet packing support").
Signed-off-by: Or Gerlitz <[email protected]>
|
|
Use netif_running() and netif_carrier_ok() to report link state,
exactly as is done to report Ethernet link state in sysfs.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
The link rate is the product of the link speed in the link width. For
Etherent ports the rate is 10G, so we use 1 for the width and 4 for
speed to get the correct rate.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
We must fully update the control segment before marking it as valid,
so that hardware doesn't start executing it before we're ready.
Signed-off-by: Eli Cohen <[email protected]>
[ Move VLAN control bit setting to before wmb(). - Roland ]
Signed-off-by: Roland Dreier <[email protected]>
|
|
dev_base_lock is the legacy way to lock the device list, and is planned
to disappear. (writers hold RTNL, readers hold RCU lock)
Convert rdma_translate_ip() and update_ipv6_gids() to RCU locking.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Roland Dreier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.
The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.
Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)
Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.
Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.
Signed-off-by: Jeff Garzik <[email protected]>
Acked-by: James Bottomley <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Only include the header linux/mutex.h once inside
drivers/infiniband/hw/cxgb4/iw_cxgb4.h
Signed-off-by: Jesper Juhl <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.
We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).
infiniband case is solved using dst.dev instead of idev->dev
Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.
About 5% speedup on routing test.
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Roland Dreier <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
split invalidate_inodes()
fs: skip I_FREEING inodes in writeback_sb_inodes
fs: fold invalidate_list into invalidate_inodes
fs: do not drop inode_lock in dispose_list
fs: inode split IO and LRU lists
fs: switch bdev inode bdi's correctly
fs: fix buffer invalidation in invalidate_list
fsnotify: use dget_parent
smbfs: use dget_parent
exportfs: use dget_parent
fs: use RCU read side protection in d_validate
fs: clean up dentry lru modification
fs: split __shrink_dcache_sb
fs: improve DCACHE_REFERENCED usage
fs: use percpu counter for nr_dentry and nr_dentry_unused
fs: simplify __d_free
fs: take dcache_lock inside __d_path
fs: do not assign default i_ino in new_inode
fs: introduce a per-cpu last_ino allocator
new helper: ihold()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
IB/qib: Allow driver to load if PCIe AER fails
IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
IB/qib: Fix extra log level in qib_early_err()
RDMA/cxgb4: Remove unnecessary KERN_<level> use
RDMA/cxgb3: Remove unnecessary KERN_<level> use
IB/core: Add link layer type information to sysfs
IB/mlx4: Add VLAN support for IBoE
IB/core: Add VLAN support for IBoE
IB/mlx4: Add support for IBoE
mlx4_en: Change multicast promiscuous mode to support IBoE
mlx4_core: Update data structures and constants for IBoE
mlx4_core: Allow protocol drivers to find corresponding interfaces
IB/uverbs: Return link layer type to userspace for query port operation
IB/srp: Sync buffer before posting send
IB/srp: Use list_first_entry()
IB/srp: Reduce number of BUSY conditions
IB/srp: Eliminate two forward declarations
IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
IB: Replace EXTRA_CFLAGS with ccflags-y
...
|
|
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Hartley Sweeten <[email protected]>
Cc: Russell King <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Roland Dreier <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
|
|
Clean up properly if pci_set_consistent_dma_mask() fails.
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Some PCIe root complex chip sets don't support advanced error reporting.
Allow the driver to load OK if pci_enable_pcie_error_reporting() fails.
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer
"dd" is uninitialized.
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Noticed this odd looking thing in dmesg:
ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5
which is due to a bad use of dev_info.
Signed-off-by: Jason Gunthorpe <[email protected]>
Acked-by: Ralph Campbell <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Signed-off-by: Joe Perches <[email protected]>
Acked-by: Steve Wise <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Signed-off-by: Joe Perches <[email protected]>
Acked-by: Steve Wise <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Since an IB transport port may use either IB or Ethernet as its link layer,
add the file /sys/class/infiniband/<device>/ports/<port_num>/link_layer to
show the link layer for the port.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
This patch allows IBoE traffic to be encapsulated in 802.1Q tagged
VLAN frames. The VLAN tag is encoded in the GID and derived from it
by a simple computation.
The netdev notifier callback is modified to catch VLAN device
addition/removal and the port's GID table is updated to reflect the
change, so that for each netdevice there is an entry in the GID table.
When the port's GID table is exhausted, GID entries will not be added.
Only children of the main interfaces can add to the GID table; if a
VLAN interface is added on another VLAN interface (e.g. "vconfig add
eth2.6 8"), then that interfaces will not add an entry to the GID
table.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
GID derived from a link local address in the following way:
GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.
The 3 bits user priority field of the packets are identical to the 3
bits of the SL.
In case of rdma_cm apps, the TOS field is used to generate the SL
field by doing a shift right of 5 bits effectively taking to 3 MS bits
of the TOS field.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add support for IBoE to mlx4_ib. The bulk of the code is handling the
new address vector fields; mlx4 needs the MAC address of a remote node
to include it in a WQE (for datagrams) or in the QP context (for
connected QPs). Address resolution is done by assuming all unicast
GIDs are either link-local IPv6 addresses.
Multicast group attach/detach needs to update the NIC's multicast
filters; but since attaching a QP to a multicast group can be done
before the QP is bound to a port, for IBoE we need to keep track of
all multicast groups that a QP is attached too before it transitions
from INIT to RTR (since it does not have a port in the INIT state).
Signed-off-by: Eli Cohen <[email protected]>
[ Many things cleaned up and otherwise monkeyed with; hope I didn't
introduce too many bugs. - Roland ]
Signed-off-by: Roland Dreier <[email protected]>
|
|
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting
the buffer to the device.
Signed-off-by: David Dillow <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent,
which makes the source code slightly more descriptive. The list_first_entry()
macro itself was introduced in kernel 2.6.22.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: David Dillow <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
As proposed by the SRP (draft) standard, ib_srp reserves one ring
element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI
mid-layer never tries to queue more than (SRP request limit) - 1 SCSI
commands to ib_srp. This improves performance for targets whose request
limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the
number of BUSY responses reported by ib_srp to the SCSI mid-layer.
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: David Dillow <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|