aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2011-01-10RDMA/cxgb4: Don't re-init wait object in init/fini pathsSteve Wise1-2/+0
Re-initializing the wait object in rdma_init()/rdma_fini() causes a timing window which can lead to a deadlock during close. Once this deadlock hits, all RDMA activity over the T4 device will be stuck. There's no need to re-init the wait object, so remove it. Signed-off-by: Steve Wise <[email protected]> Cc: <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2011-01-10RDMA/cxgb3,cxgb4: Remove dead codeStephen Hemminger5-89/+2
This removes unused code found by running 'make namespacecheck'; compile tested only. Signed-off-by: Stephen Hemminger <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2011-01-10IB/srp: consolidate hot-path variables into cache linesDavid Dillow2-17/+26
Put the variables accessed together in the hot-path into common cachelines, and separate them by RW vs RO to avoid false dirtying. We keep a local copy of the lkey and rkey in the target to avoid traversing pointers (and associated cache lines) to find them. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: David Dillow <[email protected]>
2011-01-10IB/srp: stop sharing the host lock with SCSIBart Van Assche2-23/+25
We don't need protection against the SCSI stack, so use our own lock to allow parallel progress on separate CPUs. Signed-off-by: Bart Van Assche <[email protected]> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <[email protected]>
2011-01-10IB/srp: reduce lock coverage of command completionBart Van Assche1-23/+14
We only need the lock to cover list and credit manipulations, so push those into srp_remove_req() and update the call chains. We reorder the request removal and command completion in srp_process_rsp() to avoid the SCSI mid-layer sending another command before we've released our request and added any credits returned by the target. This prevents us from returning HOST_BUSY unneccesarily. Signed-off-by: Bart Van Assche <[email protected]> [ broken out, small cleanups, and modified to avoid potential extraneous HOST_BUSY returns by David Dillow ] Signed-off-by: David Dillow <[email protected]>
2011-01-10IB/srp: reduce local coverage for command submission and EHBart Van Assche2-58/+67
We only need locks to protect our lists and number of credits available. By pre-consuming the credit for the request, we can reduce our lock coverage to just those areas. If we don't actually send the request, we'll need to put the credit back into the pool. Signed-off-by: Bart Van Assche <[email protected]> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <[email protected]>
2011-01-10IB/srp: don't move active requests to their own listBart Van Assche2-11/+13
We use req->scmnd != NULL to indicate an active request, so there's no need to keep a separate list for them. We can afford the array iteration during error handling, and dropping it gives us one less item that needs lock protection. Signed-off-by: Bart Van Assche <[email protected]> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <[email protected]>
2011-01-07Merge branch 'vfs-scale-working' of ↵Linus Torvalds2-10/+3
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
2011-01-07fs: dcache rationalise dget variantsNick Piggin2-2/+2
dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <[email protected]>
2011-01-07fs: dcache remove dcache_lockNick Piggin2-8/+1
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <[email protected]>
2011-01-07fs: dcache scale dentry refcountNick Piggin2-2/+2
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <[email protected]>
2011-01-05IB/srp: allow lockless work postingBart Van Assche2-44/+28
Only one CPU at a time will own an RX IU, so using the address of the IU as the work request cookie allows us to avoid taking a lock. We can similarly prepare the TX path for lockless posting by moving the free TX IUs to a list. This also removes the requirement that the queue sizes be a power of 2. Signed-off-by: Bart Van Assche <[email protected]> [ broken out, small cleanups, and modified to avoid needing an extra field in the IU by David Dillow] Signed-off-by: David Dillow <[email protected]>
2011-01-05IB/srp: consolidate state change codeBart Van Assche1-21/+24
Signed-off-by: Bart Van Assche <[email protected]> [ broken out and small cleanups by David Dillow ] Signed-off-by: David Dillow <[email protected]>
2011-01-05IB/srp: allow task management without a previous requestDavid Dillow2-63/+37
We can only have one task management comment outstanding, so move the completion and status to the target port. This allows us to handle resets of a LUN without a corresponding request having been sent. Meanwhile, we don't need to play games with host_scribble, just use it as the pointer it is. This fixes a crash when we issue a bus reset using sg_reset. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893 Reported-by: Bart Van Assche <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: David Dillow <[email protected]>
2010-12-26Merge branch 'master' of ↵David S. Miller7-82/+70
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c
2010-12-22Merge branch 'master' into for-nextJiri Kosina7-82/+70
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-08IB/uverbs: Handle large number of entries in poll CQDan Carpenter1-43/+56
In ib_uverbs_poll_cq() code there is a potential integer overflow if userspace passes in a large cmd.ne. The calls to kmalloc() would allocate smaller buffers than intended, leading to memory corruption. There iss also an information leak if resp wasn't all used. Unprivileged userspace may call this function, although only if an RDMA device that uses this function is present. Fix this by copying CQ entries one at a time, which avoids the allocation entirely, and also by moving this copying into a function that makes sure to initialize all memory copied to userspace. Special thanks to Jason Gunthorpe <[email protected]> for his help and advice. Cc: <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> [ Monkey around with things a bit to avoid bad code generation by gcc when designated initializers are used. - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2010-12-02Merge branch 'for-linus' of ↵Linus Torvalds4-37/+11
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Fix information leak in marshalling code IB/pack: Remove some unused code added by the IBoE patches IB/mlx4: Fix IBoE link state IB/mlx4: Fix IBoE reported link rate mlx4_core: Workaround firmware bug in query dev cap IB/mlx4: Fix memory ordering of VLAN insertion control bits MAINTAINERS: Update NetEffect entry
2010-12-01Merge branches 'misc', 'mlx4' and 'nes' into for-nextRoland Dreier2-7/+7
2010-12-01IB: Fix information leak in marshalling codeVasiliy Kulikov1-0/+4
ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved} fields uninitialized to copy_to_user(). This leads to leaking of contents of kernel stack memory to userspace. Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-12-01IB/pack: Remove some unused code added by the IBoE patchesOr Gerlitz1-30/+0
Remove unused functions added by commit ff7f5aab354d ("IB/pack: IBoE UD packet packing support"). Signed-off-by: Or Gerlitz <[email protected]>
2010-12-01IB/mlx4: Fix IBoE link stateEli Cohen1-1/+1
Use netif_running() and netif_carrier_ok() to report link state, exactly as is done to report Ethernet link state in sysfs. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-12-01IB/mlx4: Fix IBoE reported link rateEli Cohen1-1/+1
The link rate is the product of the link speed in the link width. For Etherent ports the rate is 10G, so we use 1 for the width and 4 for speed to get the correct rate. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-12-01IB/mlx4: Fix memory ordering of VLAN insertion control bitsEli Cohen1-5/+5
We must fully update the control segment before marking it as valid, so that hardware doesn't start executing it before we're ready. Signed-off-by: Eli Cohen <[email protected]> [ Move VLAN control bit setting to before wmb(). - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2010-11-24infiniband: remove dev_base_lock useEric Dumazet2-6/+6
dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock) Convert rdma_translate_ip() and update_ipv6_gids() to RCU locking. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Roland Dreier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17BKL: remove extraneous #include <smp_lock.h>Arnd Bergmann1-1/+0
The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-16SCSI host lock push-downJeff Garzik1-1/+3
Move the mid-layer's ->queuecommand() invocation from being locked with the host lock to being unlocked to facilitate speeding up the critical path for drivers who don't need this lock taken anyway. The patch below presents a simple SCSI host lock push-down as an equivalent transformation. No locking or other behavior should change with this patch. All existing bugs and locking orders are preserved. Additionally, add one parameter to queuecommand, struct Scsi_Host * and remove one parameter from queuecommand, void (*done)(struct scsi_cmnd *) Scsi_Host* is a convenient pointer that most host drivers need anyway, and 'done' is redundant to struct scsi_cmnd->scsi_done. Minimal code disturbance was attempted with this change. Most drivers needed only two one-line modifications for their host lock push-down. Signed-off-by: Jeff Garzik <[email protected]> Acked-by: James Bottomley <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-15infiniband: Only include mutex.h once in drivers/infiniband/hw/cxgb4/iw_cxgb4.hJesper Juhl1-1/+0
Only include the header linux/mutex.h once inside drivers/infiniband/hw/cxgb4/iw_cxgb4.h Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-11-11net: get rid of rtable->idevEric Dumazet1-4/+4
It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Sean Hefty <[email protected]> Cc: Hal Rosenstock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König2-2/+2
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-10-29convert get_sb_single() usersAl Viro2-14/+14
Signed-off-by: Al Viro <[email protected]>
2010-10-26Merge branch 'for-linus' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
2010-10-26Merge branch 'for-linus' of ↵Linus Torvalds55-702/+2323
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits) IB/qib: clean up properly if pci_set_consistent_dma_mask() fails IB/qib: Allow driver to load if PCIe AER fails IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set IB/qib: Fix extra log level in qib_early_err() RDMA/cxgb4: Remove unnecessary KERN_<level> use RDMA/cxgb3: Remove unnecessary KERN_<level> use IB/core: Add link layer type information to sysfs IB/mlx4: Add VLAN support for IBoE IB/core: Add VLAN support for IBoE IB/mlx4: Add support for IBoE mlx4_en: Change multicast promiscuous mode to support IBoE mlx4_core: Update data structures and constants for IBoE mlx4_core: Allow protocol drivers to find corresponding interfaces IB/uverbs: Return link layer type to userspace for query port operation IB/srp: Sync buffer before posting send IB/srp: Use list_first_entry() IB/srp: Reduce number of BUSY conditions IB/srp: Eliminate two forward declarations IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144 IB: Replace EXTRA_CFLAGS with ccflags-y ...
2010-10-26replace nested max/min macros with {max,min}3 macroHagen Paul Pfeifer1-2/+1
Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <[email protected]> Cc: Joe Perches <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hartley Sweeten <[email protected]> Cc: Russell King <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Sean Hefty <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', ↵Roland Dreier55-702/+2323
'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
2010-10-26IB/qib: clean up properly if pci_set_consistent_dma_mask() failsJason Gunthorpe1-1/+3
Clean up properly if pci_set_consistent_dma_mask() fails. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-26IB/qib: Allow driver to load if PCIe AER failsRalph Campbell1-1/+3
Some PCIe root complex chip sets don't support advanced error reporting. Allow the driver to load OK if pci_enable_pcie_error_reporting() fails. Signed-off-by: Ralph Campbell <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-26IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not setRalph Campbell1-0/+1
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer "dd" is uninitialized. Signed-off-by: Ralph Campbell <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-26IB/qib: Fix extra log level in qib_early_err()Jason Gunthorpe1-1/+1
Noticed this odd looking thing in dmesg: ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5 which is due to a bad use of dev_info. Signed-off-by: Jason Gunthorpe <[email protected]> Acked-by: Ralph Campbell <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-26RDMA/cxgb4: Remove unnecessary KERN_<level> useJoe Perches1-2/+2
Signed-off-by: Joe Perches <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-26RDMA/cxgb3: Remove unnecessary KERN_<level> useJoe Perches1-2/+2
Signed-off-by: Joe Perches <[email protected]> Acked-by: Steve Wise <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-25fs: do not assign default i_ino in new_inodeChristoph Hellwig2-0/+2
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2010-10-25IB/core: Add link layer type information to sysfsEli Cohen1-0/+15
Since an IB transport port may use either IB or Ethernet as its link layer, add the file /sys/class/infiniband/<device>/ports/<port_num>/link_layer to show the link layer for the port. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-25IB/mlx4: Add VLAN support for IBoEEli Cohen3-27/+161
This patch allows IBoE traffic to be encapsulated in 802.1Q tagged VLAN frames. The VLAN tag is encoded in the GID and derived from it by a simple computation. The netdev notifier callback is modified to catch VLAN device addition/removal and the port's GID table is updated to reflect the change, so that for each netdevice there is an entry in the GID table. When the port's GID table is exhausted, GID entries will not be added. Only children of the main interfaces can add to the GID table; if a VLAN interface is added on another VLAN interface (e.g. "vconfig add eth2.6 8"), then that interfaces will not add an entry to the GID table. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-25IB/core: Add VLAN support for IBoEEli Cohen5-12/+47
Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN. The 3 bits user priority field of the packets are identical to the 3 bits of the SL. In case of rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-25IB/mlx4: Add support for IBoEEli Cohen5-108/+687
Add support for IBoE to mlx4_ib. The bulk of the code is handling the new address vector fields; mlx4 needs the MAC address of a remote node to include it in a WQE (for datagrams) or in the QP context (for connected QPs). Address resolution is done by assuming all unicast GIDs are either link-local IPv6 addresses. Multicast group attach/detach needs to update the NIC's multicast filters; but since attaching a QP to a multicast group can be done before the QP is bound to a port, for IBoE we need to keep track of all multicast groups that a QP is attached too before it transitions from INIT to RTR (since it does not have a port in the INIT state). Signed-off-by: Eli Cohen <[email protected]> [ Many things cleaned up and otherwise monkeyed with; hope I didn't introduce too many bugs. - Roland ] Signed-off-by: Roland Dreier <[email protected]>
2010-10-25IB/uverbs: Return link layer type to userspace for query port operationEli Cohen1-0/+2
Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-24IB/srp: Sync buffer before posting sendDavid Dillow1-0/+5
srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting the buffer to the device. Signed-off-by: David Dillow <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-24IB/srp: Use list_first_entry()Bart Van Assche1-1/+1
Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent, which makes the source code slightly more descriptive. The list_first_entry() macro itself was introduced in kernel 2.6.22. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: David Dillow <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2010-10-24IB/srp: Reduce number of BUSY conditionsBart Van Assche1-2/+7
As proposed by the SRP (draft) standard, ib_srp reserves one ring element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI mid-layer never tries to queue more than (SRP request limit) - 1 SCSI commands to ib_srp. This improves performance for targets whose request limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the number of BUSY responses reported by ib_srp to the SCSI mid-layer. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: David Dillow <[email protected]> Signed-off-by: Roland Dreier <[email protected]>