aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2014-05-09blk-mq: implement new and more efficient tagging schemeJens Axboe4-97/+387
blk-mq currently uses percpu_ida for tag allocation. But that only works well if the ratio between tag space and number of CPUs is sufficiently high. For most devices and systems, that is not the case. The end result if that we either only utilize the tag space partially, or we end up attempting to fully exhaust it and run into lots of lock contention with stealing between CPUs. This is not optimal. This new tagging scheme is a hybrid bitmap allocator. It uses two tricks to both be SMP friendly and allow full exhaustion of the space: 1) We cache the last allocated (or freed) tag on a per blk-mq software context basis. This allows us to limit the space we have to search. The key element here is not caching it in the shared tag structure, otherwise we end up dirtying more shared cache lines on each allocate/free operation. 2) The tag space is split into cache line sized groups, and each context will start off randomly in that space. Even up to full utilization of the space, this divides the tag users efficiently into cache line groups, avoiding dirtying the same one both between allocators and between allocator and freeer. This scheme shows drastically better behaviour, both on small tag spaces but on large ones as well. It has been tested extensively to show better performance for all the cases blk-mq cares about. Signed-off-by: Jens Axboe <[email protected]>
2014-05-09blk-mq: initialize struct request fields individuallyChristoph Hellwig1-2/+45
This allows us to avoid a non-atomic memset over ->atomic_flags as well as killing lots of duplicate initializations. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-05-09blk-mq: update a hotplug comment for grammarJens Axboe1-4/+4
Signed-off-by: Jens Axboe <[email protected]>
2014-05-07blk-mq: add basic round-robin of what CPU to queue workqueue work onJens Axboe1-14/+31
Right now we just pick the first CPU in the mask, but that can easily overload that one. Add some basic batching and round-robin all the entries in the mask instead. Signed-off-by: Jens Axboe <[email protected]>
2014-05-05blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats()Tejun Heo1-1/+14
During the recent conversion of cgroup to kernfs, cgroup_tree_mutex which nests above both the kernfs s_active protection and cgroup_mutex is added to synchronize cgroup file type operations as cgroup_mutex needed to be grabbed from some file operations and thus can't be put above s_active protection. While this arrangement mostly worked for cgroup, this triggered the following lockdep warning. ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G W ------------------------------------------------------- trinity-c173/9024 is trying to acquire lock: (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) but task is already holding lock: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (s_active#89){++++.+}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024) kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219) cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899) cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2)) rebind_subsystems (kernel/cgroup.c:1144) cgroup_setup_root (kernel/cgroup.c:1568) cgroup_mount (kernel/cgroup.c:1716) mount_fs (fs/super.c:1094) vfs_kern_mount (fs/namespace.c:899) do_mount (fs/namespace.c:2238 fs/namespace.c:2561) SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729) tracesys (arch/x86/kernel/entry_64.S:746) -> #1 (cgroup_tree_mutex){+.+.+.}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040) blkcg_policy_register (block/blk-cgroup.c:1106) throtl_init (block/blk-throttle.c:1694) do_one_initcall (init/main.c:789) kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003) kernel_init (init/main.c:935) ret_from_fork (arch/x86/kernel/entry_64.S:552) -> #0 (blkcg_pol_mutex){+.+.+.}: __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) tracesys (arch/x86/kernel/entry_64.S:746) other info that might help us debug this: Chain exists of: blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#89); lock(cgroup_tree_mutex); lock(s_active#89); lock(blkcg_pol_mutex); *** DEADLOCK *** 4 locks held by trinity-c173/9024: #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714) #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530) #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283) #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) stack backtrace: CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G W 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002 ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004 ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0 Call Trace: dump_stack (lib/dump_stack.c:52) print_circular_bug (kernel/locking/lockdep.c:1216) __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) This is a highly unlikely but valid circular dependency between "echo 1 > blkcg.reset_stats" and cfq module [un]loading. cgroup is going through further locking update which will remove this complication but for now let's use trylock on blkcg_pol_mutex and retry the file operation if the trylock fails. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Sasha Levin <[email protected]> References: http://lkml.kernel.org/g/[email protected]
2014-05-05NVMe: Add tracepointsKeith Busch1-0/+1
Adding tracepoints for bio_complete and block_split into nvme to help with gathering IO info using blktrace and blkparse. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]>
2014-05-02block/blk-throttle.c: fix return of 0/1 with return type boolFabian Frederick1-4/+4
Fix 4 coccinelle warnings. Cc: Jens Axboe <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-05-02block/blk-iopoll.c: use iop instead of iopollFabian Frederick1-2/+2
All blk_iopoll functions use iop for parent iopoll structure except blk_iopoll_complete.This also fixes one kernel-doc warning. Cc: Jens Axboe <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-05-02blk-mq: remove extra requeue traceJens Axboe1-4/+0
We already issue a blktrace requeue event in __blk_mq_requeue_request(), don't do it from the original caller as well. Signed-off-by: Jens Axboe <[email protected]>
2014-04-30block: Fix format string mismatch in cfq-iosched.cMasanari Iida1-1/+1
Fix format string mismatch in cfq_var_show() Signed-off-by: Masanari Iida <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-30blk-mq: refactor request insertion/mergingJens Axboe1-7/+15
Refactor the logic around adding a new bio to a software queue, so we nest the ctx->lock where we really need it (merge and insertion) and don't hold it when we don't (init and IO start accounting). Signed-off-by: Jens Axboe <[email protected]>
2014-04-30blk-mq remove debug BUG_ON() when draining software queuesJens Axboe1-1/+0
It's never been of any use, lets get rid of it. Signed-off-by: Jens Axboe <[email protected]>
2014-04-29blk-mq: fix waiting for reserved tagsJens Axboe3-4/+4
blk_mq_wait_for_tags() is only able to wait for "normal" tags, not reserved tags. Pass in which one we should attempt to get a tag for, so that waiting for reserved tags will work. Reserved tags are used for internal commands, which are usually serialized. Hence no waiting generally takes place, but we should ensure that it actually works if users need that functionality. Signed-off-by: Jens Axboe <[email protected]>
2014-04-25block: fold __blk_add_timer into blk_add_timerChristoph Hellwig1-23/+11
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-25blk-mq: respect rq_affinityChristoph Hellwig3-45/+6
The blk-mq code is using it's own version of the I/O completion affinity tunables, which causes a few issues: - the rq_affinity sysfs file doesn't work for blk-mq devices, even if it still is present, thus breaking existing tuning setups. - the rq_affinity = 1 mode, which is the defauly for legacy request based drivers isn't implemented at all. - blk-mq drivers don't implement any completion affinity with the default flag settings. This patches removes the blk-mq ipi_redirect flag and sysfs file, as well as the internal BLK_MQ_F_SHOULD_IPI flag and replaces it with code that respects the queue-wide rq_affinity flags and also implements the rq_affinity = 1 mode. This means I/O completion affinity can now only be tuned block-queue wide instead of per context, which seems more sensible to me anyway. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-24blk-mq: fix race with timeouts and requeue eventsJens Axboe4-18/+42
If a requeue event races with a timeout, we can get into the situation where we attempt to complete a request from the timeout handler when it's not start anymore. This causes a crash. So have the timeout handler check that REQ_ATOM_STARTED is still set on the request - if not, we ignore the event. If this happens, the request has now been marked as complete. As a consequence, we need to ensure to clear REQ_ATOM_COMPLETE in blk_mq_start_request(), as to maintain proper request state. Signed-off-by: Jens Axboe <[email protected]>
2014-04-24Revert "blk-mq: initialize req->q in allocation"Jens Axboe1-3/+1
This reverts commit 6a3c8a3ac0e68dcfc2a01f4aa1ca0edd1a1701eb. We need selective clearing of the request to make the init-at-free time completely safe. Otherwise we end up stomping on rq->atomic_flags, which we don't want to do.
2014-04-23blk-mq: fix leak of set->tagsMing Lei1-0/+1
set->tags should be freed in blk_mq_free_tag_set(). Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-21block/blk-throttle.c: add static to blk_throtl_dispatch_work_fnFabian Frederick1-1/+1
blk_throtl_dispatch_work_fn is only used in blk-throttle.c Cc: Jens Axboe <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-21blk-mq: initialize req->q in allocationMing Lei1-1/+3
The patch basically reverts the patch of(blk-mq: initialize request on allocation) in Jens's tree(already in -next), and only initialize req->q in allocation for two reasons: - presumed cache hotness on completion - blk_rq_tagged(rq) depends on reset of req->mq_ctx Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-21blk-mq: user (1 << order) to implement order_to_size()Ming Lei1-6/+1
Cc: Jörg-Volker Peetz <[email protected]> Cc: Max Filippov <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-21blk-mq: fix allocation of set->tagsMing Lei1-1/+2
type of set->tags is struct blk_mq_tags **. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-21blk-mq: free hctx->ctx_map when init failedMing Lei1-0/+1
Avoid memory leak in the failure path. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra1-2/+2
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Cc: Linus Torvalds <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-16bsg: update check for rq based driver for blk-mqJens Axboe1-1/+1
bsg currently checks ->request_fn to check whether a queue can handle struct request. But with blk-mq, we don't have a request_fn yet are request based. Add a queue_is_rq_based() helper and use that in bsg, I'm guessing this is not the last place we need to update for this. Besides, it better explains what is being checked. Signed-off-by: Jens Axboe <[email protected]>
2014-04-16block: relax when to modify the timeout timerJens Axboe1-2/+13
Since we are now, by default, applying timer slack to expiry times, the logic for when to modify a timer in the block code is suboptimal. The block layer keeps a forward rolling timer per queue for all requests, and modifies this timer if a request has a shorter timeout than what the current expiry time is. However, this breaks down when our rounded timer values get applied slack. Then each new request ends up modifying the timer, since we're still a little in front of the timer + slack. Fix this by allowing a tolerance of HZ / 2, the timeout handling doesn't need to be very precise. This drastically cuts down the number of timer modifications we have to make. Signed-off-by: Jens Axboe <[email protected]>
2014-04-16block: export blk_finish_requestChristoph Hellwig1-1/+2
This allows to mirror the blk-mq code flow for more a more readable I/O completion handler in SCSI. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: rename mq_flush_work struct request memberChristoph Hellwig1-3/+3
We will use this work_struct to requeue scsi commands from the completion handler as well, so give it a more generic name. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: add blk_mq_requeue_requestChristoph Hellwig1-2/+16
This allows to requeue a request that has been accepted by ->queue_rq earlier. This is needed by the SCSI layer in various error conditions. The existing internal blk_mq_requeue_request is renamed to __blk_mq_requeue_request as it is a lower level building block for this funtionality. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: add blk_mq_start_hw_queuesChristoph Hellwig1-0/+11
Add a helper to unconditionally kick contexts of a queue. This will be needed by the SCSI layer to provide fair queueing between multiple devices on a single host. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: add blk_mq_delay_queueChristoph Hellwig2-8/+43
Add a blk-mq equivalent to blk_delay_queue so that the scsi layer can ask to be kicked again after a delay. Signed-off-by: Christoph Hellwig <[email protected]> Modified by me to kill the unnecessary preempt disable/enable in the delayed workqueue handler. Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: add async parameter to blk_mq_start_stopped_hw_queuesChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: bidi supportChristoph Hellwig1-2/+7
Add two unlinkely branches to make sure the resid is initialized correctly for bidi request pairs, and the second request gets properly freed. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: allow drivers to hook into I/O completionChristoph Hellwig1-6/+10
Split out the bottom half of blk_mq_end_io so that drivers can perform work when they know a request has been completed, but before it has been freed. This also obsoletes blk_mq_end_io_partial as drivers can now pass any value to blk_update_request directly. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: kill preempt disable/enable in blk_mq_work_fn()Jens Axboe1-2/+0
blk_mq_work_fn() is always invoked off the bounded workqueues, so it can happily preempt among the queues in that set without causing any issues for blk-mq. Signed-off-by: Jens Axboe <[email protected]>
2014-04-16blk-mq: don't use preempt_count() to check for right CPUJens Axboe1-1/+1
UP or CONFIG_PREEMPT_NONE will return 0, and what we really want to check is whether or not we are on the right CPU. So don't make PREEMPT part of this, just test the CPU in the mask directly. Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: split out tag initialization, support shared tagsChristoph Hellwig5-128/+160
Add a new blk_mq_tag_set structure that gets set up before we initialize the queue. A single blk_mq_tag_set structure can be shared by multiple queues. Signed-off-by: Christoph Hellwig <[email protected]> Modular export of blk_mq_{alloc,free}_tagset added by me. Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: initialize request on allocationChristoph Hellwig1-3/+1
If we want to share tag and request allocation between queues we cannot initialize the request at init/free time, but need to initialize it at allocation time as it might get used for different queues over its lifetime. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: add ->init_request and ->exit_request methodsChristoph Hellwig1-73/+32
The current blk_mq_init_commands/blk_mq_free_commands interface has a two problems: 1) Because only the constructor is passed to blk_mq_init_commands there is no easy way to clean up when a comman initialization failed. The current code simply leaks the allocations done in the constructor. 2) There is no good place to call blk_mq_free_commands: before blk_cleanup_queue there is no guarantee that all outstanding commands have completed, so we can't free them yet. After blk_cleanup_queue the queue has usually been freed. This can be worked around by grabbing an unconditional reference before calling blk_cleanup_queue and dropping it after blk_mq_free_commands is done, although that's not exatly pretty and driver writers are guaranteed to get it wrong sooner or later. Both issues are easily fixed by making the request constructor and destructor normal blk_mq_ops methods. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: make ->flush_rq fully transparent to driversChristoph Hellwig3-10/+24
Drivers shouldn't have to care about the block layer setting aside a request to implement the flush state machine. We already override the mq context and tag to make it more transparent, but so far haven't deal with the driver private data in the request. Make sure to override this as well, and while we're at it add a proper helper sitting in blk-mq.c that implements the full impersonation. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: do not initialize req->specialChristoph Hellwig3-22/+4
Drivers can reach their private data easily using the blk_mq_rq_to_pdu helper and don't need req->special. By not initializing it code can be simplified nicely, and we also shave off a few more instructions from the I/O path. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-15blk-mq: initialize resid_lenChristoph Hellwig1-0/+2
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-15block: remove struct request buffer memberJens Axboe2-18/+6
This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: Jens Axboe <[email protected]>
2014-04-15Merge tag 'v3.15-rc1' into for-3.16/coreJens Axboe5-14/+11
We don't like this, but things have diverged with the blk-mq fixes in 3.15-rc1. So merge it in.
2014-04-12Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
2014-04-11block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERODuan Jiong1-1/+1
This patch fixes coccinelle error regarding usage of IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO. Signed-off-by: Duan Jiong <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-09block: fix regression with block enabled taggingJens Axboe4-13/+10
Martin reported that his test system would not boot with current git, it oopsed with this: BUG: unable to handle kernel paging request at ffff88046c6c9e80 IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246 Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013 Workqueue: events_unbound async_run_entry_fn task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000 RIP: 0010:[<ffffffff812971e0>] [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP: 0018:ffff880273d03a58 EFLAGS: 00010092 RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6 RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410 R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0 R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e FS: 0000000000000000(0000) GS:ffff880277b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0 Stack: ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000 ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62 ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8 Call Trace: [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510 [<ffffffff81293167>] __blk_run_queue+0x37/0x50 [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130 [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0 [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0 [<ffffffff813bba35>] scsi_execute+0xe5/0x180 [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110 [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod] [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0 [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod] [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod] [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140 [<ffffffff8106a1c5>] process_one_work+0x175/0x410 [<ffffffff8106b373>] worker_thread+0x123/0x400 [<ffffffff8106b250>] ? manage_workers+0x160/0x160 [<ffffffff8107104e>] kthread+0xce/0xf0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89 df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89 58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d RIP [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP <ffff880273d03a58> CR2: ffff88046c6c9e80 Martin bisected and found this to be the problem patch; commit 6d113398dcf4dfcd9787a4ead738b186f7b7ff0f Author: Jan Kara <[email protected]> Date: Mon Feb 24 16:39:54 2014 +0100 block: Stop abusing rq->csd.list in blk-softirq and the problem was immediately apparent. The patch states that it is safe to reuse queuelist at completion time, since it is no longer used. However, that is not true if a device is using block enabled tagging. If that is the case, then the queuelist is reused to keep track of busy tags. If a device also ended up using softirq completions, we'd reuse ->queuelist for the IPI handling while block tagging was still using it. Boom. Fix this by adding a new ipi_list list head, and share the memory used with the request hash table. The hash table is never used after the request is moved to the dispatch list, which happens long before any potential completion of the request. Add a new request bit for this, so we don't have cases that check rq->hash while it could potentially have been reused for the IPI completion. Reported-by: Martin K. Petersen <[email protected]> Tested-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2014-04-09blk-mq: simplify blk_mq_hw_sysfs_cpus_show()Jens Axboe1-6/+2
Now that we have a cpu mask of CPUs that are mapped to a specific hardware queue, we can just iterate that to display the sysfs num-hw-queue/cpu_list file. Signed-off-by: Jens Axboe <[email protected]>
2014-04-09blk-mq: ensure that hardware queues are always run on the mapped CPUsJens Axboe1-15/+51
Instead of providing soft mappings with no guarantees on hardware queues always being run on the right CPU, switch to a hard mapping guarantee that ensure that we always run the hardware queue on (one of, if more) the mapped CPU. Signed-off-by: Jens Axboe <[email protected]>
2014-04-09block: add kblockd_schedule_delayed_work_on()Jens Axboe1-0/+7
Same function as kblockd_schedule_delayed_work(), but allow the caller to pass in a CPU that the work should be executed on. This just directly extends and maps into the workqueue API, and will be used to make the blk-mq mappings more strict. Signed-off-by: Jens Axboe <[email protected]>