aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-02-06block: Add Sed-opal libraryScott Bauer6-0/+3074
This patch implements the necessary logic to bring an Opal enabled drive out of a factory-enabled into a working Opal state. This patch set also enables logic to save a password to be replayed during a resume from suspend. Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-06Include: Uapi: Add user ABI for Sed/OpalScott Bauer1-0/+118
Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-01block: queue lock must be acquired when iterating over rlsTahsin Erdogan1-0/+2
blk_set_queue_dying() does not acquire queue lock before it calls blk_queue_for_each_rl(). This allows a racing blkg_destroy() to remove blkg->q_node from the linked list and have blk_queue_for_each_rl() loop infitely over the removed blkg->q_node list node. Signed-off-by: Tahsin Erdogan <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-01block: Update comments that refer to __bio_map_user() and bio_map_user()Bart Van Assche1-3/+3
Since __bio_map_user() and bio_map_user() have been removed, update the comments that still refer to these functions. Signed-off-by: Bart Van Assche <[email protected]> References: commit ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user") Cc: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31blk-mq: don't fail allocating driver tag for stopped hw queueJens Axboe1-3/+0
We rely on blk_mq_get_driver_tag() not failing if 'wait' is true, but it currently fails in that case if the queue happens to be stopped at the time of the call. We don't need to check for stopped here, it's just assigning the tag. If the queue is stopped, we'll handle it when attempting to run the queue. This fixes a stall/crash on flush intensive workloads, where we proceed to process a flush that doesn't have a valid tag assigned. Signed-off-by: Jens Axboe <[email protected]>
2017-01-31nvme/pci: Don't mark IOD as aborted if abort wasn't sentKeith Busch1-2/+1
This patch sets the aborted flag only if an abort was sent, reducing excessive kernel message spamming for completed IO that wasn't actually aborted. Reported-by: Jens Axboe <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: allow targets to use sysfsJavier González2-0/+15
In order to register through the sysfs interface, a driver needs to know its kobject. On a disk structure, this happens when the partition information is added (device_add_disk), which for lightnvm takes place after the target has been initialized. This means that on target initialization, the kboject has not been created yet. This patch adds a target function to let targets initialize their own kboject as a child of the disk kobject. Signed-off-by: Javier González <[email protected]> Added exit typedef and passed gendisk instead of void pointer for exit. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: free properly on target creation errorJavier González1-1/+1
Fix a memory leak when target creation fails. More specifically, free the entire device structure given to the target (tgt_dev). Signed-off-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: Add CRC read errorJavier González1-0/+1
Let the host differentiate between a read error and a CRC check error on the device side. Signed-off-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: use end_io callback instead of instanceMatias Bjørling6-20/+13
When the lightnvm core had the "gennvm" layer between the device and the target, there was a need for the core to be able to figure out which target it should send an end_io callback to. Leading to a "double" end_io, first for the media manager instance, and then for the target instance. Now that core and gennvm is merged, there is no longer a need for this, and a single end_io callback will do. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: add ioctls for vector I/OsMatias Bjørling4-0/+280
Enable user-space to issue vector I/O commands through ioctls. To issue a vector I/O, the ppa list with addresses is also required and must be mapped for the controller to access. For each ioctl, the result and status bits are returned as well, such that user-space can retrieve the open-channel SSD completion bits. The implementation covers the traditional use-cases of bad block management, and vectored read/write/erase. Signed-off-by: Matias Bjørling <[email protected]> Metadata implementation, test, and fixes. Signed-off-by: Simon A.F. Lund <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: reduce number of nvm_id groups to oneMatias Bjørling4-58/+47
The number of configuration groups has been limited to one in current code, even if there is support for up to four. With the introduction of the open-channel SSD 1.3 specification, only a single group is exposed onwards. Reflect this in the nvm_id structure. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: cleanup nvm transformation functionsMatias Bjørling2-89/+40
Going from target specific ppa addresses to device was accomplished by first converting target to generic ppa addresses and generic to device addresses. The conversion was either open-coded or used the built-in nvm_trans_* and nvm_map_* functions for conversion. Simplify the interface and cleanup the calls to provide clean functions that now either take a list of ppas or a nvm_rq, and is exposed through: void nvm_ppa_* - target to/from device with a list of PPAs, void nvm_rq_* - target to/from device with a nvm_rq. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: make nvm_map_* return voidMatias Bjørling1-32/+9
The only check there was done was a debugging check. Remove it and replace the return value with void to reduce error checking. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tblMatias Bjørling2-38/+4
Since the merge of gennvm and core, there is no longer a need for the device specific bad block functions. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: remove nvm_submit_ppa* functionsMatias Bjørling2-113/+0
The nvm_submit_ppa* functions are no longer needed after gennvm and core have been merged. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: collapse nvm_erase_ppa and nvm_erase_blkMatias Bjørling2-32/+26
After gennvm and core have been merged, there are no more callers to nvm_erase_ppa. Therefore collapse the device specific and target specific erase functions. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: merge gennvm with coreMatias Bjørling8-1748/+606
For the first iteration of Open-Channel SSDs, it was anticipated that there could be various media managers on top of an open-channel SSD, such to allow vendors to plug in their own host-side FTLs, without the media manager in between. Now that an Open-Channel SSD is exposed as a traditional block device, there is no longer a need for this. Therefore lets merge the gennvm code with core and simplify the stack. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: fix debugfs compilation issuesOmar Sandoval3-6/+19
This fixes a couple of problems: 1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus. 2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at all. Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option. Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree") Signed-off-by: Omar Sandoval <[email protected]> Augment Kconfig description. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27block: cleanup remaining manual checks for PREFLUSH|FUAJens Axboe2-2/+2
Use op_is_flush() where applicable. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()Jens Axboe8-55/+89
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by: Jens Axboe <[email protected]>
2017-01-27block: add a op_is_flush helperChristoph Hellwig7-28/+26
This centralizes the checks for bios that needs to be go into the flush state machine. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe4-14/+23
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq-sched: fix starvation for multiple hardware queues and shared tagsJens Axboe5-7/+41
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: release driver tag on a requeue eventJens Axboe1-0/+16
We don't want to hold on to this resource when we have a scheduler attached. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: fix potential race in queue restart and driver tag allocationJens Axboe1-1/+9
Once we mark the queue as needing a restart, re-check if we can get a driver tag. This fixes a theoretical issue where the needed IO completes _after_ blk_mq_get_driver_tag() fails, but before we manage to set the restart bit. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: improve scheduler queue sync/async runningJens Axboe1-2/+4
We'll use the same criteria for whether we need to run the queue sync or async when we have a scheduler, as we do without one. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: move hctx and ctx counters from sysfs to debugfsOmar Sandoval2-64/+181
These counters aren't as out-of-place in sysfs as the other stuff, but debugfs is a slightly better home for them. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfsOmar Sandoval2-92/+132
These statistics _might_ be useful to userspace, but it's better not to commit to an ABI for these yet. Also, the dispatched file in sysfs couldn't be cleared, so make it clearable like the others in debugfs. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add tags and sched_tags bitmaps to debugfsOmar Sandoval1-0/+50
These can be used to debug issues like tag leaks and stuck requests. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move tags and sched_tags info from sysfs to debugfsOmar Sandoval4-45/+86
These are very tied to the blk-mq tag implementation, so exposing them to sysfs isn't a great idea. Move the debugging information to debugfs and add basic entries for the number of tags and the number of reserved tags to sysfs. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: export software queue pending map to debugfsOmar Sandoval1-0/+21
This is useful for debugging problems where we've gotten stuck with requests in the software queues. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27sbitmap: add helpers for dumping to a seq_fileOmar Sandoval2-0/+121
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add extra request information to debugfsOmar Sandoval1-1/+3
The request pointers by themselves aren't super useful. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfsOmar Sandoval2-57/+106
These lists are only useful for debugging; they definitely don't belong in sysfs. Putting them in debugfs also removes the limitation of a single page of output. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add hctx->{state,flags} to debugfsOmar Sandoval1-0/+42
hctx->state could come in handy for bugs where the hardware queue gets stuck in the stopped state, and hctx->flags is just useful to know. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: create debugfs directory treeOmar Sandoval6-0/+201
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-26blk-mq-sched: check for successful allocation before assigning tagJens Axboe1-1/+2
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipath dm, since that is a heavy user of atomic request allocations in the map and clone path. Reported-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-26blk-mq: don't lose flags passed in to blk_mq_alloc_request()Jens Axboe2-4/+4
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_request(). Initialize alloc_data->flags before passing it in. Reported-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-25blk-mq: only apply active queue tag throttling for driver tagsJens Axboe2-10/+15
If we have a scheduler attached, we have two sets of tags. We don't want to apply our active queue throttling for the scheduler side of tags, that only applies to driver tags since that's the resource we need to dispatch an IO. Signed-off-by: Jens Axboe <[email protected]>
2017-01-23cfq-iosched: Adjust one function call together with a variable assignmentMarkus Elfring1-2/+4
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code place. Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-23blk-throttle: Adjust two function calls together with a variable assignmentMarkus Elfring1-2/+4
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code places. Signed-off-by: Markus Elfring <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-23block: Initialize cfqq->ioprio_class in cfq_get_queue()Alexander Potapenko1-0/+2
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in cfq_init_cfqq(): ================================================================== BUG: KMSAN: use of unitialized memory ... Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:? [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:? [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857 ... origin: [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:? [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:? [< inline >] allocate_slab mm/slub.c:1627 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641 [< inline >] new_slab_objects mm/slub.c:2407 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564 [< inline >] __slab_alloc mm/slub.c:2606 [< inline >] slab_alloc_node mm/slub.c:2669 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850 ... ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) The uninitialized struct cfq_queue is created by kmem_cache_alloc_node() and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class before it's initialized. Signed-off-by: Alexander Potapenko <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-20blk-mq: allow resize of scheduler requestsJens Axboe3-13/+61
Add support for growing the tags associated with a hardware queue, for the scheduler tags. Currently we only support resizing within the limits of the original depth, change that so we can grow it as well by allocating and replacing the existing scheduler tag set. This is similar to how we could increase the software queue depth with the legacy IO stack and schedulers. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2017-01-19blk-mq: stop hardware queue in blk_mq_delay_queue()Jens Axboe1-0/+1
The run handler we register for the delayed work requires that the queue be stopped, yet we leave that up to the caller. Let's move it into blk_mq_delay_queue() itself, so that the API is sane. This fixes a stall with SCSI, where it calls blk_mq_delay_queue() without having stopped the queue. Hence the queue is never run. Reported-by: Hannes Reinecke <[email protected]> Fixes: 70f4db639c5b ("blk-mq: add blk_mq_delay_queue") Signed-off-by: Jens Axboe <[email protected]>
2017-01-19blk-mq-tag: remove redundant check for 'data->hctx' being non-NULLJens Axboe1-4/+2
We used to pass in NULL for hctx for reserved tags, but we don't do that anymore. Hence the check for whether hctx is NULL or not is now redundant, kill it. Reported-by: Dan Carpenter <[email protected]> Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation") Signed-off-by: Jens Axboe <[email protected]>
2017-01-19elevator: fix unnecessary put of elevator in failure caseJens Axboe1-4/+0
We already checked that e is NULL, so no point in calling elevator_put() to free it. Reported-by: Dan Carpenter <[email protected]> Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers") Signed-off-by: Jens Axboe <[email protected]>
2017-01-18blk-cgroup: don't quiesce the queue on policy activate/deactivateJens Axboe1-12/+8
There's no potential harm in quiescing the queue, but it also doesn't buy us anything. And we can't run the queue async for policy deactivate, since we could be in the path of tearing the queue down. If we schedule an async run of the queue at that time, we're racing with queue teardown AFTER having we've already torn most of it down. Reported-by: Omar Sandoval <[email protected]> Fixes: 4d199c6f1c84 ("blk-cgroup: ensure that we clear the stop bit on quiesced queues") Tested-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-18sbitmap: fix wakeup hang after sbq resizeOmar Sandoval1-5/+30
When we resize a struct sbitmap_queue, we update the wakeup batch size, but we don't update the wait count in the struct sbq_wait_states. If we resized down from a size which could use a bigger batch size, these counts could be too large and cause us to miss necessary wakeups. To fix this, update the wait counts when we resize (ensuring some careful memory ordering so that it's safe w.r.t. concurrent clears). This also fixes a theoretical issue where two threads could end up bumping the wait count up by the batch size, which could also potentially lead to hangs. Reported-by: Martin Raiber <[email protected]> Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs") Fixes: 2971c35f3588 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt") Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-18sbitmap: use smp_mb__after_atomic() in sbq_wake_up()Omar Sandoval1-3/+10
We always do an atomic clear_bit() right before we call sbq_wake_up(), so we can use smp_mb__after_atomic(). While we're here, comment the memory barriers in here a little more. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>