Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch implements the necessary logic to bring an Opal
enabled drive out of a factory-enabled into a working
Opal state.
This patch set also enables logic to save a password to
be replayed during a resume from suspend.
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk_set_queue_dying() does not acquire queue lock before it calls
blk_queue_for_each_rl(). This allows a racing blkg_destroy() to
remove blkg->q_node from the linked list and have
blk_queue_for_each_rl() loop infitely over the removed blkg->q_node
list node.
Signed-off-by: Tahsin Erdogan <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since __bio_map_user() and bio_map_user() have been removed, update
the comments that still refer to these functions.
Signed-off-by: Bart Van Assche <[email protected]>
References: commit ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user")
Cc: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We rely on blk_mq_get_driver_tag() not failing if 'wait' is true,
but it currently fails in that case if the queue happens to be
stopped at the time of the call.
We don't need to check for stopped here, it's just assigning
the tag. If the queue is stopped, we'll handle it when
attempting to run the queue.
This fixes a stall/crash on flush intensive workloads, where
we proceed to process a flush that doesn't have a valid tag
assigned.
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch sets the aborted flag only if an abort was sent, reducing
excessive kernel message spamming for completed IO that wasn't actually
aborted.
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In order to register through the sysfs interface, a driver needs to know
its kobject. On a disk structure, this happens when the partition
information is added (device_add_disk), which for lightnvm takes place
after the target has been initialized. This means that on target
initialization, the kboject has not been created yet.
This patch adds a target function to let targets initialize their own
kboject as a child of the disk kobject.
Signed-off-by: Javier González <[email protected]>
Added exit typedef and passed gendisk instead of void pointer for exit.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fix a memory leak when target creation fails. More specifically, free
the entire device structure given to the target (tgt_dev).
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Let the host differentiate between a read error and a CRC check error on
the device side.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When the lightnvm core had the "gennvm" layer between the device and the
target, there was a need for the core to be able to figure out which
target it should send an end_io callback to. Leading to a "double"
end_io, first for the media manager instance, and then for the target
instance. Now that core and gennvm is merged, there is no longer a need
for this, and a single end_io callback will do.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.
For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.
The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: Matias Bjørling <[email protected]>
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The number of configuration groups has been limited to one in current
code, even if there is support for up to four. With the introduction
of the open-channel SSD 1.3 specification, only a single
group is exposed onwards. Reflect this in the nvm_id structure.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Going from target specific ppa addresses to device was accomplished by
first converting target to generic ppa addresses and generic to device
addresses. The conversion was either open-coded or used the built-in
nvm_trans_* and nvm_map_* functions for conversion. Simplify the
interface and cleanup the calls to provide clean functions that now
either take a list of ppas or a nvm_rq, and is exposed through:
void nvm_ppa_* - target to/from device with a list of PPAs,
void nvm_rq_* - target to/from device with a nvm_rq.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The only check there was done was a debugging check. Remove it and
replace the return value with void to reduce error checking.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since the merge of gennvm and core, there is no longer a need for the
device specific bad block functions.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The nvm_submit_ppa* functions are no longer needed after gennvm and core
have been merged.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
After gennvm and core have been merged, there are no more callers to
nvm_erase_ppa. Therefore collapse the device specific and target
specific erase functions.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For the first iteration of Open-Channel SSDs, it was anticipated that
there could be various media managers on top of an open-channel SSD,
such to allow vendors to plug in their own host-side FTLs, without the
media manager in between.
Now that an Open-Channel SSD is exposed as a traditional block device,
there is no longer a need for this. Therefore lets merge the gennvm code
with core and simplify the stack.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This fixes a couple of problems:
1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
all.
Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.
Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree")
Signed-off-by: Omar Sandoval <[email protected]>
Augment Kconfig description.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use op_is_flush() where applicable.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: Jens Axboe <[email protected]>
|
|
This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.
Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
We don't want to hold on to this resource when we have a scheduler
attached.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These can be used to debug issues like tag leaks and stuck requests.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is useful for debugging problems where we've gotten stuck with
requests in the software queues.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is useful debugging information that will be used in the blk-mq
debugfs directory.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Changed 'weight' to 'busy'.
Signed-off-by: Jens Axboe <[email protected]>
|
|
The request pointers by themselves aren't super useful.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:
# tree -d /sys/kernel/debug/block
/sys/kernel/debug/block
|-- nvme0n1
| `-- mq
| |-- 0
| | `-- cpu0
| |-- 1
| | `-- cpu1
| |-- 2
| | `-- cpu2
| `-- 3
| `-- cpu3
`-- vda
`-- mq
`-- 0
|-- cpu0
|-- cpu1
|-- cpu2
`-- cpu3
Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: Jens Axboe <[email protected]>
|
|
The script "checkpatch.pl" pointed information out like the following.
ERROR: do not use assignment in if condition
Thus fix the affected source code place.
Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The script "checkpatch.pl" pointed information out like the following.
ERROR: do not use assignment in if condition
Thus fix the affected source code places.
Signed-off-by: Markus Elfring <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():
==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
[<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
[<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
[< inline >] cfq_init_cfqq block/cfq-iosched.c:3754
[<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
[<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
[<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
[<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
[< inline >] allocate_slab mm/slub.c:1627
[<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
[< inline >] new_slab_objects mm/slub.c:2407
[<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
[< inline >] __slab_alloc mm/slub.c:2606
[< inline >] slab_alloc_node mm/slub.c:2669
[<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
[<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)
The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.
This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
|
|
The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.
This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.
Reported-by: Hannes Reinecke <[email protected]>
Fixes: 70f4db639c5b ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: Jens Axboe <[email protected]>
|
|
We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.
Reported-by: Dan Carpenter <[email protected]>
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: Jens Axboe <[email protected]>
|
|
We already checked that e is NULL, so no point in calling
elevator_put() to free it.
Reported-by: Dan Carpenter <[email protected]>
Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: Jens Axboe <[email protected]>
|
|
There's no potential harm in quiescing the queue, but it also doesn't
buy us anything. And we can't run the queue async for policy
deactivate, since we could be in the path of tearing the queue down.
If we schedule an async run of the queue at that time, we're racing
with queue teardown AFTER having we've already torn most of it down.
Reported-by: Omar Sandoval <[email protected]>
Fixes: 4d199c6f1c84 ("blk-cgroup: ensure that we clear the stop bit on quiesced queues")
Tested-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).
This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.
Reported-by: Martin Raiber <[email protected]>
Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f3588 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We always do an atomic clear_bit() right before we call sbq_wake_up(),
so we can use smp_mb__after_atomic(). While we're here, comment the
memory barriers in here a little more.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|