Age | Commit message (Collapse) | Author | Files | Lines |
|
When CONFIG_KASAN is enabled, compilation fails:
block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()
Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would
give us the wrong size when using _IOC_SIZE, switch it to the
right structure.
Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since function tables are a common target for attackers, it's best to keep
them in read-only memory. As such, this makes the CDROM device ops tables
const. This drops additionally n_minors, since it isn't used meaningfully,
and sets the only user of cdrom_dummy_generic_packet explicitly so the
variables can all be const.
Inspired by similar changes in grsecurity/PaX.
Signed-off-by: Kees Cook <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The old elevator= boot parameter blindly attempts to load the
same scheduler for mq and !mq devices, leading to a crash if
we specify the wrong one.
Ensure that we only apply this boot parameter to old !mq devices.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Simple cleanup to use the new APIs.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Don Brace <[email protected]>
Tested-by: Don Brace <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:
divide error: 0000 [#1] SMP KASAN
CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
task: ffff88006ba1e840 task.stack: ffff880067338000
RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
RSP: 0018:ffff88006733f108 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88006d000000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
Call Trace:
lo_do_transfer drivers/block/loop.c:251 [inline]
lo_read_transfer drivers/block/loop.c:392 [inline]
do_req_filebacked drivers/block/loop.c:541 [inline]
loop_handle_cmd drivers/block/loop.c:1677 [inline]
loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
ffff88006733f108
---[ end trace 0166f7bd3b0c0933 ]---
Reported-by: Dmitry Vyukov <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In case of error, 'err' is known to be 0 here, because of the previous
test. Set it to a -ENOMEM instead.
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch is a quick fixup of the user structures that will prevent
the structures from being different sizes on 32 and 64 bit archs.
Taking this fix will allow us to *NOT* have to do compat ioctls for
the sed code.
Signed-off-by: Scott Bauer <[email protected]>
Fixes: 19641f2d7674 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch implements the necessary logic to unlock an Opal
enabled device coming back from an S3.
The patch also implements the SED/Opal allocation necessary to support
the opal ioctls.
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch implements the necessary logic to bring an Opal
enabled drive out of a factory-enabled into a working
Opal state.
This patch set also enables logic to save a password to
be replayed during a resume from suspend.
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Signed-off-by: Scott Bauer <[email protected]>
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk_set_queue_dying() does not acquire queue lock before it calls
blk_queue_for_each_rl(). This allows a racing blkg_destroy() to
remove blkg->q_node from the linked list and have
blk_queue_for_each_rl() loop infitely over the removed blkg->q_node
list node.
Signed-off-by: Tahsin Erdogan <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since __bio_map_user() and bio_map_user() have been removed, update
the comments that still refer to these functions.
Signed-off-by: Bart Van Assche <[email protected]>
References: commit ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user")
Cc: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We rely on blk_mq_get_driver_tag() not failing if 'wait' is true,
but it currently fails in that case if the queue happens to be
stopped at the time of the call.
We don't need to check for stopped here, it's just assigning
the tag. If the queue is stopped, we'll handle it when
attempting to run the queue.
This fixes a stall/crash on flush intensive workloads, where
we proceed to process a flush that doesn't have a valid tag
assigned.
Signed-off-by: Jens Axboe <[email protected]>
|
|
This patch sets the aborted flag only if an abort was sent, reducing
excessive kernel message spamming for completed IO that wasn't actually
aborted.
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In order to register through the sysfs interface, a driver needs to know
its kobject. On a disk structure, this happens when the partition
information is added (device_add_disk), which for lightnvm takes place
after the target has been initialized. This means that on target
initialization, the kboject has not been created yet.
This patch adds a target function to let targets initialize their own
kboject as a child of the disk kobject.
Signed-off-by: Javier González <[email protected]>
Added exit typedef and passed gendisk instead of void pointer for exit.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fix a memory leak when target creation fails. More specifically, free
the entire device structure given to the target (tgt_dev).
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Let the host differentiate between a read error and a CRC check error on
the device side.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When the lightnvm core had the "gennvm" layer between the device and the
target, there was a need for the core to be able to figure out which
target it should send an end_io callback to. Leading to a "double"
end_io, first for the media manager instance, and then for the target
instance. Now that core and gennvm is merged, there is no longer a need
for this, and a single end_io callback will do.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.
For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.
The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: Matias Bjørling <[email protected]>
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The number of configuration groups has been limited to one in current
code, even if there is support for up to four. With the introduction
of the open-channel SSD 1.3 specification, only a single
group is exposed onwards. Reflect this in the nvm_id structure.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Going from target specific ppa addresses to device was accomplished by
first converting target to generic ppa addresses and generic to device
addresses. The conversion was either open-coded or used the built-in
nvm_trans_* and nvm_map_* functions for conversion. Simplify the
interface and cleanup the calls to provide clean functions that now
either take a list of ppas or a nvm_rq, and is exposed through:
void nvm_ppa_* - target to/from device with a list of PPAs,
void nvm_rq_* - target to/from device with a nvm_rq.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The only check there was done was a debugging check. Remove it and
replace the return value with void to reduce error checking.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since the merge of gennvm and core, there is no longer a need for the
device specific bad block functions.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The nvm_submit_ppa* functions are no longer needed after gennvm and core
have been merged.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
After gennvm and core have been merged, there are no more callers to
nvm_erase_ppa. Therefore collapse the device specific and target
specific erase functions.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
For the first iteration of Open-Channel SSDs, it was anticipated that
there could be various media managers on top of an open-channel SSD,
such to allow vendors to plug in their own host-side FTLs, without the
media manager in between.
Now that an Open-Channel SSD is exposed as a traditional block device,
there is no longer a need for this. Therefore lets merge the gennvm code
with core and simplify the stack.
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This fixes a couple of problems:
1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
all.
Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.
Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree")
Signed-off-by: Omar Sandoval <[email protected]>
Augment Kconfig description.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use op_is_flush() where applicable.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: Jens Axboe <[email protected]>
|
|
This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.
Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
We don't want to hold on to this resource when we have a scheduler
attached.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Tested-by: Hannes Reinecke <[email protected]>
|
|
These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These can be used to debug issues like tag leaks and stuck requests.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is useful for debugging problems where we've gotten stuck with
requests in the software queues.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
This is useful debugging information that will be used in the blk-mq
debugfs directory.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Changed 'weight' to 'busy'.
Signed-off-by: Jens Axboe <[email protected]>
|
|
The request pointers by themselves aren't super useful.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:
# tree -d /sys/kernel/debug/block
/sys/kernel/debug/block
|-- nvme0n1
| `-- mq
| |-- 0
| | `-- cpu0
| |-- 1
| | `-- cpu1
| |-- 2
| | `-- cpu2
| `-- 3
| `-- cpu3
`-- vda
`-- mq
`-- 0
|-- cpu0
|-- cpu1
|-- cpu2
`-- cpu3
Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: Jens Axboe <[email protected]>
|
|
The script "checkpatch.pl" pointed information out like the following.
ERROR: do not use assignment in if condition
Thus fix the affected source code place.
Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|