aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-02-14Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASANScott Bauer3-90/+50
When CONFIG_KASAN is enabled, compilation fails: block/sed-opal.c: In function 'sed_ioctl': block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Moved all the ioctl structures off the stack and dynamically allocate using _IOC_SIZE() Fixes: 455a7b238cd6 ("block: Add Sed-opal library") Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-14uapi: sed-opal fix IOW for activate lsp to use correct structScott Bauer1-1/+1
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would give us the wrong size when using _IOC_SIZE, switch it to the right structure. Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal") Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-14cdrom: Make device operations read-onlyKees Cook7-45/+37
Since function tables are a common target for attackers, it's best to keep them in read-only memory. As such, this makes the CDROM device ops tables const. This drops additionally n_minors, since it isn't used meaningfully, and sets the only user of cdrom_dummy_generic_packet explicitly so the variables can all be const. Inspired by similar changes in grsecurity/PaX. Signed-off-by: Kees Cook <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-14elevator: fix loading wrong elevator type for blk-mq devicesJens Axboe1-4/+5
The old elevator= boot parameter blindly attempts to load the same scheduler for mq and !mq devices, leading to a crash if we specify the wrong one. Ensure that we only apply this boot parameter to old !mq devices. Signed-off-by: Jens Axboe <[email protected]>
2017-02-13cciss: switch to pci_irq_alloc_vectorsChristoph Hellwig2-43/+17
Simple cleanup to use the new APIs. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Don Brace <[email protected]> Tested-by: Don Brace <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-13block/loop: fix race between I/O and set_statusMing Lei1-5/+12
Inside set_status, transfer need to setup again, so we have to drain IO before the transition, otherwise oops may be triggered like the following: divide error: 0000 [#1] SMP KASAN CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88006ba1e840 task.stack: ffff880067338000 RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP: 0018:ffff88006733f108 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059 RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08 RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000 R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0 Call Trace: lo_do_transfer drivers/block/loop.c:251 [inline] lo_read_transfer drivers/block/loop.c:392 [inline] do_req_filebacked drivers/block/loop.c:541 [inline] loop_handle_cmd drivers/block/loop.c:1677 [inline] loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08 84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7> 7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83 RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP: ffff88006733f108 ---[ end trace 0166f7bd3b0c0933 ]--- Reported-by: Dmitry Vyukov <[email protected]> Cc: [email protected] Signed-off-by: Ming Lei <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-07gdrom: Add missing error codeChristophe JAILLET1-2/+6
In case of error, 'err' is known to be 0 here, because of the previous test. Set it to a -ENOMEM instead. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-06Fix SED-OPAL UAPI structs to prevent 32/64 bit size differences.Scott Bauer1-17/+18
This patch is a quick fixup of the user structures that will prevent the structures from being different sizes on 32 and 64 bit archs. Taking this fix will allow us to *NOT* have to do compat ioctls for the sed code. Signed-off-by: Scott Bauer <[email protected]> Fixes: 19641f2d7674 ("Include: Uapi: Add user ABI for Sed/Opal") Signed-off-by: Jens Axboe <[email protected]>
2017-02-06nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/IoctlsScott Bauer3-0/+46
This patch implements the necessary logic to unlock an Opal enabled device coming back from an S3. The patch also implements the SED/Opal allocation necessary to support the opal ioctls. Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-06block: Add Sed-opal libraryScott Bauer6-0/+3074
This patch implements the necessary logic to bring an Opal enabled drive out of a factory-enabled into a working Opal state. This patch set also enables logic to save a password to be replayed during a resume from suspend. Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-06Include: Uapi: Add user ABI for Sed/OpalScott Bauer1-0/+118
Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-01block: queue lock must be acquired when iterating over rlsTahsin Erdogan1-0/+2
blk_set_queue_dying() does not acquire queue lock before it calls blk_queue_for_each_rl(). This allows a racing blkg_destroy() to remove blkg->q_node from the linked list and have blk_queue_for_each_rl() loop infitely over the removed blkg->q_node list node. Signed-off-by: Tahsin Erdogan <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-01block: Update comments that refer to __bio_map_user() and bio_map_user()Bart Van Assche1-3/+3
Since __bio_map_user() and bio_map_user() have been removed, update the comments that still refer to these functions. Signed-off-by: Bart Van Assche <[email protected]> References: commit ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user") Cc: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31blk-mq: don't fail allocating driver tag for stopped hw queueJens Axboe1-3/+0
We rely on blk_mq_get_driver_tag() not failing if 'wait' is true, but it currently fails in that case if the queue happens to be stopped at the time of the call. We don't need to check for stopped here, it's just assigning the tag. If the queue is stopped, we'll handle it when attempting to run the queue. This fixes a stall/crash on flush intensive workloads, where we proceed to process a flush that doesn't have a valid tag assigned. Signed-off-by: Jens Axboe <[email protected]>
2017-01-31nvme/pci: Don't mark IOD as aborted if abort wasn't sentKeith Busch1-2/+1
This patch sets the aborted flag only if an abort was sent, reducing excessive kernel message spamming for completed IO that wasn't actually aborted. Reported-by: Jens Axboe <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: allow targets to use sysfsJavier González2-0/+15
In order to register through the sysfs interface, a driver needs to know its kobject. On a disk structure, this happens when the partition information is added (device_add_disk), which for lightnvm takes place after the target has been initialized. This means that on target initialization, the kboject has not been created yet. This patch adds a target function to let targets initialize their own kboject as a child of the disk kobject. Signed-off-by: Javier González <[email protected]> Added exit typedef and passed gendisk instead of void pointer for exit. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: free properly on target creation errorJavier González1-1/+1
Fix a memory leak when target creation fails. More specifically, free the entire device structure given to the target (tgt_dev). Signed-off-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: Add CRC read errorJavier González1-0/+1
Let the host differentiate between a read error and a CRC check error on the device side. Signed-off-by: Javier González <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: use end_io callback instead of instanceMatias Bjørling6-20/+13
When the lightnvm core had the "gennvm" layer between the device and the target, there was a need for the core to be able to figure out which target it should send an end_io callback to. Leading to a "double" end_io, first for the media manager instance, and then for the target instance. Now that core and gennvm is merged, there is no longer a need for this, and a single end_io callback will do. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: add ioctls for vector I/OsMatias Bjørling4-0/+280
Enable user-space to issue vector I/O commands through ioctls. To issue a vector I/O, the ppa list with addresses is also required and must be mapped for the controller to access. For each ioctl, the result and status bits are returned as well, such that user-space can retrieve the open-channel SSD completion bits. The implementation covers the traditional use-cases of bad block management, and vectored read/write/erase. Signed-off-by: Matias Bjørling <[email protected]> Metadata implementation, test, and fixes. Signed-off-by: Simon A.F. Lund <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: reduce number of nvm_id groups to oneMatias Bjørling4-58/+47
The number of configuration groups has been limited to one in current code, even if there is support for up to four. With the introduction of the open-channel SSD 1.3 specification, only a single group is exposed onwards. Reflect this in the nvm_id structure. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: cleanup nvm transformation functionsMatias Bjørling2-89/+40
Going from target specific ppa addresses to device was accomplished by first converting target to generic ppa addresses and generic to device addresses. The conversion was either open-coded or used the built-in nvm_trans_* and nvm_map_* functions for conversion. Simplify the interface and cleanup the calls to provide clean functions that now either take a list of ppas or a nvm_rq, and is exposed through: void nvm_ppa_* - target to/from device with a list of PPAs, void nvm_rq_* - target to/from device with a nvm_rq. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: make nvm_map_* return voidMatias Bjørling1-32/+9
The only check there was done was a debugging check. Remove it and replace the return value with void to reduce error checking. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tblMatias Bjørling2-38/+4
Since the merge of gennvm and core, there is no longer a need for the device specific bad block functions. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: remove nvm_submit_ppa* functionsMatias Bjørling2-113/+0
The nvm_submit_ppa* functions are no longer needed after gennvm and core have been merged. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: collapse nvm_erase_ppa and nvm_erase_blkMatias Bjørling2-32/+26
After gennvm and core have been merged, there are no more callers to nvm_erase_ppa. Therefore collapse the device specific and target specific erase functions. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-31lightnvm: merge gennvm with coreMatias Bjørling8-1748/+606
For the first iteration of Open-Channel SSDs, it was anticipated that there could be various media managers on top of an open-channel SSD, such to allow vendors to plug in their own host-side FTLs, without the media manager in between. Now that an Open-Channel SSD is exposed as a traditional block device, there is no longer a need for this. Therefore lets merge the gennvm code with core and simplify the stack. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: fix debugfs compilation issuesOmar Sandoval3-6/+19
This fixes a couple of problems: 1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus. 2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at all. Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option. Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree") Signed-off-by: Omar Sandoval <[email protected]> Augment Kconfig description. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27block: cleanup remaining manual checks for PREFLUSH|FUAJens Axboe2-2/+2
Use op_is_flush() where applicable. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()Jens Axboe8-55/+89
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by: Jens Axboe <[email protected]>
2017-01-27block: add a op_is_flush helperChristoph Hellwig7-28/+26
This centralizes the checks for bios that needs to be go into the flush state machine. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe4-14/+23
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq-sched: fix starvation for multiple hardware queues and shared tagsJens Axboe5-7/+41
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: release driver tag on a requeue eventJens Axboe1-0/+16
We don't want to hold on to this resource when we have a scheduler attached. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: fix potential race in queue restart and driver tag allocationJens Axboe1-1/+9
Once we mark the queue as needing a restart, re-check if we can get a driver tag. This fixes a theoretical issue where the needed IO completes _after_ blk_mq_get_driver_tag() fails, but before we manage to set the restart bit. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: improve scheduler queue sync/async runningJens Axboe1-2/+4
We'll use the same criteria for whether we need to run the queue sync or async when we have a scheduler, as we do without one. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Tested-by: Hannes Reinecke <[email protected]>
2017-01-27blk-mq: move hctx and ctx counters from sysfs to debugfsOmar Sandoval2-64/+181
These counters aren't as out-of-place in sysfs as the other stuff, but debugfs is a slightly better home for them. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfsOmar Sandoval2-92/+132
These statistics _might_ be useful to userspace, but it's better not to commit to an ABI for these yet. Also, the dispatched file in sysfs couldn't be cleared, so make it clearable like the others in debugfs. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add tags and sched_tags bitmaps to debugfsOmar Sandoval1-0/+50
These can be used to debug issues like tag leaks and stuck requests. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move tags and sched_tags info from sysfs to debugfsOmar Sandoval4-45/+86
These are very tied to the blk-mq tag implementation, so exposing them to sysfs isn't a great idea. Move the debugging information to debugfs and add basic entries for the number of tags and the number of reserved tags to sysfs. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: export software queue pending map to debugfsOmar Sandoval1-0/+21
This is useful for debugging problems where we've gotten stuck with requests in the software queues. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27sbitmap: add helpers for dumping to a seq_fileOmar Sandoval2-0/+121
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add extra request information to debugfsOmar Sandoval1-1/+3
The request pointers by themselves aren't super useful. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfsOmar Sandoval2-57/+106
These lists are only useful for debugging; they definitely don't belong in sysfs. Putting them in debugfs also removes the limitation of a single page of output. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: add hctx->{state,flags} to debugfsOmar Sandoval1-0/+42
hctx->state could come in handy for bugs where the hardware queue gets stuck in the stopped state, and hctx->flags is just useful to know. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-27blk-mq: create debugfs directory treeOmar Sandoval6-0/+201
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-26blk-mq-sched: check for successful allocation before assigning tagJens Axboe1-1/+2
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipath dm, since that is a heavy user of atomic request allocations in the map and clone path. Reported-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-26blk-mq: don't lose flags passed in to blk_mq_alloc_request()Jens Axboe2-4/+4
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_request(). Initialize alloc_data->flags before passing it in. Reported-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-25blk-mq: only apply active queue tag throttling for driver tagsJens Axboe2-10/+15
If we have a scheduler attached, we have two sets of tags. We don't want to apply our active queue throttling for the scheduler side of tags, that only applies to driver tags since that's the resource we need to dispatch an IO. Signed-off-by: Jens Axboe <[email protected]>
2017-01-23cfq-iosched: Adjust one function call together with a variable assignmentMarkus Elfring1-2/+4
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code place. Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Jens Axboe <[email protected]>