aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-12-22nvme: special case AEN requestsChristoph Hellwig1-35/+40
AEN requests are different from other requests in that they don't time out or can easily be cancelled. Because of that we should not use the blk-mq infrastructure but just special case them in the completion path. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: switch abort to blk_execute_rq_nowaitChristoph Hellwig1-35/+26
And remove the now unused nvme_submit_cmd helper. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: switch delete SQ/CQ to blk_execute_rq_nowaitChristoph Hellwig1-34/+15
Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: factor out a few helpers from req_completionChristoph Hellwig3-10/+20
We'll need them in other places later. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: fix admin queue depthChristoph Hellwig1-1/+1
The number in tag_set->queue depth includes the reserved tags. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22NVMe: Simplify metadata setupKeith Busch1-25/+3
We no longer require the two-pass setup for block integrity. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22NVMe: Remove device management handles on removeKeith Busch3-4/+11
We don't want to allow new references to open on a device that is removed. This ties the lifetime of these handles to the physical device's presence rather than to the open reference count. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22NVMe: Use unbounded work queue for all workKeith Busch1-6/+6
Removes all usage of the global work queue so work can't be scheduled on two different work queues, and removes nvme's work queue singlethreadedness so controllers can be driven in parallel. Signed-off-by: Keith Busch <[email protected]> [hch: keep the dead controller removal on the system workqueue to avoid deadlocks] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22NVMe: Implement namespace list scanningKeith Busch3-9/+79
The NVMe 1.1 specification provides an identify mode to return a list of active namespaces. This is more efficient to discover which namespace identifiers are active on a controller, providing potentially significant improvement in scan time for controllers with sparesly populated namespaces. Signed-off-by: Keith Busch <[email protected]> [hch: add quirk for the broken Qemu Identify implementation. To be relaxed later] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: switch abort_limit to an atomic_tChristoph Hellwig3-6/+7
There is no lock to sychronize access to the abort_limit field of struct nvme_ctrl, so switch it to an atomic_t. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: remove dead controllers from a work itemChristoph Hellwig1-13/+9
Compared to the kthread this gives us multiple call prevention for free. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: merge probe_work and reset_workChristoph Hellwig1-35/+39
If we're using two work queues we're always going to run into races where one item is tearing down what the other one is initializing. So insted merge the two work queues, and let the old probe_work also tear the controller down first if it was alive. Together with the better detection of the probe path using a flag this gives us a properly serialized reset/probe path that also doesn't accidentally trigger when two commands time out and the second one tries to reset the controller while the first reset is still in progress. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: do not restart the request timeout if we're resetting the controllerKeith Busch1-9/+16
Otherwise we're never going to complete a command when it is restarted just after we completed all other outstanding commands in nvme_clear_queue. The controller must be disabled prior to completing a presumed lost command, do this by directly shutting down the controller before queueing the reset work, and return EH_HANDLED from the timeout handler after we shut the controller down. Signed-off-by: Keith Busch <[email protected]> [hch: split and rebase] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: simplify resetsChristoph Hellwig1-26/+13
Don't delete the controller from dev_list before queuing a reset, instead just check for it being reset in the polling kthread. This allows to remove the dev_list_lock in various places, and in addition we can simply rely on checking the queue_work return value to see if we could reset a controller. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: add NVME_SC_CANCELLEDChristoph Hellwig2-1/+11
To properly document how we are using a negative Linux error value to communicate request cancellations inside the driver. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: merge nvme_abort_req and nvme_timeoutChristoph Hellwig1-29/+18
We want to be able to return bettern error values frmo nvme_timeout, which is significantly easier if the two functions are merged. Also clean up and reduce the printk spew so that we only get one message per abort. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: don't take the I/O queue q_lock in nvme_timeoutChristoph Hellwig1-4/+2
There is nothing it protects, but it makes lockdep unhappy in many different ways. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: protect against simultaneous shutdown invocationsKeith Busch1-0/+5
Signed-off-by: Keith Busch <[email protected]> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: only add a controller to dev_list after it's been fully initializedChristoph Hellwig1-21/+30
Without this we can easily get bad derferences on nvmeq->d_db when the nvme kthread tries to poll the CQs for controllers that are in half initialized state. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22nvme: only ignore hardware errors in nvme_create_io_queuesChristoph Hellwig1-15/+20
Half initialized queues due to kernel error returns or timeout are still a good reason to give up on initializing a controller. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-22block: defer timeouts to a workqueueChristoph Hellwig5-6/+24
Timer context is not very useful for drivers to perform any meaningful abort action from. So instead of calling the driver from this useless context defer it to a workqueue as soon as possible. Note that while a delayed_work item would seem the right thing here I didn't dare to use it due to the magic in blk_add_timer that pokes deep into timer internals. But maybe this encourages Tejun to add a sensible API for that to the workqueue API and we'll all be fine in the end :) Contains a major update from Keith Bush: "This patch removes synchronizing the timeout work so that the timer can start a freeze on its own queue. The timer enters the queue, so timer context can only start a freeze, but not wait for frozen." Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-09nvme: precedence bug in nvme_pr_clear()Dan Carpenter1-1/+1
The "|" operator has higher precedence than "?:" so this didn't work as intended. I had previously fixed this bug, but it we copied the older unfixed version when we moved the function between files. Fixes: 1673f1f08c88 ('nvme: move block_device_operations and ns/ctrl freeing to common code') Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-09blk-integrity: checking for NULL instead of IS_ERRDan Carpenter1-5/+4
We recently changed bio_integrity_alloc() to return ERR_PTRs instead of NULL but these calls were missed. Fixes: 06c1e3902aa7 ('blk-integrity: empty implementation when disabled') Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-08nvme: fix another 32-bit build warningArnd Bergmann1-1/+1
The nvme_user_cmd function was recently moved around from one file to another, which made a warning reappear that I had fixed before at some point: drivers/nvme/host/core.c: In function 'nvme_user_cmd': drivers/nvme/host/core.c:424:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] This applies the same workaround that we have elsewhere in the driver with an extra type cast to uintptr_t. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Link: https://lkml.org/lkml/2015/10/9/611 Signed-off-by: Jens Axboe <[email protected]>
2015-12-03NVMe: fix build with CONFIG_NVM enabledChristoph Hellwig1-19/+16
Looks like I didn't test with CONFIG_NVM enabled, and neither did the build bot. Most of this is really weird crazy shit in the lighnvm support, though. Struct nvme_ns is a structure for the NVM I/O command set, and it has no business poking into it. Second this commit: commit 47b3115ae7b799be8b77b0f024215ad4f68d6460 Author: Wenwei Tao <[email protected]> Date: Fri Nov 20 13:47:55 2015 +0100 nvme: lightnvm: use admin queues for admin cmds Does even more crazy stuff. If a function gets a request_queue parameter passed it'd better use that and not look for another one. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-03blk-integrity: empty implementation when disabledKeith Busch4-16/+28
This patch moves the blk_integrity_payload definition outside the CONFIG_BLK_DEV_INTERITY dependency and provides empty function implementations when the kernel configuration disables integrity extensions. This simplifies drivers that make use of these to map user data so they don't need to repeat the same configuration checks. Signed-off-by: Keith Busch <[email protected]> Updated by Jens to pass an error pointer return from bio_integrity_alloc(), otherwise if CONFIG_BLK_DEV_INTEGRITY isn't set, we return a weird ENOMEM from __nvme_submit_user_cmd() if a meta buffer is set. Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: refactor set_queue_countChristoph Hellwig3-21/+30
Split out a helper that just issues the Set Features and interprets the result which can go to common code, and document why we are ignoring non-timeout error returns in the PCIe driver. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move chardev and sysfs interface to common codeChristoph Hellwig4-200/+241
For this we need to add a proper controller init routine and a list of all controllers that is in addition to the list of PCIe controllers, which stays in pci.c. Note that we remove the sysfs device when the last reference to a controller is dropped now - the old code would have kept it around longer, which doesn't make much sense. This requires a new ->reset_ctrl operation to implement controleller resets, and a new ->write_reg32 operation that is required to implement subsystem resets. We also now store caches copied of the NVMe compliance version and the flag if a controller is attached to a subsystem or not in the generic controller structure now. Signed-off-by: Christoph Hellwig <[email protected]> [Fixes for pr merge] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move namespace scanning to common codeChristoph Hellwig3-196/+240
The namespace scanning code has been mostly generic already, we just need to store a pointer to the tagset in the nvme_ctrl structure, and add a method to check if a controller is I/O incapable. The latter will hopefully be replaced by a proper controller state machine soon. Signed-off-by: Christoph Hellwig <[email protected]> [Fixed pr conflicts] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move the call to nvme_init_identify earlierChristoph Hellwig1-6/+4
We want to record the identify and CAP values even if no I/O queue is available. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: add a common helper to read Identify Controller dataChristoph Hellwig3-38/+71
And add the 64-bit register read operation for it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move nvme_{enable,disable,shutdown}_ctrl to common codeChristoph Hellwig3-109/+141
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move remaining CC setup into nvme_enable_ctrlChristoph Hellwig1-23/+21
Remove the calculation of all the bits written into the CC register into nvme_enable_ctrl, so that they can be moved into the core NVMe driver in the future. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: add explicit quirk handlingChristoph Hellwig2-3/+18
Add an enum for all workarounds not in the spec and identify the affected controllers at probe time. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move block_device_operations and ns/ctrl freeing to common codeChristoph Hellwig3-400/+439
This moves the block_device_operations over to common code mostly as-is. The only change is that the ns and ctrl refcounting got some small refcounting to have wrappers around the kref_put operations. A new free_ctrl operation is added to allow the PCI driver to free it's ressources on the final drop. Signed-off-by: Christoph Hellwig <[email protected]> [Moved the integrity and pr changes due to merge conflict] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: use the block layer for userspace passthrough metadataKeith Busch3-43/+83
Use the integrity API to pass through metadata from userspace. For PI enabled devices this means that we now validate the reftag, which seems like an unintentional ommission in the old code. Thanks to Keith Busch for testing and fixes. Signed-off-by: Christoph Hellwig <[email protected]> [Skip metadata setup on admin commands] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: split __nvme_submit_sync_cmdChristoph Hellwig4-31/+68
Add a separate nvme_submit_user_cmd for commands that directly DMA to or from userspace. We'll add metadata support to that soon and the common version would become too messy. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move nvme_setup_flush and nvme_setup_rw to common codeChristoph Hellwig2-49/+51
And mark them inline so that we don't slow down the I/O submission path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move nvme_error_status to common codeChristoph Hellwig2-12/+12
And mark it inline so that we don't slow down the completion path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: factor out a nvme_unmap_data helperChristoph Hellwig1-18/+25
This is the counter part to nvme_map_data. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: refactor nvme_queue_rqChristoph Hellwig1-122/+97
This "backports" the structure I've used for the fabrics driver. It mostly started out as a cleanup so that I could actually understand the code, but I think it also qualifies as a micro-optimization due to the reduced time we hold q_lock and disable interrupts. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: simplify nvme_setup_prps calling conventionChristoph Hellwig1-12/+10
Pass back a true/false value instead of the length which needs a compare with the bytes in the request and drop the pointless gfp_t argument. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: split a new struct nvme_ctrl out of struct nvme_devChristoph Hellwig4-157/+193
The new struct nvme_ctrl will be used by the common NVMe code that sits on top of struct request_queue and the new nvme_ctrl_ops abstraction. It only contains the bare minimum required, which consists of values sampled during controller probe, the admin queue pointer and a second struct device pointer at the moment, but more will follow later. Only values that are not used in the I/O fast path should be moved to struct nvme_ctrl so that drivers can optimize their cache line usage easily. That's also the reason why we have two device pointers as the struct device is used for DMA mapping purposes. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: use vendor it from identifyChristoph Hellwig1-2/+11
Use the vendor ID from the identify data instead of the PCI device to make the SCSI translation layer independent from the PCI driver. The NVMe spec defines them as having the same value for current PCIe devices. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: split nvme_trans_device_id_pageChristoph Hellwig1-56/+79
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: use offset instead of a struct for registersChristoph Hellwig4-46/+49
This makes life easier for future non-PCI drivers where access to the registers might be more complicated. Note that Linux drivers are pretty evenly split between the two versions, and in fact the NVMe driver already uses offsets for the doorbells. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> [Fixed CMBSZ offset] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: split command submission helpers out of pci.cChristoph Hellwig4-155/+178
Create a new core.c and start by adding the command submission helpers to it, which are already abstracted away from the actual hardware queues by the block layer. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01nvme: move struct nvme_iod to pci.cChristoph Hellwig2-17/+17
This structure is specific to the PCIe driver internals and should be moved to pci.c. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01blk-mq: add a flags parameter to blk_mq_alloc_requestChristoph Hellwig11-42/+42
We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-11-25Revert "blk-flush: Queue through IO scheduler when flush not required"Jens Axboe1-1/+1
This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Jan writes: -- Thanks for report! After some investigation I found out we allocate elevator specific data in __get_request() only for non-flush requests. And this is actually required since the flush machinery uses the space in struct request for something else. Doh. So my patch is just wrong and not easy to fix since at the time __get_request() is called we are not sure whether the flush machinery will be used in the end. Jens, please revert 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks! I'm somewhat surprised that you can reliably hit the race where flushing gets disabled for the device just while the request is in flight. But I guess during boot it makes some sense. -- So let's just revert it, we can fix the queue run manually after the fact. This race is rare enough that it didn't trigger in testing, it requires the specific disable-while-in-flight scenario to trigger.