aboutsummaryrefslogtreecommitdiff
path: root/include/linux/blk-mq.h
AgeCommit message (Collapse)AuthorFilesLines
2017-04-28blk-mq: unify hctx delay_work and run_workJens Axboe1-2/+1
The only difference between ->run_work and ->delay_work, is that the latter is used to defer running a queue. This is done by marking the queue stopped, and scheduling ->delay_work to run sometime in the future. While the queue is stopped, direct runs or runs through ->run_work will not run the queue. If we combine the handlers, then we need to handle two things: 1) If a delayed/stopped run is scheduled, then we should not run the queue before that has been completed. 2) If a queue is delayed/stopped, the handler needs to restart the queue. Normally a run of a queue with the stopped bit set would be a no-op. Case 1 is handled by modifying a currently pending queue run to the deadline set by the caller of blk_mq_delay_queue(). Subsequent attempts to queue a queue run will find the work item already pending, and direct runs will see a stopped queue as before. Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN, that tells the work handler that it should clear a stopped queue and run the handler. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-28blk-mq: unify hctx delayed_run_work and run_workJens Axboe1-2/+1
They serve the exact same purpose. Get rid of the non-delayed work variant, and just run it without delay for the normal case. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-26blk-mq: Add blk_mq_ops.show_rq()Bart Van Assche1-0/+8
This new callback function will be used in the next patch to show more information about SCSI requests. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-20blk-mq: remove the error argument to blk_mq_complete_requestChristoph Hellwig1-1/+1
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-14blk-mq: export helpersOmar Sandoval1-0/+1
blk_mq_finish_request() is required for schedulers that define their own put_request(). blk_mq_run_hw_queue() is required for schedulers that hold back requests to be run later. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-07Merge branch 'for-linus' into for-4.12/blockJens Axboe1-0/+2
We've added a considerable amount of fixes for stalls and issues with the blk-mq scheduling in the 4.11 series since forking off the for-4.12/block branch. We need to do improvements on top of that for 4.12, so pull in the previous fixes to make our lives easier going forward. Signed-off-by: Jens Axboe <[email protected]>
2017-04-07blk-mq: Introduce blk_mq_delay_run_hw_queue()Bart Van Assche1-0/+2
Introduce a function that runs a hardware queue unconditionally after a delay. Note: there is already a function that stops and restarts a hardware queue after a delay, namely blk_mq_delay_queue(). This function will be used in the next patch in this series. Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Long Li <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-04-05blk-mq: Remove blk_mq_queue_data.listBart Van Assche1-1/+0
The block layer core sets blk_mq_queue_data.list but no block drivers read that member. Hence remove it and also the code that is used to set this member. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-29block: rename blk_mq_freeze_queue_start()Ming Lei1-1/+1
As the .q_usage_counter is used by both legacy and mq path, we need to block new I/O if queue becomes dead in blk_queue_enter(). So rename it and we can use this function in both paths. Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-22blk-mq: remove BLK_MQ_F_DEFER_ISSUEChristoph Hellwig1-1/+0
This flag was never used since it was introduced. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-02blk-mq: Provide freeze queue timeoutKeith Busch1-0/+2
A driver may wish to take corrective action if queued requests do not complete within a set time. Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-03-02blk-mq: Export blk_mq_freeze_queue_waitKeith Busch1-0/+1
Drivers can start a freeze, so this provides a way to wait for frozen. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-23blk-mq: use sbq wait queues instead of restart for driver tagsOmar Sandoval1-0/+2
Commit 50e1dab86aa2 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-01-17blk-mq-sched: allow setting of default IO schedulerJens Axboe1-0/+1
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2017-01-17blk-mq-sched: add framework for MQ capable IO schedulersJens Axboe1-1/+4
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2017-01-17blk-mq: un-export blk_mq_free_hctx_request()Jens Axboe1-1/+0
It's only used in blk-mq, kill it from the main exported header and kill the symbol export as well. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Omar Sandoval <[email protected]>
2017-01-11blk-mq: make mq_ops a const pointerJens Axboe1-1/+1
We never change it, make that clear. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Bart Van Assche <[email protected]>
2016-12-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+1
Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (ncr5380, lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas). There's also an assortment of minor fixes, mostly in error legs or other not very user visible stuff. The major change is the pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this effectively makes IRQ mapping generic for the drivers and allows blk_mq to use the information" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits) scsi: qla4xxx: switch to pci_alloc_irq_vectors scsi: hisi_sas: support deferred probe for v2 hw scsi: megaraid_sas: switch to pci_alloc_irq_vectors scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices scsi: be2iscsi: set errno on error path scsi: be2iscsi: set errno on error path scsi: hpsa: fallback to use legacy REPORT PHYS command scsi: scsi_dh_alua: Fix RCU annotations scsi: hpsa: use %phN for short hex dumps scsi: hisi_sas: fix free'ing in probe and remove scsi: isci: switch to pci_alloc_irq_vectors scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI scsi: dpt_i2o: double free on error path scsi: cxlflash: Migrate scsi command pointer to AFU command scsi: cxlflash: Migrate IOARRIN specific routines to function pointers scsi: cxlflash: Cleanup queuecommand() scsi: cxlflash: Cleanup send_tmf() scsi: cxlflash: Remove AFU command lock scsi: cxlflash: Wait for active AFU commands to timeout upon tear down scsi: cxlflash: Remove private command pool ...
2016-12-09blk-mq: add blk_mq_start_stopped_hw_queue()Jens Axboe1-0/+1
We have a variant for all hardware queues, but not one for a single hardware queue. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]>
2016-11-08blk-mq: export blk_mq_map_queuesChristoph Hellwig1-0/+1
This will allow SCSI to have a single blk_mq_ops structure that either lets the LLDD map the queues to PCIe MSIx vectors or use the default. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-02blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()Bart Van Assche1-2/+3
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Introduce blk_mq_quiesce_queue()Bart Van Assche1-0/+3
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations have finished. This function does *not* wait until all outstanding requests have finished (this means invocation of request.end_io()). The algorithm used by blk_mq_quiesce_queue() is as follows: * Hold either an RCU read lock or an SRCU read lock around .queue_rq() calls. The former is used if .queue_rq() does not block and the latter if .queue_rq() may block. * blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues() followed by synchronize_srcu() or synchronize_rcu(). The latter call waits for .queue_rq() invocations that started before blk_mq_quiesce_queue() was called. * The blk_mq_hctx_stopped() calls that control whether or not .queue_rq() will be called are called with the (S)RCU read lock held. This is necessary to avoid race conditions against blk_mq_quiesce_queue(). Signed-off-by: Bart Van Assche <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Remove blk_mq_cancel_requeue_work()Bart Van Assche1-1/+0
Since blk_mq_requeue_work() no longer restarts stopped queues canceling requeue work is no longer needed to prevent that a stopped queue would be restarted. Hence remove this function. Signed-off-by: Bart Van Assche <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Keith Busch <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Introduce blk_mq_queue_stopped()Bart Van Assche1-0/+1
The function blk_queue_stopped() allows to test whether or not a traditional request queue has been stopped. Introduce a helper function that allows block drivers to query easily whether or not one or more hardware contexts of a blk-mq queue have been stopped. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-10-09Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+1
Pull blk-mq CPU hotplug update from Jens Axboe: "This is the conversion of blk-mq to the new hotplug state machine" * 'for-4.9/block-smp' of git://git.kernel.dk/linux-block: blk-mq: fixup "Convert to new hotplug state machine" blk-mq: Convert to new hotplug state machine blk-mq/cpu-notif: Convert to new hotplug state machine
2016-10-09Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-blockLinus Torvalds1-8/+4
Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
2016-09-22blk-mq: add flag for drivers wanting blocking ->queue_rq()Jens Axboe1-0/+1
If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its ->queue_rq() handler. For that case, blk-mq ensures that we always calls it from a safe context. Signed-off-by: Jens Axboe <[email protected]> Tested-by: Josef Bacik <[email protected]>
2016-09-22blk-mq/cpu-notif: Convert to new hotplug state machineThomas Gleixner1-7/+1
Replace the block-mq notifier list management with the multi instance facility in the cpu hotplug state machine. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Christoph Hellwing <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-21blk-mq: register device instead of diskMatias Bjørling1-2/+2
Enable devices without a gendisk instance to register itself with blk-mq and expose the associated multi-queue sysfs entries. Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-17blk-mq: abstract tag allocation out into sbitmap libraryOmar Sandoval1-7/+2
This is a generally useful data structure, so make it available to anyone else who might want to use it. It's also a nice cleanup separating the allocation logic from the rest of the tag handling logic. The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only selected by CONFIG_BLOCK for now. This should be a complete noop functionality-wise. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-15blk-mq: get rid of the cpumask in struct blk_mq_tagsChristoph Hellwig1-1/+0
Unused now that NVMe sets up irq affinity before calling into blk-mq. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-15blk-mq: allow the driver to pass in a queue mappingChristoph Hellwig1-0/+3
This allows drivers specify their own queue mapping by overriding the setup-time function that builds the mq_map. This can be used for example to build the map based on the MSI-X vector mapping provided by the core interrupt layer for PCI devices. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-15blk-mq: remove ->map_queueChristoph Hellwig1-7/+0
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-15blk-mq: only allocate a single mq_map per tag_setChristoph Hellwig1-0/+1
The mapping is identical for all queues in a tag_set, so stop wasting memory for building multiple. Note that for now I've kept the mq_map pointer in the request_queue, but we'll need to investigate if we can remove it without suffering too much from the additional pointer chasing. The same would apply to the mq_ops pointer as well. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-14blk-mq: introduce blk_mq_delay_kick_requeue_list()Mike Snitzer1-0/+1
blk_mq_delay_kick_requeue_list() provides the ability to kick the q->requeue_list after a specified time. To do this the request_queue's 'requeue_work' member was changed to a delayed_work. blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued requests while it doesn't make sense to immediately requeue them (e.g. when all paths in a DM multipath have failed). Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-14block: remove blk_mq_alloc_single_hw_queue() prototypeLinus Walleij1-1/+0
The blk_mq_alloc_single_hw_queue() is a prototype artifact that should have been removed with commit cdef54dd85ad66e77262ea57796a3e81683dd5d6 "blk-mq: remove alloc_hctx and free_hctx methods" where the last users of it were deleted. Fixes: cdef54dd85ad ("blk-mq: remove alloc_hctx and free_hctx methods") Signed-off-by: Linus Walleij <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-09-14block: add poll_considered statisticStephen Bates1-0/+1
In order to help determine the effectiveness of polling in a running system it is usful to determine the ratio of how often the poll function is called vs how often the completion is checked. For this reason we add a poll_considered variable and add it to the sysfs entry for io_poll. Signed-off-by: Stephen Bates <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-29blk-mq: improve layout of blk_mq_hw_ctxJens Axboe1-4/+5
Various cache line optimizations: - Move delay_work towards the end. It's huge, and we don't use it a lot (only SCSI). - Move the atomic state into the same cacheline as the the dispatch list and lock. - Rearrange a few members to pack it better. - Shrink the max-order for dispatch accounting from 10 to 7. This means that ->dispatched[] and ->run now take up their own cacheline. This shrinks struct blk_mq_hw_ctx down to 8 cachelines. Signed-off-by: Jens Axboe <[email protected]>
2016-08-29blk-mq: turn hctx->run_work into a regular work structJens Axboe1-1/+1
We don't need the larger delayed work struct, since we always run it immediately. Signed-off-by: Jens Axboe <[email protected]>
2016-07-08blk-mq: Introduce blk_mq_reinit_tagsetSagi Grimberg1-0/+3
The new nvme-rdma driver will need to reinitialize all the tags as part of the error recovery procedure (realloc the tag memory region). Add a helper in blk-mq for it that can iterate over all requests in a tagset to make this easier. Signed-off-by: Sagi Grimberg <[email protected]> Tested-by: Ming Lin <[email protected]> Reviewed-by: Stephen Bates <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Steve Wise <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-07-05blk-mq: add blk_mq_alloc_request_hctxMing Lin1-0/+2
For some protocols like NVMe over Fabrics we need to be able to send initialization commands to a specific queue. Based on an earlier patch from Christoph Hellwig <[email protected]>. Signed-off-by: Ming Lin <[email protected]> [hch: disallow sleeping allocation, req_op fixes] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-04-12blk-mq: Make blk_mq_all_tag_busy_iter staticSagi Grimberg1-2/+0
No caller outside the blk-mq code so we can settle with it static. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-04-12blk-mq: Export tagset iter functionSagi Grimberg1-0/+2
Its useful to iterate on all the active tags in cases where we will need to fail all the queues IO. Signed-off-by: Sagi Grimberg <[email protected]> [hch: carefully check for valid tagsets] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-03-20blk-mq: Use proper cpumask iteratorThomas Gleixner1-14/+0
queue_for_each_ctx() iterates over per_cpu variables under the assumption that the possible cpu mask cannot have holes. That's wrong as all cpumasks can have holes. In case there are holes the iteration ends up accessing uninitialized memory and crashing as a result. Replace the macro by a proper for_each_possible_cpu() loop and drop the unused macro blk_ctx_sum() which references queue_for_each_ctx(). Reported-by: Xiong Zhou <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-02-09blk-mq: dynamic h/w context countKeith Busch1-0/+2
The hardware's provided queue count may change at runtime with resource provisioning. This patch allows a block driver to alter the number of h/w queues available when its resource count changes. The main part is a new blk-mq API to request a new number of h/w queues for a given live tag set. The new API freezes all queues using that set, then adjusts the allocated count prior to remapping these to CPUs. The bulk of the rest just shifts where h/w contexts and all their artifacts are allocated and freed. The number of max h/w contexts is capped to the number of possible cpus since there is no use for more than that. As such, all pre-allocated memory for pointers need to account for the max possible rather than the initial number of queues. A side effect of this is that the blk-mq will proceed successfully as long as it can allocate at least one h/w context. Previously it would fail request queue initialization if less than the requested number was allocated. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Jon Derrick <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-12-01blk-mq: add a flags parameter to blk_mq_alloc_requestChristoph Hellwig1-1/+7
We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-11-07block: add block polling supportJens Axboe1-0/+10
Add basic support for polling for specific IO to complete. This uses the cookie that blk-mq passes back, which enables the block layer to pass this cookie to the driver to spin for a specific request. This will be combined with request latency tracking, so we can make qualified decisions about when to poll and when not to. For now, for benchmark purposes, we add a sysfs file that controls whether polling is enabled or not. Signed-off-by: Jens Axboe <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Acked-by: Keith Busch <[email protected]>
2015-10-21block: generic request_queue reference countingDan Williams1-1/+0
Allow pmem, and other synchronous/bio-based block drivers, to fallback on a per-cpu reference count managed by the core for tracking queue live/dead state. The existing per-cpu reference count for the blk_mq case is promoted to be used in all block i/o scenarios. This involves initializing it by default, waiting for it to drop to zero at exit, and holding a live reference over the invocation of q->make_request_fn() in generic_make_request(). The blk_mq code continues to take its own reference per blk_mq request and retains the ability to freeze the queue, but the check that the queue is frozen is moved to generic_make_request(). This fixes crash signatures like the following: BUG: unable to handle kernel paging request at ffff880140000000 [..] Call Trace: [<ffffffff8145e8bf>] ? copy_user_handle_tail+0x5f/0x70 [<ffffffffa004e1e0>] pmem_do_bvec.isra.11+0x70/0xf0 [nd_pmem] [<ffffffffa004e331>] pmem_make_request+0xd1/0x200 [nd_pmem] [<ffffffff811c3162>] ? mempool_alloc+0x72/0x1a0 [<ffffffff8141f8b6>] generic_make_request+0xd6/0x110 [<ffffffff8141f966>] submit_bio+0x76/0x170 [<ffffffff81286dff>] submit_bh_wbc+0x12f/0x160 [<ffffffff81286e62>] submit_bh+0x12/0x20 [<ffffffff813395bd>] jbd2_write_superblock+0x8d/0x170 [<ffffffff8133974d>] jbd2_mark_journal_empty+0x5d/0x90 [<ffffffff813399cb>] jbd2_journal_destroy+0x24b/0x270 [<ffffffff810bc4ca>] ? put_pwq_unlocked+0x2a/0x30 [<ffffffff810bc6f5>] ? destroy_workqueue+0x225/0x250 [<ffffffff81303494>] ext4_put_super+0x64/0x360 [<ffffffff8124ab1a>] generic_shutdown_super+0x6a/0xf0 Cc: Jens Axboe <[email protected]> Cc: Keith Busch <[email protected]> Cc: Ross Zwisler <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Ross Zwisler <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-10-01blk-mq: factor out a helper to iterate all tags for a request_queueChristoph Hellwig1-2/+0
And replace the blk_mq_tag_busy_iter with it - the driver use has been replaced with a new helper a while ago, and internal to the block we only need the new version. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-10-01blk-mq: fix racy updates of rq->errorsChristoph Hellwig1-1/+1
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>