aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2016-12-02blk-stat: fix a typoShaohua Li1-1/+1
Signed-off-by: Shaohua Li <[email protected]> Fixes: cf43e6be865a ("block: add scalable completion tracking of requests") Signed-off-by: Jens Axboe <[email protected]>
2016-12-01block: factor out req_set_nomergeRitesh Harjani1-9/+10
Factor out common code for setting REQ_NOMERGE flag which is being used out at certain places and make it a helper instead, req_set_nomerge(). Signed-off-by: Ritesh Harjani <[email protected]> Get rid of the inline. Signed-off-by: Jens Axboe <[email protected]>
2016-12-01block: add support for REQ_OP_WRITE_ZEROESChaitanya Kulkarni7-8/+105
This adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future, this should also help with improving the way zeroing discards work. For this operation, suitable entry is exported in sysfs which indicate the number of maximum bytes allowed in one write zeroes operation by the device. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-12-01block: add async variant of blkdev_issue_zerooutChaitanya Kulkarni1-34/+81
Similar to __blkdev_issue_discard this variant allows submitting the final bio asynchronously and chaining multiple ranges into a single completion. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-12-01block: Check partition alignment on zoned block devicesDamien Le Moal1-0/+65
Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of a zoned block device. However, the first and last zones reported for a partition make sense only if the partition start sector and size are aligned on the device zone size. The same applies for zone reset. Resetting the first or the last zone of a partition straddling zones may impact neighboring partitions. Finally, if a partition start sector is not at the beginning of a sequential zone, it will be impossible to write to the first sectors of the partition on a host-managed device. Avoid all these problems and incoherencies by ignoring partitions that are not zone aligned. Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the correct disk zoning type (host-aware, host-managed or none) but bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone size is unknown). So test this as a way to ensure that a zoned block device is being handled as such. As a result, for a host-aware devices, unaligned zone partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the disk will be treated as a regular block device (as it should). If zoned block device support is enabled, only aligned partitions will be accepted. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-29blk-mq: Drop explicit timeout sync in hotplugGabriel Krisman Bertazi1-8/+1
After commit 287922eb0b18 ("block: defer timeouts to a workqueue"), deleting the timeout work after freezing the queue shouldn't be necessary, since the synchronization is already enforced by the acquisition of a q_usage_counter reference in blk_mq_timeout_work. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-28blk-wbt: allow wbt to be enabled always through sysfsJens Axboe3-7/+29
Currently there's no way to enable wbt if it's not enabled in the kernel config by default for a device. Allow a write to the 'wbt_lat_usec' queue sysfs file to enable wbt. This is useful for both the kernel config case, but also if the device is CFQ managed and it was turned off by default. Signed-off-by: Jens Axboe <[email protected]>
2016-11-28blk-wbt: cleanup disable-by-default for CFQJens Axboe3-10/+13
Make it clear that we are disabling wbt for the specified queued, if it was enabled by default. This is in preparation for allowing users to re-enable wbt, and not have it disabled automatically again. Signed-off-by: Jens Axboe <[email protected]>
2016-11-28blk-wbt: allow reset of default latency through sysfsJens Axboe3-11/+34
Allow a write of '-1' to reset the default latency target for a given device. This removes knowledge of the different default settings for rotational vs non-rotational from user space. Signed-off-by: Jens Axboe <[email protected]>
2016-11-22block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcgTejun Heo2-5/+7
blkcg allocates some per-cgroup data structures with GFP_NOWAIT and when that fails falls back to operations which aren't specific to the cgroup. Occassional failures are expected under pressure and falling back to non-cgroup operation is the right thing to do. Unfortunately, I forgot to add __GFP_NOWARN to these allocations and these expected failures end up creating a lot of noise. Add __GFP_NOWARN. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Marc MERLIN <[email protected]> Reported-by: Vlastimil Babka <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-22block: bio: pass bvec table to bio_init()Ming Lei1-2/+6
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Fixed up the new O_DIRECT cases. Signed-off-by: Jens Axboe <[email protected]>
2016-11-21block: apply blk_partition_remap to REQ_OP_ZONE_RESETShaun Tancheff1-1/+6
If a ZBC device is partitioned and operations are performed on the partition the zone information is rebased to the partition, however the zone reset is not mapped from the partition to device as are other operations. This causes the API (report zones / reset zone) to be unbalanced in this regard. Checking for the zone reset op code explicitly will balance the API. Signed-off-by: Shaun Tancheff <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-17scsi: fc: move FC transport's bsg code to bsg-libJohannes Thumshirn1-2/+1
Now that all conversions are done, move the FibreChannel bsg code over to the bsg library. This patch is derived from work done by Mike Christie in 2011 [1] but only the iscsi parts got merged back then. [1] http://marc.info/?l=linux-scsi&m=131149780921009&w=2 Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-17block: add bsg_job_put() and bsg_job_get()Johannes Thumshirn1-3/+14
Add bsg_job_put() and bsg_job_get() so don't need to export bsg_destroy_job() any more. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-17scsi: fc: use bsg_softirq_doneJohannes Thumshirn1-1/+2
bsg_softirq_done() and fc_bsg_softirq_done() are copies of each other, so ditch the fc specific one. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-17scsi: fc: Use bsg_destroy_jobJohannes Thumshirn1-2/+5
fc_destroy_bsgjob() and bsg_destroy_job() are now 1:1 copies, so use the latter. As bsg_destroy_job() comes from bsg-lib we need to select it in Kconfig once CONFOG_SCSI_FC_ATTRS is active. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-17block: add reference counting for struct bsg_jobJohannes Thumshirn1-2/+5
Add reference counting to 'struct bsg_job' so we can implement a reuqest timeout handler for bsg_jobs, which is needed for Fibre Channel. Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-17blk-mq: make the polling code adaptiveJens Axboe2-11/+82
The previous commit introduced the hybrid sleep/poll mode. Take that one step further, and use the completion latencies to automatically sleep for half the mean completion time. This is a good approximation. This changes the 'io_poll_delay' sysfs file a bit to expose the various options. Depending on the value, the polling code will behave differently: -1 Never enter hybrid sleep mode 0 Use half of the completion mean for the sleep delay >0 Use this specific value as the sleep delay Signed-off-by: Jens Axboe <[email protected]> Tested-By: Stephen Bates <[email protected]> Reviewed-By: Stephen Bates <[email protected]>
2016-11-17blk-mq: implement hybrid poll mode for sync O_DIRECTJens Axboe3-0/+80
This patch enables a hybrid polling mode. Instead of polling after IO submission, we can induce an artificial delay, and then poll after that. For example, if the IO is presumed to complete in 8 usecs from now, we can sleep for 4 usecs, wake up, and then do our polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, we can achieve big latency reductions while still using the same (or less) amount of CPU. Signed-off-by: Jens Axboe <[email protected]> Tested-By: Stephen Bates <[email protected]> Reviewed-By: Stephen Bates <[email protected]>
2016-11-16blk-wbt: fix old-style function declarationArnd Bergmann1-1/+1
The newly added driver causes a harmless warning in some configurations: block/blk-wbt.c:250:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration] static bool inline stat_sample_valid(struct blk_rq_stat *stat) This makes it use the expected format for the declaration. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-16block: deal with stale req count of plug listMing Lei2-1/+11
In both legacy and mq path, req count of plug list is computed before allocating request, so the number can be stale when falling back to slept allocation, also the new introduced wbt can sleep too. This patch deals with the case by checking if plug list becomes empty, and fixes the KASAN report of 'BUG: KASAN: stack-out-of-bounds' which is introduced by Shaohua's patches of dispatching big request. Fixes: 600271d900002(blk-mq: immediately dispatch big size request) Fixes: 50d24c34403c6(block: immediately dispatch big size request) Cc: Shaohua Li <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-14bsg: Add sparse annotations to bsg_request_fn()Bart Van Assche1-0/+2
Avoid that sparse complains about unbalanced lock actions. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-11blk-wbt: use BLK_STAT_{READ,WRITE} instead of 0/1Jens Axboe1-6/+6
Since we have proper enums for the stats directions, use them. Signed-off-by: Jens Axboe <[email protected]>
2016-11-11blk-wbt: remove stat opsJens Axboe3-43/+8
Again a leftover from when the throttling code was generic. Now that we just have the block user, get rid of the stat ops and indirections. Signed-off-by: Jens Axboe <[email protected]>
2016-11-11blk-wbt: store queue instead of bdiJens Axboe2-11/+13
The bdi was a leftover from when the code was block layer agnostic. Now that we just support a block layer user, store the queue directly. Signed-off-by: Jens Axboe <[email protected]>
2016-11-11block: move poll code to blk-mqJens Axboe2-46/+54
The poll code is blk-mq specific, let's move it to blk-mq.c. This is a prep patch for improving the polling code. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2016-11-11blk-mq: blk_mq_try_issue_directly() should lookup hardware queueJens Axboe1-4/+4
A previous commit changed this to pass in the hardware queue, but it was using the wrong hardware queue. Hence a request that was allocated on one hardware queue ended up being issued on another one, and that caused IO timeouts and oopses on some drivers. Since the request holds hardware queue private resources, like a tag, we can't just issue it on a different hardware queue. Fixes: 2253efc850c4 ("blk-mq: Move more code into blk_mq_direct_issue_request()") Signed-off-by: Jens Axboe <[email protected]>
2016-11-10block: hook up writeback throttlingJens Axboe6-4/+171
Enable throttling of buffered writeback to make it a lot more smooth, and has way less impact on other system activity. Background writeback should be, by definition, background activity. The fact that we flush huge bundles of it at the time means that it potentially has heavy impacts on foreground workloads, which isn't ideal. We can't easily limit the sizes of writes that we do, since that would impact file system layout in the presence of delayed allocation. So just throttle back buffered writeback, unless someone is waiting for it. The algorithm for when to throttle takes its inspiration in the CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors the minimum latencies of requests over a window of time. In that window of time, if the minimum latency of any request exceeds a given target, then a scale count is incremented and the queue depth is shrunk. The next monitoring window is shrunk accordingly. Unlike CoDel, if we hit a window that exhibits good behavior, then we simply increment the scale count and re-calculate the limits for that scale value. This prevents us from oscillating between a close-to-ideal value and max all the time, instead remaining in the windows where we get good behavior. Unlike CoDel, blk-wb allows the scale count to to negative. This happens if we primarily have writes going on. Unlike positive scale counts, this doesn't change the size of the monitoring window. When the heavy writers finish, blk-bw quickly snaps back to it's stable state of a zero scale count. The patch registers a sysfs entry, 'wb_lat_usec'. This sets the latency target to me met. It defaults to 2 msec for non-rotational storage, and 75 msec for rotational storage. Setting this value to '0' disables blk-wb. Generally, a user would not have to touch this setting. We don't enable WBT on devices that are managed with CFQ, and have a non-root block cgroup attached. If we have a proportional share setup on this particular disk, then the wbt throttling will interfere with that. We don't have a strong need for wbt for that case, since we will rely on CFQ doing that for us. Signed-off-by: Jens Axboe <[email protected]>
2016-11-10blk-wbt: add general throttling mechanismJens Axboe3-0/+901
We can hook this up to the block layer, to help throttle buffered writes. wbt registers a few trace points that can be used to track what is happening in the system: wbt_lat: 259:0: latency 2446318 wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1, wmean=518866, wmin=15522, wmax=5330353, wsamples=57 wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32 This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat dumps the current read/write stats for that window, and wbt_step shows a step down event where we now scale back writes. Each trace includes the device, 259:0 in this case. Signed-off-by: Jens Axboe <[email protected]>
2016-11-10block: add scalable completion tracking of requestsJens Axboe8-3/+404
For legacy block, we simply track them in the request queue. For blk-mq, we track them on a per-sw queue basis, which we can then sum up through the hardware queues and finally to a per device state. The stats are tracked in, roughly, 0.1s interval windows. Add sysfs files to display the stats. The feature is off by default, to avoid any extra overhead. In-kernel users of it can turn it on by setting QUEUE_FLAG_STATS in the queue flags. We currently don't turn it on if someone just reads any of the stats files, that is something we could add as well. Signed-off-by: Jens Axboe <[email protected]>
2016-11-10block: cfq_cpd_alloc() should use @gfpTejun Heo1-1/+1
cfq_cpd_alloc() which is the cpd_alloc_fn implementation for cfq was incorrectly hard coding GFP_KERNEL instead of using the mask specified through the @gfp parameter. This currently doesn't cause any actual issues because all current callers specify GFP_KERNEL. Fix it. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Dan Carpenter <[email protected]> Fixes: e4a9bde9589f ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods") Signed-off-by: Jens Axboe <[email protected]>
2016-11-08block: set REQ_SYNC if we clear REQ_FUA|REQ_PREFLUSHJens Axboe1-0/+7
If we insert a flush request, we clear REQ_PREFLUSH and/or REQ_FUA, depending on flush settings. Since op_is_sync() factors those flags in for deciding whether this request is sync or not, we should set REQ_SYNC to avoid screwing up this accounting. This should be less fragile. Reported-by: Logan Gunthorpe <[email protected]> Fixes: b685d3d65ac ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous") Signed-off-by: Jens Axboe <[email protected]>
2016-11-08blk-mq: export blk_mq_map_queuesChristoph Hellwig2-1/+1
This will allow SCSI to have a single blk_mq_ops structure that either lets the LLDD map the queues to PCIe MSIx vectors or use the default. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2016-11-06blk-mq: Always schedule hctx->next_cpuGabriel Krisman Bertazi1-3/+1
Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU") attempts to avoid triggering the WARN_ON in __blk_mq_run_hw_queue when the expected CPU is dead. Problem is, in the last batch execution before round robin, blk_mq_hctx_next_cpu can schedule a dead CPU and also update next_cpu to the next alive CPU in the mask, which will trigger the WARN_ON despite the previous workaround. The following patch fixes this scenario by always scheduling the value in hctx->next_cpu. This changes the moment when we round-robin the CPU running the hctx, but it really doesn't matter, since it still executes BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU. Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU") Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-05block: add code to track actual device queue depthJens Axboe1-0/+12
For blk-mq, ->nr_requests does track queue depth, at least at init time. But for the older queue paths, it's simply a soft setting. On top of that, it's generally larger than the hardware setting on purpose, to allow backup of requests for merging. Fill a hole in struct request with a 'queue_depth' member, that drivers can call to more closely inform the block layer of the real queue depth. Signed-off-by: Jens Axboe <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2016-11-03blk-mq: immediately dispatch big size requestShaohua Li1-1/+6
This is corresponding part for blk-mq. Disk with multiple hardware queues doesn't need this as we only hold 1 request at most. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-03block: immediately dispatch big size requestShaohua Li1-1/+3
Currently block plug holds up to 16 non-mergeable requests. This makes sense if the request size is small, eg, reduce lock contention. But if request size is big enough, we don't need to worry about lock contention. Holding such request makes no sense and it lows the disk utilization. In practice, this improves 10% throughput for my raid5 sequential write workload. The size (128k) is arbitrary right now, but it makes sure lock contention is small. This probably could be more intelligent, eg, check average request size holded. Since this is mainly for sequential IO, probably not worthy. V2: check the last request instead of the first request, so as long as there is one big size request we flush the plug. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-03block: drop q argument from bsg_validate_sgv4_hdrJohannes Thumshirn1-2/+2
bsg_validate_sgv4_hdr() doesn't care about the request_queue, so drop it from it's arguments. Signed-off-by: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()Bart Van Assche2-7/+8
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Introduce blk_mq_quiesce_queue()Bart Van Assche2-7/+65
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations have finished. This function does *not* wait until all outstanding requests have finished (this means invocation of request.end_io()). The algorithm used by blk_mq_quiesce_queue() is as follows: * Hold either an RCU read lock or an SRCU read lock around .queue_rq() calls. The former is used if .queue_rq() does not block and the latter if .queue_rq() may block. * blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues() followed by synchronize_srcu() or synchronize_rcu(). The latter call waits for .queue_rq() invocations that started before blk_mq_quiesce_queue() was called. * The blk_mq_hctx_stopped() calls that control whether or not .queue_rq() will be called are called with the (S)RCU read lock held. This is necessary to avoid race conditions against blk_mq_quiesce_queue(). Signed-off-by: Bart Van Assche <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Remove blk_mq_cancel_requeue_work()Bart Van Assche1-6/+0
Since blk_mq_requeue_work() no longer restarts stopped queues canceling requeue work is no longer needed to prevent that a stopped queue would be restarted. Hence remove this function. Signed-off-by: Bart Van Assche <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Keith Busch <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Avoid that requeueing starts stopped queuesBart Van Assche1-5/+1
Since blk_mq_requeue_work() starts stopped queues and since execution of this function can be scheduled after a queue has been stopped it is not possible to stop queues without using an additional state variable to track whether or not the queue has been stopped. Hence modify blk_mq_requeue_work() such that it does not start stopped queues. My conclusion after a review of the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list() callers is as follows: * In the dm driver starting and stopping queues should only happen if __dm_suspend() or __dm_resume() is called and not if the requeue list is processed. * In the SCSI core queue stopping and starting should only be performed by the scsi_internal_device_block() and scsi_internal_device_unblock() functions but not by any other function. Although the blk_mq_stop_hw_queue() call in scsi_queue_rq() may help to reduce CPU load if a LLD queue is full, figuring out whether or not a queue should be restarted when requeueing a command would require to introduce additional locking in scsi_mq_requeue_cmd() to avoid a race with scsi_internal_device_block(). Avoid this complexity by removing the blk_mq_stop_hw_queue() call from scsi_queue_rq(). * In the NVMe core only the functions that call blk_mq_start_stopped_hw_queues() explicitly should start stopped queues. * A blk_mq_start_stopped_hwqueues() call must be added in the xen-blkfront driver in its blkif_recover() function. Signed-off-by: Bart Van Assche <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Roger Pau Monné <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: James Bottomley <[email protected]> Cc: Martin K. Petersen <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Move more code into blk_mq_direct_issue_request()Bart Van Assche1-8/+10
Move the "hctx stopped" test and the insert request calls into blk_mq_direct_issue_request(). Rename that function into blk_mq_try_issue_directly() to reflect its new semantics. Pass the hctx pointer to that function instead of looking it up a second time. These changes avoid that code has to be duplicated in the next patch. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Introduce blk_mq_queue_stopped()Bart Van Assche1-0/+20
The function blk_queue_stopped() allows to test whether or not a traditional request queue has been stopped. Introduce a helper function that allows block drivers to query easily whether or not one or more hardware contexts of a blk-mq queue have been stopped. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Introduce blk_mq_hctx_stopped()Bart Van Assche2-6/+11
Multiple functions test the BLK_MQ_S_STOPPED bit so introduce a helper function that performs this test. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02blk-mq: Do not invoke .queue_rq() for a stopped queueBart Van Assche1-3/+3
The meaning of the BLK_MQ_S_STOPPED flag is "do not call .queue_rq()". Hence modify blk_mq_make_request() such that requests are queued instead of issued if a queue has been stopped. Reported-by: Ming Lei <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-02block: add bio_iov_iter_get_pages()Kent Overstreet1-0/+49
This is a helper that pins down a range from an iov_iter and adds it to a bio without requiring a separate memory allocation for the page array. It will be used for upcoming direct I/O implementations for block devices and iomap based file systems. Signed-off-by: Kent Overstreet <[email protected]> [hch: ported to the iov_iter interface, renamed and added comments. All blame should be directed to me and all fame should go to Kent after this!] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig1-2/+2
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-01block: replace REQ_NOIDLE with REQ_IDLEChristoph Hellwig1-3/+8
Noidle should be the default for writes as seen by all the compounds definitions in fs.h using it. In fact only direct I/O really should be using NODILE, so turn the whole flag around to get the defaults right, which will make our life much easier especially onces the WRITE_* defines go away. This assumes all the existing "raw" users of REQ_SYNC for writes want noidle behavior, which seems to be spot on from a quick audit. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-11-01cfq-iosched: use op_is_sync instead of opencoding itChristoph Hellwig1-12/+4
Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>