aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-21block/DAC960.c: make some arrays static const, shrinks object sizeColin Ian King1-18/+24
Don't populate the arrays ReadCacheStatus, WriteCacheStatus and SenseErrors on the stack but instead make them static const. Makes the object code smaller by 47 bytes: Before: text data bss dec hex filename 160974 34628 832 196434 2ff52 drivers/block/DAC960.o After: text data bss dec hex filename 160671 34884 832 196387 2ff23 drivers/block/DAC960.o (gcc version 8.2.0 x86_64) Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-21blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iterJianchao Wang2-1/+17
For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to account the inflight requests. It will access the queue_hw_ctx and nr_hw_queues w/o any protection. When updating nr_hw_queues and blk_mq_in_flight/rw occur concurrently, panic comes up. Before update nr_hw_queues, the q will be frozen. So we could use q_usage_counter to avoid the race. percpu_ref_is_zero is used here so that we will not miss any in-flight request. The access to nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are under rcu critical section, __blk_mq_update_nr_hw_queues could use synchronize_rcu to ensure the zeroed q_usage_counter to be globally visible. Signed-off-by: Jianchao Wang <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-21blk-mq: init hctx sched after update ctx and hctx mappingJianchao Wang5-65/+98
Currently, when update nr_hw_queues, IO scheduler's init_hctx will be invoked before the mapping between ctx and hctx is adapted correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber) may depend on this mapping and get wrong result and panic finally. A simply way to fix this is that switch the IO scheduler to 'none' before update the nr_hw_queues, and then switch it back after update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due to nobody use them any more. Signed-off-by: Jianchao Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-17block: remove duplicate initializationChaitanya Kulkarni1-1/+0
This patch removes the duplicate initialization of q->queue_head in the blk_alloc_queue_node(). This removes the 2nd initialization so that we preserve the initialization order same as declaration present in struct request_queue. Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16tracing/blktrace: Fix to allow setting same valueSteven Rostedt (VMware)1-0/+4
Masami Hiramatsu reported: Current trace-enable attribute in sysfs returns an error if user writes the same setting value as current one, e.g. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable bash: echo: write error: Invalid argument # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable bash: echo: write error: Device or resource busy But this is not a preferred behavior, it should ignore if new setting is same as current one. This fixes the problem as below. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable Link: http://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()") Reported-by: Masami Hiramatsu <[email protected]> Tested-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16pktcdvd: fix setting of 'ret' error return for a few casesJens Axboe1-0/+1
We initialize it to -ENOMEM, but then later overwrite it. After overwriting, we don't set it again for two later failure cases. Reported-by: Jason Wood <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block: change return type to boolChengguang Xu1-1/+1
Because blk_do_io_stat() only does a judgement about the request contributes to IO statistics, it better changes return type to bool. Signed-off-by: Chengguang Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block, bfq: return nbytes and not zero from struct cftype .write() methodMaciej S. Szmigiero1-1/+2
The value that struct cftype .write() method returns is then directly returned to userspace as the value returned by write() syscall, so it should be the number of bytes actually written (or consumed) and not zero. Returning zero from write() syscall makes programs like /bin/echo or bash spin. Signed-off-by: Maciej S. Szmigiero <[email protected]> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block, bfq: improve code of bfq_bfqq_charge_timePaolo Valente1-9/+5
bfq_bfqq_charge_time contains some lengthy and redundant code. This commit trims and condenses that code. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block, bfq: reduce write overchargePaolo Valente1-14/+19
When a sync request is dispatched, the queue that contains that request, and all the ancestor entities of that queue, are charged with the number of sectors of the request. In constrast, if the request is async, then the queue and its ancestor entities are charged with the number of sectors of the request, multiplied by an overcharge factor. This throttles the bandwidth for async I/O, w.r.t. to sync I/O, and it is done to counter the tendency of async writes to steal I/O throughput to reads. On the opposite end, the lower this parameter, the stabler I/O control, in the following respect. The lower this parameter is, the less the bandwidth enjoyed by a group decreases - when the group does writes, w.r.t. to when it does reads; - when other groups do reads, w.r.t. to when they do writes. The fixes "block, bfq: always update the budget of an entity when needed" and "block, bfq: readd missing reset of parent-entity service" improved I/O control in bfq to such an extent that it has been possible to revise this overcharge factor downwards. This commit introduces the resulting, new value. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block, bfq: always update the budget of an entity when neededPaolo Valente1-2/+6
When the next child entity to serve changes for a given parent entity, the budget of that parent entity must be updated accordingly. Unfortunately, this update is not performed, by mistake, for the entities that happen to switch from having no child entity to serve, to having one child entity to serve. Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-16block, bfq: readd missing reset of parent-entity servicePaolo Valente1-0/+21
The received-service counter needs to be equal to 0 when an entity is set in service. Unfortunately, commit "block, bfq: fix service being wrongly set to zero in case of preemption" mistakenly removed the resetting of this counter for the parent entities of the bfq_queue being set in service. This commit fixes this issue by resetting service for parent entities, directly on the expiration of the in-service bfq_queue. Fixes: 9fae8dd59ff3 ("block, bfq: fix service being wrongly set to zero in case of preemption") Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-14blk-wbt: fix IO hang in wbt_wait()Ming Lei1-5/+1
On wbt invariant is that if one IO is tracked via WBT_TRACKED, rqw->inflight should be updated for tracking this IO. But commit c1c80384c8f ("block: remove external dependency on wbt_flags") forgets to remove the early handling of !rwb_enabled(rwb) inside wbt_wait(), then the inflight counter may not be increased in wbt_wait(), but decreased in wbt_done() for this kind of IO, so this counter may become negative, then wbt_wait() may wait forever. This patch fixes the report in the following link: https://marc.info/?l=linux-block&m=153221542021033&w=2 Fixes: c1c80384c8f ("block: remove external dependency on wbt_flags") Cc: Josef Bacik <[email protected]> Reported-by: Ming Lei <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-14block: don't warn for flush on read-only deviceJens Axboe1-1/+3
Don't warn for a flush issued to a read-only device. It's not strictly a writable command, as it doesn't change any on-media data by itself. Reported-by: Stefan Agner <[email protected]> Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: add the missing comments for smp_mb()/smp_wmb()Coly Li2-2/+4
Checkpatch.pl warns there are 2 locations of smp_mb() and smp_wmb() without code comment. This patch adds the missing code comments for these memory barrier calls. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: remove unnecessary space before ioctl function pointer argumentsColy Li1-2/+2
This is warned by checkpatch.pl, this patch removes the extra space. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: add missing SPDX headerColy Li3-0/+3
The SPDX header is missing fro closure.c, super.c and util.c, this patch adds SPDX header for GPL-2.0 into these files. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: move open brace at end of function definitions to next lineColy Li1-3/+6
This is not a preferred style to place open brace '{' at the end of function definition, checkpatch.pl reports error for such coding style. This patch moves them into the start of the next new line. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: add static const prefix to char * array declarationsColy Li1-1/+1
This patch declares char * array with const prefix in sysfs.c, which is suggested by checkpatch.pl. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: fix code comments styleColy Li3-13/+21
This patch fixes 3 style issues warned by checkpatch.pl, - Comment lines are not aligned - Comments use "/*" on subsequent lines - Comment lines use a trailing "*/" Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: do not check NULL pointer before calling kmem_cache_destroyColy Li1-2/+1
kmem_cache_destroy() is safe for NULL pointer as input, the NULL pointer checking is unncessary. This patch just removes the NULL pointer checking to make code simpler. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: prefer 'help' in KconfigColy Li1-3/+3
Current bcache Kconfig uses '---help---' as header of help information, for now 'help' is prefered. This patch fixes this style by replacing '---help---' by 'help' in bcache Kconfig file. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: fix typo 'succesfully' to 'successfully'Coly Li2-2/+2
This patch fixes typo 'succesfully' to correct 'successfully', which is suggested by checkpatch.pl. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: replace '%pF' by '%pS' in seq_printf()Coly Li1-2/+2
'%pF' and '%pf' are deprecated vsprintf pointer extensions, this patch replace them by '%pS', which is suggested by checkpatch.pl. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: fix indent by replacing blank by tabsColy Li1-2/+2
bch_btree_insert_check_key() has unaligned indent, or indent by blank characters. This patch makes the indent aligned and replace blank by tabs. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: replace printk() by pr_*() routinesColy Li4-15/+15
There are still many places in bcache use printk to display kernel message, which are suggested to be preplaced by pr_*() routines like pr_err(), pr_info(), or pr_notice(). This patch replaces all printk() with a proper pr_*() routine for bcache code. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: replace Symbolic permissions by octal permission numbersColy Li2-5/+5
Symbolic permission names are used in bcache, for now octal permission numbers are encouraged to use for readability. This patch replaces all symbolic permissions by octal permission numbers. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: style fixes for lines over 80 charactersColy Li13-28/+59
This patch fixes the lines over 80 characters into more lines, to minimize warnings by checkpatch.pl. There are still some lines exceed 80 characters, but it is better to be a single line and I don't change them. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: add identifier names to arguments of function definitionsColy Li14-185/+215
There are many function definitions do not have identifier argument names, scripts/checkpatch.pl complains warnings like this, WARNING: function definition argument 'struct bcache_device *' should also have an identifier name #16735: FILE: writeback.h:120: +void bch_sectors_dirty_init(struct bcache_device *); This patch adds identifier argument names to all bcache function definitions to fix such warnings. Signed-off-by: Coly Li <[email protected]> Reviewed: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: style fix to add a blank line after declarationsColy Li17-7/+57
Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11bcache: style fix to replace 'unsigned' by 'unsigned int'Coly Li23-296/+309
This patch fixes warning reported by checkpatch.pl by replacing 'unsigned' with 'unsigned int'. Signed-off-by: Coly Li <[email protected]> Reviewed-by: Shenghui Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-11blkcg: Make blkg_root_lookup() work for queues in bypass modeBart Van Assche2-11/+6
For legacy queues the only call of blkg_root_lookup() happens after bypass mode has been enabled. Since blkg_lookup() returns NULL for queues in bypass mode, modify the blkg_root_lookup() such that it no longer depends on bypass mode. Rename the function into blk_queue_root_blkg() as suggested by Tejun. Suggested-by: Tejun Heo <[email protected]> Fixes: 6bad9b210a22 ("blkcg: Introduce blkg_root_lookup()") Signed-off-by: Bart Van Assche <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-10bcache: fix error setting writeback_rate through sysfs interfaceColy Li1-3/+10
Commit ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle") changes struct bch_ratelimit member rate from uint32_t to atomic_long_t and uses atomic_long_set() in drivers/md/bcache/sysfs.c to set new writeback rate, after the input is converted from memory buf to long int by sysfs_strtoul_clamp(). The above change has a problem because there is an implicit return inside sysfs_strtoul_clamp() so the following atomic_long_set() won't be called. This error is detected by 0day system with following snipped smatch warnings: drivers/md/bcache/sysfs.c:271 __cached_dev_store() error: uninitialized symbol 'v'. 270 sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @271 atomic_long_set(&dc->writeback_rate.rate, v); This patch fixes the above error by using strtoul_safe_clamp() to convert the input buffer into a long int type result. Fixes: ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle") Cc: Kai Krakow <[email protected]> Cc: Stefan Priebe <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09null_blk: add lock drop/acquire annotationJens Axboe1-1/+3
sparse complains: drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock Fix it by adding the necessary annotations to the function. Signed-off-by: Jens Axboe <[email protected]>
2018-08-09Blk-throttle: reduce tail io latency when iops limit is enforcedLiu Bo1-6/+1
When an application's iops has exceeded its cgroup's iops limit, surely it is throttled and kernel will set a timer for dispatching, thus IO latency includes the delay. However, the dispatch delay which is calculated by the limit and the elapsed jiffies is suboptimal. As the dispatch delay is only calculated once the application's iops is (iops limit + 1), it doesn't need to wait any longer than the remaining time of the current slice. The difference can be proved by the following fio job and cgroup iops setting, ----- $ echo 4 > /mnt/config/nullb/disk1/mbps # limit nullb's bandwidth to 4MB/s for testing. $ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max $ cat r2.job [global] name=fio-rand-read filename=/dev/nullb1 rw=randread bs=4k direct=1 numjobs=1 time_based=1 runtime=60 group_reporting=1 [file1] size=4G ioengine=libaio iodepth=1 rate_iops=50000 norandommap=1 thinktime=4ms ----- wo patch: file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.7-66-gedfc Starting 1 process read: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec) slat (usec): min=10, max=336, avg=27.71, stdev=17.82 clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29 lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22 clat percentiles (usec): | 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4], | 30.00th=[ 4], 40.00th=[ 4], 50.00th=[ 6], 60.00th=[11731], | 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676], | 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035], | 99.99th=[28967] w/ patch: file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.7-66-gedfc Starting 1 process read: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec) slat (usec): min=10, max=155, avg=23.24, stdev=16.79 clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25 lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92 clat percentiles (usec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 4], 20.00th=[ 4], | 30.00th=[ 4], 40.00th=[ 5], 50.00th=[ 47], 60.00th=[11863], | 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994], | 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125], | 99.99th=[12387] Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09block: paride: pd: mark expected switch fall-throughsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1056543 ("Missing break in switch") Addresses-Coverity-ID: 1056544 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09block: Ensure that a request queue is dissociated from the cgroup controllerBart Van Assche1-0/+15
Several block drivers call alloc_disk() followed by put_disk() if something fails before device_add_disk() is called without calling blk_cleanup_queue(). Make sure that also for this scenario a request queue is dissociated from the cgroup controller. This patch avoids that loading the parport_pc, paride and pf drivers triggers the following kernel crash: BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] Read of size 4 at addr 0000000000000008 by task modprobe/744 Call Trace: dump_stack+0x9a/0xeb kasan_report+0x139/0x350 pi_init+0x42e/0x580 [paride] pf_init+0x2bb/0x1000 [pf] do_one_initcall+0x8e/0x405 do_init_module+0xd9/0x2f2 load_module+0x3ab4/0x4700 SYSC_finit_module+0x176/0x1a0 do_syscall_64+0xee/0x2b0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: Alexandru Moise <[email protected]> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17 Signed-off-by: Bart Van Assche <[email protected]> Tested-by: Alexandru Moise <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Alexandru Moise <[email protected]> Cc: Joseph Qi <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09block: Introduce blk_exit_queue()Bart Van Assche2-24/+31
This patch does not change any functionality. Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Omar Sandoval <[email protected]> Cc: Alexandru Moise <[email protected]> Cc: Joseph Qi <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09blkcg: Introduce blkg_root_lookup()Bart Van Assche1-0/+18
This new function will be used in a later patch to verify whether a queue has been dissociated from the cgroup controller before being released. Signed-off-by: Bart Van Assche <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Omar Sandoval <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Alexandru Moise <[email protected]> Cc: Joseph Qi <[email protected]> Cc: <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09block: Remove two superfluous #include directivesBart Van Assche1-2/+0
Commit 12f5b9314545 ("blk-mq: Remove generation seqeunce") removed the only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but did not remove the corresponding #include directives. Since these include directives are no longer needed, remove them. Signed-off-by: Bart Van Assche <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Ming Lei <[email protected]> Cc: Jianchao Wang <[email protected]> Cc: Hannes Reinecke <[email protected]>, Cc: Johannes Thumshirn <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09blk-mq: count the hctx as active before allocating tagJianchao Wang2-2/+9
Currently, we count the hctx as active after allocate driver tag successfully. If a previously inactive hctx try to get tag first time, it may fails and need to wait. However, due to the stale tag ->active_queues, the other shared-tags users are still able to occupy all driver tags while there is someone waiting for tag. Consequently, even if the previously inactive hctx is waked up, it still may not be able to get a tag and could be starved. To fix it, we count the hctx as active before try to allocate driver tag, then when it is waiting the tag, the other shared-tag users will reserve budget for it. Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jianchao Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09block: bvec_nr_vecs() returns value for wrong slabGreg Edwards1-1/+1
In commit ed996a52c868 ("block: simplify and cleanup bvec pool handling"), the value of the slab index is incremented by one in bvec_alloc() after the allocation is done to indicate an index value of 0 does not need to be later freed. bvec_nr_vecs() was not updated accordingly, and thus returns the wrong value. Decrement idx before performing the lookup. Fixes: ed996a52c868 ("block: simplify and cleanup bvec pool handling") Signed-off-by: Greg Edwards <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/blockJens Axboe9-10/+138
Pull NVMe updates from Christoph: "This should be the last round of NVMe updates before the 4.19 merge window opens. It conatins support for write protected (aka read-only) namespaces from Chaitanya, two ANA fixes from Hannes and a fabrics fix from Tal Shorer." * 'nvme-4.19' of git://git.infradead.org/nvme: nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect forever nvmet: add ns write protect support nvme: set gendisk read only based on nsattr nvme.h: add support for ns write protect definitions nvme.h: fixup ANA group descriptor format nvme: fixup crash on failed discovery
2018-08-09bcache: trivial - remove tailing backslash in macro BTREE_FLAGShenghui Wang1-1/+1
Remove the tailing backslash in macro BTREE_FLAG in btree.h Signed-off-by: Shenghui Wang <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: make the pr_err statement used for ENOENT only in sysfs_attatch sectionShenghui Wang1-2/+2
The pr_err statement in the code for sysfs_attatch section would run for various error codes, which maybe confusing. E.g, Run the command twice: echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \ /sys/block/bcache0/bcache/attach [the backing dev got attached on the first run] echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \ /sys/block/bcache0/bcache/attach In dmesg, after the command run twice, we can get: bcache: bch_cached_dev_attach() Can't attach sda6: already attached bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\ a8df5e8be891 : cache set not found The first statement in the message was right, but the second was confusing. bch_cached_dev_attach has various pr_ statements for various error codes, except ENOENT. After the change, rerun above command twice: echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \ /sys/block/bcache0/bcache/attach echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \ /sys/block/bcache0/bcache/attach In dmesg we only got: bcache: bch_cached_dev_attach() Can't attach sda6: already attached No confusing "cache set not found" message anymore. And for some not exist SET-UUID: echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be898 > \ /sys/block/bcache0/bcache/attach In dmesg we can get: bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\ a8df5e8be898 : cache set not found Signed-off-by: Shenghui Wang <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: set max writeback rate when I/O request is idleColy Li7-45/+138
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle") allows the writeback rate to be faster if there is no I/O request on a bcache device. It works well if there is only one bcache device attached to the cache set. If there are many bcache devices attached to a cache set, it may introduce performance regression because multiple faster writeback threads of the idle bcache devices will compete the btree level locks with the bcache device who have I/O requests coming. This patch fixes the above issue by only permitting fast writebac when all bcache devices attached on the cache set are idle. And if one of the bcache devices has new I/O request coming, minimized all writeback throughput immediately and let PI controller __update_writeback_rate() to decide the upcoming writeback rate for each bcache device. Also when all bcache devices are idle, limited wrieback rate to a small number is wast of thoughput, especially when backing devices are slower non-rotation devices (e.g. SATA SSD). This patch sets a max writeback rate for each backing device if the whole cache set is idle. A faster writeback rate in idle time means new I/Os may have more available space for dirty data, and people may observe a better write performance then. Please note bcache may change its cache mode in run time, and this patch still works if the cache mode is switched from writeback mode and there is still dirty data on cache. Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle") Cc: [email protected] #4.16+ Signed-off-by: Coly Li <[email protected]> Tested-by: Kai Krakow <[email protected]> Tested-by: Stefan Priebe <[email protected]> Cc: Michael Lyle <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: add code comments for bset.cColy Li1-0/+63
This patch tries to add code comments in bset.c, to make some tricky code and designment to be more comprehensible. Most information of this patch comes from the discussion between Kent and I, he offers very informative details. If there is any mistake of the idea behind the code, no doubt that's from me misrepresentation. Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: fix mistaken comments in request.cColy Li1-1/+1
This patch updates code comment in bch_keylist_realloc() by fixing incorrected function names, to make the code to be more comprehennsible. Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: fix mistaken code comments in bcache.hColy Li1-3/+3
This patch updates the code comment in struct cache with correct array names, to make the code to be more comprehensible. Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-08-09bcache: add a comment in super.cColy Li1-0/+1
This patch adds a line of code comment in super.c:register_bdev(), to make code to be more comprehensible. Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>