Age | Commit message (Collapse) | Author | Files | Lines |
|
The bi_max_vecs and bi_vcnt fields are defined as unsigned short, so
don't allow passing larger values in.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
All bios with up to 4 bvecs use the inline bvecs in the bio itself, so
don't bother to define bvec_slabs entries for them. Also decruftify
the bvec_slabs definition and initialization while we're at it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Avoid the pointless goto by trying the slab allocation first and falling
through to the mempool.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Clean up bvec_alloc a little by factoring out a helper for the gfp_t
manipulations.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
struct biovec_slab is only used inside of bio.c, so move it there.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bvec_alloc always uses biovec_slabs, and thus always needs to use the
same number of inline vecs. Share a single definition for the data
and integrity bvecs.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bio_init() clears bio instance, so the bvec index has to be set after
bio_init(), otherwise bio->bi_io_vec may be leaked.
Fixes: 3175199ab0ac ("block: split bio_kmalloc from bio_alloc_bioset")
Cc: Johannes Thumshirn <[email protected]>
Cc: Chaitanya Kulkarni <[email protected]>
Cc: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Make the read-only check in restart_array identical to the other two
read-only checks.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
->meta_bdev is optional and not set for most arrays. Add a
rdev_read_only helper that calls bdev_read_only for both devices
in a safe way.
Fixes: 6f0d9689b670 ("block: remove the NULL bdev check in bdev_read_only")
Reported-by: Guoqing Jiang <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 684da7628d93 ("block: remove unnecessary argument from
blk_execute_rq") changes the signature of blk_execute_rq(), but misses
to adjust its kernel-doc.
Hence, make htmldocs warns on ./block/blk-exec.c:78:
warning: Excess function parameter 'q' description in 'blk_execute_rq'
Drop removed argument from kernel-doc of blk_execute_rq() as well.
Signed-off-by: Lukas Bulwahn <[email protected]>
Acked-by: Guoqing Jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 52f019d43c22 ("block: add a hard-readonly flag to struct gendisk")
provides some kernel-doc for set_disk_ro(), but introduces a small typo.
Hence, make htmldocs warns on ./block/genhd.c:1441:
warning: Function parameter or member 'read_only' not described in 'set_disk_ro'
warning: Excess function parameter 'ready_only' description in 'set_disk_ro'
Remove that typo in the kernel-doc for set_disk_ro().
Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove the obsolete 'MAX_KEY_LEN' macro.
Signed-off-by: Baolin Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The nvme-core sets the bdev to NULL when admin comamnd is issued from
IOCTL in the following path e.g. nvme list :-
block_ioctl()
blkdev_ioctl()
nvme_ioctl()
nvme_user_cmd()
nvme_submit_user_cmd()
The commit 309dca309fc3 ("block: store a block_device pointer in struct bio")
now uses bdev unconditionally in the macro bio_set_dev() and assumes
that bdev value is not NULL which results in the following crash in
since thats where bdev is actually accessed :-
void bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{
if (bio->bi_blkg)
blkg_put(bio->bi_blkg);
if (css && css->parent) {
bio->bi_blkg = blkg_tryget_closest(bio, css);
} else {
--------------> blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
}
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
[ 345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690
[ 345.387103] #PF: supervisor read access in kernel mode
[ 345.387894] #PF: error_code(0x0000) - not-present page
[ 345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0
[ 345.389625] Oops: 0000 [#1] SMP NOPTI
[ 345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G OE 5.11.0-rc5blk+ #141
[ 345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764
[ 345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f
[ 345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246
[ 345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[ 345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8
[ 345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001
[ 345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800
[ 345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540
[ 345.401976] FS: 00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[ 345.402997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0
[ 345.404685] Call Trace:
[ 345.405031] bio_associate_blkg+0x71/0x1c0
[ 345.405649] nvme_submit_user_cmd+0x1aa/0x38e [nvme_core]
[ 345.406348] nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core]
[ 345.407117] nvme_ioctl+0x226/0x260 [nvme_core]
[ 345.407707] blkdev_ioctl+0x1c8/0x2b0
[ 345.408183] block_ioctl+0x3f/0x50
[ 345.408627] __x64_sys_ioctl+0x84/0xc0
[ 345.409117] do_syscall_64+0x33/0x40
[ 345.409592] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 345.410233] RIP: 0033:0x7f3747632107
[ 345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ 345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107
[ 345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004
[ 345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080
[ 345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0
[ 345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
Add a NULL check before we set the bdev for bio.
This issue is found on block/for-next tree.
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Current tree spews this on compile:
mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function]
2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
| ^~~~~~~~~~~~~~
if !CONFIG_HIBERNATION, as we don't use the function unless we have that
config option set.
Fixes: 48d15436fde6 ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe <[email protected]>
|
|
Just reuse the block_device and sector from the swap_info structure,
just as used by the SWP_SYNCHRONOUS path. Also remove the checks for
NULL returns from bio_alloc as that can't happen for sleeping
allocations.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bio_alloc never returns NULL when it can sleep.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bio_alloc never returns NULL when it can sleep.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Refactor raid5_read_one_chunk so that all simple checks are done
before allocating the bio.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Song Liu <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
md_bio_alloc_sync is never called with a NULL mddev, and ->sync_set is
initialized in md_run, so it always must be initialized as well. Just
open code the remaining call to bio_alloc_bioset.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Song Liu <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use an on-stack bio and biovec for the single page synchronous I/O.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Song Liu <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is
initialized in md_run, so it always must be initialized as well. Just
open code the remaining call to bio_alloc_bioset.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Song Liu <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Open code drbd_req_make_private_bio in the two callers to prepare
for further changes. Also don't bother to initialize bi_next as the
bio code already does that that.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Given that drbd_md_io_bio_set is initialized during module initialization
and the module fails to load if the initialization fails there is no need
to fall back to plain bio_alloc.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Sleeping bio allocations do not fail, which means that injecting an error
into sleeping bio allocations is a little silly.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use the blkdev_issue_flush helper instead of duplicating it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use blkdev_issue_flush instead of open coding it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
There is no point in allocating memory for a synchronous flush.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bio_kmalloc shares almost no logic with the bio_set based fast path
in bio_alloc_bioset. Split it into an entirely separate implementation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use bio_kmalloc instead of open coding it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use bio_kmalloc instead of open coding it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use bio_alloc instead of open coding it.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently whenever bfq queue has a request queued we add now -
last_completion_time to the think time statistics. This is however
misleading in case the process is able to submit several requests in
parallel because e.g. if the queue has request completed at time T0 and
then queues new requests at times T1, T2, then we will add T1-T0 and
T2-T0 to think time statistics which just doesn't make any sence (the
queue's think time is penalized by the queue being able to submit more
IO). So add to think time statistics only time intervals when the queue
had no IO pending.
Signed-off-by: Jan Kara <[email protected]>
Acked-by: Paolo Valente <[email protected]>
[axboe: fix whitespace on empty line]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Use local variable 'ttime' instead of dereferencing bfqq.
Signed-off-by: Jan Kara <[email protected]>
Acked-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:
[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read
[reader]
numjobs=4
directory=/mnt
where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).
Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.
Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging")
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Acked-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When calling blkcg_schedule_throttle(), for the same queue,
redundant get/put operations can be removed.
Signed-off-by: Chunguang Xu <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
truncate_bdev_range is only used in always built-in block layer code,
so remove the export and the !CONFIG_BLOCK stub.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The first parameter rwb is not used for this function.
So just remove it.
Signed-off-by: Lei Chen <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.
Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.
Reported-by: Maxim Levitsky <[email protected]>
CC: "Darrick J. Wong" <[email protected]>
Link: https://lore.kernel.org/qemu-devel/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Cloned bios are can be used to on the same device, in which case we need
to inherit the BIO_REMAPPED flag to avoid a double partition remap. When
the cloned bios are used on another device, bio_set_dev will clear the flag.
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
It's only used in the same file, mark is appropriately static.
Fixes: 71217df39dc6 ("block, bfq: make waker-queue detection more robust")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To prevent injection information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
To prevent weight-raising information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A bfq_queue may happen to be deemed as soft real-time while it is
still enjoying interactive weight-raising. If this happens because of
a false positive, then the bfq_queue is likely to loose its soft
real-time status soon. Upon losing such a status, the bfq_queue must
get back its interactive weight-raising, if its interactive period is
not over yet. But this case is not handled. This commit corrects this
error.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Upon an I/O-dispatch attempt, BFQ may detect that it was better to
plug I/O dispatch, and to wait for a new request to arrive for the
currently in-service queue. But the arrival of a new request for an
empty bfq_queue, and thus the switch from idle to busy of the
bfq_queue, may cause the scenario to change, and make plugging no
longer needed for service guarantees, or more convenient for
throughput. In this case, keeping I/O-dispatch plugged would certainly
lower throughput.
To address this issue, this commit makes such a check, and stops
plugging I/O if it is better to stop plugging I/O.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Some BFQ mechanisms make their decisions on a bfq_queue basing also on
whether the bfq_queue is I/O bound. In this respect, the current logic
for evaluating whether a bfq_queue is I/O bound is rather rough. This
commits replaces this logic with a more effective one.
The new logic measures the percentage of time during which a bfq_queue
is active, and marks the bfq_queue as I/O bound if the latter if this
percentage is above a fixed threshold.
Tested-by: Jan Kara <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When an already remapped bio is resubmitted (e.g. by blk_queue_split),
bio_check_eod will compare the remapped bi_sector against the size
of the partition, leading to spurious I/O failures.
Skip the EOD check in this case.
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.
Suggested-by: Matthew Wilcox <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|