Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a new API to allocate a gendisk including the request_queue for use
with blk-mq based drivers. This is to avoid boilerplate code in drivers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Don't return the passed in request_queue but a normal error code, and
drop the elevator_init argument in favor of just calling elevator_init_mq
directly from dm-rq.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Factour out a helper to initialize a simple single hw queue tag_set from
blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue
in favor of a more symmetric and general API.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The queue_to_disk() helper can not be used after del_gendisk()
communicate @disk via the pgmap->owner.
Otherwise, queue_to_disk() returns NULL resulting in the splat below.
Kernel attempted to read user page (330) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000330
Faulting instruction address: 0xc000000000906344
Oops: Kernel access of bad area, sig: 11 [#1]
[..]
NIP [c000000000906344] pmem_pagemap_cleanup+0x24/0x40
LR [c0000000004701d4] memunmap_pages+0x1b4/0x4b0
Call Trace:
[c000000022cbb9c0] [c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable)
[c000000022cbb9e0] [c0000000004701d4] memunmap_pages+0x1b4/0x4b0
[c000000022cbba90] [c0000000008b28a0] devm_action_release+0x30/0x50
[c000000022cbbab0] [c0000000008b39c8] release_nodes+0x2f8/0x3e0
[c000000022cbbb60] [c0000000008ac440] device_release_driver_internal+0x190/0x2b0
[c000000022cbbba0] [c0000000008a8450] unbind_store+0x130/0x170
Reported-by: Sachin Sant <[email protected]>
Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk")
Link: http://lore.kernel.org/r/[email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com
[axboe: fold in compile warning fix]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker)
rq_qos_wait() rq_qos_wait()
acquire_inflight_cb() -> fails
acquire_inflight_cb() -> fails
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can
have multiple pages. But bio_will_gap() still assumes one page bvec while
checking for merging. If the pages in the bvec go across the
seg_boundary_mask, this check for merging can potentially succeed if only
the 1st page is tested, and can fail if all the pages are tested.
Later, when SCSI builds the SG list the same check for merging is done in
__blk_segment_map_sg_merge() with all the pages in the bvec tested. This
time the check may fail if the pages in bvec go across the
seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
BIOs were merged). If this check fails, we end up with a broken SG list
for drivers assuming the SG list not having offsets in intermediate pages.
This results in incorrect pages written to the disk.
Fix this by returning the multi-page bvec when testing gaps for merging.
Cc: Jens Axboe <[email protected]>
Cc: Johannes Thumshirn <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Jeffle Xu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 07173c3ec276 ("block: enable multipage bvecs")
Signed-off-by: Long Li <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Although the original intent was to use blk_update_request() in stacking
block drivers only, it is used much more widely today. Reflect this in the
documentation block above this function. See also:
* commit 32fab448e5e8 ("block: add request update interface").
* commit 2e60e02297cf ("block: clean up request completion API").
* commit ed6565e73424 ("block: handle partial completions for special
payload requests").
Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Provided the device driver does not implement dispatch budget accounting
(which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
requests from the IO scheduler as long as it is willing to give out any.
That defeats scheduling heuristics inside the scheduler by creating
false impression that the device can take more IO when it in fact
cannot.
For example with BFQ IO scheduler on top of virtio-blk device setting
blkio cgroup weight has barely any impact on observed throughput of
async IO because __blk_mq_do_dispatch_sched() always sucks out all the
IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
when that is all dispatched, it will give out IO of lower weight cgroups
as well. And then we have to wait for all this IO to be dispatched to
the disk (which means lot of it actually has to complete) before the
IO scheduler is queried again for dispatching more requests. This
completely destroys any service differentiation.
So grab request tag for a request pulled out of the IO scheduler already
in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
cannot get it because we are unlikely to be able to dispatch it. That
way only single request is going to wait in the dispatch list for some
tag to free.
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The error handling on a nullb->disk allocation currently jumps to
out_cleanup_disk that calls blk_cleanup_disk with a null pointer causing
a null pointer dereference issue. Fix this by jumping to out_cleanup_tags
instead.
Addresses-Coverity: ("Dereference after null check")
Fixes: 132226b301b5 ("null_blk: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Just opencode the xa_load in the callers, as none of them actually
needs a reference to the bdev.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a helper to find the dev_t for a disk + partno tuple.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The bd_part_count value only makes sense for whole devices, so move it
to struct gendisk and give it a more descriptive name.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split __blkdev_put into one helper for the whole device, and one for
partitions as well as another shared helper for flushing the block
device inode mapping.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Keep in the callers and thus remove the for_part argument. This mirrors
what is done on the blkdev_get side and slightly simplifies
blkdev_get_part as well.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Replace the per-block device bd_mutex with a per-gendisk open_mutex,
thus simplifying locking wherever we deal with partitions.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Do the early unlocked syncing even earlier to move more code out of
the recursive path.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split __blkdev_get into one helper for the whole device, and one for
opening partitions. This removes the (bounded) recursion when opening
a partition.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
blk_alloc_queue is just an internal helper now, unexport it and remove
it from the public header.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation. Note that the
blk-mq mode is left with its own allocations scheme, to be handled later.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the xpram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the ps3vram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the n64cart driver to use the blk_alloc_disk helper to simplify
gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the simdisk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the nfblock driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the nvme-multipath driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the nvdimm-pmem driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the nvdimm-btt driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the nvdimm-blk driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the md driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the bcache driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Acked-by: Coly Li <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the lightnvm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the zram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the rsxx driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the pktcdvd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the drbd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Convert the brd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation. This also
allows to remove the request_queue pointer in struct request_queue,
and to simplify the initialization as blk_cleanup_disk can be called
on any disk returned from blk_alloc_disk.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add two new APIs to allocate and free a gendisk including the
request_queue for use with BIO based drivers. This is to avoid
boilerplate code in drivers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add a flag to indicate that __device_add_disk did grab a queue reference
so that disk_release only drops it if we actually had it. This sort
out one of the major pitfals with partially initialized gendisk that
a lot of drivers did get wrong or still do.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated
without an explicit number of minors. This is what all new block
drivers should do, so make sure it is the default without boilerplate
code.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Keep this together with the first place that actually looks at
->minors and prepare for not passing a minors argument to
alloc_disk.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Untangle the mess around blk_alloc_devt by moving the check for
the used allocation scheme into the callers.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The tags used for an IO scheduler are currently per hctx.
As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.
This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.
Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/[email protected]/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b
In that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.
Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).
For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The tag allocation code to alloc the sbitmap pairs is common for regular
bitmaps tags and shared sbitmap, so refactor into a common function.
Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap().
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Tested-by: John Garry <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: David Jeffery <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <[email protected]>
Reviewed-by: David Jeffery <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes: b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: John Garry <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|