aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-06-11virtio-blk: use blk_mq_alloc_diskChristoph Hellwig1-19/+7
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-11blk-mq: add the blk_mq_alloc_disk APIsChristoph Hellwig2-0/+31
Add a new API to allocate a gendisk including the request_queue for use with blk-mq based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-11blk-mq: improve the blk_mq_init_allocated_queue interfaceChristoph Hellwig6-33/+21
Don't return the passed in request_queue but a normal error code, and drop the elevator_init argument in favor of just calling elevator_init_mq directly from dm-rq. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-11blk-mq: factor out a blk_mq_alloc_sq_tag_set helperChristoph Hellwig2-14/+21
Factour out a helper to initialize a simple single hw queue tag_set from blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue in favor of a more symmetric and general API. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-09libnvdimm/pmem: Fix blk_cleanup_disk() usageDan Williams1-3/+3
The queue_to_disk() helper can not be used after del_gendisk() communicate @disk via the pgmap->owner. Otherwise, queue_to_disk() returns NULL resulting in the splat below. Kernel attempted to read user page (330) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000330 Faulting instruction address: 0xc000000000906344 Oops: Kernel access of bad area, sig: 11 [#1] [..] NIP [c000000000906344] pmem_pagemap_cleanup+0x24/0x40 LR [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 Call Trace: [c000000022cbb9c0] [c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable) [c000000022cbb9e0] [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 [c000000022cbba90] [c0000000008b28a0] devm_action_release+0x30/0x50 [c000000022cbbab0] [c0000000008b39c8] release_nodes+0x2f8/0x3e0 [c000000022cbbb60] [c0000000008ac440] device_release_driver_internal+0x190/0x2b0 [c000000022cbbba0] [c0000000008a8450] unbind_store+0x130/0x170 Reported-by: Sachin Sant <[email protected]> Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk") Link: http://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Sachin Sant <[email protected]> Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com [axboe: fold in compile warning fix] Signed-off-by: Jens Axboe <[email protected]>
2021-06-08rq-qos: fix missed wake-ups in rq_qos_throttle try twoJan Kara3-5/+10
Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait() without anyone to wake it up. However the fix is not complete and the following can still happen: CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker) rq_qos_wait() rq_qos_wait() acquire_inflight_cb() -> fails acquire_inflight_cb() -> fails completes IOs, inflight decreased prepare_to_wait_exclusive() prepare_to_wait_exclusive() has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers has_sleeper = !wq_has_single_sleeper() -> true io_schedule() io_schedule() Deadlock as now there's nobody to wakeup the two waiters. The logic automatically blocking when there are already sleepers is really subtle and the only way to make it work reliably is that we check whether there are some waiters in the queue when adding ourselves there. That way, we are guaranteed that at least the first process to enter the wait queue will recheck the waiting condition before going to sleep and thus guarantee forward progress. Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-08block: return the correct bvec when checking for gapsLong Li1-8/+4
After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can have multiple pages. But bio_will_gap() still assumes one page bvec while checking for merging. If the pages in the bvec go across the seg_boundary_mask, this check for merging can potentially succeed if only the 1st page is tested, and can fail if all the pages are tested. Later, when SCSI builds the SG list the same check for merging is done in __blk_segment_map_sg_merge() with all the pages in the bvec tested. This time the check may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those BIOs were merged). If this check fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages. This results in incorrect pages written to the disk. Fix this by returning the multi-page bvec when testing gaps for merging. Cc: Jens Axboe <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Ming Lei <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Jeffle Xu <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 07173c3ec276 ("block: enable multipage bvecs") Signed-off-by: Long Li <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-03block: Update blk_update_request() documentationBart Van Assche1-8/+4
Although the original intent was to use blk_update_request() in stacking block drivers only, it is used much more widely today. Reflect this in the documentation block above this function. See also: * commit 32fab448e5e8 ("block: add request update interface"). * commit 2e60e02297cf ("block: clean up request completion API"). * commit ed6565e73424 ("block: handle partial completions for special payload requests"). Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-03block: Do not pull requests from the scheduler when we cannot dispatch themJan Kara3-2/+14
Provided the device driver does not implement dispatch budget accounting (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls requests from the IO scheduler as long as it is willing to give out any. That defeats scheduling heuristics inside the scheduler by creating false impression that the device can take more IO when it in fact cannot. For example with BFQ IO scheduler on top of virtio-blk device setting blkio cgroup weight has barely any impact on observed throughput of async IO because __blk_mq_do_dispatch_sched() always sucks out all the IO queued in BFQ. BFQ first submits IO from higher weight cgroups but when that is all dispatched, it will give out IO of lower weight cgroups as well. And then we have to wait for all this IO to be dispatched to the disk (which means lot of it actually has to complete) before the IO scheduler is queried again for dispatching more requests. This completely destroys any service differentiation. So grab request tag for a request pulled out of the IO scheduler already in __blk_mq_do_dispatch_sched() and do not pull any more requests if we cannot get it because we are unlikely to be able to dispatch it. That way only single request is going to wait in the dispatch list for some tag to free. Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-03null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk callColin Ian King1-1/+1
The error handling on a nullb->disk allocation currently jumps to out_cleanup_disk that calls blk_cleanup_disk with a null pointer causing a null pointer dereference issue. Fix this by jumping to out_cleanup_tags instead. Addresses-Coverity: ("Dereference after null check") Fixes: 132226b301b5 ("null_blk: convert to blk_alloc_disk/blk_cleanup_disk") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: remove bdget_diskChristoph Hellwig3-44/+17
Just opencode the xa_load in the callers, as none of them actually needs a reference to the bdev. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: factor out a part_devt helperChristoph Hellwig3-16/+20
Add a helper to find the dev_t for a disk + partno tuple. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: move bd_part_count to struct gendiskChristoph Hellwig4-7/+5
The bd_part_count value only makes sense for whole devices, so move it to struct gendisk and give it a more descriptive name. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: split __blkdev_putChristoph Hellwig1-26/+32
Split __blkdev_put into one helper for the whole device, and one for partitions as well as another shared helper for flushing the block device inode mapping. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: move adjusting bd_part_count out of __blkdev_getChristoph Hellwig1-9/+7
Keep in the callers and thus remove the for_part argument. This mirrors what is done on the blkdev_get side and slightly simplifies blkdev_get_part as well. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: move bd_mutex to struct gendiskChristoph Hellwig15-76/+68
Replace the per-block device bd_mutex with a per-gendisk open_mutex, thus simplifying locking wherever we deal with partitions. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Acked-by: Roger Pau MonnĂ© <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: move sync_blockdev from __blkdev_put to blkdev_putChristoph Hellwig1-10/+10
Do the early unlocked syncing even earlier to move more code out of the recursive path. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: split __blkdev_getChristoph Hellwig1-61/+57
Split __blkdev_get into one helper for the whole device, and one for opening partitions. This removes the (bounded) recursion when opening a partition. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: unexport blk_alloc_queueChristoph Hellwig3-2/+2
blk_alloc_queue is just an internal helper now, unexport it and remove it from the public header. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01null_blk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-19/+19
Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Note that the blk-mq mode is left with its own allocations scheme, to be handled later. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01xpram: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-17/+9
Convert the xpram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01dcssblk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-18/+8
Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01ps3vram: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-23/+8
Convert the ps3vram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01n64cart: convert to blk_alloc_diskChristoph Hellwig1-5/+1
Convert the n64cart driver to use the blk_alloc_disk helper to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01simdisk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-22/+7
Convert the simdisk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01nfblock: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-15/+5
Convert the nfblock driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01nvme-multipath: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig2-33/+13
Convert the nvme-multipath driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-10/+5
Convert the nvdimm-pmem driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01nvdimm-btt: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig2-19/+7
Convert the nvdimm-btt driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01nvdimm-blk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-20/+6
Convert the nvdimm-blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01md: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-16/+9
Convert the md driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01dm: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-9/+7
Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01bcache: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-11/+4
Convert the bcache driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Coly Li <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01lightnvm: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-18/+5
Convert the lightnvm driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01zram: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-15/+4
Convert the zram driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01rsxx: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig2-25/+15
Convert the rsxx driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01pktcdvd: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-7/+4
Convert the pktcdvd driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01drbd: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-15/+8
Convert the drbd driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01brd: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-61/+33
Convert the brd driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. This also allows to remove the request_queue pointer in struct request_queue, and to simplify the initialization as blk_cleanup_disk can be called on any disk returned from blk_alloc_disk. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: add blk_alloc_disk and blk_cleanup_disk APIsChristoph Hellwig2-0/+57
Add two new APIs to allocate and free a gendisk including the request_queue for use with BIO based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: add a flag to make put_disk on partially initalized disks saferChristoph Hellwig2-2/+6
Add a flag to indicate that __device_add_disk did grab a queue reference so that disk_release only drops it if we actually had it. This sort out one of the major pitfals with partially initialized gendisk that a lot of drivers did get wrong or still do. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: automatically enable GENHD_FL_EXT_DEVTChristoph Hellwig10-13/+2
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated without an explicit number of minors. This is what all new block drivers should do, so make sure it is the default without boilerplate code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: move the DISK_MAX_PARTS sanity check into __device_add_diskChristoph Hellwig1-7/+6
Keep this together with the first place that actually looks at ->minors and prepare for not passing a minors argument to alloc_disk. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-01block: refactor device number setup in __device_add_diskChristoph Hellwig3-66/+49
Untangle the mess around blk_alloc_devt by moving the check for the used allocation scheme into the callers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24blk-mq: Use request queue-wide tags for tagset-wide sbitmapJohn Garry5-21/+76
The tags used for an IO scheduler are currently per hctx. As such, when q->nr_hw_queues grows, so does the request queue total IO scheduler tag depth. This may cause problems for SCSI MQ HBAs whose total driver depth is fixed. Ming and Yanhui report higher CPU usage and lower throughput in scenarios where the fixed total driver tag depth is appreciably lower than the total scheduler tag depth: https://lore.kernel.org/linux-block/[email protected]/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b In that scenario, since the scheduler tag is got first, much contention is introduced since a driver tag may not be available after we have got the sched tag. Improve this scenario by introducing request queue-wide tags for when a tagset-wide sbitmap is used. The static sched requests are still allocated per hctx, as requests are initialised per hctx, as in blk_mq_init_request(..., hctx_idx, ...) -> set->ops->init_request(.., hctx_idx, ...). For simplicity of resizing the request queue sbitmap when updating the request queue depth, just init at the max possible size, so we don't need to deal with the possibly with swapping out a new sbitmap for old if we need to grow. Signed-off-by: John Garry <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24blk-mq: Some tag allocation code refactoringJohn Garry3-25/+40
The tag allocation code to alloc the sbitmap pairs is common for regular bitmaps tags and shared sbitmap, so refactor into a common function. Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap(). Signed-off-by: John Garry <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24blk-mq: clearing flush request reference in tags->rqs[]Ming Lei1-1/+34
Before we free request queue, clearing flush request reference in tags->rqs[], so that potential UAF can be avoided. Based on one patch written by David Jeffery. Tested-by: John Garry <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: David Jeffery <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24blk-mq: clear stale request in tags->rq[] before freeing one request poolMing Lei3-7/+54
refcount_inc_not_zero() in bt_tags_iter() still may read one freed request. Fix the issue by the following approach: 1) hold a per-tags spinlock when reading ->rqs[tag] and calling refcount_inc_not_zero in bt_tags_iter() 2) clearing stale request referred via ->rqs[tag] before freeing request pool, the per-tags spinlock is held for clearing stale ->rq[tag] So after we cleared stale requests, bt_tags_iter() won't observe freed request any more, also the clearing will wait for pending request reference. The idea of clearing ->rqs[] is borrowed from John Garry's previous patch and one recent David's patch. Tested-by: John Garry <[email protected]> Reviewed-by: David Jeffery <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iterMing Lei3-16/+43
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and this way will prevent the request from being re-used when ->fn is running. The approach is same as what we do during handling timeout. Fix request use-after-free(UAF) related with completion race or queue releasing: - If one rq is referred before rq->q is frozen, then queue won't be frozen before the request is released during iteration. - If one rq is referred after rq->q is frozen, refcount_inc_not_zero() will return false, and we won't iterate over this request. However, still one request UAF not covered: refcount_inc_not_zero() may read one freed request, and it will be handled in next patch. Tested-by: John Garry <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-05-24block: avoid double io accounting for flush requestMing Lei1-2/+1
For flush request, rq->end_io() may be called two times, one is from timeout handling(blk_mq_check_expired()), another is from normal completion(__blk_mq_end_request()). Move blk_account_io_flush() after flush_rq->ref drops to zero, so io accounting can be done just once for flush request. Fixes: b68663186577 ("block: add iostat counters for flush requests") Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: John Garry <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>