aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-03-21dm: switch dm_io booleans over to proper flagsMike Snitzer2-14/+38
Add flags to dm_io and manage them using the same pattern used for bi_flags in struct bio. Signed-off-by: Mike Snitzer <[email protected]>
2022-03-11dm: update email address in MAINTAINERSMike Snitzer1-1/+1
Update my email address to kernel.org to allow distinction between my "upstream" and "Red" Hats. Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm: return void from __send_empty_flushMike Snitzer1-3/+2
Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm: factor out dm_io_completeMike Snitzer1-72/+77
Optimizes dm_io_dec_pending() slightly by avoiding local variables. Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm cache: use dm_submit_bio_remapMike Snitzer1-3/+4
Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm: simplify dm_sumbit_bio_remap interfaceMike Snitzer6-11/+12
Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the need for dm_sumbit_bio_remap() callers to know whether they are calling for a workqueue or from the original dm_submit_bio(). Add map_task to dm_io struct, record the map_task in alloc_io and clear it after all target ->map() calls have completed. Update dm_sumbit_bio_remap to check if 'current' matches io->map_task rather than rely on passed 'from_rq' argument. This change really simplifies the chore of porting each DM target to using dm_sumbit_bio_remap() because there is no longer the risk of programming error by not completely knowing all the different contexts a particular method that calls dm_sumbit_bio_remap() might be used in. Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm thin: use dm_submit_bio_remapMike Snitzer1-2/+3
Signed-off-by: Mike Snitzer <[email protected]>
2022-03-10dm: add WARN_ON_ONCE to dm_submit_bio_remapMike Snitzer1-1/+3
If a target uses dm_submit_bio_remap() it should set ti->accounts_remapped_io. Also, switch dm_start_io_acct() WARN_ON to WARN_ON_ONCE. Signed-off-by: Mike Snitzer <[email protected]>
2022-03-09dm: support bio pollingMing Lei3-3/+169
Support bio polling (REQ_POLLED) in the following approach: 1) only support io polling on normal READ/WRITE, and other abnormal IOs still fallback to IRQ mode, so the target io (and DM's clone bio) is exactly inside the dm io. 2) hold one refcnt on io->io_count after submitting this dm bio with REQ_POLLED 3) support dm native bio splitting, any dm io instance associated with current bio will be added into one list which head is bio->bi_private which will be recovered before ending this bio 4) implement .poll_bio() callback, call bio_poll() on the single target bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call dm_io_dec_pending() after the target io is done in .poll_bio() 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL, which is based on Jeffle's previous patch. These changes are good for a 30-35% IOPS improvement for polled IO. For detailed test results please see (Jens, thanks for testing!): https://listman.redhat.com/archives/dm-devel/2022-March/049868.html or https://marc.info/?l=linux-block&m=164684246214700&w=2 Tested-by: Jens Axboe <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-03-09block: add ->poll_bio to block_device_operationsMing Lei3-5/+15
Prepare for supporting IO polling for bio-based driver. Add ->poll_bio callback so that bio-based driver can provide their own logic for polling bio. Also fix ->submit_bio_bio typo in comment block above __submit_bio_noacct. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-03-02dm mpath: use DMINFO instead of printk with KERN_INFOMike Snitzer1-2/+1
Signed-off-by: Mike Snitzer <[email protected]>
2022-03-02dm: stop using bdevnameChristoph Hellwig5-37/+27
Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-03-02dm-zoned: remove the ->name field in struct dmz_devChristoph Hellwig3-8/+6
Just use the %pg format specifier to print the block device name directly. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm: remove unnecessary local variables in __bindMike Snitzer1-5/+2
Also remove empty newline before 'out:' label at end of __bind. Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm: requeue IO if mapping table not yet availableMike Snitzer2-9/+9
Update both bio-based and request-based DM to requeue IO if the mapping table not available. This race of IO being submitted before the DM device ready is so narrow, yet possible for initial table load given that the DM device's request_queue is created prior, that it best to requeue IO to handle this unlikely case. Reported-by: Zhang Yi <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm io: remove stale comment block for dm_io()Barry Song1-8/+0
Commit 7eaceaccab5f ("block: remove per-queue plugging") dropped unplug_delay and blk_unplug(). Plus, the current kernel has no fundamental difference between sync_io() and async_io() except sync_io() uses sync_io_complete() as the notify.fn and explicitly calls wait_for_completion_io() to sync. The comment isn't valid any more. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Barry Song <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm thin metadata: remove unused dm_thin_remove_block and __removeZhiqiang Liu2-29/+0
Signed-off-by: Zhiqiang Liu <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm thin: use time_is_before_jiffies instead of open coding itWang Qing1-1/+1
Use time_is_before_jiffies() to improve code readability. Signed-off-by: Wang Qing <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm crypt: fix get_key_size compiler warning if !CONFIG_KEYSAashish Sharma1-1/+1
Explicitly convert unsigned int in the right of the conditional expression to int to match the left side operand and the return type, fixing the following compiler warning: drivers/md/dm-crypt.c:2593:43: warning: signed and unsigned type in conditional expression [-Wsign-compare] Fixes: c538f6ec9f56 ("dm crypt: add ability to use keys from the kernel key retention service") Signed-off-by: Aashish Sharma <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm: fix use-after-free in dm_cleanup_zoned_dev()Kirill Tkhai1-1/+1
dm_cleanup_zoned_dev() uses queue, so it must be called before blk_cleanup_disk() starts its killing: blk_cleanup_disk->blk_cleanup_queue()->kobject_put()->blk_release_queue()-> ->...RCU...->blk_free_queue_rcu()->kmem_cache_free() Otherwise, RCU callback may be executed first and dm_cleanup_zoned_dev() will touch free'd memory: BUG: KASAN: use-after-free in dm_cleanup_zoned_dev+0x33/0xd0 Read of size 8 at addr ffff88805ac6e430 by task dmsetup/681 CPU: 4 PID: 681 Comm: dmsetup Not tainted 5.17.0-rc2+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1f/0x150 ? dm_cleanup_zoned_dev+0x33/0xd0 kasan_report.cold+0x7f/0x11b ? dm_cleanup_zoned_dev+0x33/0xd0 dm_cleanup_zoned_dev+0x33/0xd0 __dm_destroy+0x26a/0x400 ? dm_blk_ioctl+0x230/0x230 ? up_write+0xd8/0x270 dev_remove+0x156/0x1d0 ctl_ioctl+0x269/0x530 ? table_clear+0x140/0x140 ? lock_release+0xb2/0x750 ? remove_all+0x40/0x40 ? rcu_read_lock_sched_held+0x12/0x70 ? lock_downgrade+0x3c0/0x3c0 ? rcu_read_lock_sched_held+0x12/0x70 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb9/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fb6dfa95c27 Fixes: bb37d77239af ("dm: introduce zone append emulation") Cc: [email protected] Signed-off-by: Kirill Tkhai <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm ioctl: prevent potential spectre v1 gadgetJordy Zomer1-0/+2
It appears like cmd could be a Spectre v1 gadget as it's supplied by a user and used as an array index. Prevent the contents of kernel memory from being leaked to userspace via speculative execution by using array_index_nospec. Signed-off-by: Jordy Zomer <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm: cleanup double word in commentTom Rix1-1/+1
Remove the second 'a'. Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm ima: fix wrong length calculation for no_data stringThore Sommer1-3/+3
All entries measured by dm ima are prefixed by a version string (dm_version=N.N.N). When there is no data to measure, the entire buffer is overwritten with a string containing the version string again and the length of that string is added to the length of the version string. The new length is now wrong because it contains the version string twice. This caused entries like this: dm_version=4.45.0;name=test,uuid=test;table_clear=no_data; \ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \ current_device_capacity=204808; Signed-off-by: Thore Sommer <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-22dm cache policy smq: make static read-only array table constColin Ian King1-1/+3
The 'table' static array is read-only so it make sense to make it const. Add in the int type to clean up checkpatch warning. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm delay: use dm_submit_bio_remapMike Snitzer1-2/+3
Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm crypt: use dm_submit_bio_remapMike Snitzer1-5/+8
Care was taken to support kcryptd_io_read being called from crypt_map or workqueue. Use of an intermediate CRYPT_MAP_READ_GFP gfp_t (defined as GFP_NOWAIT) should protect from maintenance burden if that flag were to change for some reason. Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: add dm_submit_bio_remap interfaceMike Snitzer4-22/+118
Where possible, switch from early bio-based IO accounting (at the time DM clones each incoming bio) to late IO accounting just before each remapped bio is issued to underlying device via submit_bio_noacct(). Allows more precise bio-based IO accounting for DM targets that use their own workqueues to perform additional processing of each bio in conjunction with their DM_MAPIO_SUBMITTED return from their map function. When a target is updated to use dm_submit_bio_remap() they must also set ti->accounts_remapped_io to true. Use xchg() in start_io_acct(), as suggested by Mikulas, to ensure each IO is only started once. The xchg race only happens if __send_duplicate_bios() sends multiple bios -- that case is reflected via tio->is_duplicate_bio. Given the niche nature of this race, it is best to avoid any xchg performance penalty for normal IO. For IO that was never submitted with dm_bio_submit_remap(), but the target completes the clone with bio_endio, accounting is started then ended and pending_io counter decremented. Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: flag clones created by __send_duplicate_biosMike Snitzer2-21/+30
Formally disallow dm_accept_partial_bio() on clones created by __send_duplicate_bios() because their len_ptr points to a shared unsigned int. __send_duplicate_bios() is only used for flush bios and other "abnormal" bios (discards, writezeroes, etc). And dm_accept_partial_bio() already didn't support flush bios. Also refactor __send_changing_extent_only() to reflect it cannot fail. As such __send_changing_extent_only() can update the clone_info before __send_duplicate_bios() is called to fan-out __map_bio() calls. Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: reduce dm_io and dm_target_io struct sizesMike Snitzer1-3/+3
Remove one 4 byte hole in dm_io struct. Remove two 4 byte holes in dm_target_io struct. Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: move duplicate code from callers of alloc_tio into alloc_tioMike Snitzer1-14/+13
Suggested-by: Christoph Hellwig <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: record old_sector in dm_target_io before calling map functionMike Snitzer2-3/+5
Prep for being able to defer trace_block_bio_remap() until when the bio is remapped and submitted by the DM target. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: remove legacy code only needed before submit_bio recursionMike Snitzer1-9/+2
Commit 8615cb65bd63 ("dm: remove useless loop in __split_and_process_bio") showcased that we no longer loop. Remove the bio_advance() in __split_and_process_bio() that was only needed when looping was possible. Similarly there is no need to advance the bio, using ci->sector cursor, in __send_duplicate_bios(). Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: remove unused mapped_device argument from free_tioMike Snitzer1-2/+2
Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: remove impossible BUG_ON in __send_empty_flushMike Snitzer1-1/+0
The flush_bio in question was just initialized to be empty, so there is no way bio_has_data() will return true. So remove stale BUG_ON(). Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: reduce code duplication in __map_bioMike Snitzer1-12/+6
Error path code (for handling DM_MAPIO_REQUEUE and DM_MAPIO_KILL) is effectively identical. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: refactor dm_split_and_process_bio a bitMike Snitzer1-26/+28
Remove needless branching and indentation. Leaves code to catch malformed op_is_zone_mgmt bios (they shouldn't have a payload). Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: fold __clone_and_map_data_bio into __split_and_process_bioMike Snitzer1-22/+8
Fold __clone_and_map_data_bio into its only caller. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: rename split functionsMike Snitzer1-11/+11
Rename __split_and_process_bio to dm_split_and_process_bio. Rename __split_and_process_non_flush to __split_and_process_bio. Also fix a stale comment and whitespace. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: reorder members in mapped_device structMike Snitzer1-19/+19
Improves alignment and groups related members relative to cachelines. Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: eliminate copying of dm_io fields in dm_io_dec_pendingMike Snitzer1-12/+7
There is no need for dm_io_dec_pending() to copy dm_io fields anymore now that DM provides its own pending_io counters again. The race documented in commit d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") no longer exists now that block core's in_flight counters aren't used to signal all dm_io is complete. Also, rename {start,end}_io_acct to dm_{start,end}_io_acct. Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm stats: fix too short end duration_ns when using precise_timestampsMike Snitzer3-5/+34
dm_stats_account_io()'s STAT_PRECISE_TIMESTAMPS support doesn't handle the fact that with commit b879f915bc48 ("dm: properly fix redundant bio-based IO accounting") io->start_time _may_ be in the past (meaning the start_io_acct() was deferred until later). Add a new dm_stats_recalc_precise_timestamps() helper that will set/clear a new 'precise_timestamps' flag in the dm_stats struct based on whether any configured stats enable STAT_PRECISE_TIMESTAMPS. And update DM core's alloc_io() to use dm_stats_record_start() to set stats_aux.duration_ns if stats->precise_timestamps is true. Also, remove unused 'last_sector' and 'last_rw' members from the dm_stats struct. Fixes: b879f915bc48 ("dm: properly fix redundant bio-based IO accounting") Cc: [email protected] Co-developed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: fix double accounting of flush with dataMike Snitzer3-17/+38
DM handles a flush with data by first issuing an empty flush and then once it completes the REQ_PREFLUSH flag is removed and the payload is issued. The problem fixed by this commit is that both the empty flush bio and the data payload will account the full extent of the data payload. Fix this by factoring out dm_io_acct() and having it wrap all IO accounting to set the size of bio with REQ_PREFLUSH to 0, account the IO, and then restore the original size. Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2022-02-21dm: interlock pending dm_io and dm_wait_for_bios_completionMike Snitzer2-12/+25
Commit d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") didn't go far enough. When bio_end_io_acct ends the count of in-flight I/Os may reach zero and the DM device may be suspended. There is a possibility that the suspend races with dm_stats_account_io. Fix this by adding percpu "pending_io" counters to track outstanding dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also, rename md_in_flight_bios() to dm_in_flight_bios() and update it to iterate all pending_io counters. Fixes: d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") Cc: [email protected] Co-developed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-02-16block/bfq_wf2q: correct weight to ioprioYahu Gao1-1/+1
The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0. What we want is ioprio or 0. Correct this by changing the calculation. Signed-off-by: Yahu Gao <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-02-16blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queuesDavid Jeffery1-0/+8
When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it can reset the delay length for an already pending delayed work run_work. This creates a scenario where multiple hctx may have their queues set to run, but if one runs first and finds nothing to do, it can reset the delay of another hctx and stall the other hctx's ability to run requests. To avoid this I/O stall when an hctx's run_work is already pending, leave it untouched to run at its current designated time rather than extending its delay. The work will still run which keeps closed the race calling blk_mq_delay_run_hw_queues is needed for while also avoiding the I/O stall. Signed-off-by: David Jeffery <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/20220131203337.GA17666@redhat Signed-off-by: Jens Axboe <[email protected]>
2022-02-16virtio_blk: simplify refcountingChristoph Hellwig1-52/+14
Implement the ->free_disk method to free the virtio_blk structure only once the last gendisk reference goes away instead of keeping a local refcount. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-02-16memstick/mspro_block: simplify refcountingChristoph Hellwig1-42/+7
Implement the ->free_disk method to free the msb_data structure only once the last gendisk reference goes away instead of keeping a local refcount. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-02-16memstick/mspro_block: fix handling of read-only devicesChristoph Hellwig1-6/+4
Use set_disk_ro to propagate the read-only state to the block layer instead of checking for it in ->open and leaking a reference in case of a read-only device. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-02-16memstick/ms_block: simplify refcountingChristoph Hellwig2-50/+15
Implement the ->free_disk method to free the msb_data structure only once the last gendisk reference goes away instead of keeping a local refcount. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-02-16block: add a ->free_disk methodChristoph Hellwig2-0/+7
Add a method to notify the driver that the gendisk is about to be freed. This allows drivers to tie the lifetime of their private data to that of the gendisk and thus deal with device removal races without expensive synchronization and boilerplate code. A new flag is added so that ->free_disk is only called after a successful call to add_disk, which significantly simplifies the error handling path during probing. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>