aboutsummaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2020-07-08dm zoned: fix unused but set variable warningsWei Yongjun1-4/+1
Fix unused but set variable warnings: drivers/md/dm-zoned-reclaim.c:504:42: warning: variable nr_rnd set but not used [-Wunused-but-set-variable] 504 | unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0; | ^~~~~~ drivers/md/dm-zoned-reclaim.c:504:24: warning: variable nr_unmap_rnd set but not used [-Wunused-but-set-variable] 504 | unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0; | ^~~~~~~~~~~~ Fixes: f97809aec589 ("dm zoned: per-device reclaim") Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-07-08dm writecache: reject asynchronous pmem devicesMichal Suchanek1-0/+6
DM writecache does not handle asynchronous pmem. Reject it when supplied as cache. Link: https://lore.kernel.org/linux-nvdimm/[email protected]/ Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver") Signed-off-by: Michal Suchanek <[email protected]> Acked-by: Mikulas Patocka <[email protected]> Cc: [email protected] # 5.3+ Signed-off-by: Mike Snitzer <[email protected]>
2020-07-08dm: use bio_uninit instead of bio_disassociate_blkgChristoph Hellwig1-3/+2
bio_uninit is the proper API to clean up a BIO that has been allocated on stack or inside a structure that doesn't come from the BIO allocator. Switch dm to use that instead of bio_disassociate_blkg, which really is an implementation detail. Note that the bio_uninit calls are also moved to the two callers of __send_empty_flush, so that they better pair with the bio_init calls used to initialize them. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-07-08Merge tag 'v5.8-rc4' into for-5.9/driversJens Axboe6-18/+55
Merge in 5.8-rc4 for-5.9/block to setup for-5.9/drivers, to provide a clean base and making the life for the NVMe changes easier. Signed-off-by: Jens Axboe <[email protected]> * tag 'v5.8-rc4': (732 commits) Linux 5.8-rc4 x86/ldt: use "pr_info_once()" instead of open-coding it badly MIPS: Do not use smp_processor_id() in preemptible code MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen .gitignore: Do not track `defconfig` from `make savedefconfig` io_uring: fix regression with always ignoring signals in io_cqring_wait() x86/ldt: Disable 16-bit segments on Xen PV x86/entry/32: Fix #MC and #DB wiring on x86_32 x86/entry/xen: Route #DB correctly on Xen PV x86/entry, selftests: Further improve user entry sanity checks x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER i2c: mlxcpld: check correct size of maximum RECV_LEN packet i2c: add Kconfig help text for slave mode i2c: slave-eeprom: update documentation i2c: eg20t: Load module automatically if ID matches i2c: designware: platdrv: Set class based on DMI i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 mm/page_alloc: fix documentation error vmalloc: fix the owner argument for the new __vmalloc_node_range callers mm/cma.c: use exact_nid true to fix possible per-numa cma leak ...
2020-07-07dm: do not use waitqueue for request-based DMMing Lei2-29/+39
Given request-based DM now uses blk-mq's blk_mq_queue_inflight() to determine if outstanding IO has completed (and DM has no control over the blk-mq state machine used to track outstanding IO) it is unsafe to wakeup waiter (dm_wait_for_completion) before blk-mq has cleared a request's state bits (e.g. MQ_RQ_IN_FLIGHT or MQ_RQ_COMPLETE). As such dm_wait_for_completion() could be left to wait indefinitely if no other requests complete. Fix this by eliminating request-based DM's use of waitqueue to wait for blk-mq requests to complete in dm_wait_for_completion. Signed-off-by: Ming Lei <[email protected]> Depends-on: 3c94d83cb3526 ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()") Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2020-07-05Replace HTTP links with HTTPS ones: LVMAlexander A. Klimov2-5/+5
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-07-01dm: remove unused variableJens Axboe1-1/+0
Since merging the commit identified in Fixes below, we trigger this compile time warning: drivers/md/dm.c: In function ‘__map_bio’: drivers/md/dm.c:1296:24: warning: unused variable ‘md’ [-Wunused-variable] 1296 | struct mapped_device *md = io->md; | ^~ Remove the 'md' variable. Fixes: 5a6c35f9af41 ("block: remove direct_make_request") Signed-off-by: Jens Axboe <[email protected]>
2020-07-01block: remove the bd_queue field from struct block_deviceChristoph Hellwig1-1/+1
Just use bd_disk->queue instead. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01block: remove direct_make_requestChristoph Hellwig1-4/+1
Now that submit_bio_noacct has a decent blk-mq fast path there is no more need for this bypass. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01block: rename generic_make_request to submit_bio_noacctChristoph Hellwig25-73/+72
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01block: move ->make_request_fn to struct block_device_operationsChristoph Hellwig5-24/+35
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01block: remove the request_queue argument from blk_queue_splitChristoph Hellwig2-2/+2
The queue can be trivially derived from the bio, so pass one less argument. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01dm: stop using ->queuedataChristoph Hellwig1-2/+1
Instead of setting up the queuedata as well just use one private data field. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-07-01bcache: stop setting ->queuedataChristoph Hellwig1-1/+0
Nothing in bcache actually uses the ->queuedata field. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-29dm: use bio_uninit instead of bio_disassociate_blkgChristoph Hellwig1-3/+2
bio_uninit is the proper API to clean up a BIO that has been allocated on stack or inside a structure that doesn't come from the BIO allocator. Switch dm to use that instead of bio_disassociate_blkg, which really is an implementation detail. Note that the bio_uninit calls are also moved to the two callers of __send_empty_flush, so that they better pair with the bio_init calls used to initialize them. Acked-by: Tejun Heo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-27Merge tag 'for-5.8/dm-fixes' of ↵Linus Torvalds6-18/+55
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Quite a few DM zoned target fixes and a Zone append fix in DM core. Considering the amount of dm-zoned changes that went in during the 5.8 merge window these fixes are not that surprising. - A few DM writecache target fixes. - A fix to Documentation index to include DM ebs target docs. - Small cleanup to use struct_size() in DM core's retrieve_deps(). * tag 'for-5.8/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm writecache: add cond_resched to loop in persistent_memory_claim() dm zoned: Fix reclaim zone selection dm zoned: Fix random zone reclaim selection dm: update original bio sector on Zone Append dm zoned: Fix metadata zone size check docs: device-mapper: add dm-ebs.rst to an index file dm ioctl: use struct_size() helper in retrieve_deps() dm writecache: skip writecache_wait when using pmem mode dm writecache: correct uncommitted_block when discarding uncommitted entry dm zoned: assign max_io_len correctly dm zoned: fix uninitialized pointer dereference
2020-06-24blk-mq: move failure injection out of blk_mq_complete_requestChristoph Hellwig1-1/+2
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-19dm writecache: add cond_resched to loop in persistent_memory_claim()Mikulas Patocka1-0/+2
Add cond_resched() to a loop that fills in the mapper memory area because the loop can be executed many times. Fixes: 48debafe4f2fe ("dm: add writecache target") Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-19dm zoned: Fix reclaim zone selectionShin'ichiro Kawasaki1-2/+2
When dm zoned has multiple devices, random zones are never selected for reclaim if all reserved sequential write zones are in use and no sequential write required zones can be selected for reclaim. This can lead to deadlocks as selecting a cache zone allows reclaiming a sequential zone, ensuring forward progress. Fix this by always defaulting to selecting a random zone when no sequential write required zone can be selected. [Damien: fix commit message] Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-19dm zoned: Fix random zone reclaim selectionDamien Le Moal1-8/+27
Commit 2094045fe5b5 ("dm zoned: prefer full zones for reclaim") modified dmz_get_rnd_zone_for_reclaim() to add a search for the buffer zone with the heaviest weight as an optimal candidate for reclaim. This modification uses the zone pointer variabl "last" which is set only once and never modified as zones are scanned, resulting in the search being inefective. Furthermore, if the selected buffer zone at the end of the search loop is active or already locked for reclaim, dmz_get_rnd_zone_for_reclaim() returns NULL even if other random zones with a lesser weight can be reclaimed. To fix the search and to guarantee that reclaim can make forward progress, fix dmz_get_rnd_zone_for_reclaim() loop to correctly find the buffer zone with the heaviest weight using the variable maxw_z. Also make sure to fallback to finding the first random zone that can be reclaimed if this best candidate zone cannot be reclaimed. While at it, also fix the device index check to consider only random zones, ignoring cache zones belonging to the cache device if one is used as that device does not have a reclaim process. Fixes: 2094045fe5b5 ("dm zoned: prefer full zones for reclaim") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-19dm: update original bio sector on Zone AppendJohannes Thumshirn1-0/+13
Naohiro reported that issuing zone-append bios to a zoned block device underneath a dm-linear device does not work as expected. This because we forgot to reverse-map the sector the device wrote to the original bio. For zone-append bios, get the offset in the zone of the written sector from the clone bio and add that to the original bio's sector position. Fixes: 0512a75b98f8 ("block: Introduce REQ_OP_ZONE_APPEND") Cc: [email protected] Reported-by: Naohiro Aota <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-19dm zoned: Fix metadata zone size checkShin'ichiro Kawasaki1-1/+2
When dm zoned has multiple devices, metadata is on the cache device, not in random zones of the zoned devices. Then the number of metadata zones shall be checked with the number of cache zones, not random zones. Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones") Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-17dm ioctl: use struct_size() helper in retrieve_deps()Gustavo A. R. Silva1-1/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct dm_target_deps { ... __u64 dev[0]; /* out */ }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-17dm writecache: skip writecache_wait when using pmem modeHuaisheng Ye1-2/+4
The array bio_in_progress is only used with ssd mode. So skip writecache_wait_for_ios in writecache_discard when pmem mode. Signed-off-by: Huaisheng Ye <[email protected]> Acked-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-17dm writecache: correct uncommitted_block when discarding uncommitted entryHuaisheng Ye1-0/+2
When uncommitted entry has been discarded, correct wc->uncommitted_block for getting the exact number. Fixes: 48debafe4f2fe ("dm: add writecache target") Cc: [email protected] Signed-off-by: Huaisheng Ye <[email protected]> Acked-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-17dm zoned: assign max_io_len correctlyHou Tao1-1/+1
The unit of max_io_len is sector instead of byte (spotted through code review), so fix it. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: [email protected] Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-17dm zoned: fix uninitialized pointer dereferenceDamien Le Moal1-3/+1
Make sure that the local variable rzone in dmz_do_reclaim() is always initialized before being used for printing debug messages. Fixes: f97809aec589 ("dm zoned: per-device reclaim") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-14bcache: pr_info() format clean up in bcache_device_init()Coly Li1-2/+1
scripts/checkpatch.pl reports following warning for patch ("bcache: check and adjust logical block size for backing devices"), WARNING: quoted string split across lines #146: FILE: drivers/md/bcache/super.c:896: + pr_info("%s: sb/logical block size (%u) greater than page size " + "(%lu) falling back to device logical block size (%u)", There are two things to fix up, - The kernel message print should be in a single line. - pr_info() won't automatically add new line since v5.8, a '\n' should be added. This patch just does the above cleanup in bcache_device_init(). Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-14bcache: use delayed kworker fo asynchronous devices registrationColy Li1-6/+8
This patch changes the asynchronous registration kworker to a delayed kworker. There is probability queue_work() queues the async registration kworker to the same CPU (even though very little), then the process which writing sysfs interface to reigster bcache device may won't return immeidately. queue_delayed_work() in this patch will delay 10 jiffies before insert the kworker to run queue, which makes sure the registering process may always returns to user space in time. Fixes: 9e23ccf8f0a22 ("bcache: asynchronous devices registration") Signed-off-by: Coly Li <[email protected]> Cc: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-14bcache: check and adjust logical block size for backing devicesMauricio Faria de Oliveira1-3/+19
It's possible for a block driver to set logical block size to a value greater than page size incorrectly; e.g. bcache takes the value from the superblock, set by the user w/ make-bcache. This causes a BUG/NULL pointer dereference in the path: __blkdev_get() -> set_init_blocksize() // set i_blkbits based on ... -> bdev_logical_block_size() -> queue_logical_block_size() // ... this value -> bdev_disk_changed() ... -> blkdev_readpage() -> block_read_full_page() -> create_page_buffers() // size = 1 << i_blkbits -> create_empty_buffers() // give size/take pointer -> alloc_page_buffers() // return NULL .. BUG! Because alloc_page_buffers() is called with size > PAGE_SIZE, thus it initializes head = NULL, skips the loop, return head; then create_empty_buffers() gets (and uses) the NULL pointer. This has been around longer than commit ad6bf88a6c19 ("block: fix an integer overflow in logical block size"); however, it increased the range of values that can trigger the issue. Previously only 8k/16k/32k (on x86/4k page size) would do it, as greater values overflow unsigned short to zero, and queue_ logical_block_size() would then use the default of 512. Now the range with unsigned int is much larger, and users w/ the 512k value, which happened to be zero'ed previously and work fine, started to hit this issue -- as the zero is gone, and queue_logical_block_size() does return 512k (>PAGE_SIZE.) Fix this by checking the bcache device's logical block size, and if it's greater than page size, fallback to the backing/ cached device's logical page size. This doesn't affect cache devices as those are still checked for block/page size in read_super(); only the backing/cached devices are not. Apparently it's a regression from commit 2903381fce71 ("bcache: Take data offset from the bdev superblock."), moving the check into BCACHE_SB_VERSION_CDEV only. Now that we have superblocks of backing devices out there with this larger value, we cannot refuse to load them (i.e., have a similar check in _BDEV.) Ideally perhaps bcache should use all values from the backing device (physical/logical/io_min block size)? But for now just fix the problematic case. Test-case: # IMG=/root/disk.img # dd if=/dev/zero of=$IMG bs=1 count=0 seek=1G # DEV=$(losetup --find --show $IMG) # make-bcache --bdev $DEV --block 8k < see dmesg > Before: # uname -r 5.7.0-rc7 [ 55.944046] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 55.949742] CPU: 3 PID: 610 Comm: bcache-register Not tainted 5.7.0-rc7 #4 ... [ 55.952281] RIP: 0010:create_empty_buffers+0x1a/0x100 ... [ 55.966434] Call Trace: [ 55.967021] create_page_buffers+0x48/0x50 [ 55.967834] block_read_full_page+0x49/0x380 [ 55.972181] do_read_cache_page+0x494/0x610 [ 55.974780] read_part_sector+0x2d/0xaa [ 55.975558] read_lba+0x10e/0x1e0 [ 55.977904] efi_partition+0x120/0x5a6 [ 55.980227] blk_add_partitions+0x161/0x390 [ 55.982177] bdev_disk_changed+0x61/0xd0 [ 55.982961] __blkdev_get+0x350/0x490 [ 55.983715] __device_add_disk+0x318/0x480 [ 55.984539] bch_cached_dev_run+0xc5/0x270 [ 55.986010] register_bcache.cold+0x122/0x179 [ 55.987628] kernfs_fop_write+0xbc/0x1a0 [ 55.988416] vfs_write+0xb1/0x1a0 [ 55.989134] ksys_write+0x5a/0xd0 [ 55.989825] do_syscall_64+0x43/0x140 [ 55.990563] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 55.991519] RIP: 0033:0x7f7d60ba3154 ... After: # uname -r 5.7.0.bcachelbspgsz [ 31.672460] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512) [ 31.675133] bcache: register_bdev() registered backing device loop0 # grep ^ /sys/block/bcache0/queue/*_block_size /sys/block/bcache0/queue/logical_block_size:512 /sys/block/bcache0/queue/physical_block_size:8192 Reported-by: Ryan Finnie <[email protected]> Reported-by: Sebastian Marsching <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-14bcache: fix potential deadlock problem in btree_gc_coalesceZhiqiang Liu1-2/+6
coccicheck reports: drivers/md//bcache/btree.c:1538:1-7: preceding lock on line 1417 In btree_gc_coalesce func, if the coalescing process fails, we will goto to out_nocoalesce tag directly without releasing new_nodes[i]->write_lock. Then, it will cause a deadlock when trying to acquire new_nodes[i]-> write_lock for freeing new_nodes[i] before return. btree_gc_coalesce func details as follows: if alloc new_nodes[i] fails: goto out_nocoalesce; // obtain new_nodes[i]->write_lock mutex_lock(&new_nodes[i]->write_lock) // main coalescing process for (i = nodes - 1; i > 0; --i) [snipped] if coalescing process fails: // Here, directly goto out_nocoalesce // tag will cause a deadlock goto out_nocoalesce; [snipped] // release new_nodes[i]->write_lock mutex_unlock(&new_nodes[i]->write_lock) // coalesing succ, return return; out_nocoalesce: btree_node_free(new_nodes[i]) // free new_nodes[i] // obtain new_nodes[i]->write_lock mutex_lock(&new_nodes[i]->write_lock); // set flag for reuse clear_bit(BTREE_NODE_dirty, &ew_nodes[i]->flags); // release new_nodes[i]->write_lock mutex_unlock(&new_nodes[i]->write_lock); To fix the problem, we add a new tag 'out_unlock_nocoalesce' for releasing new_nodes[i]->write_lock before out_nocoalesce tag. If coalescing process fails, we will go to out_unlock_nocoalesce tag for releasing new_nodes[i]->write_lock before free new_nodes[i] in out_nocoalesce tag. (Coly Li helps to clean up commit log format.) Fixes: 2a285686c109816 ("bcache: btree locking rework") Signed-off-by: Zhiqiang Liu <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada2-42/+42
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <[email protected]>
2020-06-05Merge tag 'for-5.8/dm-changes' of ↵Linus Torvalds25-636/+2650
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - The largest change for this cycle is the DM zoned target's metadata version 2 feature that adds support for pairing regular block devices with a zoned device to ease the performance impact associated with finite random zones of zoned device. The changes came in three batches: the first prepared for and then added the ability to pair a single regular block device, the second was a batch of fixes to improve zoned's reclaim heuristic, and the third removed the limitation of only adding a single additional regular block device to allow many devices. Testing has shown linear scaling as more devices are added. - Add new emulated block size (ebs) target that emulates a smaller logical_block_size than a block device supports The primary use-case is to emulate "512e" devices that have 512 byte logical_block_size and 4KB physical_block_size. This is useful to some legacy applications that otherwise wouldn't be able to be used on 4K devices because they depend on issuing IO in 512 byte granularity. - Add discard interfaces to DM bufio. First consumer of the interface is the dm-ebs target that makes heavy use of dm-bufio. - Fix DM crypt's block queue_limits stacking to not truncate logic_block_size. - Add Documentation for DM integrity's status line. - Switch DMDEBUG from a compile time config option to instead use dynamic debug via pr_debug. - Fix DM multipath target's hueristic for how it manages "queue_if_no_path" state internally. DM multipath now avoids disabling "queue_if_no_path" unless it is actually needed (e.g. in response to configure timeout or explicit "fail_if_no_path" message). This fixes reports of spurious -EIO being reported back to userspace application during fault tolerance testing with an NVMe backend. Added various dynamic DMDEBUG messages to assist with debugging queue_if_no_path in the future. - Add a new DM multipath "Historical Service Time" Path Selector. - Fix DM multipath's dm_blk_ioctl() to switch paths on IO error. - Improve DM writecache target performance by using explicit cache flushing for target's single-threaded usecase and a small cleanup to remove unnecessary test in persistent_memory_claim. - Other small cleanups in DM core, dm-persistent-data, and DM integrity. * tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits) dm crypt: avoid truncating the logical block size dm mpath: add DM device name to Failing/Reinstating path log messages dm mpath: enhance queue_if_no_path debugging dm mpath: restrict queue_if_no_path state machine dm mpath: simplify __must_push_back dm zoned: check superblock location dm zoned: prefer full zones for reclaim dm zoned: select reclaim zone based on device index dm zoned: allocate zone by device index dm zoned: support arbitrary number of devices dm zoned: move random and sequential zones into struct dmz_dev dm zoned: per-device reclaim dm zoned: add metadata pointer to struct dmz_dev dm zoned: add device pointer to struct dm_zone dm zoned: allocate temporary superblock for tertiary devices dm zoned: convert to xarray dm zoned: add a 'reserved' zone flag dm zoned: improve logging messages for reclaim dm zoned: avoid unnecessary device recalulation for secondary superblock dm zoned: add debugging message for reading superblocks ...
2020-06-05dm crypt: avoid truncating the logical block sizeEric Biggers1-1/+1
queue_limits::logical_block_size got changed from unsigned short to unsigned int, but it was forgotten to update crypt_io_hints() to use the new type. Fix it. Fixes: ad6bf88a6c19 ("block: fix an integer overflow in logical block size") Cc: [email protected] Signed-off-by: Eric Biggers <[email protected]> Reviewed-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm mpath: add DM device name to Failing/Reinstating path log messagesMike Snitzer1-2/+6
When there are many DM multipath devices it really helps to have additional context for which DM device a failed or reinstated path is part of. Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm mpath: enhance queue_if_no_path debuggingMike Snitzer1-7/+23
Add more DMDEBUG that shows arguments passed and caller, and another that shows state of related flags at end of queue_if_no_path(). Also add queue_if_no_path DMDEBUG to multipath_resume(). Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm mpath: restrict queue_if_no_path state machineMike Snitzer1-10/+28
Do not allow saving disabled queue_if_no_path if already saved as enabled; implies multiple suspends (which shouldn't ever happen). Log if this unlikely scenario is ever triggered. Also, only write MPATHF_SAVED_QUEUE_IF_NO_PATH during presuspend or if "fail_if_no_path" message. MPATHF_SAVED_QUEUE_IF_NO_PATH is no longer always modified, e.g.: even if queue_if_no_path()'s save_old_value argument wasn't set. This just implies a bit tighter control over the management of MPATHF_SAVED_QUEUE_IF_NO_PATH. Side-effect is multipath_resume() doesn't reset MPATHF_QUEUE_IF_NO_PATH unless MPATHF_SAVED_QUEUE_IF_NO_PATH was set (during presuspend); and at that time the MPATHF_SAVED_QUEUE_IF_NO_PATH bit gets cleared. So MPATHF_SAVED_QUEUE_IF_NO_PATH's use is much more narrow in scope. Last, but not least, do _not_ disable queue_if_no_path during noflush suspend. There is no need/benefit to saving off queue_if_no_path via MPATHF_SAVED_QUEUE_IF_NO_PATH and clearing MPATHF_QUEUE_IF_NO_PATH for noflush suspend -- by avoiding this needless queue_if_no_path flag churn there is less potential for MPATHF_QUEUE_IF_NO_PATH to get lost. Which avoids potential for IOs to be errored back up to userspace during DM multipath's handling of path failures. That said, this last change papers over a reported issue concerning request-based dm-multipath's interaction with blk-mq, relative to suspend and resume: multipath_endio is being called _before_ multipath_resume. This should never happen if DM suspend's blk_mq_quiesce_queue() + dm_wait_for_completion() is genuinely waiting for all inflight blk-mq requests to complete. Similarly: drivers/md/dm.c:__dm_resume() clearly calls dm_table_resume_targets() _before_ dm_start_queue()'s blk_mq_unquiesce_queue() is called. If the queue isn't even restarted until after multipath_resume(); the BIG question that still needs answering is: how can multipath_end_io beat multipath_resume in a race!? Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm mpath: simplify __must_push_backMike Snitzer1-23/+5
Remove micro-optimization that infers device is between presuspend and resume (was done purely to avoid call to dm_noflush_suspending, which isn't expensive anyway). Remove flags argument since they are no longer checked. And remove must_push_back_bio() since it was simply a call to __must_push_back(). Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: check superblock locationHannes Reinecke1-1/+9
When specifying several devices the superblock location must be checked to ensure the devices are specified in the correct order. Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: prefer full zones for reclaimHannes Reinecke1-1/+8
Prefer full zones when selecting the next zone for reclaim. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: select reclaim zone based on device indexHannes Reinecke4-32/+27
per-device reclaim should select zones on that device only. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: allocate zone by device indexHannes Reinecke3-8/+15
When allocating a zone, pass in an indicator on which device the zone should be allocated; this increases performance for a multi-device setup because reclaim will now allocate zones on the device for which reclaim is running. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: support arbitrary number of devicesHannes Reinecke2-45/+74
Remove the hard-coded limit of two devices and support an unlimited number of additional zoned devices. Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: move random and sequential zones into struct dmz_devHannes Reinecke4-78/+119
Random and sequential zones should be part of the respective device structure to make arbitration between devices possible. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: per-device reclaimHannes Reinecke3-57/+88
Instead of having one reclaim workqueue for the entire set we should be allocating a reclaim workqueue per device; doing so will reduce contention and should boost performance for a multi-device setup. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: add metadata pointer to struct dmz_devHannes Reinecke2-8/+13
Add a metadata pointer within struct dmz_dev and use it as argument for blkdev_report_zones() instead of the metadata itself. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: add device pointer to struct dm_zoneHannes Reinecke4-39/+19
Add a pointer, to the containing device, within struct dm_zone and kill dmz_zone_to_dev(). Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: allocate temporary superblock for tertiary devicesHannes Reinecke1-48/+61
Checking the tertiary superblock just consists of validating UUIDs, crcs, and the generation number; it doesn't have contents which would be required during the actual operation. So allocate a temporary superblock when checking tertiary devices to avoid having to store it together with the 'real' superblocks. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: convert to xarrayHannes Reinecke1-32/+90
The zones array is getting really large, and large arrays tend to wreak havoc with the CPU caches. So convert it to xarray to become more cache friendly. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Colin Ian King <[email protected]> # fix leak in dmz_insert Signed-off-by: Mike Snitzer <[email protected]>
2020-06-05dm zoned: add a 'reserved' zone flagHannes Reinecke2-2/+4
Instead of counting the number of reserved zones in dmz_free_zone(), mark the zone as 'reserved' during allocation and simplify dmz_free_zone(). Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>