blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-07-28	dm: Fix PR release handling for non All Registrants	Mike Christie	1	-14/+34
	This commit fixes a bug where we are leaving the reservation in place even though pr_release has run and returned success. If we have a Write Exclusive, Exclusive Access, or Write/Exclusive Registrants only reservation, the release must be sent down the path that is the reservation holder. The problem is multipath_prepare_ioctl most likely selected path N for the reservation, then later when we do the release multipath_prepare_ioctl will select a completely different path. The device will then return success becuase the nvme and scsi specs say to return success if there is no reservation or if the release is sent down from a path that is not the holder. We then think we have released the reservation. This commit has us loop over each path and send a release so we can make sure the release is executed on the correct path. It has been tested with windows failover clustering's validation test which checks this case, and it has been tested manually (the libiscsi PGR tests don't have a test case for this yet, but I will be adding one). Signed-off-by: Mike Christie <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-28	dm: Start pr_reserve from the same starting path	Mike Christie	1	-14/+32
	When an app does a pr_reserve it will go to whatever path we happen to be using at the time. This can result in errors when the app does a second pr_reserve call and expects success but gets a failure because the reserve is not done on the holder's path. This commit has us always start trying to do reserves from the first path in the first group. Windows failover clustering will produce the type of pattern above. With this commit, we will now pass its validation test for this case. Signed-off-by: Mike Christie <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-28	dm: Allow dm_call_pr to be used for path searches	Mike Christie	1	-12/+38
	The specs state that if you send a reserve down a path that is already the holder success must be returned and if it goes down a path that is not the holder reservation conflict must be returned. Windows failover clustering will send a second reservation and expects that a device returns success. The problem for multipathing is that for an All Registrants reservation, we can send the reserve down any path but for all other reservation types there is one path that is the holder. To handle this we could add PR state to dm but that can get nasty. Look at target_core_pr.c for an example of the type of things we'd have to track. It will also get more complicated because other initiators can change the state so we will have to add in async event/sense handling. This commit, and the 3 commits that follow, tries to keep dm simple and keep just doing passthrough. This commit modifies dm_call_pr to be able to find the first usable path that can execute our pr_op then return. When dm_pr_reserve is converted to dm_call_pr in the next commit for the normal case we will use the same path for every reserve. Signed-off-by: Mike Christie <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-28	dm: return early from dm_pr_call() if DM device is suspended	Mike Snitzer	1	-0/+5
	Otherwise PR ops may be issued while the broader DM device is being reconfigured, etc. Fixes: 9c72bad1f31a ("dm: call PR reserve/unreserve on each underlying device") Signed-off-by: Mike Snitzer <[email protected]>
2022-07-15	dm thin: fix use-after-free crash in dm_sm_register_threshold_callback	Luo Meng	2	-3/+8
	Fault inject on pool metadata device reports: BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80 Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950 CPU: 7 PID: 950 Comm: dmsetup Tainted: G W 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xeb/0x3f4 kasan_report.cold+0xe6/0x147 dm_pool_register_metadata_threshold+0x40/0x80 pool_ctr+0xa0a/0x1150 dm_table_add_target+0x2c8/0x640 table_load+0x1fd/0x430 ctl_ioctl+0x2c4/0x5a0 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb3/0xd0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This can be easily reproduced using: echo offline > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10 dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0" If a metadata commit fails, the transaction will be aborted and the metadata space maps will be destroyed. If a DM table reload then happens for this failed thin-pool, a use-after-free will occur in dm_sm_register_threshold_callback (called from dm_pool_register_metadata_threshold). Fix this by in dm_pool_register_metadata_threshold() by returning the -EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr() with a new error message: "Error registering metadata threshold". Fixes: ac8c3f3df65e4 ("dm thin: generate event when metadata threshold passed") Cc: [email protected] Reported-by: Hulk Robot <[email protected]> Signed-off-by: Luo Meng <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm writecache: count number of blocks discarded, not number of discard bios	Mikulas Patocka	2	-2/+2
	Change dm-writecache, so that it counts the number of blocks discarded instead of the number of discard bios. Make it consistent with the read and write statistics counters that were changed to count the number of blocks instead of bios. Fixes: e3a35d03407c ("dm writecache: add event counters") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm writecache: count number of blocks written, not number of write bios	Mikulas Patocka	2	-8/+14
	Change dm-writecache, so that it counts the number of blocks written instead of the number of write bios. Bios can be split and requeued using the dm_accept_partial_bio function, so counting bios caused inaccurate results. Fixes: e3a35d03407c ("dm writecache: add event counters") Reported-by: Yu Kuai <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm writecache: count number of blocks read, not number of read bios	Mikulas Patocka	2	-2/+3
	Change dm-writecache, so that it counts the number of blocks read instead of the number of read bios. Bios can be split and requeued using the dm_accept_partial_bio function, so counting bios caused inaccurate results. Fixes: e3a35d03407c ("dm writecache: add event counters") Reported-by: Yu Kuai <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm writecache: return void from functions	Mikulas Patocka	1	-13/+13
	The functions writecache_map_remap_origin and writecache_bio_copy_ssd only return a single value, thus they can be made to return void. This helps simplify the following IO accounting changes. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm kcopyd: use __GFP_HIGHMEM when allocating pages	Mikulas Patocka	1	-1/+1
	dm-kcopyd doesn't access the allocated pages directly, it only passes them to dm-io which adds them to a bio list - thus, we can allocate the pages from high memory. This will reduce pressure on the low memory when there are a large number of kcopyd jobs in progress. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-14	dm writecache: set a default MAX_WRITEBACK_JOBS	Mikulas Patocka	1	-1/+1
	dm-writecache has the capability to limit the number of writeback jobs in progress. However, this feature was off by default. As such there were some out-of-memory crashes observed when lowering the low watermark while the cache is full. This commit enables writeback limit by default. It is set to 256MiB or 1/16 of total system memory, whichever is smaller. Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	Documentation: dm writecache: Render status list as list	Bagas Sanjaya	1	-0/+1
	The status list isn't rendered as list, but rather as normal paragraph, because there is missing blank line between "Status:" line and the list. Fix the issue by adding the blank line separator. Fixes: 48debafe4f2fea ("dm: add writecache target") Signed-off-by: Bagas Sanjaya <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	Documentation: dm writecache: add blank line before optional parameters	Mauro Carvalho Chehab	1	-0/+1
	Otherwise this warning occurs: Documentation/admin-guide/device-mapper/writecache.rst:23: WARNING: Unexpected indentation. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm snapshot: fix typo in snapshot_map() comment	Zhang Jiaming	1	-1/+1
	Signed-off-by: Zhang Jiaming <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm raid: remove redundant "the" in parse_raid_params() comment	Jiang Jian	1	-1/+1
	Signed-off-by: Jiang Jian <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm cache: fix typo in 2 comment blocks	Steven Lung	2	-2/+2
	Replace neccessarily with necessarily. Signed-off-by: Steven Lung <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm verity: fix checkpatch close brace error	JeongHyeon Lee	1	-4/+3
	Resolves: ERROR: else should follow close brace '}' Signed-off-by: JeongHyeon Lee <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm table: rename dm_target variable in dm_table_add_target()	Mike Snitzer	1	-28/+28
	Rename from "tgt" to "ti" so that all of dm-table.c code uses the same naming for dm_target variables. Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm table: audit all dm_table_get_target() callers	Mike Snitzer	6	-143/+97
	All callers of dm_table_get_target() are expected to do proper bounds checking on the index they pass. Move dm_table_get_target() to dm-core.h to make it extra clear that only DM core code should be using it. Switch it to be inlined while at it. Standardize all DM core callers to use the same for loop pattern and make associated variables as local as possible. Rename some variables (e.g. s/table/t/ and s/tgt/ti/) along the way. Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm table: remove dm_table_get_num_targets() wrapper	Mike Snitzer	6	-29/+23
	More efficient and readable to just access table->num_targets directly. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm: add two stage requeue mechanism	Ming Lei	3	-31/+130
	Commit 61b6e2e5321d ("dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio") reverted DM core's bio splitting back to using bio_split()+bio_chain() because it was found that otherwise DM's BLK_STS_DM_REQUEUE would trigger a live-lock waiting for bio completion that would never occur. Restore using bio_trim()+bio_inc_remaining(), like was done in commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting"), but this time with proper handling for the above scenario that is covered in more detail in the commit header for 61b6e2e5321d. Solve this issue by adding a two staged dm_io requeue mechanism that uses the new dm_bio_rewind() via dm_io_rewind(): 1) requeue the dm_io into the requeue_list added to struct mapped_device, and schedule it via new added requeue work. This workqueue just clones the dm_io->orig_bio (which DM saves and ensures its end sector isn't modified). dm_io_rewind() uses the sectors and sectors_offset members of the dm_io that are recorded relative to the end of orig_bio: dm_bio_rewind()+bio_trim() are then used to make that cloned bio reflect the subset of the original bio that is represented by the dm_io that is being requeued. 2) the 2nd stage requeue is same with original requeue, but io->orig_bio points to new cloned bio (which matches the requeued dm_io as described above). This allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-07-07	dm: add dm_bio_rewind() API to DM core	Ming Lei	3	-1/+146
	Commit 7759eb23fd98 ("block: remove bio_rewind_iter()") removed a similar API for the following reasons: ``` It is pointed that bio_rewind_iter() is one very bad API[1]: 1) bio size may not be restored after rewinding 2) it causes some bogus change, such as 5151842b9d8732 (block: reset bi_iter.bi_done after splitting bio) 3) rewinding really makes things complicated wrt. bio splitting 4) unnecessary updating of .bi_done in fast path [1] https://marc.info/?t=153549924200005&r=1&w=2 So this patch takes Kent's suggestion to restore one bio into its original state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(), given now bio_rewind_iter() is only used by bio integrity code. ``` However, saving off a copy of the 32 bytes bio->bi_iter in case rewind needed isn't efficient because it bloats per-bio-data for what is an unlikely case. That suggestion also ignores the need to restore crypto and integrity info. Add dm_bio_rewind() API for a specific use-case that is much more narrow than the previous more generic rewind code that was reverted: 1) most bios have a fixed end sector since bio split is done from front of the bio, if driver just records how many sectors between current bio's start sector and the original bio's end sector, the original position can be restored. Keeping the original bio's end sector fixed is a _hard_ requirement for this interface! 2) if a bio's end sector won't change (usually bio_trim() isn't called, or in the case of DM it preserves original bio), user can restore the original position by storing sector offset from the current ->bi_iter.bi_sector to bio's end sector; together with saving bio size, only 8 bytes is needed to restore to original bio. 3) DM's requeue use case: when BLK_STS_DM_REQUEUE happens, DM core needs to restore to an "original bio" which represents the current dm_io to be requeued (which may be a subset of the original bio). By storing the sector offset from the original bio's end sector and dm_io's size, dm_bio_rewind() can restore such original bio. See commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") for more details on how DM does this. Leveraging this, allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). 4) Unlike the original rewind API, dm_bio_rewind() doesn't add .bi_done to bvec_iter and there is no effect on the fast path. Implement dm_bio_rewind() by factoring out clear helpers that it calls: dm_bio_integrity_rewind, dm_bio_crypt_rewind and dm_bio_rewind_iter. DM is able to ensure that dm_bio_rewind() is used safely but, given the constraint that the bio's end must never change, other hypothetical future callers may not take the same care. So make dm_bio_rewind() and all supporting code local to DM to avoid risk of hypothetical abuse. A "dm_" prefix was added to all functions to avoid any namespace collisions. Suggested-by: Jens Axboe <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-06-29	dm: improve BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling	Ming Lei	1	-25/+45
	If either BLK_STS_DM_REQUEUE or BLK_STS_AGAIN is returned for POLLED io, we requeue the original bio into deferred list and kick md->wq to re-submit it to block layer. Improve the handling in the following way: 1) Factor out dm_handle_requeue() for handling dm_io requeue. 2) Unify handling for BLK_STS_DM_REQUEUE and BLK_STS_AGAIN: clear REQ_POLLED for BLK_STS_DM_REQUEUE too, for the sake of simplicity, given BLK_STS_DM_REQUEUE is very unusual. 3) Queue md->wq explicitly in dm_handle_requeue(), so requeue handling becomes more robust. Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-06-29	dm: refactor dm_md_mempool allocation	Christoph Hellwig	4	-71/+44
	The current split between dm_table_alloc_md_mempools and dm_alloc_md_mempools is rather arbitrary, so merge the two into one easy to follow function. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-06-29	dm: unexport dm_get_reserved_rq_based_ios	Christoph Hellwig	1	-1/+0
	dm_get_reserved_rq_based_ios is only used in the core dm code, so remove the export. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2022-06-29	block: simplify disk_set_independent_access_ranges	Christoph Hellwig	3	-44/+18
	Lift setting disk->ia_ranges from disk_register_independent_access_ranges into disk_set_independent_access_ranges, and make the behavior the same for the registered vs non-registered queue cases. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-29	block: move ->ia_ranges from the request_queue to the gendisk	Christoph Hellwig	2	-15/+15
	Independent access ranges only matter for file system I/O and are only valid with a registered gendisk, so move them there. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-29	block: remove "select BLK_RQ_IO_DATA_LEN" from BLK_CGROUP_IOCOST dependency	Ying Sun	1	-1/+0
	The configuration item BLK_RQ_IO_DATA_LEN is not declared in the kernel. Select BLK_RQ_IO_DATA_LEN is meaningless which could be removed. Signed-off-by: Ying Sun <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	blk-mq: cleanup disk sysfs registration	Christoph Hellwig	4	-26/+26
	Pass a gendisk to the sysfs register/unregister functions and give them descriptive names. Also move the unregistration helper next to the one doing the registration. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	blk-mq: rename blk_mq_sysfs_{,un}register	Christoph Hellwig	3	-6/+6
	Add a _hctx postfix to better describe what the functions do, match the debugfs equivalents and release the old names for functions that should be using them. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: remove the extra gendisk reference in __blk_mq_register_dev	Christoph Hellwig	1	-3/+1
	kobject_add already grabs a reference to the parent, no need to have another one. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: use default groups to register the queue attributes	Christoph Hellwig	1	-6/+6
	Set up the default_groups for blk_queue_ktype instead of manually calling sysfs_create_group. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: remove a superflous queue kobject reference	Christoph Hellwig	1	-5/+1
	kobject_add already adds a reference to the parent that is dropped on deletion, so don't bother grabbing another one. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: simplify blktrace sysfs attribute creation	Christoph Hellwig	6	-32/+6
	Add the trace attributes to the default gendisk attributes, just like we already do for partitions. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: remove blk_cleanup_disk	Christoph Hellwig	47	-90/+74
	blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: simplify disk shutdown	Christoph Hellwig	34	-113/+105
	Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: stop setting the nomerges flags in blk_cleanup_queue	Christoph Hellwig	1	-3/+0
	These flags only apply to file system I/O, and all file system I/O is already drained by del_gendisk and thus can't be in progress when blk_cleanup_queue is called. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	block: remove QUEUE_FLAG_DEAD	Christoph Hellwig	3	-10/+3
	Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	mtip32xx: fix device removal	Christoph Hellwig	2	-114/+44
	Use the proper helper to mark a surpise removal, remove the gendisk as soon as possible when removing the device and implement the ->free_disk callback to ensure the private data is alive as long as the gendisk has references. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-28	mtip32xx: remove the device_status debugfs file	Christoph Hellwig	2	-144/+1
	This file is a huge mess that iterates over all devices and is in the way of fixing the device removal in this driver, so remove it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	blk-mq: blk_mq_tag_busy is no need to return a value	Liu Song	2	-17/+11
	Currently "blk_mq_tag_busy" return value has no effect, so adjust it. Some code implementations have also been adjusted to enhance readability. Signed-off-by: Liu Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Always initialize bio IO priority on submit	Jan Kara	1	-0/+3
	Currently, IO priority set in task's IO context is not reflected in the bio->bi_ioprio for most IO (only io_uring and direct IO set it). This results in odd results where process is submitting some bios with one priority and other bios with a different (unset) priority and due to differing priorities bios cannot be merged. Make sure bio->bi_ioprio is always set on bio submission. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Initialize bio priority earlier	Jan Kara	1	-2/+2
	Bio's IO priority needs to be initialized before we try to merge the bio with other bios. Otherwise we could merge bios which would otherwise receive different IO priorities leading to possible QoS issues. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	blk-ioprio: Convert from rqos policy to direct call	Jan Kara	4	-45/+23
	Convert blk-ioprio handling from a rqos policy to a direct call from blk_mq_submit_bio(). Firstly, blk-ioprio is not much of a rqos policy anyway, it just needs a hook in bio submission path to set the bio's IO priority. Secondly, the rqos .track hook gets actually called too late for blk-ioprio purposes and introducing a special rqos hook just for blk-ioprio looks even weirder. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	blk-ioprio: Remove unneeded field	Jan Kara	1	-6/+3
	blkcg->ioprio_set field is not really useful except for avoiding possibly more expensive checks inside blkcg_ioprio_track(). The check for blkcg->prio_policy being equal to POLICY_NO_CHANGE does the same service so just remove the ioprio_set field and replace the check. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Fix handling of tasks without ioprio in ioprio_get(2)	Jan Kara	1	-7/+23
	ioprio_get(2) can be asked to return the best IO priority from several tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats tasks without set IO priority as having priority IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the IO priority the task will get (which depends on task's nice value). Fix the code to use the real IO priority task's IO will use. We have to modify code for ioprio_get(IOPRIO_WHO_PROCESS) to keep returning IOPRIO_CLASS_NONE priority for tasks without set IO priority as a special case to maintain userspace visible API. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Make ioprio_best() static	Jan Kara	2	-6/+1
	Nobody outside of block/ioprio.c uses it. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Generalize get_current_ioprio() for any task	Jan Kara	2	-16/+36
	get_current_ioprio() operates only on current task. We will need the same functionality for other tasks as well. Generalize get_current_ioprio() for that and also move the bulk out of the header file because it is large enough. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: Return effective IO priority from get_current_ioprio()	Jan Kara	1	-2/+9
	get_current_ioprio() is used to initialize IO priority of various requests. As such it should be returning the effective IO priority of the task (i.e., reflecting the fact that unset IO priority should get set based on task's CPU priority) so that the conversion is concentrated in one place. Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-06-27	block: fix default IO priority handling again	Jan Kara	3	-3/+5
	Commit e70344c05995 ("block: fix default IO priority handling") introduced an inconsistency in get_current_ioprio() that tasks without IO context return IOPRIO_DEFAULT priority while tasks with freshly allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority. Tasks without IO context used to be rare before 5a9d041ba2f6 ("block: move io_context creation into where it's needed") but after this commit they became common because now only BFQ IO scheduler setups task's IO context. Similar inconsistency is there for get_task_ioprio() so this inconsistency is now exposed to userspace and userspace will see different IO priority for tasks operating on devices with BFQ compared to devices without BFQ. Furthemore the changes done by commit e70344c05995 change the behavior when no IO priority is set for BFQ IO scheduler which is also documented in ioprio_set(2) manpage: "If no I/O scheduler has been set for a thread, then by default the I/O priority will follow the CPU nice value (setpriority(2)). In Linux kernels before version 2.6.24, once an I/O priority had been set using ioprio_set(), there was no way to reset the I/O scheduling behavior to the default. Since Linux 2.6.24, specifying ioprio as 0 can be used to reset to the default I/O scheduling behavior." So make sure we default to IOPRIO_CLASS_NONE as used to be the case before commit e70344c05995. Also cleanup alloc_io_context() to explicitely set this IO priority for the allocated IO context to avoid future surprises. Note that we tweak ioprio_best() to maintain ioprio_get(2) behavior and make this commit easily backportable. CC: [email protected] Fixes: e70344c05995 ("block: fix default IO priority handling") Reviewed-by: Damien Le Moal <[email protected]> Tested-by: Damien Le Moal <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>