aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-11-01block, bfq: remove dead code for updating 'rq_in_driver'Yu Kuai1-16/+0
Such code are not even compiled since they are inside marco "#if 0". Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: cleanup bfq_activate_requeue_entity()Yu Kuai1-9/+5
Just make the code a litter cleaner by removing the unnecessary variable 'sd'. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: factor out code to update 'active_entities'Yu Kuai1-29/+32
Current code is a bit ugly and hard to read. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: remove set but not used variable in __bfq_entity_update_weight_prioYu Kuai1-15/+0
After the patch "block, bfq: cleanup bfq_weights_tree add/remove apis"), the local variable 'bfqd' is not used anymore, thus remove it. Signed-off-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: afdba1461262 ("block, bfq: cleanup bfq_weights_tree add/remove apis") Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: Replace struct rq_depth with unsigned int in struct iolatency_grpKemeng Shi1-15/+13
We only need a max queue depth for every iolatency to limit the inflight io number. Replace struct rq_depth with unsigned int to simplfy "struct iolatency_grp" and save memory. Signed-off-by: Kemeng Shi <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: Correct comment for scale_cookie_changeKemeng Shi1-2/+4
Default queue depth of iolatency_grp is unlimited, so we scale down quickly(once by half) in scale_cookie_change. Remove the "subtract 1/16th" part which is not the truth and add the actual way we scale down. Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: Remove redundant parent blkcg_gp check in check_scale_changeKemeng Shi1-3/+0
Function blkcg_iolatency_throttle will make sure blkg->parent is not NULL before calls check_scale_change. And function check_scale_change is only called in blkcg_iolatency_throttle. Signed-off-by: Kemeng Shi <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: split elevator_switchChristoph Hellwig4-47/+40
Split an elevator_disable helper from elevator_switch for the case where we want to switch to no scheduler at all. This includes removing the pointless elevator_switch_mq helper and removing the switch to no schedule logic from blk_mq_init_sched. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: don't check for required features in elevator_matchChristoph Hellwig1-35/+16
Checking for the required features in the callers simplifies the code quite a bit, so do that. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: adjust for dropping patch 1, use __elevator_find()] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: simplify the check for the current elevator in elv_iosched_showChristoph Hellwig1-1/+1
Just compare the pointers instead of using the string based elevator_match. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: cleanup the variable naming in elv_iosched_storeChristoph Hellwig1-9/+8
Use eq for the elevator_queue as done elsewhere. This frees e to be used for the loop iterator instead of the odd __ prefix. In addition rename elv to cur to make it more clear it is the currently selected elevator. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: exit elv_iosched_show early when I/O schedulers are not supportedChristoph Hellwig1-3/+2
If the tag_set has BLK_MQ_F_NO_SCHED flag set we will never show any scheduler, so exit early. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block: cleanup elevator_getChristoph Hellwig1-15/+10
Do the request_module and repeated lookup in the only caller that cares, pick a saner name that explains where are actually doing a lookup and use a sane calling conventions that passes the queue first. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: cleanup __bfq_weights_tree_remove()Yu Kuai3-12/+2
It's the same with bfq_weights_tree_remove() now. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: cleanup bfq_weights_tree add/remove apisYu Kuai3-29/+18
The 'bfq_data' and 'rb_root_cached' can both be accessed through 'bfq_queue', thus only pass 'bfq_queue' as parameter. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: do not idle if only one group is activatedYu Kuai1-2/+2
Now that root group is counted into 'num_groups_with_pending_reqs', 'num_groups_with_pending_reqs > 0' is always true in bfq_asymmetric_scenario(). Thus change the condition to '> 1'. On the other hand, this change can enable concurrent sync io if only one group is activated. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: refactor the counting of 'num_groups_with_pending_reqs'Yu Kuai3-66/+17
Currently, bfq can't handle sync io concurrently as long as they are not issued from root group. This is because 'bfqd->num_groups_with_pending_reqs > 0' is always true in bfq_asymmetric_scenario(). The way that bfqg is counted into 'num_groups_with_pending_reqs': Before this patch: 1) root group will never be counted. 2) Count if bfqg or it's child bfqgs have pending requests. 3) Don't count if bfqg and it's child bfqgs complete all the requests. After this patch: 1) root group is counted. 2) Count if bfqg have pending requests. 3) Don't count if bfqg complete all the requests. With this change, the occasion that only one group is activated can be detected, and next patch will support concurrent sync io in the occasion. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: record how many queues have pending requestsYu Kuai3-2/+21
Prepare to refactor the counting of 'num_groups_with_pending_reqs'. Add a counter in bfq_group, update it while tracking if bfqq have pending requests and when bfq_bfqq_move() is called. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-01block, bfq: support to track if bfqq has pending requestsYu Kuai3-2/+25
If entity belongs to bfqq, then entity->in_groups_with_pending_reqs is not used currently. This patch use it to track if bfqq has pending requests through callers of weights_tree insertion and removal. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Paolo Valente <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-31blk-mq: remove redundant call to blk_freeze_queue_start in blk_mq_destroy_queueJinlong Chen1-1/+1
The calling relationship in blk_mq_destroy_queue() is as follows: blk_mq_destroy_queue() ... -> blk_queue_start_drain() -> blk_freeze_queue_start() <- called ... -> blk_freeze_queue() -> blk_freeze_queue_start() <- called again -> blk_mq_freeze_queue_wait() ... So there is a redundant call to blk_freeze_queue_start(). Replace blk_freeze_queue() with blk_mq_freeze_queue_wait() to avoid the redundant call. Signed-off-by: Jinlong Chen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-31blk-mq: move queue_is_mq out of blk_mq_cancel_work_syncJinlong Chen2-8/+8
The only caller that needs queue_is_mq check is del_gendisk, so move the check into it. Signed-off-by: Jinlong Chen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-31block: simplify blksize_bits() implementationDawei Li1-6/+1
Convert current looping-based implementation into bit operation, which can bring improvement for: 1) bitops is more efficient for its arch-level optimization. 2) Given that blksize_bits() is inline, _if_ @size is compile-time constant, it's possible that order_base_2() _may_ make output compile-time evaluated, depending on code context and compiler behavior. Signed-off-by: Dawei Li <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/TYCP286MB23238842958D7C083D6B67CECA349@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Jens Axboe <[email protected]>
2022-10-31blk-mq: avoid double ->queue_rq() because of early timeoutDavid Jeffery1-12/+44
David Jeffery found one double ->queue_rq() issue, so far it can be triggered in VM use case because of long vmexit latency or preempt latency of vCPU pthread or long page fault in vCPU pthread, then block IO req could be timed out before queuing the request to hardware but after calling blk_mq_start_request() during ->queue_rq(), then timeout handler may handle it by requeue, then double ->queue_rq() is caused, and kernel panic. So far, it is driver's responsibility to cover the race between timeout and completion, so it seems supposed to be solved in driver in theory, given driver has enough knowledge. But it is really one common problem, lots of driver could have similar issue, and could be hard to fix all affected drivers, even it isn't easy for driver to handle the race. So David suggests this patch by draining in-progress ->queue_rq() for solving this issue. Cc: Stefan Hajnoczi <[email protected]> Cc: Keith Busch <[email protected]> Cc: [email protected] Cc: Bart Van Assche <[email protected]> Signed-off-by: David Jeffery <[email protected]> Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-25block: Micro-optimize get_max_segment_size()Bart Van Assche1-4/+11
This patch removes a conditional jump from get_max_segment_size(). The x86-64 assembler code for this function without this patch is as follows: 206 return min_not_zero(mask - offset + 1, 0x0000000000000118 <+72>: not %rax 0x000000000000011b <+75>: and 0x8(%r10),%rax 0x000000000000011f <+79>: add $0x1,%rax 0x0000000000000123 <+83>: je 0x138 <bvec_split_segs+104> 0x0000000000000125 <+85>: cmp %rdx,%rax 0x0000000000000128 <+88>: mov %rdx,%r12 0x000000000000012b <+91>: cmovbe %rax,%r12 0x000000000000012f <+95>: test %rdx,%rdx 0x0000000000000132 <+98>: mov %eax,%edx 0x0000000000000134 <+100>: cmovne %r12d,%edx With this patch applied: 206 return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1; 0x000000000000003f <+63>: mov 0x28(%rdi),%ebp 0x0000000000000042 <+66>: not %rax 0x0000000000000045 <+69>: and 0x8(%rdi),%rax 0x0000000000000049 <+73>: sub $0x1,%rbp 0x000000000000004d <+77>: cmp %rbp,%rax 0x0000000000000050 <+80>: cmova %rbp,%rax 0x0000000000000054 <+84>: add $0x1,%eax Reviewed-by: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-10-25block: Constify most queue limits pointersBart Van Assche4-22/+26
Document which functions do not modify the queue limits. Reviewed-by: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-10-25block: Remove request.write_hintBart Van Assche1-1/+0
Commit c75e707fe1aa ("block: remove the per-bio/request write hint") removed all code that uses the struct request write_hint member. Hence also remove 'write_hint' itself. Reviewed-by: Ming Lei <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2022-10-25block: remove bio_start_io_acct_timeChristoph Hellwig2-13/+0
bio_start_io_acct_time is not actually used anywhere, so remove it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-25nvme-apple: remove an extra queue referenceChristoph Hellwig1-9/+0
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second admin queue reference to be held by the apple_nvme structure. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Sven Peter <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-25nvme-pci: remove an extra queue referenceChristoph Hellwig1-6/+0
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second admin queue reference to be held by the nvme_dev. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-25scsi: remove an extra queue referenceChristoph Hellwig2-2/+0
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second queue reference to be held by the scsi_device. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-25blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queueChristoph Hellwig7-5/+16
The fact that blk_mq_destroy_queue also drops a queue reference leads to various places having to grab an extra reference. Move the call to blk_put_queue into the callers to allow removing the extra references. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fix fabrics_q vs admin_q conflict in nvme core.c] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23block: fix up elevator_type refcountingJinlong Chen3-3/+10
The current reference management logic of io scheduler modules contains refcnt problems. For example, blk_mq_init_sched may fail before or after the calling of e->ops.init_sched. If it fails before the calling, it does nothing to the reference to the io scheduler module. But if it fails after the calling, it releases the reference by calling kobject_put(&eq->kobj). As the callers of blk_mq_init_sched can't know exactly where the failure happens, they can't handle the reference to the io scheduler module properly: releasing the reference on failure results in double-release if blk_mq_init_sched has released it, and not releasing the reference results in ghost reference if blk_mq_init_sched did not release it either. The same problem also exists in io schedulers' init_sched implementations. We can address the problem by adding releasing statements to the error handling procedures of blk_mq_init_sched and init_sched implementations. But that is counterintuitive and requires modifications to existing io schedulers. Instead, We make elevator_alloc get the io scheduler module references that will be released by elevator_release. And then, we match each elevator_get with an elevator_put. Therefore, each reference to an io scheduler module explicitly has its own getter and releaser, and we no longer need to worry about the refcnt problems. The bugs and the patch can be validated with tools here: https://github.com/nickyc975/linux_elv_refcnt_bug.git [hch: split out a few bits into separate patches, use a non-try module_get in elevator_alloc] Signed-off-by: Jinlong Chen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23block: check for an unchanged elevator earlier in __elevator_changeJinlong Chen1-6/+3
No need to find the actual elevator_type struct for this comparism, the name is all that is needed. Signed-off-by: Jinlong Chen <[email protected]> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23block: sanitize the elevator name before passing it to __elevator_changeChristoph Hellwig1-8/+7
The stripped name should also be used for the none check. To do so strip it in the caller and pass in the sanitized name. Drop the pointless __ prefix in the function name while we're at it. Based on a patch from Jinlong Chen <[email protected]>. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23block: add proper helpers for elevator_type module refcount managementChristoph Hellwig3-16/+19
Make sure we have helpers for all relevant module refcount operations on the elevator_type in elevator.h, and use them. Move the call to the get helper in blk_mq_elv_switch_none a bit so that it is obvious with a less verbose comment. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-wbt: don't enable throttling if default elevator is bfqYu Kuai3-4/+12
Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") tries to disable wbt for bfq, it's done by calling wbt_disable_default() in bfq_init_queue(). However, wbt is still enabled if default elevator is bfq: device_add_disk elevator_init_mq bfq_init_queue wbt_disable_default -> done nothing blk_register_queue wbt_enable_default -> wbt is enabled Fix the problem by adding a new flag ELEVATOR_FLAG_DISBALE_WBT, bfq will set the flag in bfq_init_queue, and following wbt_enable_default() won't enable wbt while the flag is set. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23elevator: add new field flags in struct elevator_queueYu Kuai2-5/+5
There are only one flag to indicate that elevator is registered currently, prepare to add a flag to disable wbt if default elevator is bfq. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabledYu Kuai3-0/+16
Currently, if wbt is initialized and then disabled by wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which will confuse users that wbt is still enabled. This patch shows wbt_lat_usec as zero if it's disabled. Signed-off-by: Yu Kuai <[email protected]> Reported-and-tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-wbt: make enable_state more accurateYu Kuai2-6/+13
Currently, if user disable wbt through sysfs, 'enable_state' will be 'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state 'WBT_STATE_OFF_MANUAL' to cover that case. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-wbt: remove unnecessary check in wbt_enable_default()Yu Kuai1-1/+1
If CONFIG_BLK_WBT_MQ is disabled, wbt_init() won't do anything. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23elevator: remove redundant code in elv_unregister_queue()Yu Kuai1-2/+0
"elevator_queue *e" is already declared and initialized in the beginning of elv_unregister_queue(). Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-iocost: read 'ioc->params' inside 'ioc->lock' in ioc_timer_fn()Yu Kuai1-2/+4
'ioc->params' is updated in ioc_refresh_params(), which is proteced by 'ioc->lock', however, ioc_timer_fn() read params outside the lock. Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-iocost: prevent configuration update concurrent with io throttlingYu Kuai1-2/+24
This won't cause any severe problem currently, however, this doesn't seems appropriate: 1) 'ioc->params' is read from multiple places without holding 'ioc->lock', unexpected value might be read if writing it concurrently. 2) If configuration is changed while io is throttling, the functionality might be affected. For example, if module params is updated and cost becomes smaller, waiting for timer that is caculated under old configuration is not appropriate. Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-iocost: don't release 'ioc->lock' while updating paramsYu Kuai1-5/+2
ioc_qos_write() and ioc_cost_model_write() are the same: 1) hold lock to read 'ioc->params' to local variable; 2) update params to local variable without lock; 3) hold lock to write local variable to 'ioc->params'; In theroy, if user updates params concurrenty, the params might be lost: t1: update params a t2: update params b spin_lock_irq(&ioc->lock); memcpy(qos, ioc->params.qos, sizeof(qos)) spin_unlock_irq(&ioc->lock); qos[a] = xxx; spin_lock_irq(&ioc->lock); memcpy(qos, ioc->params.qos, sizeof(qos)) spin_unlock_irq(&ioc->lock); qos[b] = xxx; spin_lock_irq(&ioc->lock); memcpy(ioc->params.qos, qos, sizeof(qos)); ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock); spin_lock_irq(&ioc->lock); // updates of a will be lost memcpy(ioc->params.qos, qos, sizeof(qos)); ioc_refresh_params(ioc, true); spin_unlock_irq(&ioc->lock); Althrough this is not common case, the problem can by fixed easily by holding the lock through the read, update, write process. Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23blk-iocost: disable writeback throttlingYu Kuai1-0/+2
Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") disable wbt for bfq, because different write-throttling heuristics should not work together. For the same reason, wbt and iocost should not work together as well, unless admin really want to do that, dispite that performance is affected. Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-10-23Linux 6.1-rc2Linus Torvalds1-1/+1
2022-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds18-99/+180
Pull kvm fixes from Paolo Bonzini: "RISC-V: - Fix compilation without RISCV_ISA_ZICBOM - Fix kvm_riscv_vcpu_timer_pending() for Sstc ARM: - Fix a bug preventing restoring an ITS containing mappings for very large and very sparse device topology - Work around a relocation handling error when compiling the nVHE object with profile optimisation - Fix for stage-2 invalidation holding the VM MMU lock for too long by limiting the walk to the largest block mapping size - Enable stack protection and branch profiling for VHE - Two selftest fixes x86: - add compat implementation for KVM_X86_SET_MSR_FILTER ioctl selftests: - synchronize includes between include/uapi and tools/include/uapi" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: tools: include: sync include/api/linux/kvm.h KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter() kvm: Add support for arch compat vm ioctls RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc RISC-V: Fix compilation without RISCV_ISA_ZICBOM KVM: arm64: vgic: Fix exit condition in scan_its_table() KVM: arm64: nvhe: Fix build with profile optimization KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test KVM: arm64: selftests: Fix multiple versions of GIC creation KVM: arm64: Enable stack protection and branch profiling for VHE KVM: arm64: Limit stage2_apply_range() batch size to largest block KVM: arm64: Work out supported block level at compile time
2022-10-23Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()"Jason A. Donenfeld1-0/+8
This reverts commit 72a95859728a7866522e6633818bebc1c2519b17. It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances, which use the syscon driver. Little-endian is not effected, which means likely it's important to handle regmap_get_val_endian() in this function after all. Fixes: 72a95859728a ("mfd: syscon: Remove repetition of the regmap_get_val_endian()") Reviewed-by: Andy Shevchenko <[email protected]> Cc: Lee Jones <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-10-23kernel/utsname_sysctl.c: Fix hostname pollingLinus Torvalds2-0/+2
Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added a new entry to the uts_kern_table[] array, but didn't update the UTS_PROC_xyz enumerators of older entries, breaking anything that used them. Which is admittedly not many cases: it's really just the two uses of uts_proc_notify() in kernel/sys.c. But apparently journald-systemd actually uses this to detect hostname changes. Reported-by: Torsten Hilbrich <[email protected]> Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://linux-regtracking.leemhuis.info/regzbot/regression/[email protected]/ Cc: Petr Vorel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-10-23Merge tag 'perf_urgent_for_v6.1_rc2' of ↵Linus Torvalds5-46/+163
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Fix raw data handling when perf events are used in bpf - Rework how SIGTRAPs get delivered to events to address a bunch of problems with it. Add a selftest for that too * tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bpf: Fix sample_flags for bpf_perf_event_output selftests/perf_events: Add a SIGTRAP stress test with disables perf: Fix missing SIGTRAPs