aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-12-07nvme: pass nr_maps explicitly to nvme_alloc_io_tag_setChristoph Hellwig6-7/+9
Don't look at ctrl->ops as only RDMA and TCP actually support multiple maps. Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers") Fixes: ceee1953f923 ("nvme-loop: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]>
2022-12-06nvme-pci: split out a nvme_pci_ctrl_is_dead helperChristoph Hellwig1-22/+25
Clean up nvme_dev_disable by splitting the logic to detect if a controller is dead into a separate helper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme-pci: return early on ctrl state mismatch in nvme_reset_workChristoph Hellwig1-2/+1
The only way nvme_reset_work could be called when not in resetting state is if a reset and remove happen near the same time. This should not happen, but if it did we don't want the reset work to disable the controller because the remove is already doing that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme-pci: rename nvme_disable_io_queuesChristoph Hellwig1-10/+10
This function really deletes the I/O queues, so rename it to match the functionality. Also move the main wrapper right next to the actual underlying implementation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme-pci: cleanup nvme_suspend_queueChristoph Hellwig1-10/+7
Remove the unused returne value, pass a dev + qid instead of the queue as that is better for the callers as well as the function itself, and remove the entirely pointless kerneldoc comment. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme-pci: remove nvme_pci_disableChristoph Hellwig1-13/+5
nvme_pci_disable has a single caller, fold it into that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Eric Curtin <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme-pci: remove nvme_disable_admin_queueChristoph Hellwig1-9/+2
nvme_disable_admin_queue has only a single caller, and just calls two other funtions, so remove it to clean up the remove path a little more. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Eric Curtin <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]>
2022-12-06nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrlChristoph Hellwig8-45/+25
Many of the callers decide which one to use based on a bool argument and there is at least some code to be shared, so merge these two. Also move a comment specific to a single callsite to that callsite. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Hector Martin <[email protected]>
2022-12-06nvme: use nvme_wait_ready in nvme_shutdown_ctrlChristoph Hellwig1-26/+12
Refactor the code to wait for CSTS state changes so that it can be reused by nvme_shutdown_ctrl. This reduces the delay between each iteration that checks CSTS from 100ms in the shutdown code to the 1 to 2ms range done during enable, matching the changes from commit 3e98c2443f5c that were only applied to the enable/disable path. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Pankaj Raghav <[email protected]>
2022-12-06nvme-apple: fix controller shutdown in apple_nvme_disableChristoph Hellwig1-1/+2
nvme_shutdown_ctrl already shuts the controller down, there is no need to also call nvme_disable_ctrl for the shutdown case. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Eric Curtin <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Hector Martin <[email protected]>
2022-12-06nvme-fc: move common code into helperChaitanya Kulkarni1-8/+11
Add a helper to move the duplicate code for error message from nvme_fc_rcv_ls_req() to nvme_fc_rcv_ls_req_err_msg(). Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-12-06nvme-fc: avoid null pointer dereferenceChaitanya Kulkarni1-1/+10
Before using dynamically allcoated variable lsop in the nvme_fc_rcv_ls_req(), add a check for NULL and error out early. Signed-off-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-12-06nvme: allow unprivileged passthrough of Identify ControllerJoel Granados1-0/+2
Add unprivileged passthrough of the I/O Command Set Independent and I/O Command Set Specific Identify Controller sub-command. This will allow access to attributes (e.g. MDTS and WZSL) that are needed to effectively form passthrough I/O to the /dev/ng* character devices. Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-12-06nvme-multipath: support io stats on the mpath deviceSagi Grimberg3-0/+42
Our mpath stack device is just a shim that selects a bottom namespace and submits the bio to it without any fancy splitting. This also means that we don't clone the bio or have any context to the bio beyond submission. However it really sucks that we don't see the mpath device io stats. Given that the mpath device can't do that without adding some context to it, we let the bottom device do it on its behalf (somewhat similar to the approach taken in nvme_trace_bio_complete). When the IO starts, we account the request for multipath IO stats using REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable in the middle of the request. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]>
2022-12-06nvme: introduce nvme_start_requestSagi Grimberg7-6/+11
In preparation for nvme-multipath IO stats accounting, we want the accounting to happen in a centralized place. The request completion is already centralized, but we need a common helper to request I/O start. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]>
2022-12-06nvme: use kstrtobool() instead of strtobool()Christophe JAILLET2-9/+11
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-12-06nvme: don't call blk_mq_{,un}quiesce_tagset when ctrl->tagset is NULLChristoph Hellwig1-0/+4
The NVMe drivers support a mode where no tagset is allocated for the I/O queues and only the admin queue is usable. In that case ctrl->tagset is NULL and we must not call the block per-tagset quiesce helpers that dereference it. Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset") Reported-by: Gerd Bayer <[email protected]> Reported-by: Chao Leng <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chao Leng <[email protected]>
2022-12-05blk-throttle: Use more suitable time_after check for update of slice_startKemeng Shi1-1/+1
There is no need to update tg->slice_start[rw] to start when they are equal already. So remove "eq" part of check before update slice_start. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: remove repeat check of elapsed timeKemeng Shi1-2/+6
There is no need to check elapsed time from last upgrade for each node in hierarchy. Move this check before traversing as throtl_can_upgrade do to remove repeat check. Reported-by: kernel test robot <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: remove incorrect comment for tg_last_low_overflow_timeKemeng Shi1-1/+0
Function tg_last_low_overflow_time is called with intermediate node as following: throtl_hierarchy_can_downgrade throtl_tg_can_downgrade tg_last_low_overflow_time throtl_hierarchy_can_upgrade throtl_tg_can_upgrade tg_last_low_overflow_time throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse from leaf node to sub-root node and pass traversed intermediate node to tg_last_low_overflow_time. No such limit could be found from context and implementation of tg_last_low_overflow_time, so remove this limit in comment. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: fix typo in comment of throtl_adjusted_limitKemeng Shi1-1/+1
lapsed time -> elapsed time Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: simpfy low limit reached check in throtl_tg_can_upgradeKemeng Shi1-13/+18
Commit c79892c557616 ("blk-throttle: add upgrade logic for LIMIT_LOW state") added upgrade logic for low limit and methioned that 1. "To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit." 2. "If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed." Besides, we also determine that cgroup reaches low limit if low limit is 0, see comment in throtl_tg_can_upgrade. Collect the information above, the desgin of upgrade check is as following: 1.The low limit is reached if limit is zero or io is already queued. 2.Cgroup will pass upgrade check if low limits of READ and WRITE are both reached. Simpfy the check code described above to removce repeat check and improve readability. There is no functional change. Detail equivalence proof is as following: All replaced conditions to return true are as following: condition 1 (!read_limit && !write_limit) condition 2 read_limit && sq->nr_queued[READ] && (!write_limit || sq->nr_queued[WRITE]) condition 3 write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) Transferring condition 2 as following: (read_limit && sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) is equivalent to (read_limit && sq->nr_queued[READ]) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) is equivalent to condition 2.1 (read_limit && sq->nr_queued[READ] && !write_limit) || condition 2.2 (read_limit && sq->nr_queued[READ] && (write_limit && sq->nr_queued[WRITE])) Transferring condition 3 as following: write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) is equivalent to (write_limit && sq->nr_queued[WRITE]) && (!read_limit || (read_limit && sq->nr_queued[READ])) is equivalent to condition 3.1 ((write_limit && sq->nr_queued[WRITE]) && !read_limit) || condition 3.2 ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) Condition 3.2 is the same as condition 2.2, so all conditions we get to return are as following: (!read_limit && !write_limit) (1) (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1) ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1) ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) (2.2) As we can extract conditions "(a1 || a2) && (b1 || b2)" to: a1 && b1 a1 && b2 a2 && b1 ab && b2 Considering that: a1 = !read_limit a2 = read_limit && sq->nr_queued[READ] b1 = !write_limit b2 = write_limit && sq->nr_queued[WRITE] We can pack replaced conditions to (!read_limit || (read_limit && sq->nr_queued[READ])) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) which is equivalent to (!read_limit || sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) Reported-by: kernel test robot <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: correct calculation of wait time in tg_may_dispatchKemeng Shi1-25/+13
In C language, When executing "if (expression1 && expression2)" and expression1 return false, the expression2 may not be executed. For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is limited, tg_within_bps_limit will return false and tg_within_iops_limit will not be called. So even bps and iops are both limited, iops_wait will not be calculated and is always zero. So wait time of iops is always ignored. Fix this by always calling tg_within_bps_limit and tg_within_iops_limit to get wait time for both bps and iops. Observed that: 1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always be stored as wait argument is always passed. 2. wait time is stored to zero if iops/bps is limited otherwise non-zero is stored. Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument and return wait time directly. Caller tg_may_dispatch checks if wait time is zero to find if iops/bps is limited. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_biosKemeng Shi1-1/+12
Ignore cgroup without io queued in blk_throtl_cancel_bios for two reasons: 1. Save cpu cycle for trying to dispatch cgroup which is no io queued. 2. Avoid non-consistent state that cgroup is inserted to service queue without THROTL_TG_PENDING set as tg_update_disptime will unconditional re-insert cgroup to service queue. If we are on the default hierarchy, IO dispatched from child in tg_dispatch_one_bio will trigger inserting cgroup to service queue without erase first and ruin the tree. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: Fix that bps of child could exceed bps limited in parentKemeng Shi1-1/+1
Consider situation as following (on the default hierarchy): HDD | root (bps limit: 4k) | child (bps limit :8k) | fio bs=8k Rate of fio is supposed to be 4k, but result is 8k. Reason is as following: Size of single IO from fio is larger than bytes allowed in one throtl_slice in child, so IOs are always queued in child group first. When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED is set and these IOs will not be limited by tg_within_bps_limit anymore. Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire tree. There patch has no influence on situation which is not on the default hierarchy as each group is a single root group without parent. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-05blk-throttle: correct stale comment in throtl_pd_initKemeng Shi1-2/+3
On the default hierarchy (cgroup2), the throttle interface files don't exist in the root cgroup, so the ablity to limit the whole system by configuring root group is not existing anymore. In general, cgroup doesn't wanna be in the business of restricting resources at the system level, so correct the stale comment that we can limit whole system to we can only limit subtree. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-04Merge tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy into ↵Jens Axboe1-1/+3
for-6.2/block Pull floppy fix from Denis: "Floppy patch for 6.2 The patch from Yuan Can fixes a memory leak in floppy init code. Signed-off-by: Denis Efremov <[email protected]>" * tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy: floppy: Fix memory leak in do_floppy_init()
2022-12-04floppy: Fix memory leak in do_floppy_init()Yuan Can1-1/+3
A memory leak was reported when floppy_alloc_disk() failed in do_floppy_init(). unreferenced object 0xffff888115ed25a0 (size 8): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 8 bytes): 00 ac 67 5b 81 88 ff ff ..g[.... backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<00000000a87bfa9e>] blk_mq_realloc_tag_set_tags.part.0+0x6f/0x180 [<000000006f02e8b1>] blk_mq_alloc_tag_set+0x573/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff88810fc30540 (size 32): comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007f457abb>] __kmalloc_node+0x4c/0xc0 [<000000006b91eab4>] blk_mq_alloc_tag_set+0x393/0x1130 [<0000000066007fd7>] 0xffffffffc06b8b08 [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0 [<00000000e26d04ee>] do_init_module+0x1a4/0x680 [<000000001bb22407>] load_module+0x6249/0x7110 [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200 [<000000007bddca46>] do_syscall_64+0x35/0x80 [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 If the floppy_alloc_disk() failed, disks of current drive will not be set, thus the lastest allocated set->tag cannot be freed in the error handling path. A simple call graph shown as below: floppy_module_init() floppy_init() do_floppy_init() for (drive = 0; drive < N_DRIVE; drive++) blk_mq_alloc_tag_set() blk_mq_alloc_tag_set_tags() blk_mq_realloc_tag_set_tags() # set->tag allocated floppy_alloc_disk() blk_mq_alloc_disk() # error occurred, disks failed to allocated ->out_put_disk: for (drive = 0; drive < N_DRIVE; drive++) if (!disks[drive][0]) # the last disks is not set and loop break break; blk_mq_free_tag_set() # the latest allocated set->tag leaked Fix this problem by free the set->tag of current drive before jump to error handling path. Cc: [email protected] Fixes: 302cfee15029 ("floppy: use a separate gendisk for each media format") Signed-off-by: Yuan Can <[email protected]> [efremov: added stable list, changed title] Signed-off-by: Denis Efremov <[email protected]>
2022-12-03block: remove devnode callback from struct block_device_operationsGreg Kroah-Hartman2-12/+0
With the removal of the pktcdvd driver, there are no in-kernel users of the devnode callback in struct block_device_operations, so it can be safely removed. If it is needed for new block drivers in the future, it can be brought back. Cc: Jens Axboe <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-02Merge branch 'md-next' of ↵Jens Axboe2-62/+36
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.2/block Pull MD fixes from Song: "This contains code cleanup by Christoph." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fold unbind_rdev_from_array into md_kick_rdev_from_array md: mark md_kick_rdev_from_array static md: remove lock_bdev / unlock_bdev
2022-12-02pktcdvd: remove driver.Greg Kroah-Hartman8-3419/+0
Way back in 2016 in commit 5a8b187c61e9 ("pktcdvd: mark as unmaintained and deprecated") this driver was marked as "will be removed soon". 5 years seems long enough to have it stick around after that, so finally remove the thing now. Reported-by: Christoph Hellwig <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Thomas Maier <[email protected]> Cc: Peter Osterlund <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-02md: fold unbind_rdev_from_array into md_kick_rdev_from_arrayChristoph Hellwig1-21/+16
unbind_rdev_from_array is only called from md_kick_rdev_from_array, so merge it into its only caller. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Song Liu <[email protected]>
2022-12-02md: mark md_kick_rdev_from_array staticChristoph Hellwig2-3/+1
md_kick_rdev_from_array is only used in md.c, so unexport it and mark the symbol static. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Song Liu <[email protected]>
2022-12-02md: remove lock_bdev / unlock_bdevChristoph Hellwig1-41/+22
These wrappers for blkdev_get / blkdev_put just horribly confuse the code with their odd naming. Remove them and improve the error unwinding in md_import_device with the now folded code. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Song Liu <[email protected]>
2022-12-01blk-cgroup: Fix some kernel-doc commentsYang Li1-1/+1
Make the description of @gendisk to @disk in blkcg_schedule_throttle() to clear the below warnings: block/blk-cgroup.c:1850: warning: Function parameter or member 'disk' not described in 'blkcg_schedule_throttle' block/blk-cgroup.c:1850: warning: Excess function parameter 'gendisk' description in 'blkcg_schedule_throttle' Fixes: de185b56e8a6 ("blk-cgroup: pass a gendisk to blkcg_schedule_throttle") Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3338 Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01null_blk: support read-only and offline zone conditionsShin'ichiro Kawasaki3-4/+121
In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01drbd: add context parameter to expect() macroChristoph Böhmwalder6-42/+42
Originally-from: Andreas Gruenbacher <[email protected]> Signed-off-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01drbd: introduce drbd_ratelimit()Christoph Böhmwalder8-18/+26
Use call site specific ratelimit instead of one single static global. Also ratelimit ASSERTION messages generated by expect(). Originally-from: Lars Ellenberg <[email protected]> Signed-off-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01drbd: introduce dynamic debugChristoph Böhmwalder1-36/+97
Incorporate as many out-of-tree changes as possible without changing the genl API. Over the years, we restructured this several times, and also changed the log format. One breaking change is that DRBD 9 gained "implicit options", like a connection name. This cannot be replayed here without changing the API, so save it for later. Originally-from: Andreas Gruenbacher <[email protected]> Originally-from: Philipp Reisner <[email protected]> Originally-from: Lars Ellenberg <[email protected]> Signed-off-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01drbd: split polymorph printk to its own fileChristoph Böhmwalder2-67/+73
Signed-off-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01drbd: unify how failed assertions are loggedChristoph Böhmwalder1-3/+5
Unify how failed assertions from D_ASSERT() and expect() are logged. Originally-from: Andreas Gruenbacher <[email protected]> Signed-off-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01block: bdev & blktrace: use consistent function doc. notationRandy Dunlap2-4/+4
Use only one hyphen in kernel-doc notation between the function name and its short description. The is the documented kerenl-doc format. It also fixes the HTML presentation to be consistent with other functions. Signed-off-by: Randy Dunlap <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01blk-iocost: Correct comment in blk_iocost_initKemeng Shi1-1/+1
There is no iocg_pd_init function. The pd_alloc_fn function pointer of iocost policy is set with ioc_pd_init. Just correct it. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01blk-iocost: Remove vrate member in struct ioc_nowKemeng Shi1-3/+3
If we trace vtime_base_rate instead of vtime_rate, there is nowhere which accesses now->vrate except function ioc_now using now->vrate locally. Just remove it. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01blk-iocost: Trace vtime_base_rate instead of vtime_rateKemeng Shi2-3/+3
Since commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation") rename original vtime_rate to vtime_base_rate and current vtime_rate is original vtime_rate with compensation. The current rate showed in tracepoint is mixed with vtime_rate and vtime_base_rate: 1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows vtime_base_rate. 2) In function iocg_activate shows vtime_rate by calling TRACE_IOCG_PATH(iocg_activate... 3) In function ioc_check_iocgs shows vtime_rate by calling TRACE_IOCG_PATH(iocg_idle... Trace vtime_base_rate instead of vtime_rate as: 1) Before commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation"), the traced rate is without compensation, so still show rate without compensation. 2) The vtime_base_rate is more stable while vtime_rate heavily depends on excess budeget on current period which may change abruptly in next period. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01blk-iocost: Reset vtime_base_rate in ioc_refresh_paramsKemeng Shi1-1/+3
Since commit ac33e91e2daca("blk-iocost: implement vtime loss compensation") split vtime_rate into vtime_rate and vtime_base_rate, we need reset both vtime_base_rate and vtime_rate when device parameters are refreshed. If vtime_base_rate is no reset here, vtime_rate will be overwritten with old vtime_base_rate soon in ioc_refresh_vrate. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01blk-iocost: Fix typo in commentKemeng Shi1-1/+1
soley -> solely Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01block: Do not reread partition table on exclusively open deviceJan Kara3-8/+13
Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Cc: [email protected] Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fold in followup fix] Signed-off-by: Jens Axboe <[email protected]>
2022-11-30virtio-blk: replace ida_simple[get|remove] with ida_[alloc_range|free]Pankaj Raghav1-4/+4
ida_simple[get|remove] are deprecated, and are just wrappers to ida_[alloc_range|free]. Replace ida_simple[get|remove] with their corresponding counterparts. No functional changes. Signed-off-by: Pankaj Raghav <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-30block: mark blk_put_queue as potentially blockingChristoph Hellwig1-4/+2
We can't just say that the last reference release may block, as any reference dropped could be the last one. So move the might_sleep() from blk_free_queue to blk_put_queue and update the documentation. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>