aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-07blk-ioc: fix recursive spin_lock/unlock_irq() in ioc_clear_queue()Yu Kuai1-2/+2
Recursive spin_lock/unlock_irq() is not safe, because spin_unlock_irq() will enable irq unconditionally: spin_lock_irq queue_lock -> disable irq spin_lock_irq ioc->lock spin_unlock_irq ioc->lock -> enable irq /* * AA dead lock will be triggered if current context is preempted by irq, * and irq try to hold queue_lock again. */ spin_unlock_irq queue_lock Fix this problem by using spin_lock/unlock() directly for 'ioc->lock'. Fixes: 5a0ac57c48aa ("blk-ioc: protect ioc_destroy_icq() by 'queue_lock'") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-07nbd: Add the maximum limit of allocated index in nbd_dev_addZhong Jinghua1-1/+2
If the index allocated by idr_alloc greater than MINORMASK >> part_shift, the device number will overflow, resulting in failure to create a block device. Fix it by imiting the size of the max allocation. Signed-off-by: Zhong Jinghua <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-06blk-ioprio: Introduce promote-to-rt policyHou Tao2-21/+44
Since commit a78418e6a04c ("block: Always initialize bio IO priority on submit"), bio->bi_ioprio will never be IOPRIO_CLASS_NONE when calling blkcg_set_ioprio(), so there will be no way to promote the io-priority of one cgroup to IOPRIO_CLASS_RT, because bi_ioprio will always be greater than or equals to IOPRIO_CLASS_RT. It seems possible to call blkcg_set_ioprio() first then try to initialize bi_ioprio later in bio_set_ioprio(), but this doesn't work for bio in which bi_ioprio is already initialized (e.g., direct-io), so introduce a new promote-to-rt policy to promote the iopriority of bio to IOPRIO_CLASS_RT if the ioprio is not already RT. For none-to-rt policy, although it doesn't work now, but considering that its purpose was also to override the io-priority to RT and allowing for a smoother transition, just keep it and treat it as an alias of the promote-to-rt policy. Acked-by: Tejun Heo <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Bagas Sanjaya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_costLi Nan1-3/+4
adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled when unlock. DEADLOCK might happen if we have held other locks and disabled IRQ before invoking it. Fix it by using spin_lock_irqsave() instead, which can keep IRQ state consistent with before when unlock. ================================ WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f #26 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 {IN-HARDIRQ-W} state was registered at: __lock_acquire+0x3d7/0x1070 lock_acquire+0x197/0x4a0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3b/0x60 bfq_idle_slice_timer_body bfq_idle_slice_timer+0x53/0x1d0 __run_hrtimer+0x477/0xa70 __hrtimer_run_queues+0x1c6/0x2d0 hrtimer_interrupt+0x302/0x9e0 local_apic_timer_interrupt __sysvec_apic_timer_interrupt+0xfd/0x420 run_sysvec_on_irqstack_cond sysvec_apic_timer_interrupt+0x46/0xa0 asm_sysvec_apic_timer_interrupt+0x12/0x20 irq event stamp: 837522 hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40 hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50 softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bfqd->lock); <Interrupt> lock(&bfqd->lock); *** DEADLOCK *** 3 locks held by kworker/2:3/388: #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0 #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0 #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 stack backtrace: CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kthrotld blk_throtl_dispatch_work_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 print_usage_bug valid_state mark_lock_irq.cold+0x32/0x3a mark_lock+0x693/0xbc0 mark_held_locks+0x9e/0xe0 __trace_hardirqs_on_caller lockdep_hardirqs_on_prepare.part.0+0x151/0x360 trace_hardirqs_on+0x5b/0x180 __raw_spin_unlock_irq _raw_spin_unlock_irq+0x24/0x40 spin_unlock_irq adjust_inuse_and_calc_cost+0x4fb/0x970 ioc_rqos_merge+0x277/0x740 __rq_qos_merge+0x62/0xb0 rq_qos_merge bio_attempt_back_merge+0x12c/0x4a0 blk_mq_sched_try_merge+0x1b6/0x4d0 bfq_bio_merge+0x24a/0x390 __blk_mq_sched_bio_merge+0xa6/0x460 blk_mq_sched_bio_merge blk_mq_submit_bio+0x2e7/0x1ee0 __submit_bio_noacct_mq+0x175/0x3b0 submit_bio_noacct+0x1fb/0x270 blk_throtl_dispatch_work_fn+0x1ef/0x2b0 process_one_work+0x83e/0x13f0 process_scheduled_works worker_thread+0x7e3/0xd80 kthread+0x353/0x470 ret_from_fork+0x1f/0x30 Fixes: b0853ab4a238 ("blk-iocost: revamp in-period donation snapbacks") Signed-off-by: Li Nan <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: mark early_lookup_bdev as __initChristoph Hellwig2-11/+10
early_lookup_bdev is now only used during the early boot code as it should, so mark it __init to not waste run time memory on it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05mtd: block2mtd: don't call early_lookup_bdev after the system is runningChristoph Hellwig1-1/+11
early_lookup_bdev is supposed to only be called from the early boot code, but mdtblock_early_get_bdev is called as a general fallback when lookup_bdev fails, which is problematic because early_lookup_bdev bypasses all normal path based permission checking, and might cause problems with certain container environments renaming devices. Switch to only call early_lookup_bdev when block2mtd is built-in and the system state in not running yet. Note that this strictly speaking changes the kernel ABI as the PARTUUID= and PARTLABEL= style syntax is now not available during a running systems. They never were intended for that, but this breaks things we'll have to figure out a way to make them available again. But if avoidable in any way I'd rather avoid that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05mtd: block2mtd: factor the early block device open logic into a helperChristoph Hellwig1-23/+30
Simplify add_device a bit by splitting out the cumbersome early boot logic into a separate helper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05PM: hibernate: don't use early_lookup_bdev in resume_storeChristoph Hellwig1-1/+17
resume_store is a sysfs attribute written during normal kernel runtime, and it should not use the early_lookup_bdev API that bypasses all normal path based permission checking, and might cause problems with certain container environments renaming devices. Switch to lookup_bdev, which does a normal path lookup instead, and fall back to trying to parse a numeric dev_t just like early_lookup_bdev did. Note that this strictly speaking changes the kernel ABI as the PARTUUID= and PARTLABEL= style syntax is now not available during a running systems. They never were intended for that, but this breaks things we'll have to figure out a way to make them available again. But if avoidable in any way I'd rather avoid that. Fixes: 421a5fa1a6cf ("PM / hibernate: use name_to_dev_t to parse resume") Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05dm: only call early_lookup_bdev from early boot contextChristoph Hellwig1-2/+7
early_lookup_bdev is supposed to only be called from the early boot code, but dm_get_device calls it as a general fallback when lookup_bdev fails, which is problematic because early_lookup_bdev bypasses all normal path based permission checking, and might cause problems with certain container environments renaming devices. Switch to only call early_lookup_bdev when dm is built-in and the system state in not running yet. This means it is still available when tables are constructed by dm-init.c from the kernel command line, but not otherwise. Note that this strictly speaking changes the kernel ABI as the PARTUUID= and PARTLABEL= style syntax is now not available during a running systems. They never were intended for that, but this breaks things we'll have to figure out a way to make them available again. But if avoidable in any way I'd rather avoid that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05dm: remove dm_get_dev_tChristoph Hellwig2-19/+5
Open code dm_get_dev_t in the only remaining caller, and propagate the exact error code from lookup_bdev and early_lookup_bdev. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05dm: open code dm_get_dev_t in dm_init_initChristoph Hellwig1-1/+3
dm_init_init is called from early boot code, and thus lookup_bdev will never succeed. Just open code that call to early_lookup_bdev instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05dm-snap: simplify the origin_dev == cow_dev check in snapshot_ctrChristoph Hellwig1-9/+5
Use the block_device acquired in dm_get_device for the check instead of doing an extra lookup. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Snitzer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: move more code to early-lookup.cChristoph Hellwig3-93/+92
blk_lookup_devt is only used by code in early-lookup.c, so move it there. printk_all_partitions and it's helper bdevt_str are only used by the early init code in init/do_mounts.c, so they should go there as well. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: move the code to do early boot lookup of block devices to block/Christoph Hellwig4-222/+227
Create a new block/early-lookup.c to keep the early block device lookup code instead of having this code sit with the early mount code. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: clear root_wait on all invalid root= stringsChristoph Hellwig1-7/+11
Instead of only clearing root_wait in devt_from_partuuid when the UUID format was invalid, do that in parse_root_device for all strings that failed to parse. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: improve the name_to_dev_t interfaceChristoph Hellwig9-75/+74
name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: move the nfs/cifs/ram special cases out of name_to_dev_tChristoph Hellwig2-9/+12
The nfs/cifs/ram special case only needs to be parsed once, and only in the boot code. Move them out of name_to_dev_t and into prepare_namespace. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: factor the root_wait logic in prepare_namespace into a helperChristoph Hellwig1-10/+22
The root_wait logic is a bit obsfucated right now. Expand it and move it into a helper. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: handle ubi/mtd root mounting like all other root typesChristoph Hellwig2-9/+15
Assign a Root_Generic magic value for UBI/MTD root and handle the root mounting in mount_root like all other root types. Besides making the code more clear this also means that UBI/MTD root can be used together with an initrd (not that anyone should care). Also factor parsing of the root name into a helper now that it can be easily done and will get more complicated with subsequent patches. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: don't remove the /dev/ prefix from error messagesChristoph Hellwig1-11/+6
Remove the code that drops the /dev/ prefix from root_device_name, which is only used for error messages when mounting the root device fails. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: pass root_device_name explicitlyChristoph Hellwig3-25/+29
Instead of declaring root_device_name as a global variable pass it as an argument to the functions using it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: refactor mount_rootChristoph Hellwig1-48/+56
Provide stubs for all the lower level mount helpers, and just switch on ROOT_DEV in the main function. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: rename mount_block_root to mount_root_genericChristoph Hellwig3-5/+5
mount_block_root is also used to mount non-block file systems, so give it a better name. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: remove pointless Root_* valuesChristoph Hellwig4-11/+4
Remove all unused defines, and just use the expanded versions for the SCSI disk majors. I've decided to keep Root_RAM0 even if it could be expanded as there is a lot of special casing for it in the init code. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05PM: hibernate: move finding the resume device out of software_resumeChristoph Hellwig1-41/+39
software_resume can be called either from an init call in the boot code, or from sysfs once the system has finished booting, and the two invocation methods this can't race with each other. For the latter case we did just parse the suspend device manually, while the former might not have one. Split software_resume so that the search only happens for the boot case, which also means the special lockdep nesting annotation can go away as the system transition mutex can be taken a little later and doesn't have the sysfs locking nest inside it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05PM: hibernate: remove the global snapshot_test variableChristoph Hellwig3-14/+8
Passing call dependent variable in global variables is a huge antipattern. Fix it up. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05PM: hibernate: factor out a helper to find the resume deviceChristoph Hellwig1-35/+37
Split the logic to find the resume device out software_resume and into a separate helper to start unwindig the convoluted goto logic. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05driver core: return bool from driver_probe_doneChristoph Hellwig3-6/+4
bool is the most sensible return value for a yes/no return. Also add __init as this funtion is only called from the early boot code. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05ext4: wire up the ->mark_dead holder operation for log devicesChristoph Hellwig1-1/+10
Implement a set of holder_ops that shut down the file system when the block device used as log device is removed undeneath the file system. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05ext4: wire up sops->shutdownChristoph Hellwig1-0/+6
Wire up the shutdown method to shut down the file system when the underlying block device is marked dead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05ext4: split ext4_shutdownChristoph Hellwig2-9/+16
Split ext4_shutdown into a low-level helper that will be reused for implementing the shutdown super operation and a wrapper for the ioctl handling. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05xfs: wire up the ->mark_dead holder operation for log and RT devicesChristoph Hellwig1-1/+12
Implement a set of holder_ops that shut down the file system when the block device used as log or RT device is removed undeneath the file system. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05xfs: wire up sops->shutdownChristoph Hellwig3-1/+14
Wire up the shutdown method to shut down the file system when the underlying block device is marked dead. Add a new message to clearly distinguish this shutdown reason from other shutdowns. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05fs: add a method to shut down the file systemChristoph Hellwig2-2/+20
Add a new ->shutdown super operation that can be used to tell the file system to shut down, and call it from newly created holder ops when the block device under a file system shuts down. This only covers the main block device for "simple" file systems using get_tree_bdev / mount_bdev. File systems their own get_tree method or opening additional devices will need to set up their own blk_holder_ops. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: add a mark_dead holder operationChristoph Hellwig2-0/+25
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead to notify the holder that the block device it is using has been marked dead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: introduce holder opsChristoph Hellwig34-56/+90
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: remove blk_drop_partitionsChristoph Hellwig1-12/+4
There is only a single caller left, so fold the loop into that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: delete partitions later in del_gendiskChristoph Hellwig3-13/+32
Delay dropping the block_devices for partitions in del_gendisk until after the call to blk_mark_disk_dead, so that we can implementat notification of removed devices in blk_mark_disk_dead. This requires splitting a lower-level drop_partition helper out of delete_partition and using that from del_gendisk, while having a common loop for the whole device and partitions that calls remove_inode_hash, fsync_bdev and __invalidate_device before the call to blk_mark_disk_dead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: unhash the inode earlier in delete_partitionChristoph Hellwig1-6/+6
Move the call to remove_inode_hash to the beginning of delete_partition, as we want to prevent opening a block_device that is about to be removed ASAP. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: avoid repeated work in blk_mark_disk_deadChristoph Hellwig1-1/+3
Check if GD_DEAD is already set in blk_mark_disk_dead, and don't duplicate the work already done. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendiskChristoph Hellwig1-14/+12
blk_mark_disk_dead does very similar work a a section of del_gendisk: - set the GD_DEAD flag - set the capacity to zero - start a queue drain but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by the disk, sets the capacity to zero before starting the drain, and both with sending a uevent and kernel message for this fake capacity change. Move the exact logic from the more heavily used del_gendisk into blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: turn bdev_lock into a mutexChristoph Hellwig1-14/+13
There is no reason for this lock to spin, and being able to sleep under it will come in handy soon. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: refactor bd_may_claimChristoph Hellwig1-18/+22
The long if/else chain obsfucates the actual logic. Tidy it up to be more structured. Also drop the whole argument, as it can be trivially derived from bdev using bdev_whole, and having the bdev_whole in the function makes it easier to follow. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: factor out a bd_end_claim helper from blkdev_putChristoph Hellwig1-30/+33
Move all the logic to release an exclusive claim into a helper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05drbd: stop defining __KERNEL_SYSCALLS__Christoph Hellwig2-2/+0
__KERNEL_SYSCALLS__ hasn't been needed since Linux 2.6.19 so stop defining it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christoph Böhmwalder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-04ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei2-0/+29
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-01block: Replace all non-returning strlcpy with strscpyAzeem Shaikh3-3/+3
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-01blk-ioc: protect ioc_destroy_icq() by 'queue_lock'Yu Kuai1-17/+13
Currently, icq is tracked by both request_queue(icq->q_node) and task(icq->ioc_node), and ioc_clear_queue() from elevator exit is not safe because it can access the list without protection: ioc_clear_queue ioc_release_fn lock queue_lock list_splice /* move queue list to a local list */ unlock queue_lock /* * lock is released, the local list * can be accessed through task exit. */ lock ioc->lock while (!hlist_empty) icq = hlist_entry lock queue_lock ioc_destroy_icq delete icq->ioc_node while (!list_empty) icq = list_entry() list_del icq->q_node /* * This is not protected by any lock, * list_entry concurrent with list_del * is not safe. */ unlock queue_lock unlock ioc->lock Fix this problem by protecting list 'icq->q_node' by queue_lock from ioc_clear_queue(). Reported-and-tested-by: Pradeep Pragallapati <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-01block: mark bio_add_folio as __must_checkJohannes Thumshirn1-1/+2
Now that all callers of bio_add_folio() check the return value, mark it as __must_check. Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/381360a45ac3684120cfbe1e07685e9c36256e47.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>
2023-06-01fs: iomap: use bio_add_folio_nofail where possibleJohannes Thumshirn1-3/+3
When the iomap buffered-io code can't add a folio to a bio, it allocates a new bio and adds the folio to that one. This is done using bio_add_folio(), but doesn't check for errors. As adding a folio to a newly created bio can't fail, use the newly introduced bio_add_folio_nofail() function. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/58fa893c24c67340a63323f09a179fefdca07f2a.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>