aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-06-16dm thin metadata: Fix ABBA deadlock by resetting dm_bufio_clientLi Lingfeng1-0/+2
As described in commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata"), ABBA deadlocks will be triggered because shrinker_rwsem currently needs to held by dm_pool_abort_metadata() as a side-effect of thin-pool metadata operation failure. The following three problem scenarios have been noticed: 1) Described by commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata") 2) shrinker_rwsem and throttle->lock P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab super_cache_scan prune_icache_sb dispose_list evict ext4_evict_inode ext4_clear_inode ext4_discard_preallocations ext4_mb_load_buddy_gfp ext4_mb_init_cache ext4_wait_block_bitmap __ext4_error ext4_handle_error ext4_commit_super ... dm_submit_bio do_worker throttle_work_update down_write(&t->lock) -- LOCK B process_deferred_bios commit metadata_operation_failed dm_pool_abort_metadata dm_block_manager_create dm_bufio_client_create register_shrinker down_write(&shrinker_rwsem) -- LOCK A thin_map thin_bio_map thin_defer_bio_with_throttle throttle_lock down_read(&t->lock) - LOCK B 3) shrinker_rwsem and wait_on_buffer P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab ... ext4_wait_block_bitmap __ext4_error ext4_handle_error jbd2_journal_abort jbd2_journal_update_sb_errno jbd2_write_superblock submit_bh // LOCK B // RELEASE B do_worker throttle_work_update down_write(&t->lock) - LOCK B process_deferred_bios process_bio commit metadata_operation_failed dm_pool_abort_metadata dm_block_manager_create dm_bufio_client_create register_shrinker register_shrinker_prepared down_write(&shrinker_rwsem) - LOCK A bio_endio wait_on_buffer __wait_on_buffer Fix these by resetting dm_bufio_client without holding shrinker_rwsem. Fixes: 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata") Cc: [email protected] Signed-off-by: Li Lingfeng <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2023-06-16iov_iter: remove iov_iter_get_pages and iov_iter_get_pages_allocChristoph Hellwig1-6/+0
Now that the direct I/O helpers have switched to use iov_iter_extract_pages, these helpers are unused. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-16block: remove BIO_PAGE_REFFEDChristoph Hellwig2-3/+1
Now that all block direct I/O helpers use page pinning, this flag is unused. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-16Merge tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme into ↵Jens Axboe1-5/+5
for-6.5/block Pull NVMe updates from Keith: "nvme updates for Linux 6.5 - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner)" * tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme: (27 commits) nvme: forward port sysfs delete fix nvme: skip optional id ctrl csi if it failed nvme-core: use nvme_ns_head_multipath instead of ns->head->disk nvmet-fcloop: Do not wait on completion when unregister fails nvme-fabrics: open code __nvmf_host_find() nvme-fabrics: error out to unlock the mutex nvme: Increase block size variable size to 32-bit nvme-fcloop: no need to return from void function nvmet-auth: remove unnecessary break after goto nvmet-auth: remove some dead code nvme-core: remove redundant check from nvme_init_ns_head nvme: move sysfs code to a dedicated sysfs.c file nvme-fabrics: prevent overriding of existing host nvme-fabrics: check hostid using uuid_equal nvme-fabrics: unify common code in admin and io queue connect nvmet: reorder fields in 'struct nvmefc_fcp_req' nvmet: reorder fields in 'struct nvme_dhchap_queue_context' nvmet: reorder fields in 'struct nvmf_ctrl_options' nvme: reorder fields in 'struct nvme_ctrl' nvmet: reorder fields in 'struct nvmet_sq' ...
2023-06-14blktrace: use inline function for blk_trace_remove() while blktrace is disabledYu Kuai1-1/+5
If config is disabled, call blk_trace_remove() directly will trigger build warning, hence use inline function instead, prepare to fix blktrace debugfs entries leakage. Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12nvmet: reorder fields in 'struct nvmefc_fcp_req'Christophe JAILLET1-5/+5
Group some variables based on their sizes to reduce holes. On x86_64, this shrinks the size of 'struct nvmefc_fcp_req' from 112 to 104 bytes. This structure is embedded in some other structures (nvme_fc_fcp_op which itself is embedded in nvme_fcp_op_w_sgl), so it helps reducing the size of these structures too. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-06-12blk-mq: fix potential io hang by wrong 'wake_batch'Yu Kuai1-2/+1
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating 'wake_batch' is not atomic: t1: t2: _blk_mq_tag_busy blk_mq_tag_busy inc active_queues // assume 1->2 inc active_queues // 2 -> 3 blk_mq_update_wake_batch // calculate based on 3 blk_mq_update_wake_batch /* calculate based on 2, while active_queues is actually 3. */ Fix this problem by protecting them wih 'tags->lock', this is not a hot path, so performance should not be concerned. And now that all writers are inside the lock, switch 'actives_queues' from atomic to unsigned int. Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12fs: remove the now unused FMODE_* flagsChristoph Hellwig1-7/+0
FMODE_NDELAY, FMODE_EXCL and FMODE_WRITE_IOCTL were only used for block internal purposed and are now entirely unused, so remove them. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig3-12/+29
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Jack Wang <[email protected]> [rnbd] Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: move a few internal definitions out of blkdev.hChristoph Hellwig1-27/+0
All these helpers are only used in core block code, so move them out of the public header. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12mtd: block: use a simple bool to track open for writeChristoph Hellwig1-1/+1
Instead of propagating the fmode_t, just use a bool to track if a mtd block device was opened for writing. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Richard Weinberger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12scsi: replace the fmode_t argument to ->sg_io_fn with a simple boolChristoph Hellwig1-1/+1
Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12fs: remove sb->s_modeChristoph Hellwig1-1/+0
There is no real need to store the open mode in the super_block now. It is only used by f2fs, which can easily recalculate it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: add a sb_open_mode helperChristoph Hellwig1-0/+7
Add a helper to return the open flags for blkdev_get_by* for passed in super block flags instead of open coding the logic in many places. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig1-1/+1
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: David Sterba <[email protected]> [btrfs] Acked-by: Jack Wang <[email protected]> [rnbd] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: remove the unused mode argument to ->releaseChristoph Hellwig1-1/+1
The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Jack Wang <[email protected]> [rnbd] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: pass a gendisk to ->openChristoph Hellwig1-1/+1
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Jack Wang <[email protected]> [rnbd] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12block: pass a gendisk on bdev_check_media_changeChristoph Hellwig1-1/+1
bdev_check_media_change should only ever be called for the whole device. Pass a gendisk to make that explicit and rename the function to disk_check_media_change. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12cdrom: remove the unused mode argument to cdrom_releaseChristoph Hellwig1-1/+1
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12cdrom: track if a cdrom_device_info was opened for dataChristoph Hellwig1-0/+1
Set a flag when a cdrom_device_info is opened for writing, instead of trying to figure out this at release time. This will allow to eventually remove the mode argument to the ->release block_device_operation as nothing but the CDROM drivers uses that argument. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Acked-by: Christian Brauner <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12cdrom: remove the unused cdrom_close_write release codeChristoph Hellwig1-1/+0
cdrom_close_write is empty, and the for_data flag it is keyed off is never set. Remove all this clutter. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12cdrom: remove the unused mode argument to cdrom_ioctlChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-12cdrom: remove the unused bdev argument to cdrom_openChristoph Hellwig1-2/+1
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Phillip Potter <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-07pktcdvd: Get rid of custom printing macrosAndy Shevchenko1-1/+0
We may use traditional dev_*() macros instead of custom ones provided by the driver. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: mark early_lookup_bdev as __initChristoph Hellwig1-1/+1
early_lookup_bdev is now only used during the early boot code as it should, so mark it __init to not waste run time memory on it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05dm: remove dm_get_dev_tChristoph Hellwig1-2/+0
Open code dm_get_dev_t in the only remaining caller, and propagate the exact error code from lookup_bdev and early_lookup_bdev. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: move more code to early-lookup.cChristoph Hellwig1-1/+0
blk_lookup_devt is only used by code in early-lookup.c, so move it there. printk_all_partitions and it's helper bdevt_str are only used by the early init code in init/do_mounts.c, so they should go there as well. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: improve the name_to_dev_t interfaceChristoph Hellwig2-1/+5
name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: handle ubi/mtd root mounting like all other root typesChristoph Hellwig1-0/+1
Assign a Root_Generic magic value for UBI/MTD root and handle the root mounting in mount_root like all other root types. Besides making the code more clear this also means that UBI/MTD root can be used together with an initrd (not that anyone should care). Also factor parsing of the root name into a helper now that it can be easily done and will get more complicated with subsequent patches. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05init: remove pointless Root_* valuesChristoph Hellwig1-8/+0
Remove all unused defines, and just use the expanded versions for the SCSI disk majors. I've decided to keep Root_RAM0 even if it could be expanded as there is a lot of special casing for it in the init code. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05driver core: return bool from driver_probe_doneChristoph Hellwig1-1/+1
bool is the most sensible return value for a yes/no return. Also add __init as this funtion is only called from the early boot code. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05fs: add a method to shut down the file systemChristoph Hellwig1-0/+1
Add a new ->shutdown super operation that can be used to tell the file system to shut down, and call it from newly created holder ops when the block device under a file system shuts down. This only covers the main block device for "simple" file systems using get_tree_bdev / mount_bdev. File systems their own get_tree method or opening additional devices will need to set up their own blk_holder_ops. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: add a mark_dead holder operationChristoph Hellwig1-0/+1
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead to notify the holder that the block device it is using has been marked dead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-05block: introduce holder opsChristoph Hellwig2-3/+10
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Dave Chinner <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-01block: mark bio_add_folio as __must_checkJohannes Thumshirn1-1/+2
Now that all callers of bio_add_folio() check the return value, mark it as __must_check. Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/381360a45ac3684120cfbe1e07685e9c36256e47.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>
2023-06-01block: add bio_add_folio_nofailJohannes Thumshirn1-0/+2
Just like for bio_add_pages() add a no-fail variant for bio_add_folio(). Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/924dff4077812804398ef84128fb920507fa4be1.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>
2023-06-01block: mark bio_add_page as __must_checkJohannes Thumshirn1-1/+2
Now that all users of bio_add_page check for the return value, mark bio_add_page as __must_check. Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/7ae4a902e08fe2e90c012ee07aeb35d4aae28373.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>
2023-05-31mm: Provide a function to get an additional pin on a pageDavid Howells1-0/+1
Provide a function to get an additional pin on a page that we already have a pin on. This will be used in fs/direct-io.c when dispatching multiple bios to a page we've extracted from a user-backed iter rather than redoing the extraction. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: David Hildenbrand <[email protected]> cc: Lorenzo Stoakes <[email protected]> cc: Andrew Morton <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: Jason Gunthorpe <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: Hillf Danton <[email protected]> cc: Christian Brauner <[email protected]> cc: Linus Torvalds <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-31mm: Don't pin ZERO_PAGE in pin_user_pages()David Howells1-2/+24
Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer to it from the page tables and make unpin_user_page*() correspondingly ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a zero page's refcount as we're only allowed ~2 million pins on it - something that userspace can conceivably trigger. Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: David Hildenbrand <[email protected]> cc: Lorenzo Stoakes <[email protected]> cc: Andrew Morton <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: Jason Gunthorpe <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: Hillf Danton <[email protected]> cc: Christian Brauner <[email protected]> cc: Linus Torvalds <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-30block: constify struct part_type part_typeThomas Weißschuh1-1/+1
The struct is never modified so it can be const. Signed-off-by: Thomas Weißschuh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24block: Add BIO_PAGE_PINNED and associated infrastructureDavid Howells2-1/+3
Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jan Kara <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logicChristoph Hellwig2-2/+2
Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jan Kara <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24block: Fix bio_flagged() so that gcc can better optimise itDavid Howells1-1/+1
Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) || bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Jens Axboe <[email protected]> cc: [email protected] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24Merge branch 'for-6.5/splice' into for-6.5/blockJens Axboe3-19/+6
Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-05-24iov_iter: Kill ITER_PIPEDavid Howells1-14/+0
The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Remove generic_file_splice_read()David Howells1-2/+0
Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Steve French <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Make do_splice_to() generic and export itDavid Howells1-0/+3
Rename do_splice_to() to vfs_splice_read() and export it so that it can be used as a helper when calling down to a lower layer filesystem as it performs all the necessary checks[1]. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/CAJfpeguGksS3sCigmRi9hJdUec8qtM9f+_9jC1rJhsXT+dV01w@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Rename direct_splice_read() to copy_splice_read()David Howells1-3/+3
Rename direct_splice_read() to copy_splice_read() to better reflect as to what it does. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Steve French <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23block/rq_qos: protect rq_qos apis with a new lockYu Kuai1-0/+1
commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce some problems: 1) If rq_qos_add() is triggered by enabling iocost/iolatency through cgroupfs, then it can concurrent with del_gendisk(), it's not safe to write 'q->rq_qos' concurrently. 2) Activate cgroup policy that is relied on rq_qos will call rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is called in the middle, null-ptr-dereference will be triggered in blkcg_activate_policy(). 3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the disk, then if rq_qos_exit() from del_gendisk() is done before rq_qos_add(), then memory will be leaked. This patch add a new disk level mutex 'rq_qos_mutex': 1) The lock will protect rq_qos_exit() directly. 2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be called from disk initialization for now because wbt can't be destructed until rq_qos_exit(), so it's safe not to protect wbt for now. Hoever, in case that rq_qos dynamically destruction is supported in the furture, this patch also protect rq_qos_add() from wbt_init() directly, this is enough because blk-sysfs already synchronize writers with disk removal. 3) For iocost and iolatency, in order to synchronize disk removal and cgroup configuration, the lock is held after blkdev_get_no_open() from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). In order to fix the above memory leak, disk_live() is checked after holding the new lock. Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23block: remove redundant req_op in blk_rq_is_passthroughLi Nan1-1/+1
op &= REQ_OP_MASK in blk_op_is_passthrough() is exactly what req_op() do. Therefore, it is redundant to call req_op() for blk_op_is_passthrough(). Signed-off-by: Li Nan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>