aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-05-31fs: buffer: use __bio_add_page to add single page to bioJohannes Thumshirn1-2/+1
The buffer_head submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Gou Hao <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/84ff2dcbe81b258a73ad900adb5266e208b61a4d.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <[email protected]>
2023-05-31block: Use iov_iter_extract_pages() and page pinning in direct-io.cDavid Howells1-29/+43
Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: David Hildenbrand <[email protected]> cc: Lorenzo Stoakes <[email protected]> cc: Andrew Morton <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: Jason Gunthorpe <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: Hillf Danton <[email protected]> cc: Christian Brauner <[email protected]> cc: Linus Torvalds <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logicChristoph Hellwig2-1/+2
Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jan Kara <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroingDavid Howells1-1/+1
ZERO_PAGE can't go away, no need to hold an extra reference. Signed-off-by: David Howells <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24Merge branch 'for-6.5/splice' into for-6.5/blockJens Axboe57-159/+521
Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-05-24splice: kdoc for filemap_splice_read() and copy_splice_read()David Howells1-2/+19
Provide kerneldoc comments for filemap_splice_read() and copy_splice_read(). Signed-off-by: David Howells <[email protected]> cc: Christian Brauner <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Jens Axboe <[email protected]> cc: Steve French <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Remove generic_file_splice_read()David Howells1-43/+0
Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Steve French <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Use filemap_splice_read() instead of generic_file_splice_read()David Howells36-38/+38
Replace pointers to generic_file_splice_read() with calls to filemap_splice_read(). Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24cifs: Use filemap_splice_read()David Howells3-23/+4
Make cifs use filemap_splice_read() rather than doing its own version of generic_file_splice_read(). Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> cc: Jens Axboe <[email protected]> cc: Steve French <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24zonefs: Provide a splice-read wrapperDavid Howells1-1/+39
Provide a splice_read wrapper for zonefs. This does some checks before proceeding and locks the inode across the call to filemap_splice_read() and a size check in case of truncation. Splicing from direct I/O is handled by the caller. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Darrick J. Wong <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Acked-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24xfs: Provide a splice-read wrapperDavid Howells2-2/+30
Provide a splice_read wrapper for XFS. This does a stat count and a shutdown check before proceeding, then emits a new trace line and locks the inode across the call to filemap_splice_read() and adds to the stats afterwards. Splicing from direct I/O or DAX is handled by the caller. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Darrick J. Wong <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24orangefs: Provide a splice-read wrapperDavid Howells1-1/+21
Provide a splice_read wrapper for ocfs2. This increments the read stats and then locks the inode across the call to filemap_splice_read() and a revalidation of the mapping. Splicing from direct I/O is done by the caller. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Mike Marshall <[email protected]> cc: Martin Brandenburg <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24ocfs2: Provide a splice-read wrapperDavid Howells2-2/+42
Provide a splice_read wrapper for ocfs2. This emits trace lines and does an atime lock/update before calling filemap_splice_read(). Splicing from direct I/O is handled by the caller. A couple of new tracepoints are added for this purpose. Signed-off-by: David Howells <[email protected]> Reviewed-by: Joseph Qi <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Mark Fasheh <[email protected]> cc: Joel Becker <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24ntfs3: Provide a splice-read wrapperDavid Howells1-1/+30
Provide a splice_read wrapper for NTFS3 to perform various checks before allowing the operation to proceed. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Konstantin Komarov <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24nfs: Provide a splice-read wrapperDavid Howells3-2/+25
Provide a splice_read wrapper for NFS. This locks the inode around filemap_splice_read() and revalidates the mapping. Splicing from direct I/O is handled by the caller. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Trond Myklebust <[email protected]> cc: Anna Schumaker <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24f2fs: Provide a splice-read wrapperDavid Howells1-8/+35
Provide a splice_read wrapper for f2fs. This does some checks and tracing before calling filemap_splice_read() and will update the iostats afterwards. Direct I/O is handled by the caller. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jaegeuk Kim <[email protected]> cc: Chao Yu <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24ext4: Provide a splice-read wrapperDavid Howells1-1/+12
Provide a splice_read wrapper for Ext4. This does the inode shutdown check before proceeding. Splicing from DAX files and O_DIRECT fds is handled by the caller. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Theodore Ts'o <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Andreas Dilger <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24ecryptfs: Provide a splice-read wrapperDavid Howells1-1/+26
Provide a splice_read wrapper for ecryptfs to update the access time on the lower file after the operation. Splicing from a direct I/O fd will update the access time when ->read_iter() is called. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Tyler Hicks <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24ceph: Provide a splice-read wrapperDavid Howells1-1/+64
Provide a splice_read wrapper for Ceph. This does the inode shutdown check before proceeding and jumps to copy_splice_read() if the file has inline data or is a synchronous file. We try and get FILE_RD and either FILE_CACHE and/or FILE_LAZYIO caps and hold them across filemap_splice_read(). If we fail to get FILE_CACHE or FILE_LAZYIO capabilities, we use copy_splice_read() instead. Signed-off-by: David Howells <[email protected]> Reviewed-by: Xiubo Li <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Ilya Dryomov <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24afs: Provide a splice-read wrapperDavid Howells1-1/+19
Provide a splice_read wrapper for AFS to call afs_validate() before going into generic_file_splice_read() so that we're likely to have a callback promise from the server. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-249p: Add splice_read wrapperDavid Howells1-2/+24
Add a splice_read wrapper for 9p. We should use copy_splice_read() if 9PL_DIRECT is set and filemap_splice_read() otherwise. Note that this doesn't seem to be particularly related to O_DIRECT. Signed-off-by: David Howells <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Dominique Martinet <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24tty, proc, kernfs, random: Use copy_splice_read()David Howells4-7/+7
Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read(). Signed-off-by: David Howells <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Arnd Bergmann <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24coda: Implement splice-readDavid Howells1-1/+28
Implement splice-read for coda by passing the request down a layer rather than going through generic_file_splice_read() which is going to be changed to assume that ->read_folio() is present on buffered files. Signed-off-by: David Howells <[email protected]> Acked-by: Jan Harkes <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24overlayfs: Implement splice-readDavid Howells1-1/+22
Implement splice-read for overlayfs by passing the request down a layer rather than going through generic_file_splice_read() which is going to be changed to assume that ->read_folio() is present on buffered files. Signed-off-by: David Howells <[email protected]> Acked-by: Christian Brauner <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Amir Goldstein <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Make splice from a DAX file use copy_splice_read()David Howells1-3/+3
Make a read splice from a DAX file go directly to copy_splice_read() to do the reading as filemap_splice_read() is unlikely to find any pagecache to splice. I think this affects only erofs, Ext2, Ext4, fuse and XFS. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Theodore Ts'o <[email protected]> Reviewed-by: Gao Xiang <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Make splice from an O_DIRECT fd use copy_splice_read()David Howells1-0/+6
Make a read splice from a file descriptor that's open O_DIRECT use copy_splice_read() to do the reading as filemap_splice_read() is unlikely to find any pagecache to splice. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Check for zero count in vfs_splice_read()David Howells1-0/+2
Make vfs_splice_read() return immediately if the length is 0. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Make do_splice_to() generic and export itDavid Howells1-7/+20
Rename do_splice_to() to vfs_splice_read() and export it so that it can be used as a helper when calling down to a lower layer filesystem as it performs all the necessary checks[1]. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/CAJfpeguGksS3sCigmRi9hJdUec8qtM9f+_9jC1rJhsXT+dV01w@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Clean up copy_splice_read() a bitDavid Howells1-12/+7
Do a couple of cleanups to copy_splice_read(): (1) Cast to struct page **, not void *. (2) Simplify the calculation of the number of pages to keep/reclaim in copy_splice_read(). Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24splice: Rename direct_splice_read() to copy_splice_read()David Howells3-9/+8
Rename direct_splice_read() to copy_splice_read() to better reflect as to what it does. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Steve French <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-19fs: remove the special !CONFIG_BLOCK def_blk_fopsChristoph Hellwig3-28/+4
def_blk_fops always returns -ENODEV, which dosn't match the return value of a non-existing block device with CONFIG_BLOCK, which is -ENXIO. Just remove the extra implementation and fall back to the default no_open_fops that always returns -ENXIO. Fixes: 9361401eb761 ("[PATCH] BLOCK: Make it possible to disable the block layer [try #6]") Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-13Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds13-104/+269
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some ext4 bug fixes (mostly to address Syzbot reports)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: bail out of ext4_xattr_ibody_get() fails for any reason ext4: add bounds checking in get_max_inline_xattr_value_size() ext4: add indication of ro vs r/w mounts in the mount message ext4: fix deadlock when converting an inline directory in nojournal mode ext4: improve error recovery code paths in __ext4_remount() ext4: improve error handling from ext4_dirhash() ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled ext4: check iomap type only if ext4_iomap_begin() does not fail ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum ext4: fix data races when using cached status extents ext4: avoid deadlock in fs reclaim with page writeback ext4: fix invalid free tracking in ext4_xattr_move_to_block() ext4: remove a BUG_ON in ext4_mb_release_group_pa() ext4: allow ext4_get_group_info() to fail ext4: fix lockdep warning when enabling MMP ext4: fix WARNING in mb_find_extent
2023-05-13ext4: bail out of ext4_xattr_ibody_get() fails for any reasonTheodore Ts'o1-1/+1
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any reason, it's best if we just fail as opposed to stumbling on, especially if the failure is EFSCORRUPTED. Cc: [email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: add bounds checking in get_max_inline_xattr_value_size()Theodore Ts'o1-1/+11
Normally the extended attributes in the inode body would have been checked when the inode is first opened, but if someone is writing to the block device while the file system is mounted, it's possible for the inode table to get corrupted. Add bounds checking to avoid reading beyond the end of allocated memory if this happens. Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7 Cc: [email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: add indication of ro vs r/w mounts in the mount messageTheodore Ts'o1-4/+6
Whether the file system is mounted read-only or read/write is more important than the quota mode, which we are already printing. Add the ro vs r/w indication since this can be helpful in debugging problems from the console log. Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: fix deadlock when converting an inline directory in nojournal modeTheodore Ts'o1-1/+2
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock by calling ext4_handle_dirty_dirblock() when it already has taken the directory lock. There is a similar self-deadlock in ext4_incvert_inline_data_nolock() for data files which we'll fix at the same time. A simple reproducer demonstrating the problem: mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64 mount -t ext4 -o dirsync /dev/vdc /vdc cd /vdc mkdir file0 cd file0 touch file0 touch file1 attr -s BurnSpaceInEA -V abcde . touch supercalifragilisticexpialidocious Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: improve error recovery code paths in __ext4_remount()Theodore Ts'o1-3/+10
If there are failures while changing the mount options in __ext4_remount(), we need to restore the old mount options. This commit fixes two problem. The first is there is a chance that we will free the old quota file names before a potential failure leading to a use-after-free. The second problem addressed in this commit is if there is a failed read/write to read-only transition, if the quota has already been suspended, we need to renable quota handling. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: improve error handling from ext4_dirhash()Theodore Ts'o2-17/+42
The ext4_dirhash() will *almost* never fail, especially when the hash tree feature was first introduced. However, with the addition of support of encrypted, casefolded file names, that function can most certainly fail today. So make sure the callers of ext4_dirhash() properly check for failures, and reflect the errors back up to their callers. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabledTheodore Ts'o1-1/+5
When a file system currently mounted read/only is remounted read/write, if we clear the SB_RDONLY flag too early, before the quota is initialized, and there is another process/thread constantly attempting to create a directory, it's possible to trigger the WARN_ON_ONCE(dquot_initialize_needed(inode)); in ext4_xattr_block_set(), with the following stack trace: WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680 RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141 Call Trace: ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458 ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44 security_inode_init_security+0x2df/0x3f0 security/security.c:1147 __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324 ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992 vfs_mkdir+0x29d/0x450 fs/namei.c:4038 do_mkdirat+0x264/0x520 fs/namei.c:4061 __do_sys_mkdirat fs/namei.c:4076 [inline] __se_sys_mkdirat fs/namei.c:4074 [inline] __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074 Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: check iomap type only if ext4_iomap_begin() does not failBaokun Li1-1/+1
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may fail for some reason (e.g. memory allocation failure, bare disk write), and later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4 iomap_begin() returns an error, it is normal that the type of iomap->type may not match the expectation. Therefore, we only determine if iomap->type is as expected when ext4_iomap_begin() is executed successfully. Cc: [email protected] Reported-by: [email protected] Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csumTudor Ambarus1-4/+2
When modifying the block device while it is mounted by the filesystem, syzbot reported the following: BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58 Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586 CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 crc16+0x206/0x280 lib/crc16.c:58 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline] ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173 ext4_remove_blocks fs/ext4/extents.c:2527 [inline] ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline] ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622 notify_change+0xe50/0x1100 fs/attr.c:482 do_truncate+0x200/0x2f0 fs/open.c:65 handle_truncate fs/namei.c:3216 [inline] do_open fs/namei.c:3561 [inline] path_openat+0x272b/0x2dd0 fs/namei.c:3714 do_filp_open+0x264/0x4f0 fs/namei.c:3741 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f72f8a8c0c9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280 RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000 Replace le16_to_cpu(sbi->s_es->s_desc_size) with sbi->s_desc_size It reduces ext4's compiled text size, and makes the code more efficient (we remove an extra indirect reference and a potential byte swap on big endian systems), and there is no downside. It also avoids the potential KASAN / syzkaller failure, as a bonus. Reported-by: [email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42 Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3 Link: https://lore.kernel.org/all/[email protected]/ Fixes: 717d50e4971b ("Ext4: Uninitialized Block Groups") Cc: [email protected] Signed-off-by: Tudor Ambarus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: fix data races when using cached status extentsJan Kara1-17/+13
When using cached extent stored in extent status tree in tree->cache_es another process holding ei->i_es_lock for reading can be racing with us setting new value of tree->cache_es. If the compiler would decide to refetch tree->cache_es at an unfortunate moment, it could result in a bogus in_range() check. Fix the possible race by using READ_ONCE() when using tree->cache_es only under ei->i_es_lock for reading. Cc: [email protected] Reported-by: [email protected] Link: https://lore.kernel.org/all/[email protected] Suggested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: avoid deadlock in fs reclaim with page writebackJan Kara3-13/+40
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to avoid races with switching of journalled data flag or inode format. This lock can however cause a deadlock like: CPU0 CPU1 ext4_writepages() percpu_down_read(sbi->s_writepages_rwsem); ext4_change_inode_journal_flag() percpu_down_write(sbi->s_writepages_rwsem); - blocks, all readers block from now on ext4_do_writepages() ext4_init_io_end() kmem_cache_zalloc(io_end_cachep, GFP_KERNEL) fs_reclaim frees dentry... dentry_unlink_inode() iput() - last ref => iput_final() - inode dirty => write_inode_now()... ext4_writepages() tries to acquire sbi->s_writepages_rwsem and blocks forever Make sure we cannot recurse into filesystem reclaim from writeback code to avoid the deadlock. Reported-by: [email protected] Link: https://lore.kernel.org/all/[email protected] Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages") CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: fix invalid free tracking in ext4_xattr_move_to_block()Theodore Ts'o1-2/+3
In ext4_xattr_move_to_block(), the value of the extended attribute which we need to move to an external block may be allocated by kvmalloc() if the value is stored in an external inode. So at the end of the function the code tried to check if this was the case by testing entry->e_value_inum. However, at this point, the pointer to the xattr entry is no longer valid, because it was removed from the original location where it had been stored. So we could end up calling kvfree() on a pointer which was not allocated by kvmalloc(); or we could also potentially leak memory by not freeing the buffer when it should be freed. Fix this by storing whether it should be freed in a separate variable. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://syzkaller.appspot.com/bug?id=5c2aee8256e30b55ccf57312c16d88417adbd5e1 Link: https://syzkaller.appspot.com/bug?id=41a6b5d4917c0412eb3b3c3c604965bed7d7420b Reported-by: [email protected] Reported-by: [email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: remove a BUG_ON in ext4_mb_release_group_pa()Theodore Ts'o1-1/+5
If a malicious fuzzer overwrites the ext4 superblock while it is mounted such that the s_first_data_block is set to a very large number, the calculation of the block group can underflow, and trigger a BUG_ON check. Change this to be an ext4_warning so that we don't crash the kernel. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220 Signed-off-by: Theodore Ts'o <[email protected]>
2023-05-13ext4: allow ext4_get_group_info() to failTheodore Ts'o5-29/+82
Previously, ext4_get_group_info() would treat an invalid group number as BUG(), since in theory it should never happen. However, if a malicious attaker (or fuzzer) modifies the superblock via the block device while it is the file system is mounted, it is possible for s_first_data_block to get set to a very large number. In that case, when calculating the block group of some block number (such as the starting block of a preallocation region), could result in an underflow and very large block group number. Then the BUG_ON check in ext4_get_group_info() would fire, resutling in a denial of service attack that can be triggered by root or someone with write access to the block device. For a quality of implementation perspective, it's best that even if the system administrator does something that they shouldn't, that it will not trigger a BUG. So instead of BUG'ing, ext4_get_group_info() will call ext4_error and return NULL. We also add fallback code in all of the callers of ext4_get_group_info() that it might NULL. Also, since ext4_get_group_info() was already borderline to be an inline function, un-inline it. The results in a next reduction of the compiled text size of ext4 by roughly 2k. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220 Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2023-05-12Merge tag 'for-6.4-rc1-tag' of ↵Linus Torvalds11-33/+102
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull more btrfs fixes from David Sterba: - fix incorrect number of bitmap entries for space cache if loading is interrupted by some error - fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that is used in deduplication tools - zoned mode fixes: - properly finish zone reserved for relocation - correctly calculate super block zone end on ZNS - properly initialize new extent buffer for redirty - make mount option clear_cache work with block-group-tree, to rebuild free-space-tree instead of temporarily disabling it that would lead to a forced read-only mount - fix alignment check for offset when printing extent item * tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: make clear_cache mount option to rebuild FST without disabling it btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add btrfs: zoned: fix full zone super block reading on ZNS btrfs: zoned: zone finish data relocation BG with last IO btrfs: fix backref walking not returning all inode refs btrfs: fix space cache inconsistency after error loading it from disk btrfs: print-tree: parent bytenr must be aligned to sector size
2023-05-12Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-2/+28
Pull cifs client fixes from Steve French: - fix for copy_file_range bug for very large files that are multiples of rsize - do not ignore "isolated transport" flag if set on share - set rasize default better - three fixes related to shutdown and freezing (fixes 4 xfstests, and closes deferred handles faster in some places that were missed) * tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: release leases for deferred close handles when freezing smb3: fix problem remounting a share after shutdown SMB3: force unmount was failing to close deferred close files smb3: improve parallel reads of large files do not reuse connection if share marked as isolated cifs: fix pcchunk length type in smb2_copychunk_range
2023-05-12Merge tag 'vfs/v6.4-rc1/pipe' of ↵Linus Torvalds1-2/+4
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fix from Christian Brauner: "During the pipe nonblock rework the check for both O_NONBLOCK and IOCB_NOWAIT was dropped. Both checks need to be performed to ensure that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing to or reading from a pipe. This just contains the fix adding the check for IOCB_NOWAIT back in" * tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
2023-05-12pipe: check for IOCB_NOWAIT alongside O_NONBLOCKJens Axboe1-2/+4
Pipe reads or writes need to enable nonblocking attempts, if either O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being passed in. The latter isn't currently true, ensure we check for both before waiting on data or space. Fixes: afed6271f5b0 ("pipe: set FMODE_NOWAIT on pipes") Signed-off-by: Jens Axboe <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>