aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-10-07xfs: don't ifdef around the exact minlen allocationsChristoph Hellwig3-14/+3
Exact minlen allocations only exist as an error injection tool for debug builds. Currently this is implemented using ifdefs, which means the code isn't even compiled for non-XFS_DEBUG builds. Enhance the compile test coverage by always building the code and use the compilers' dead code elimination to remove it from the generated binary instead. The only downside is that the alloc_minlen_only field is unconditionally added to struct xfs_alloc_args now, but by moving it around and packing it tightly this doesn't actually increase the size of the structure. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocateChristoph Hellwig1-45/+28
Userdata and metadata allocations end up in the same allocation helpers. Remove the separate xfs_bmap_alloc_userdata function to make this more clear. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addnameChristoph Hellwig1-5/+8
Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return -ENOSPC both for an actual failure to allocate a disk block, but also to signal the caller to convert the format of the attr fork. Use magic 1 to ask for the conversion here as well. Note that unlike the similar issue in xfs_attr3_leaf_split, this one was only found by code review. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_splitChristoph Hellwig2-3/+7
xfs_attr3_leaf_split propagates the need for an extra btree split as -ENOSPC to it's only caller, but the same return value can also be returned from xfs_da_grow_inode when it fails to find free space. Distinguish the two cases by returning 1 for the extra split case instead of overloading -ENOSPC. This can be triggered relatively easily with the pending realtime group support and a file system with a lot of small zones that use metadata space on the main device. In this case every about 5-10th run of xfs/538 runs into the following assert: ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC); in xfs_attr3_leaf_split caused by an allocation failure. Note that the allocation failure is caused by another bug that will be fixed subsequently, but this commit at least sorts out the error handling. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: return bool from xfs_attr3_leaf_addChristoph Hellwig3-27/+25
xfs_attr3_leaf_add only has two potential return values, indicating if the entry could be added or not. Replace the errno return with a bool so that ENOSPC from it can't easily be confused with a real ENOSPC. Remove the return value from the xfs_attr3_leaf_add_work helper entirely, as it always return 0. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addnameChristoph Hellwig1-102/+74
xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and merging the two will simplify a following error handling fix. To facilitate this move the remote block state save/restore helpers up in the file so that they don't need forward declarations now. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()Uros Bizjak1-7/+4
Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in xlog_cil_insert_pcp_aggregate(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Chandan Babu R <[email protected]> Cc: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: scrub: convert comma to semicolonYan Zhen1-2/+2
Replace a comma between expression statements by a semicolon. Signed-off-by: Yan Zhen <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-07xfs: Remove empty declartion in header fileZhang Zekun1-2/+0
The definition of xfs_attr_use_log_assist() has been removed since commit d9c61ccb3b09 ("xfs: move xfs_attr_use_log_assist out of xfs_log.c"). So, Remove the empty declartion in header files. Signed-off-by: Zhang Zekun <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
2024-10-06cifs: Fix creating native symlinks pointing to current or parent directoryPali Rohár1-3/+14
Calling 'ln -s . symlink' or 'ln -s .. symlink' creates symlink pointing to some object name which ends with U+F029 unicode codepoint. This is because trailing dot in the object name is replaced by non-ASCII unicode codepoint. So Linux SMB client currently is not able to create native symlink pointing to current or parent directory on Windows SMB server which can be read by either on local Windows server or by any other SMB client which does not implement compatible-reverse character replacement. Fix this problem in cifsConvertToUTF16() function which is doing that character replacement. Function comment already says that it does not need to handle special cases '.' and '..', but after introduction of native symlinks in reparse point form, this handling is needed. Note that this change depends on the previous change "cifs: Improve creating native symlinks pointing to directory". Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-10-06cifs: Improve creating native symlinks pointing to directoryPali Rohár3-4/+164
SMB protocol for native symlinks distinguish between symlink to directory and symlink to file. These two symlink types cannot be exchanged, which means that symlink of file type pointing to directory cannot be resolved at all (and vice-versa). Windows follows this rule for local filesystems (NTFS) and also for SMB. Linux SMB client currenly creates all native symlinks of file type. Which means that Windows (and some other SMB clients) cannot resolve symlinks pointing to directory created by Linux SMB client. As Linux system does not distinguish between directory and file symlinks, its API does not provide enough information for Linux SMB client during creating of native symlinks. Add some heuristic into the Linux SMB client for choosing the correct symlink type during symlink creation. Check if the symlink target location ends with slash, or last path component is dot or dot-dot, and check if the target location on SMB share exists and is a directory. If at least one condition is truth then create a new SMB symlink of directory type. Otherwise create it as file type symlink. This change improves interoperability with Windows systems. Windows systems would be able to resolve more SMB symlinks created by Linux SMB client which points to existing directory. Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-10-06bcachefs: Delete vestigal check_inode() checksKent Overstreet1-75/+5
BCH_INODE_i_size_dirty dates from before we had logged operations for truncate (as well as finsert) - it hasn't been needed since before bcachefs was mainlined. BCH_INODE_i_sectors_dirty hasn't been needed since we started always updating i_sectors transactionally - it's been unused for even longer. BCH_INODE_backptr_untrusted also hasn't been used since prior to mainlining; when unlinking a hardling, we zero out the backpointer fields if they're for the dirent being removed. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-06bcachefs: btree_iter_peek_upto() now handles BTREE_ITER_all_snapshotsKent Overstreet1-3/+3
end_pos now compares against snapshot ID when required Signed-off-by: Kent Overstreet <[email protected]>
2024-10-06bcachefs: reattach_inode() now correctly handles interior snapshot nodesKent Overstreet2-20/+158
When we find an unreachable inode, we now reattach it in the oldest version that needs to be reattached (thus avoiding redundant work reattaching every single version), and we now fix up inode -> dirent backpointers in newer versions as needed - or white out the reattaching dirent in newer versions, if the newer version isn't supposed to be reattached. This results in the second verify fsck now passing cleanly after repairing on a user-provided filesystem image with thousands of different snapshots. Reported-by: Christopher Snowhill <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-06bcachefs: Split out check_unreachable_inodes() passKent Overstreet3-35/+67
With inode backpointers, we can write a very simple check_unreachable_inodes() pass that only looks for non-unlinked inodes that are missing backpointers, and reattaches them. This simplifies check_directory_structure() so that it's now only checking for directory structure loops, Signed-off-by: Kent Overstreet <[email protected]>
2024-10-06bcachefs: Fix lockdep splat in bch2_accounting_readKent Overstreet1-8/+26
We can't take sb_lock while holding mark_lock, so split out replicas_entry_validate() and replicas_entry_sb_validate() - replicas_entry_validate() now uses the normal online device interface. 00039 ========= TEST set_option 00039 00039 WATCHDOG 30 00040 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=errors=panic 00040 bcachefs (vdb): initializing new filesystem 00040 bcachefs (vdb): going read-write 00040 bcachefs (vdb): marking superblocks 00040 bcachefs (vdb): initializing freespace 00040 bcachefs (vdb): done initializing freespace 00040 bcachefs (vdb): reading snapshots table 00040 bcachefs (vdb): reading snapshots done 00040 bcachefs (vdb): done starting filesystem 00040 zstd 00041 bcachefs (vdb): shutting down 00041 bcachefs (vdb): going read-only 00041 bcachefs (vdb): finished waiting for writes to stop 00041 bcachefs (vdb): flushing journal and stopping allocators, journal seq 3 00041 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 11 00041 bcachefs (vdb): shutdown complete, journal seq 12 00041 bcachefs (vdb): marking filesystem clean 00041 bcachefs (vdb): shutdown complete 00041 Setting option on offline fs 00041 bch2_write_super(): fatal error : attempting to write superblock that wasn't version downgraded (1.12: (unknown version) > 1.10: disk_accounting_v3) 00041 fatal error - emergency read only 00041 bch2_write_super(): fatal error : attempting to write superblock that wasn't version downgraded (1.12: (unknown version) > 1.10: disk_accounting_v3) 00042 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=errors=panic,compression=zstd 00042 bcachefs (vdb): recovering from clean shutdown, journal seq 12 00042 bcachefs (vdb): accounting_read... 00042 00042 ====================================================== 00042 WARNING: possible circular locking dependency detected 00042 6.12.0-rc1-ktest-g805e938a8502 #6807 Not tainted 00042 ------------------------------------------------------ 00042 mount.bcachefs/665 is trying to acquire lock: 00045 ffffff80cc280908 (&c->sb_lock){+.+.}-{3:3}, at: bch2_replicas_entry_validate (fs/bcachefs/replicas.c:102) 00045 00045 but task is already holding lock: 00048 ffffff80cc284870 (&c->mark_lock){++++}-{0:0}, at: bch2_accounting_read (fs/bcachefs/disk_accounting.c:670 (discriminator 1)) 00048 00048 which lock already depends on the new lock. 00048 00048 00048 the existing dependency chain (in reverse order) is: 00048 00048 -> #1 (&c->mark_lock){++++}-{0:0}: 00049 percpu_down_write (kernel/locking/percpu-rwsem.c:232) 00052 bch2_sb_replicas_to_cpu_replicas (fs/bcachefs/replicas.c:583) 00055 bch2_sb_to_fs (fs/bcachefs/super-io.c:614) 00057 bch2_fs_open (fs/bcachefs/super.c:828 fs/bcachefs/super.c:2050) 00060 bch2_fs_get_tree (fs/bcachefs/fs.c:2067) 00062 vfs_get_tree (fs/super.c:1801) 00064 path_mount (fs/namespace.c:3507 fs/namespace.c:3834) 00066 __arm64_sys_mount (fs/namespace.c:3847 fs/namespace.c:4055 fs/namespace.c:4032 fs/namespace.c:4032) 00067 invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) 00068 do_el0_svc (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) 00069 el0_svc (arch/arm64/include/asm/irqflags.h:82 arch/arm64/include/asm/irqflags.h:123 arch/arm64/include/asm/irqflags.h:136 arch/arm64/kernel/entry-common.c:165 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713) 00069 ========= FAILED TIMEOUT set_option in 30s Signed-off-by: Kent Overstreet <[email protected]>
2024-10-05Merge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefsLinus Torvalds20-209/+342
Pull bcachefs fixes from Kent Overstreet: "A lot of little fixes, bigger ones include: - bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs changes, now fixed along with another lost wakeup - fragmentation LRU fixes; fsck now repairs successfully (this is the data structure copygc uses); along with some nice simplification. - Rework logged op error handling, so that if logged op replay errors (due to another filesystem error) we delete the logged op instead of going into an infinite loop) - Various small filesystem connectivitity repair fixes" * tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs: bcachefs: Rework logged op error handling bcachefs: Add warn param to subvol_get_snapshot, peek_inode bcachefs: Kill snapshot arg to fsck_write_inode() bcachefs: Check for unlinked, non-empty dirs in check_inode() bcachefs: Check for unlinked inodes with dirents bcachefs: Check for directories with no backpointers bcachefs: Kill alloc_v4.fragmentation_lru bcachefs: minor lru fsck fixes bcachefs: Mark more errors AUTOFIX bcachefs: Make sure we print error that causes fsck to bail out bcachefs: bkey errors are only AUTOFIX during read bcachefs: Create lost+found in correct snapshot bcachefs: Fix reattach_inode() bcachefs: Add missing wakeup to bch2_inode_hash_remove() bcachefs: Fix trans_commit disk accounting revert bcachefs: Fix bch2_inode_is_open() check bcachefs: Fix return type of dirent_points_to_inode_nowarn() bcachefs: Fix bad shift in bch2_read_flag_list()
2024-10-05nfsd: fix possible badness in FREE_STATEIDOlga Kornievskaia1-0/+1
When multiple FREE_STATEIDs are sent for the same delegation stateid, it can lead to a possible either use-after-free or counter refcount underflow errors. In nfsd4_free_stateid() under the client lock we find a delegation stateid, however the code drops the lock before calling nfs4_put_stid(), that allows another FREE_STATE to find the stateid again. The first one will proceed to then free the stateid which leads to either use-after-free or decrementing already zeroed counter. Fixes: 3f29cc82a84c ("nfsd: split sc_status out of sc_type") Signed-off-by: Olga Kornievskaia <[email protected]> Reviewed-by: Benjamin Coddington <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2024-10-05Merge tag 'ext4_for_linus-5.12-rc2' of ↵Linus Torvalds3-17/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix some ext4 bugs and regressions relating to oneline resize and fast commits" * tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix off by one issue in alloc_flex_gd() ext4: mark fc as ineligible using an handle in ext4_xattr_set() ext4: use handle to mark fc as ineligible in __track_dentry_update()
2024-10-04bcachefs: Rework logged op error handlingKent Overstreet3-28/+53
Initially it was thought that we just wanted to ignore errors from logged op replay, but it turns out we do need to catch -EROFS, or we'll go into an infinite loop. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Add warn param to subvol_get_snapshot, peek_inodeKent Overstreet4-28/+43
These shouldn't always be fatal errors - logged op resume, in particular, and we want it as a parameter there. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Kill snapshot arg to fsck_write_inode()Kent Overstreet4-55/+51
It was initially believed that it would be better to be explicit about the snapshot we're updating when writing inodes in fsck; however, it turns out that passing around the snapshot separately is more error prone and we're usually updating the inode in the same snapshow we read it from. This is different from normal filesystem paths, where we do the update in the snapshot of the subvolume we're in. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Check for unlinked, non-empty dirs in check_inode()Kent Overstreet2-1/+19
We want to check for this early so it can be reattached if necessary in check_unreachable_inodes(); better than letting it be deleted and having the children reattached, losing their filenames. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Check for unlinked inodes with direntsKent Overstreet2-15/+41
link count works differently in bcachefs - it's only nonzero for files with multiple hardlinks, which means we can also avoid checking it except for files that are known to have hardlinks. That means we need a few different checks instead; in particular, we don't want fsck to delet a file that has a dirent pointing to it. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Check for directories with no backpointersKent Overstreet2-8/+17
It's legal for regular files to have missing backpointers (due to hardlinks), and fsck should automatically add them, but for directories this is an error that should be flagged. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Kill alloc_v4.fragmentation_lruKent Overstreet7-22/+38
The fragmentation_lru field hasn't been needed since we reworked the LRU btrees to use the btree write buffer; previously it was used to resolve collisions, but the revised LRU btree uses the backpointer (the bucket) as part of the key. It should have been deleted at the time of the LRU rework; since it wasn't, that left places for bugs to hide, in check/repair. This fixes LRU fsck on a filesystem image helpfully provided by a user who disappeared before I could get his name for the reported-by. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: minor lru fsck fixesKent Overstreet1-12/+15
check_lru_key() wasn't using write buffer updates for deleting bad lru entries - dating from before the lru btree used the btree write buffer. And when possibly flushing the btree write buffer (to make sure we're seeing a real inconsistency), we need to be using the modern bch2_btree_write_buffer_maybe_flush(). Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Mark more errors AUTOFIXKent Overstreet1-12/+12
Errors are getting marked as AUTOFIX once they've been (re)-tested and audited. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Make sure we print error that causes fsck to bail outKent Overstreet1-3/+9
Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: bkey errors are only AUTOFIX during readKent Overstreet2-8/+12
Newly generated keys, in the transaction commit path or write path, should not be AUTOFIX; those indicate bugs that we need to fail fast for. Fixes: 5612daafb764 ("bcachefs: Fix fsck warnings from bkey validation") Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Create lost+found in correct snapshotKent Overstreet1-1/+7
Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Fix reattach_inode()Kent Overstreet1-6/+5
Ensure a copy of the lost+found inode exists in the snapshot that we're reattaching, so that we don't trigger warnings in lookup_inode_for_snapshot() later. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04bcachefs: Add missing wakeup to bch2_inode_hash_remove()Kent Overstreet1-12/+21
This fixes two different bugs: - Looser locking with the rhashtable means we need to recheck if the inode is still hashed after prepare_to_wait(), and add a corresponding wakeup after removing from the hash table. - da18ecbf0fb6 ("fs: add i_state helpers") changed the bit waitqueues used for inodes, and bcachefs wasn't updated and thus broke; this updates bcachefs to the new helper. Fixes: 112d21fd1a12 ("bcachefs: switch to rhashtable for vfs inodes hash") Signed-off-by: Kent Overstreet <[email protected]>
2024-10-04ext4: fix off by one issue in alloc_flex_gd()Baokun Li1-8/+10
Wesley reported an issue: ================================================================== EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks ------------[ cut here ]------------ kernel BUG at fs/ext4/resize.c:324! CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 RIP: 0010:ext4_resize_fs+0x1212/0x12d0 Call Trace: __ext4_ioctl+0x4e0/0x1800 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0x99/0xd0 x64_sys_call+0x1206/0x20d0 do_syscall_64+0x72/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e ================================================================== While reviewing the patch, Honza found that when adjusting resize_bg in alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than flexbg_size. The reproduction of the problem requires the following: o_group = flexbg_size * 2 * n; o_size = (o_group + 1) * group_size; n_group: [o_group + flexbg_size, o_group + flexbg_size * 2) o_size = (n_group + 1) * group_size; Take n=0,flexbg_size=16 as an example: last:15 |o---------------|--------------n-| o_group:0 resize to n_group:30 The corresponding reproducer is: img=test.img rm -f $img truncate -s 600M $img mkfs.ext4 -F $img -b 1024 -G 16 8M dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 248M Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE() to prevent the issue from happening again. [ Note: another reproucer which this commit fixes is: img=test.img rm -f $img truncate -s 25MiB $img mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img truncate -s 3GiB $img dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 3G umount $dev losetup -d $dev -- TYT ] Reported-by: Wesley Hershberger <[email protected]> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Tested-by: Alexander Mikhalitsyn <[email protected]> Tested-by: Eric Sandeen <[email protected]> Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: [email protected] Signed-off-by: Baokun Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2024-10-04ext4: mark fc as ineligible using an handle in ext4_xattr_set()Luis Henriques (SUSE)1-1/+2
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch moves the call to this function so that an handle can be used. If a transaction fails to start, then there's not point in trying to mark the filesystem as ineligible, and an error will eventually be returned to user-space. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Luis Henriques (SUSE) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2024-10-04ext4: use handle to mark fc as ineligible in __track_dentry_update()Luis Henriques (SUSE)1-8/+11
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch fixes the calls to this function in __track_dentry_update() by adding an extra parameter to the callback used in ext4_fc_track_template(). Suggested-by: Jan Kara <[email protected]> Signed-off-by: Luis Henriques (SUSE) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2024-10-04nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstpMike Snitzer1-3/+3
Otherwise nfsd_file_acquire, nfsd_file_insert_err, and nfsd_file_cons_err will hit a NULL pointer when they are enabled and LOCALIO used. Example trace output (note xid is 0x0 and LOCALIO flag set): nfsd_file_acquire: xid=0x0 inode=0000000069a1b2e7 may_flags=WRITE|LOCALIO ref=1 nf_flags=HASHED|GC nf_may=WRITE nf_file=0000000070123234 status=0 Fixes: c63f0e48febf ("nfsd: add nfsd_file_acquire_local()") Signed-off-by: Mike Snitzer <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2024-10-04Merge tag 'fsnotify_for_v6.12-rc2' of ↵Linus Torvalds7-27/+22
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Fixes for an inotify deadlock and a data race in fsnotify" * tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Fix possible deadlock in fsnotify_destroy_mark fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
2024-10-04Merge tag 'fs_for_v6.12-rc2' of ↵Linus Torvalds7-106/+224
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A couple of UDF error handling fixes for issues spotted by syzbot" * tag 'fs_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix uninit-value use in udf_get_fileshortad udf: refactor inode_bmap() to handle error udf: refactor udf_next_aext() to handle error udf: refactor udf_current_aext() to handle error
2024-10-04Merge tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds1-2/+5
Pull ceph fixes from Ilya Dryomov: "A fix from Patrick for a variety of CephFS lockup scenarios caused by a regression in cap handling which sneaked in through the netfs helper library in 5.18 (marked for stable) and an unrelated one-line cleanup" * tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-client: ceph: fix cap ref leak via netfs init_request ceph: use struct_size() helper in __ceph_pool_perm_get()
2024-10-04Merge tag 'for-6.12-rc1-tag' of ↵Linus Torvalds6-82/+57
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in incremental send, fix invalid clone operation for file that got its size decreased - fix __counted_by() annotation of send path cache entries, we do not store the terminating NUL - fix a longstanding bug in relocation (and quite hard to hit by chance), drop back reference cache that can get out of sync after transaction commit - wait for fixup worker kthread before finishing umount - add missing raid-stripe-tree extent for NOCOW files, zoned mode cannot have NOCOW files but RST is meant to be a standalone feature - handle transaction start error during relocation, avoid potential NULL pointer dereference of relocation control structure (reported by syzbot) - disable module-wide rate limiting of debug level messages - minor fix to tracepoint definition (reported by checkpatch.pl) * tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: disable rate limiting when debug enabled btrfs: wait for fixup workers before stopping cleaner kthread during umount btrfs: fix a NULL pointer dereference when failed to start a new trasacntion btrfs: send: fix invalid clone operation for file that got its size decreased btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class btrfs: drop the backref cache during relocation if we commit btrfs: also add stripe entries for NOCOW writes btrfs: send: fix buffer overflow detection when copying path to cache entry
2024-10-04Merge tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds24-59/+167
Pull smb client fixes from Steve French: - statfs fix (e.g. when limited access to root directory of share) - special file handling fixes: fix packet validation to avoid buffer overflow for reparse points, fixes for symlink path parsing (one for reparse points, and one for SFU use case), and fix for cleanup after failed SET_REPARSE operation. - fix for SMB2.1 signing bug introduced by recent patch to NFS symlink path, and NFS reparse point validation - comment cleanup * tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not convert delimiter when parsing NFS-style symlinks cifs: Validate content of NFS reparse point buffer cifs: Fix buffer overflow when parsing NFS reparse points smb: client: Correct typos in multiple comments across various files smb: client: use actual path when queryfs cifs: Remove intermediate object of failed create reparse call Revert "smb: client: make SHA-512 TFM ephemeral" smb: Update comments about some reparse point tags cifs: Check for UTF-16 null codepoint in SFU symlink target location
2024-10-04Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-61/+34
Pull close_range() fix from Al Viro: "Fix the logic in descriptor table trimming" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close_range(): fix the logics in descriptor table trimming
2024-10-03Merge tag 'pull-fixes.ufs' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ufs fix from Al Viro: "Fix ufs_rename() braino introduced this cycle. The 'folio_release_kmap(dir_folio, new_dir)' in ufs_rename() part of folio conversion should've been getting a pointer to ufs directory entry within the page, rather than a pointer to directory struct inode..." * tag 'pull-fixes.ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs_rename(): fix bogus argument of folio_release_kmap()
2024-10-03nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORTMike Snitzer1-1/+1
The 'default n' that was in NFS_COMMON_LOCALIO_SUPPORT caused these extra defaults to be missed: default y if NFSD=y || NFS_FS=y default m if NFSD=m && NFS_FS=m Remove the 'default n' for NFS_COMMON_LOCALIO_SUPPORT so that the correct tristate is selected based on how NFSD and NFS_FS are configured. This fixes the reported case where NFS_FS=y but NFS_COMMON_LOCALIO_SUPPORT=m, it is now correctly set to =y. In addition, add extra 'depends on NFS_LOCALIO' to NFS_COMMON_LOCALIO_SUPPORT so that if NFS_LOCALIO isn't set then NFS_COMMON_LOCALIO_SUPPORT will not be either. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2024-10-03nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()Mike Snitzer5-8/+11
Add nfs_to_nfsd_file_put_local() interface to fix race with nfsd module unload. Similarly, use RCU around nfs_open_local_fh()'s error path call to nfs_to->nfsd_serv_put(). Holding RCU ensures that NFS will safely _call and return_ from its nfs_to calls into the NFSD functions nfsd_file_put_local() and nfsd_serv_put(). Otherwise, if RCU isn't used then there is a narrow window when NFS's reference for the nfsd_file and nfsd_serv are dropped and the NFSD module could be unloaded, which could result in a crash from the return instruction for either nfs_to->nfsd_file_put_local() or nfs_to->nfsd_serv_put(). Reported-by: NeilBrown <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2024-10-03NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()Yanjun Zhang3-2/+3
On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <[email protected]> Reviewed-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2024-10-03SUNRPC: Fix integer overflow in decode_rc_list()Dan Carpenter1-0/+2
The math in "rc_list->rcl_nrefcalls * 2 * sizeof(uint32_t)" could have an integer overflow. Add bounds checking on rc_list->rcl_nrefcalls to fix that. Fixes: 4aece6a19cf7 ("nfs41: cb_sequence xdr implementation") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2024-10-03cifs: Do not convert delimiter when parsing NFS-style symlinksPali Rohár1-1/+0
NFS-style symlinks have target location always stored in NFS/UNIX form where backslash means the real UNIX backslash and not the SMB path separator. So do not mangle slash and backslash content of NFS-style symlink during readlink() syscall as it is already in the correct Linux form. This fixes interoperability of NFS-style symlinks with backslashes created by Linux NFS3 client throw Windows NFS server and retrieved by Linux SMB client throw Windows SMB server, where both Windows servers exports the same directory. Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points") Acked-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-10-03cifs: Validate content of NFS reparse point bufferPali Rohár1-0/+23
Symlink target location stored in DataBuffer is encoded in UTF-16. So check that symlink DataBuffer length is non-zero and even number. And check that DataBuffer does not contain UTF-16 null codepoint because Linux cannot process symlink with null byte. DataBuffer for char and block devices is 8 bytes long as it contains two 32-bit numbers (major and minor). Add check for this. DataBuffer buffer for sockets and fifos zero-length. Add checks for this. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>