aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2024-05-03xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extentChristoph Hellwig1-6/+0
Accessing if_bytes without the ilock is racy. Remove the initial if_bytes == 0 check in xfs_reflink_end_cow_extent and let ext_iext_lookup_extent fail for this case after we've taken the ilock. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: upgrade the extent counters in xfs_reflink_end_cow_extent laterChristoph Hellwig1-8/+8
Defer the extent counter size upgrade until we know we're going to modify the extent mapping. This also defers dirtying the transaction and will allow us safely back out later in the function in later changes. Fixes: 4f86bb4b66c9 ("xfs: Conditionally upgrade existing inodes to use large extent counters") Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: xfs_quota_unreserve_blkres can't failChristoph Hellwig8-37/+20
Unreserving quotas can't fail due to quota limits, and we'll notice a shut down file system a bit later in all the callers anyway. Return void and remove the error checking and propagation in the callers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: consolidate the xfs_quota_reserve_blkres definitionsChristoph Hellwig1-12/+6
xfs_trans_reserve_quota_nblks is already stubbed out if quota support is disabled, no need for an extra xfs_quota_reserve_blkres stub. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: clean up buffer allocation in xlog_do_recovery_passChristoph Hellwig1-7/+6
Merge the initial xlog_alloc_buffer calls, and pass the variable designating the length that is initialized to 1 above instead of passing the open coded 1 directly. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: fix log recovery buffer allocation for the legacy h_size fixupChristoph Hellwig1-6/+14
Commit a70f9fe52daa ("xfs: detect and handle invalid iclog size set by mkfs") added a fixup for incorrect h_size values used for the initial umount record in old xfsprogs versions. Later commit 0c771b99d6c9 ("xfs: clean up calculation of LR header blocks") cleaned up the log reover buffer calculation, but stoped using the fixed up h_size value to size the log recovery buffer, which can lead to an out of bounds access when the incorrect h_size does not come from the old mkfs tool, but a fuzzer. Fix this by open coding xlog_logrec_hblks and taking the fixed h_size into account for this calculation. Fixes: 0c771b99d6c9 ("xfs: clean up calculation of LR header blocks") Reported-by: Sam Sun <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-02set_blocksize(): switch to passing struct file *Al Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2024-05-02xfs: widen flags argument to the xfs_iflags_* helpersDarrick J. Wong2-10/+8
xfs_inode.i_flags is an unsigned long, so make these helpers take that as the flags argument instead of unsigned short. This is needed for the next patch. While we're at it, remove the iflags variable from xfs_iget_cache_miss because we no longer need it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]>
2024-05-02xfs: minor cleanups of xfs_attr3_rmt_blocksDarrick J. Wong1-8/+8
Clean up the type signature of this function since we don't have negative attr lengths or block counts. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-05-02xfs: create a helper to compute the blockcount of a max sized remote valueDarrick J. Wong3-3/+9
Create a helper function to compute the number of fsblocks needed to store a maximally-sized extended attribute value. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-05-02xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a functionDarrick J. Wong2-6/+17
Turn this into a properly typechecked function, and actually use the correct blocksize for extended attributes. The function cannot be static inline because xfsprogs userspace uses it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-05-02xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.cDarrick J. Wong2-32/+31
In the next few patches we're going to refactor the attr remote code so that we can support headerless remote xattr values for storing merkle tree blocks. For now, let's change the code to use unsigned int to describe quantities of bytes and blocks that cannot be negative. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-30xfs: do not allocate the entire delalloc extent in xfs_bmapi_writeChristoph Hellwig1-2/+3
While trying to convert the entire delalloc extent is a good decision for regular writeback as it leads to larger contigous on-disk extents, but for other callers of xfs_bmapi_write is is rather questionable as it forced them to loop creating new transactions just in case there is no large enough contiguous extent to cover the whole delalloc reservation. Change xfs_bmapi_write to only allocate the passed in range instead, whіle the writeback path through xfs_bmapi_convert_delalloc and xfs_bmapi_allocate still always converts the full extents. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: fix xfs_bmap_add_extent_delay_real for partial conversionsChristoph Hellwig1-5/+10
xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent and converts them to a real extent. It is written to deal with any potential overlap of the to be converted range with the delalloc extent, but it turns out that currently only converting the entire extents, or a part starting at the beginning is actually exercised, as the only caller always tries to convert the entire delalloc extent, and either succeeds or at least progresses partially from the start. If it only converts a tiny part of a delalloc extent, the indirect block calculation for the new delalloc extent (da_new) might be equivalent to that of the existing delalloc extent (da_old). If this extent conversion now requires allocating an indirect block that gets accounted into da_new, leading to the assert that da_new must be smaller or equal to da_new unless we split the extent to trigger. Except for the assert that case is actually handled by just trying to allocate more space, as that already handled for the split case (which currently can't be reached at all), so just reusing it should be fine. Except that without dipping into the reserved block pool that would make it a bit too easy to trigger a fs shutdown due to ENOSPC. So in addition to adjusting the assert, also dip into the reserved block pool. Note that I could only reproduce the assert with a change to only convert the actually asked range instead of the full delalloc extent from xfs_bmapi_write. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocateChristoph Hellwig1-5/+0
Both callers of xfs_bmapi_allocate already initialize bma->prev, don't redo that in xfs_bmapi_allocate. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: pass the actual offset and len to allocate to xfs_bmapi_allocateChristoph Hellwig1-14/+18
xfs_bmapi_allocate currently overwrites offset and len when converting delayed allocations, and duplicates the length cap done for non-delalloc allocations. Move all that logic into the callers to avoid duplication and to make the calling conventions more obvious. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_writeChristoph Hellwig1-6/+3
XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct xfs_filblks_t type. Use it in xfs_bmapi_write and slightly adjust the comment document th potential pitfall to take account of this Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: lift a xfs_valid_startblock into xfs_bmapi_allocateChristoph Hellwig1-6/+5
xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block allocated by xfs_bmapi_allocate. Lift it into xfs_bmapi_allocate as we should assert the same for xfs_bmapi_write. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocateChristoph Hellwig1-3/+0
tmp_logflags is initialized to 0 and then ORed into bma->logflags, which isn't actually doing anything. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-30xfs: fix error returns from xfs_bmapi_writeChristoph Hellwig10-74/+57
xfs_bmapi_write can return 0 without actually returning a mapping in mval in two different cases: 1) when there is absolutely no space available to do an allocation 2) when converting delalloc space, and the allocation is so small that it only covers parts of the delalloc extent before the range requested by the caller Callers at best can handle one of these cases, but in many cases can't cope with either one. Switch xfs_bmapi_write to always return a mapping or return an error code instead. For case 1) above ENOSPC is the obvious choice which is very much what the callers expect anyway. For case 2) there is no really good error code, so pick a funky one from the SysV streams portfolio. This fixes the reproducer here: https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/ which uses reserved blocks to create file systems that are gravely out of space and thus cause at least xfs_file_alloc_space to hang and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc. Note that this patch does not actually make any caller but xfs_alloc_file_space deal intelligently with case 2) above. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: 刘通 <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-29xfs: convert delayed extents to unwritten when zeroing post eof blocksZhang Yi1-0/+29
Current clone operation could be non-atomic if the destination of a file is beyond EOF, user could get a file with corrupted (zeroed) data on crash. The problem is about preallocations. If you write some data into a file: [A...B) and XFS decides to preallocate some post-eof blocks, then it can create a delayed allocation reservation: [A.........D) The writeback path tries to convert delayed extents to real ones by allocating blocks. If there aren't enough contiguous free space, we can end up with two extents, the first real and the second still delalloc: [A....C)[C.D) After that, both the in-memory and the on-disk file sizes are still B. If we clone into the range [E...F) from another file: [A....C)[C.D) [E...F) then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc extent, its pagecache will be zeroed and both the in-memory and on-disk size will be updated to D after flushing but before cloning. This is wrong, because the user can see the size change and read the zeroes while the clone operation is ongoing. We need to keep the in-memory and on-disk size before the clone operation starts, so instead of writing zeroes through the page cache for delayed ranges beyond EOF, we convert these ranges to unwritten and invalidate any cached data over that range beyond EOF. Suggested-by: Dave Chinner <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-29xfs: make xfs_bmapi_convert_delalloc() to allocate the target offsetZhang Yi2-42/+46
Since xfs_bmapi_convert_delalloc() only attempts to allocate the entire delalloc extent and require multiple invocations to allocate the target offset. So xfs_convert_blocks() add a loop to do this job and we call it in the write back path, but xfs_convert_blocks() isn't a common helper. Let's do it in xfs_bmapi_convert_delalloc() and drop xfs_convert_blocks(), preparing for the post EOF delalloc blocks converting in the buffered write begin path. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-29xfs: make the seq argument to xfs_bmapi_convert_delalloc() optionalZhang Yi1-2/+4
Allow callers to pass a NULLL seq argument if they don't care about the fork sequence number. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-29xfs: match lock mode in xfs_buffered_write_iomap_begin()Zhang Yi1-5/+5
Commit 1aa91d9c9933 ("xfs: Add async buffered write support") replace xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the writing inode, and a new variable lockmode is used to indicate the lock mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still better to use this variable instead of useing XFS_ILOCK_EXCL directly when unlocking the inode. Fixes: 1aa91d9c9933 ("xfs: Add async buffered write support") Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-26xfs: refactor dir format helpersChristoph Hellwig6-150/+105
Add a new enum and a xfs_dir2_format helper that returns it to allow the code to switch on the format of a directory in a single operation and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it. This also removes the explicit xfs_iread_extents call in a few of the call sites given that xfs_bmap_last_offset already takes care of it underneath. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-26xfs: factor out a xfs_dir_replace_args helperChristoph Hellwig3-41/+28
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-26xfs: factor out a xfs_dir_removename_args helperChristoph Hellwig3-42/+27
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-26xfs: factor out a xfs_dir_createname_args helperChristoph Hellwig3-43/+30
Add a helper to switch between the different directory formats for creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag based on the passed in ino number field. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-26xfs: factor out a xfs_dir_lookup_args helperChristoph Hellwig3-60/+43
Add a helper to switch between the different directory formats for lookup and to handle the -EEXIST return for a successful lookup. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-24xfs: don't call xfs_file_open from xfs_dir_openChristoph Hellwig1-1/+3
Directories do not support direct I/O and thus no non-blocking direct I/O either. Open code the shutdown check and call to generic_file_open instead. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-04-24xfs: drop fop_flags for directoriesChristoph Hellwig1-2/+0
Directories have non of the capabilities, so drop the flags. Note that the current state is harmless as no one actually checks for the flags either. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-04-24xfs: fix overly long line in the file_operationsChristoph Hellwig1-4/+4
Re-wrap the newly added fop_flags fields to not go over 80 characters. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-04-24xfs: Remove unused function xrep_dir_self_parentJiapeng Chong1-21/+0
The function are defined in the dir_repair.c file, but not called elsewhere, so delete the unused function. fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867 Signed-off-by: Jiapeng Chong <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-23xfs: invalidate dentries for a file before moving it to the orphanageDarrick J. Wong2-29/+20
Invalidate the cached dentries that point to the file that we're moving to lost+found before we actually move it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: exchange-range for repairs is no longer dynamicDarrick J. Wong10-45/+25
The atomic file exchange-range functionality is now a permanent filesystem feature instead of a dynamic log-incompat feature. It cannot be turned on at runtime, so we no longer need the XCHK_FSGATES flags and whatnot that supported it. Remove the flag and the enable function, and move the xfs_has_exchange_range checks to the start of the repair functions. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: fix iunlock calls in xrep_adoption_trans_allocDarrick J. Wong1-1/+1
If the transaction allocation in xrep_adoption_trans_alloc fails, we should drop only the locks that we took. In this case this is ILOCK_EXCL of both the orphanage and the file being repaired. Dropping any IOLOCK here is incorrect. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: drop the scrub file's iolock when transaction allocation failsDarrick J. Wong1-1/+3
If the transaction allocation in the !orphanage_available case of xrep_nlinks_repair_inode fails, we need to drop the IOLOCK of the file being scrubbed before exiting. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: only iget the file once when doing vectored scrub-by-handleDarrick J. Wong1-0/+45
If a program wants us to perform a scrub on a file handle and the fd passed to ioctl() is not the file referenced in the handle, iget the file once and pass it into the scrub code. This amortizes the untrusted iget lookup over /all/ the scrubbers mentioned in the scrubv call. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime on account of avoiding repeated inobt lookups. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: use dontcache for grabbing inodes during scrubDarrick J. Wong3-11/+21
Back when I wrote commit a03297a0ca9f2, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. Note: If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode *and* the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the effect of setting I_DONTCACHE for cache misses. Remove the d_mark_dontcache call that can happen in xchk_irele. Fixes: a03297a0ca9f2 ("xfs: manage inode DONTCACHE status at irele time") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: introduce vectored scrub modeDarrick J. Wong5-1/+264
Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored mode. The caller specifies the principal metadata object that they want to scrub (allocation group, inode, etc.) once, followed by an array of scrub types they want called on that object. The kernel runs the scrub operations and writes the output flags and errno code to the corresponding array element. A new pseudo scrub type BARRIER is introduced to force the kernel to return to userspace if any corruptions have been found when scrubbing the previous scrub types in the array. This enables userspace to schedule, for example, the sequence: 1. data fork 2. barrier 3. directory If the data fork scrub is clean, then the kernel will perform the directory scrub. If not, the barrier in 2 will exit back to userspace. The alternative would have been an interface where userspace passes a pointer to an empty buffer, and the kernel formats that with xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome was. With that the kernel would have to communicate that the buffer needed to have been at least X size, even though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough. Compared to that, this design keeps all the dependency policy and ordering logic in userspace where it already resides instead of duplicating it in the kernel. The downside of that is that it needs the barrier logic. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime due to fewer transitions across the system call boundary. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: move xfs_ioc_scrub_metadata to scrub.cDarrick J. Wong3-27/+28
Move the scrub ioctl handler to scrub.c to keep the code together and to reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: reduce the rate of cond_resched calls inside scrubDarrick J. Wong6-31/+74
We really don't want to call cond_resched every single time we go through a loop in scrub -- there may be billions of records, and probing into the scheduler itself has overhead. Reduce this overhead by only calling cond_resched 10x per second; and add a counter so that we only check jiffies once every 1000 records or so. Surprisingly, this reduces scrub-only fstests runtime by about 2%. I used the bmapinflate xfs_db command to produce a billion-extent file and this stupid gadget reduced the scrub runtime by about 4%. From a stupid microbenchmark of calling these things 1 billion times, I estimate that cond_resched costs about 5.5ns per call; jiffes costs about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per call. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: fix corruptions in the directory treeDarrick J. Wong11-8/+927
Repair corruptions in the directory tree itself. Cycles are broken by removing an incoming parent->child link. Multiply-owned directories are fixed by pruning the extra parent -> child links Disconnected subtrees are reconnected to the lost and found. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: report directory tree corruption in the health informationDarrick J. Wong4-1/+6
Report directories that are the source of corruption in the directory tree. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: invalidate dirloop scrub path data when concurrent updates happenDarrick J. Wong3-1/+244
Add a dirent update hook so that we can detect directory tree updates that affect any of the paths found by this scrubber and force it to rescan. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: teach online scrub to find directory tree structure problemsDarrick J. Wong12-2/+1168
Create a new scrubber that detects corruptions within the directory tree structure itself. It can detect directories with multiple parents; loops within the directory tree; and directory loops not accessible from the root. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: inode repair should ensure there's an attr fork to store parent pointersDarrick J. Wong1-0/+41
The runtime parent pointer update code expects that any file being moved around the directory tree already has an attr fork. However, if we had to rebuild an inode core record, there's a chance that we zeroed forkoff as part of the inode to pass the iget verifiers. Therefore, if we performed any repairs on an inode core, ensure that the inode has a nonzero forkoff before unlocking the inode. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: repair link count of nondirectories after rebuilding parent pointersDarrick J. Wong1-0/+107
Since the parent pointer scrubber does not exhaustively search the filesystem for missing parent pointers, it doesn't have a good way to determine that there are pointers missing from an otherwise uncorrupt xattr structure. Instead, for nondirectories it employs a heuristic of comparing the file link count to the number of parent pointers found. However, we don't want this heuristic flagging a false corruption after a repair has actually scanned the entire filesystem to rebuild the parent pointers. Therefore, reset the file link count in this one case because we actually know the correct link count. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: adapt the orphanage code to handle parent pointersDarrick J. Wong3-0/+43
Adapt the orphanage's adoption code to update the child file's parent pointers as part of the reparenting process. Also ensure that the child has an attr fork to receive the parent pointer update, since the runtime code assumes one exists. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: actually rebuild the parent pointer xattrsDarrick J. Wong7-23/+701
Once we've assembled all the parent pointers for a file, we need to commit the new dataset atomically to that file. Parent pointer records are embedded in the xattr structure, which means that we must write a new extended attribute structure, again, atomically. Therefore, we must copy the non-parent-pointer attributes from the file being repaired into the temporary file's extended attributes and then call the atomic extent swap mechanism to exchange the blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>