aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2024-04-15xfs: pin inodes that would otherwise overflow link countDarrick J. Wong5-26/+36
The VFS inc_nlink function does not explicitly check for integer overflows in the i_nlink field. Instead, it checks the link count against s_max_links in the vfs_{link,create,rename} functions. XFS sets the maximum link count to 2.1 billion, so integer overflows should not be a problem. However. It's possible that online repair could find that a file has more than four billion links, particularly if the link count got corrupted while creating hardlinks to the file. The di_nlinkv2 field is not large enough to store a value larger than 2^32, so we ought to define a magic pin value of ~0U which means that the inode never gets deleted. This will prevent a UAF error if the repair finds this situation and users begin deleting links to the file. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: try to avoid allocating from sick inode clustersDarrick J. Wong1-0/+40
I noticed that xfs/413 and xfs/375 occasionally failed while fuzzing core.mode of an inode. The root cause of these problems is that the field we fuzzed (core.mode or core.magic, typically) causes the entire inode cluster buffer verification to fail, which affects several inodes at once. The repair process tries to create either a /lost+found or a temporary repair file, but regrettably it picks the same inode cluster that we just corrupted, with the result that repair triggers the demise of the filesystem. Try avoid this by making the inode allocation path detect when the perag health status indicates that someone has found bad inode cluster buffers, and try to read the inode cluster buffer. If the cluster buffer fails the verifiers, try another AG. This isn't foolproof and can result in premature ENOSPC, but that might be better than shutting down. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: check unused nlink fields in the ondisk inodeDarrick J. Wong2-0/+20
v2/v3 inodes use di_nlink and not di_onlink; and v1 inodes use di_onlink and not di_nlink. Whichever field is not in use, make sure its contents are zero, and teach xfs_scrub to fix that if it is. This clears a bunch of missing scrub failure errors in xfs/385 for core.onlink. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: repair AGI unlinked inode bucket listsDarrick J. Wong3-4/+1074
Teach the AGI repair code to rebuild the unlinked buckets and lists. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: hoist AGI repair context to a heap objectDarrick J. Wong1-42/+63
Save ~460 bytes of stack space by moving all the repair context to a heap object. We're going to add even more context data in the next patch, which is why we really need to do this now. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: check AGI unlinked inode bucketsDarrick J. Wong3-1/+42
Look for corruptions in the AGI unlinked bucket chains. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: online repair of symbolic linksDarrick J. Wong7-2/+587
If a symbolic link target looks bad, try to sift through the rubble to find as much of the target buffer that we can, and stage a new target (short or remote format as needed) in a temporary file and use the atomic extent swapping mechanism to commit the results. In the worst case, we replace the target with an overly long filename that cannot possibly resolve. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: pass the owner to xfs_symlink_write_targetDarrick J. Wong3-6/+6
Require callers of xfs_symlink_write_target to pass the owner number explicitly. This sets us up for online repair to be able to write a remote symlink target to sc->tempip with sc->ip's inumber in the block heaader. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: expose xfs_bmap_local_to_extents for online repairDarrick J. Wong4-7/+16
Allow online repair to call xfs_bmap_local_to_extents and add a void * argument at the end so that online repair can pass its own context. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: ensure dentry consistency when the orphanage adopts a fileDarrick J. Wong2-0/+133
When the orphanage adopts a file, that file becomes a child of the orphanage. The dentry cache may have entries for the orphanage directory and the name we've chosen, so (1) make sure we abort if the dcache has a positive entry because something's not right; and (2) invalidate and purge negative dentries if the adoption goes through. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: move files to orphanage instead of letting nlinks drop to zeroDarrick J. Wong6-18/+161
If we encounter an inode with a nonzero link count but zero observed links, move it to the orphanage. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: move orphan files to the orphanageDarrick J. Wong10-14/+831
When we're repairing a directory structure or fixing the dotdot entry of a subdirectory, it's possible that we won't ever find a parent for the subdirectory. When this is the case, move it to the orphanage, aka /lost+found. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: ask the dentry cache if it knows the parent of a directoryDarrick J. Wong5-1/+81
It's possible that the dentry cache can tell us the parent of a directory. Therefore, when repairing directory dot dot entries, query the dcache as a last resort before scanning the entire filesystem. A reviewer asks: "How high is the chance that we actually have a valid dcache entry for a file in a corrupted directory?" There's a decent chance of this actually working. Say you have a 1000-block directory foo, and block 980 gets corrupted. Let's further suppose that block 0 has a correct entry for ".." and "bar". If someone accesses /mnt/foo/bar, that will cause the dcache to create a dentry from /mnt to /mnt/foo whose d_parent points back to /mnt. If you then want to rebuild the directory, XFS can obtain the parent from the dcache without needing to wander into parent pointers or scan the filesystem to find /mnt's connection to foo. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: online repair of parent pointersDarrick J. Wong6-1/+238
Teach the online repair code to fix parent pointers for directories. For now, this means correcting the dotdot entry of an existing directory that is otherwise consistent. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: scan the filesystem to repair a directory dotdot entryDarrick J. Wong7-24/+528
Teach the online directory repair code to scan the filesystem so that we can set the dotdot entry when we're rebuilding a directory. This involves dropping ILOCK on the directory that we're repairing, which means that the VFS can sneak in and tell us to update dotdot at any time. Deal with these races by using a dirent hook to absorb dotdot updates, and be careful not to check the scan results until after we've retaken the ILOCK. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: online repair of directoriesDarrick J. Wong15-2/+1563
If a directory looks like it's in bad shape, try to sift through the rubble to find whatever directory entries we can, scan the directory tree for the parent (if needed), stage the new directory contents in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: inactivate directory data blocksDarrick J. Wong1-0/+51
Teach inode inactivation to delete all the incore buffers backing a directory. In normal runtime this should never happen because the VFS forbids rmdir on a non-empty directory. In the next patch, online directory repair stands up a new directory, exchanges it with the broken directory, and then drops the private temporary directory. If we cancel the repair just prior to exchanging the directory contents, the new directory will need to be torn down. Note: If we commit the repair, reaping will take care of all the ondisk space allocations and incore buffers for the old corrupt directory. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: update the unlinked list when repairing link countsDarrick J. Wong1-9/+33
When we're repairing the link counts of a file, we must ensure either that the file has zero link count and is on the unlinked list; or that it has nonzero link count and is not on the unlinked list. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: ensure unlinked list state is consistent with nlink during scrubDarrick J. Wong4-4/+67
Now that we have the means to tell if an inode is on an unlinked inode list or not, we can check that an inode with zero link count is on the unlinked list; and an inode that has nonzero link count is not on that list. Make repair clean things up too. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: create an xattr iteration function for scrubDarrick J. Wong5-78/+414
Create a streamlined function to walk a file's xattrs, without all the cursor management stuff in the regular listxattr. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: flag empty xattr leaf blocks for optimizationDarrick J. Wong2-0/+13
Empty xattr leaf blocks at offset zero are a waste of space but otherwise harmless. If we encounter one, flag it as an opportunity for optimization. If we encounter empty attr leaf blocks anywhere else in the attr fork, that's corruption. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: scrub should set preen if attr leaf has holesDarrick J. Wong4-0/+20
If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: repair extended attributesDarrick J. Wong19-4/+1436
If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: use atomic extent swapping to fix user file fork dataDarrick J. Wong5-1/+211
Build on the code that was recently added to the temporary repair file code so that we can atomically switch the contents of any file fork, even if the fork is in local format. The upcoming functions to repair xattrs, directories, and symlinks will need that capability. Repair can lock out access to these user files by holding IOLOCK_EXCL on these user files. Therefore, it is safe to drop the ILOCK of both the file being repaired and the tempfile being used for staging, and cancel the scrub transaction. We do this so that we can reuse the resource estimation and transaction allocation functions used by a regular file exchange operation. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: create a blob array data structureDarrick J. Wong3-0/+176
Create a simple 'blob array' data structure for storage of arbitrarily sized metadata objects that will be used to reconstruct metadata. For the intended usage (temporarily storing extended attribute names and values) we only have to support storing objects and retrieving them. Use the xfile abstraction to store the attribute information in memory that can be swapped out. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: enable discarding of folios backing an xfileDarrick J. Wong3-0/+14
Create a new xfile function to discard the page cache that's backing part of an xfile. The next patch wil use this to drop parts of an xfile that aren't needed anymore. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory free block ownersDarrick J. Wong4-18/+23
Port the existing directory freespace block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory block buffer ownersDarrick J. Wong7-14/+19
Port the existing directory block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory data buffer ownersDarrick J. Wong9-31/+39
Port the existing directory data header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate directory leaf buffer ownersDarrick J. Wong6-10/+82
Check the owner field of directory leaf blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate dabtree node buffer ownersDarrick J. Wong3-0/+119
Check the owner field of dabtree node blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate attr remote value buffer ownersDarrick J. Wong1-5/+4
Check the owner field of xattr remote value blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate attr leaf buffer ownersDarrick J. Wong8-19/+128
Create a leaf block header checking function to validate the owner field of xattr leaf blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: reduce indenting in xfs_attr_node_listDarrick J. Wong1-27/+29
Reduce the indentation here so that we can add some things in the next patch without going over the column limits. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: use the xfs_da_args owner field to set new dir/attr block ownerDarrick J. Wong7-21/+21
When we're creating leaf, data, freespace, or dabtree blocks for directories and xattrs, use the explicit owner field (instead of the xfs_inode) to set the owner field. This will enable online repair to construct replacement data structures in a temporary file without having to change the owner fields prior to swapping the new and old structures. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: add an explicit owner field to xfs_da_argsDarrick J. Wong13-3/+28
Add an explicit owner field to xfs_da_args, which will make it easier for online fsck to set the owner field of the temporary directory and xattr structures that it builds to repair damaged metadata. Note: I hopefully found all the xfs_da_args definitions by looking for automatic stack variable declarations and xfs_da_args.dp assignments: git grep -E '(args.*dp =|struct xfs_da_args[[:space:]]*[a-z0-9][a-z0-9]*)' Note that callers of xfs_attr_{get,set,change} can set the owner to zero (or leave it unset) to have the default set to args->dp. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: online repair of realtime summariesDarrick J. Wong7-16/+239
Repair the realtime summary data by constructing a new rtsummary file in the scrub temporary file, then atomically swapping the contents. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: teach the tempfile to set up atomic file content exchangesDarrick J. Wong5-3/+225
Create some new routines to exchange the contents of a temporary file created to stage a repair with another ondisk file. This will be used by the realtime summary repair function to commit atomically the new rtsummary data, which will be staged in the tempfile. The rest of XFS coordinates access to the realtime metadata inodes solely through the ILOCK. For repair to hold its exclusive access to the realtime summary file, it has to allocate a single large transaction and roll it repeatedly throughout the repair while holding the ILOCK. In turn, this means that for now there's only a partial file mapping exchange implementation for the temporary file because we can only work within an existing transaction. For now, the only tempswap functions needed here are to estimate the resource requirements of the exchange, reserve more space/quota to an existing transaction, and kick off the actual exchange. The rest will be added in a later patch in preparation for repairing xattrs and directories. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: support preallocating and copying content into temporary filesDarrick J. Wong3-0/+251
Create the routines we need to preallocate space in a temporary ondisk file and then copy the contents of an xfile into the tempfile. The upcoming rtsummary repair feature will construct the contents of a realtime summary file in memory, after which it will want to copy all that into the ondisk temporary file before atomically committing the new rtsummary contents. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: add the ability to reap entire inode forksDarrick J. Wong3-0/+436
In preparation for supporting repair of indexed file-based metadata (such as realtime bitmaps, directories, and extended attribute data), add a function to reap the old blocks after a metadata repair finishes. IOWs, this is an elaborate bunmapi call that deals with crosslinked blocks by unmapping them without freeing them, and also scans for incore buffers to invalidate. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: refactor live buffer invalidation for repairsDarrick J. Wong2-22/+71
In an upcoming patch, we will need to be able to look for xfs_buf objects caching file-based metadata blocks without needing to walk the (possibly corrupt) structures to find all the buffers. Repair already has most of the code needed to scan the buffer cache, so hoist these utility functions. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: create temporary files and directories for online repairDarrick J. Wong9-3/+324
Teach the online repair code how to create temporary files or directories. These temporary files can be used to stage reconstructed information until we're ready to perform an atomic extent swap to commit the new metadata. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: hide private inodes from bulkstat and handle functionsDarrick J. Wong3-1/+12
We're about to start adding functionality that uses internal inodes that are private to XFS. What this means is that userspace should never be able to access any information about these files, and should not be able to open these files by handle. To prevent users from ever finding the file or mis-interactions with the security apparatus, set S_PRIVATE on the inode. Don't allow bulkstat, open-by-handle, or linking of S_PRIVATE files into the directory tree. This should keep private inodes actually private. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: enable logged file mapping exchange featureDarrick J. Wong1-1/+2
Add the XFS_SB_FEAT_INCOMPAT_EXCHRANGE feature to the set of features that we will permit when mounting a filesystem. This turns on support for the file range exchange feature. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: capture inode generation numbers in the ondisk exchmaps log itemDarrick J. Wong4-5/+55
Per some very late review comments, capture the generation numbers of both inodes involved in a file content exchange operation so that we don't accidentally target files with have been reallocated. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: support non-power-of-two rtextsize with exchange-rangeDarrick J. Wong1-7/+82
The generic exchange-range alignment checks use (fast) bitmasking operations to perform block alignment checks on the exchange parameters. Unfortunately, bitmasks require that the alignment size be a power of two. This isn't true for realtime devices with a non-power-of-two extent size, so we have to copy-pasta the generic checks using long division for this to work properly. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: make file range exchange support realtime filesDarrick J. Wong2-10/+69
Now that bmap items support the realtime device, we can add the necessary pieces to the file range exchange code to support exchanging mappings. All we really need to do here is adjust the blockcount upwards to the end of the rt extent and remove the inode checks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense symbolic links after a mapping exchange operationDarrick J. Wong4-45/+103
The previous commit added a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. Now add this ability for symlinks. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online symlink repair feature can salvage the remote target in a temporary link and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed symlink down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense directories after a mapping exchange operationDarrick J. Wong1-0/+43
The previous commit added a new file mapping exchange flag that enables us to perform post-swap processing on file2 once we're done exchanging extent mappings. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense extended attributes after a mapping exchange operationDarrick J. Wong3-3/+58
Add a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. If we were swapping mappings between extended attribute forks, we want to be able to convert file2's attr fork from block to inline format. (This implies that all fork contents are exchanged.) This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and exchange the attr fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>