aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)AuthorFilesLines
2024-04-23xfs: check the flags earlier in xfs_attr_matchChristoph Hellwig1-9/+10
Checking the flags match is much cheaper than a memcmp, so do it early on in xfs_attr_match, and also add a little helper to calculate the match mask right under the comment explaining the logic for it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2024-04-23xfs: rearrange xfs_attr_match parametersDarrick J. Wong1-11/+12
Rearrange the parameters to this function so that they match the order of attr listent: attr_flags -> name -> namelen -> value -> valuelen. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: enforce one namespace per attributeDarrick J. Wong3-2/+20
Create a standardized helper function to enforce one namespace bit per extended attribute, and refactor all the open-coded hweight logic. This function is not a static inline to avoid porting hassles in userspace. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: use helpers to extract xattr op from opflagsDarrick J. Wong1-0/+5
Create helper functions to extract the xattr op from the ondisk xattri log item and the incore attr intent item. These will get more use in the patches that follow. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: restructure xfs_attr_complete_op a bitDarrick J. Wong1-5/+4
Eliminate the local variable from this function so that we can streamline things a bit later when we add the PPTR_REPLACE op code. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: fix missing check for invalid attr flagsDarrick J. Wong1-0/+5
The xattr scrubber doesn't check for undefined flags in shortform attr entries. Therefore, define a mask XFS_ATTR_ONDISK_MASK that has all possible XFS_ATTR_* flags in it, and use that to check for unknown bits in xchk_xattr_actor. Refactor the check in the dabtree scanner function to use the new mask as well. The redundant checks need to be in place because the dabtree check examines the hash mappings and therefore needs to decode the attr leaf entries to compute the namehash. This happens before the walk of the xattr entries themselves. Fixes: ae0506eba78fd ("xfs: check used space of shortform xattr structures") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: attr fork iext must be loaded before calling xfs_attr_is_leafDarrick J. Wong1-0/+17
Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can access the incore extent tree of the attr fork, but nothing in the xfs_attr_get path guarantees that the incore tree is actually loaded. Most of the time it is, but seeing as xfs_attr_is_leaf ignores the return value of xfs_iext_get_extent I guess we've been making choices based on random stack contents and nobody's complained? Reported-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: rearrange xfs_da_args a bit to use less spaceDarrick J. Wong1-9/+11
A few notes about struct xfs_da_args: The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which means that attr_filter could be a u8 field. I've reduced the number of XFS_DA_OP_* flags down to the point where op_flags would also fit into a u8. filetype has 7 bytes of slack after it, which is wasteful. namelen will never be greater than MAXNAMELEN, which is 256. This field could be reduced to a short. Rearrange the fields in xfs_da_args to waste less space. This reduces the structure size from 136 bytes to 128. Later when we add extra fields to support parent pointer replacement, this will only bloat the structure to 144 bytes, instead of 168. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: make attr removal an explicit operationDarrick J. Wong2-10/+12
Parent pointers match attrs on name+value, unlike everything else which matches on only the name. Therefore, we cannot keep using the heuristic that !value means remove. Make this an explicit operation code. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: remove xfs_da_args.attr_flagsDarrick J. Wong3-5/+12
This field only ever contains XATTR_{CREATE,REPLACE}, and it only goes as deep as xfs_attr_set. Remove the field from the structure and replace it with an enum specifying exactly what kind of change we want to make to the xattr structure. Upsert is the name that we'll give to the flags==0 operation, because we're either updating an existing value or inserting it, and the caller doesn't care. Note: The "UPSERTR" name created here is to make userspace porting easier. It will be removed in the next patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: remove XFS_DA_OP_NOTIMEDarrick J. Wong2-7/+4
The only user of this flag sets it prior to an xfs_attr_get_ilocked call, which doesn't update anything. Get rid of the flag. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-23xfs: remove XFS_DA_OP_REMOVEDarrick J. Wong2-5/+2
Nobody checks this flag, so get rid of it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-22xfs: stop the steal (of data blocks for RT indirect blocks)Christoph Hellwig1-1/+6
When xfs_bmap_del_extent_delay has to split an indirect block it tries to steal blocks from the the part that gets unmapped to increase the indirect block reservation that now needs to cover for two extents instead of one. This works perfectly fine on the data device, where the data and indirect blocks come from the same pool. It has no chance of working when the inode sits on the RT device. To support re-enabling delalloc for inodes on the RT device, make this behavior conditional on not being for rt extents. Note that split of delalloc extents should only happen on writeback failure, as for other kinds of hole punching we first write back all data and thus convert the delalloc reservations covering the hole to a real allocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: rework splitting of indirect block reservationsChristoph Hellwig1-22/+16
Move the check if we have enough indirect blocks and the stealing of the deleted extent blocks out of xfs_bmap_split_indlen and into the caller to prepare for handling delayed allocation of RT extents that can't easily be stolen. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: support RT inodes in xfs_mod_delallocChristoph Hellwig1-6/+6
To prepare for re-enabling delalloc on RT devices, track the data blocks (which use the RT device when the inode sits on it) and the indirect blocks (which don't) separately to xfs_mod_delalloc, and add a new percpu counter to also track the RT delalloc blocks. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delayChristoph Hellwig1-10/+10
The code to account fdblocks and frextents in xfs_bmap_del_extent_delay is a bit weird in that it accounts frextents before the iext tree manipulations and fdblocks after it. Given that the iext tree manipulations cannot fail currently that's not really a problem, but still odd. Move the frextent manipulation to the end, and use a fdblocks variable to account of the unconditional indirect blocks and the data blocks only freed for !RT. This prepares for following updates in the area and already makes the code more readable. Also remove the !isrt assert given that this code clearly handles rt extents correctly, and we'll soon reinstate delalloc support for RT inodes. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: reinstate RT support in xfs_bmapi_reserve_delallocChristoph Hellwig1-8/+14
Allocate data blocks for RT inodes using xfs_dec_frextents. While at it optimize the data device case by doing only a single xfs_dec_fdblocks call for the extent itself and the indirect blocks. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: split xfs_mod_freecounterChristoph Hellwig5-35/+22
xfs_mod_freecounter has two entirely separate code paths for adding or subtracting from the free counters. Only the subtract case looks at the rsvd flag and can return an error. Split xfs_mod_freecounter into separate helpers for subtracting or adding the freecounter, and remove all the impossible to reach error handling for the addition case. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: move RT inode locking out of __xfs_bunmapiChristoph Hellwig2-7/+11
__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary inodes given that it is very high level code. While this only looks ugly right now, it will become a problem when supporting delayed allocations for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents and thus never unlock the rt inodes. Move the locking into xfs_bmap_del_extent_real just before the call to xfs_rtfree_blocks instead and use a new flag in the transaction to ensure that the locking happens only once. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: free RT extents after updating the bmap btreeChristoph Hellwig1-17/+9
Currently xfs_bmap_del_extent_real frees RT extents before updating the bmap btree, while it frees regular blocks after performing the bmap btree update for convoluted historic reasons. Switch to free the RT blocks in the same place as the regular data blocks instead to simply the code and fix a very theoretical bug. A short history of this code researched by Dave Chiner below: The truncate for data device extents was originally a two-phase operation. First it removed the bmapbt record, but because this can free BMBT extents, it can use up all the free space tree reservation space. So the transaction gets rolled to commit the BMBT change and the xfs_bmap_finish() call that frees the data extent runs with a new transaction reservation that allows different free space btrees to be logged without overrun. However, on crash, this could lose the free space because there was nothing to tell recovery about the extents removed from the BMBT, hence EFIs were introduced. They tie the extent free operation to the bmapbt record removal commit for recovery of the second phase of the extent removal process. Then RT extents came along. RT extent freeing does not require a free space btree reservation because the free space metadata is static and transaction size is bound. Hence we don't need to care if the BMBT record removal modifies the per-ag free space trees and we don't need a two-phase extent remove transaction. The only thing we have to care about is not losing space on crash. Hence instead of recording the extent for freeing in the bmap list for xfs_bmap_finish() to process in a new transaction, it simply freed the rtextent directly. So the original code (from 1994) simply replaced the "free AG extent later" queueing with a direct free. This code was originally at the start of xfs_dmap_del_extent(), but the xfs_bmap_add_free() got moved to the end of the function via the "do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58 in the historic xfs-import tree) because there was a shutdown occurring because of a case where splitting the extent record failed because the BMBT split and the filesystem didn't have enough space for the split to be done. (FWIW, I'm not sure this can happen anymore.) The commit backed out the BMBT change on ENOSPC error, and in doing so I think this actually breaks RT free space tracking. However, it then returns an ENOSPC error, and we have a dirty transaction in the RT case so this will shut down the filesysetm when the transaction is cancelled. Hence the corrupted "bmbt now points at freed rt dev space" condition never make it to disk, but it's still the wrong way to handle the issue. IOWs, this proposed change fixes that "shutdown at ENOSPC on rt devices" situation that was introduced by the above commit back in 1997. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: refactor realtime inode lockingChristoph Hellwig3-5/+76
Create helper functions to deal with locking realtime metadata inodes. This enables us to maintain correct locking order once we start adding the realtime rmap and refcount btree inodes. Signed-off-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-22xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitionsChristoph Hellwig1-2/+1
Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done items") switched the XFS_TRANS_ definitions to be bit based, and using comments above the definitions. As XFS_TRANS_LOWMODE was last and has a big fat comment it was missed. Switch it to the same style. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-15xfs: Increase XFS_DEFER_OPS_NR_INODES to 5Allison Henderson2-2/+12
Renames that generate parent pointer updates can join up to 5 inodes locked in sorted order. So we need to increase the number of defer ops inodes and relock them in the same way. Signed-off-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Catherine Hoang <[email protected]> [djwong: have one sorting function] Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: pin inodes that would otherwise overflow link countDarrick J. Wong1-0/+6
The VFS inc_nlink function does not explicitly check for integer overflows in the i_nlink field. Instead, it checks the link count against s_max_links in the vfs_{link,create,rename} functions. XFS sets the maximum link count to 2.1 billion, so integer overflows should not be a problem. However. It's possible that online repair could find that a file has more than four billion links, particularly if the link count got corrupted while creating hardlinks to the file. The di_nlinkv2 field is not large enough to store a value larger than 2^32, so we ought to define a magic pin value of ~0U which means that the inode never gets deleted. This will prevent a UAF error if the repair finds this situation and users begin deleting links to the file. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: try to avoid allocating from sick inode clustersDarrick J. Wong1-0/+40
I noticed that xfs/413 and xfs/375 occasionally failed while fuzzing core.mode of an inode. The root cause of these problems is that the field we fuzzed (core.mode or core.magic, typically) causes the entire inode cluster buffer verification to fail, which affects several inodes at once. The repair process tries to create either a /lost+found or a temporary repair file, but regrettably it picks the same inode cluster that we just corrupted, with the result that repair triggers the demise of the filesystem. Try avoid this by making the inode allocation path detect when the perag health status indicates that someone has found bad inode cluster buffers, and try to read the inode cluster buffer. If the cluster buffer fails the verifiers, try another AG. This isn't foolproof and can result in premature ENOSPC, but that might be better than shutting down. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: check unused nlink fields in the ondisk inodeDarrick J. Wong1-0/+8
v2/v3 inodes use di_nlink and not di_onlink; and v1 inodes use di_onlink and not di_nlink. Whichever field is not in use, make sure its contents are zero, and teach xfs_scrub to fix that if it is. This clears a bunch of missing scrub failure errors in xfs/385 for core.onlink. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: pass the owner to xfs_symlink_write_targetDarrick J. Wong2-4/+4
Require callers of xfs_symlink_write_target to pass the owner number explicitly. This sets us up for online repair to be able to write a remote symlink target to sc->tempip with sc->ip's inumber in the block heaader. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: expose xfs_bmap_local_to_extents for online repairDarrick J. Wong4-7/+16
Allow online repair to call xfs_bmap_local_to_extents and add a void * argument at the end so that online repair can pass its own context. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: repair extended attributesDarrick J. Wong3-1/+8
If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: use atomic extent swapping to fix user file fork dataDarrick J. Wong2-1/+2
Build on the code that was recently added to the temporary repair file code so that we can atomically switch the contents of any file fork, even if the fork is in local format. The upcoming functions to repair xattrs, directories, and symlinks will need that capability. Repair can lock out access to these user files by holding IOLOCK_EXCL on these user files. Therefore, it is safe to drop the ILOCK of both the file being repaired and the tempfile being used for staging, and cancel the scrub transaction. We do this so that we can reuse the resource estimation and transaction allocation functions used by a regular file exchange operation. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory free block ownersDarrick J. Wong3-17/+22
Port the existing directory freespace block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory block buffer ownersDarrick J. Wong4-11/+16
Port the existing directory block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate explicit directory data buffer ownersDarrick J. Wong6-22/+29
Port the existing directory data header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate directory leaf buffer ownersDarrick J. Wong5-9/+81
Check the owner field of directory leaf blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate dabtree node buffer ownersDarrick J. Wong2-0/+110
Check the owner field of dabtree node blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate attr remote value buffer ownersDarrick J. Wong1-5/+4
Check the owner field of xattr remote value blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: validate attr leaf buffer ownersDarrick J. Wong6-16/+100
Create a leaf block header checking function to validate the owner field of xattr leaf blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: use the xfs_da_args owner field to set new dir/attr block ownerDarrick J. Wong7-21/+21
When we're creating leaf, data, freespace, or dabtree blocks for directories and xattrs, use the explicit owner field (instead of the xfs_inode) to set the owner field. This will enable online repair to construct replacement data structures in a temporary file without having to change the owner fields prior to swapping the new and old structures. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: add an explicit owner field to xfs_da_argsDarrick J. Wong6-0/+15
Add an explicit owner field to xfs_da_args, which will make it easier for online fsck to set the owner field of the temporary directory and xattr structures that it builds to repair damaged metadata. Note: I hopefully found all the xfs_da_args definitions by looking for automatic stack variable declarations and xfs_da_args.dp assignments: git grep -E '(args.*dp =|struct xfs_da_args[[:space:]]*[a-z0-9][a-z0-9]*)' Note that callers of xfs_attr_{get,set,change} can set the owner to zero (or leave it unset) to have the default set to args->dp. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: enable logged file mapping exchange featureDarrick J. Wong1-1/+2
Add the XFS_SB_FEAT_INCOMPAT_EXCHRANGE feature to the set of features that we will permit when mounting a filesystem. This turns on support for the file range exchange feature. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: capture inode generation numbers in the ondisk exchmaps log itemDarrick J. Wong2-0/+4
Per some very late review comments, capture the generation numbers of both inodes involved in a file content exchange operation so that we don't accidentally target files with have been reallocated. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: make file range exchange support realtime filesDarrick J. Wong1-10/+60
Now that bmap items support the realtime device, we can add the necessary pieces to the file range exchange code to support exchanging mappings. All we really need to do here is adjust the blockcount upwards to the end of the rt extent and remove the inode checks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense symbolic links after a mapping exchange operationDarrick J. Wong3-1/+96
The previous commit added a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. Now add this ability for symlinks. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online symlink repair feature can salvage the remote target in a temporary link and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed symlink down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense directories after a mapping exchange operationDarrick J. Wong1-0/+43
The previous commit added a new file mapping exchange flag that enables us to perform post-swap processing on file2 once we're done exchanging extent mappings. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: condense extended attributes after a mapping exchange operationDarrick J. Wong2-2/+56
Add a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. If we were swapping mappings between extended attribute forks, we want to be able to convert file2's attr fork from block to inline format. (This implies that all fork contents are exchanged.) This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and exchange the attr fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: add error injection to test file mapping exchange recoveryDarrick J. Wong2-1/+6
Add an errortag so that we can test recovery of exchmaps log items. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: create deferred log items for file mapping exchangesDarrick J. Wong6-2/+1197
Now that we've created the skeleton of a log intent item to track and restart file mapping exchange operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. This builds on the existing bmap update intent items that have been around for a while now. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: introduce a file mapping exchange log intent itemDarrick J. Wong2-3/+41
Introduce a new intent log item to handle exchanging mappings between the forks of two files. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: create a incompat flag for atomic file mapping exchangesDarrick J. Wong3-11/+18
Create a incompat flag so that we only attempt to process file mapping exchange log items if the filesystem supports it, and a geometry flag to advertise support if it's present. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-04-15xfs: introduce new file range exchange ioctlDarrick J. Wong1-0/+41
Introduce a new ioctl to handle exchanging ranges of bytes between files. The goal here is to perform the exchange atomically with respect to applications -- either they see the file contents before the exchange or they see that A-B is now B-A, even if the kernel crashes. My original goal with all this code was to make it so that online repair can build a replacement directory or xattr structure in a temporary file and commit the repair by atomically exchanging all the data blocks between the two files. However, I needed a way to test this mechanism thoroughly, so I've been evolving an ioctl interface since then. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>