aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)AuthorFilesLines
2024-07-02xfs: don't bother calling xfs_refcount_finish_one_cleanup in ↵Darrick J. Wong2-20/+1
xfs_refcount_finish_one In xfs_refcount_finish_one we know the cursor is non-zero when calling xfs_refcount_finish_one_cleanup and we pass a 0 error variable. This means xfs_refcount_finish_one_cleanup is just doing a xfs_btree_del_cursor. Open code that and move xfs_refcount_finish_one_cleanup to fs/xfs/xfs_refcount_item.c. Inspired-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: clean up refcount log intent item tracepoint callsitesDarrick J. Wong2-10/+10
Pass the incore refcount intent structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: pass btree cursors to refcount btree tracepointsDarrick J. Wong1-27/+15
Prepare the rest of refcount btree tracepoints for use with realtime reflink by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Remove the xfs_refcount_recover_extent tracepoint since it's already covered by other refcount tracepoints. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create specialized classes for refcount tracepointsDarrick J. Wong1-15/+9
The only user of the "ag" tracepoint event classes is the refcount btree, so rename them to make that obvious and make them take the btree cursor to simplify the arguments. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: give refcount btree cursor error tracepoints their own classDarrick J. Wong1-28/+14
Convert all the refcount tracepoints to use the btree error tracepoint class. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.cDarrick J. Wong2-7/+2
Move the code that adds the incore xfs_rmap_update_item deferred work data to a transaction to live with the RUI log item code. This means that the rmap code no longer has to know about the inner workings of the RUI log items. As a consequence, we can get rid of the _{get,put}_group helpers. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: simplify usage of the rcur local variable in xfs_rmap_finish_oneChristoph Hellwig1-4/+2
Only update rcur when we know the final *pcur value. Signed-off-by: Christoph Hellwig <[email protected]> [djwong: don't leave the caller with a dangling ref] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2024-07-02xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_oneChristoph Hellwig2-20/+1
In xfs_rmap_finish_one we known the cursor is non-zero when calling xfs_rmap_finish_one_cleanup and we pass a 0 error variable. This means xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor. Open code that and move xfs_rmap_finish_one_cleanup to fs/xfs/xfs_rmap_item.c. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> [djwong: minor porting changes] Signed-off-by: Darrick J. Wong <[email protected]>
2024-07-02xfs: clean up rmap log intent item tracepoint callsitesDarrick J. Wong2-17/+15
Pass the incore rmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: pass btree cursors to rmap btree tracepointsDarrick J. Wong1-111/+73
Prepare the rmap btree tracepoints for use with realtime rmap btrees by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: give rmap btree cursor error tracepoints their own classDarrick J. Wong1-22/+11
Create a new tracepoint class for btree-related errors, then convert all the rmap tracepoints to use it. Also fix the one tracepoint that was abusing the old class by making it a separate tracepoint. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: move xfs_extent_free_defer_add to xfs_extfree_item.cDarrick J. Wong2-13/+2
Move the code that adds the incore xfs_extent_free_item deferred work data to a transaction to live with the EFI log item code. This means that the allocator code no longer has to know about the inner workings of the EFI log items. As a consequence, we can get rid of the _{get,put}_group helpers. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: remove xfs_defer_agfl_blockChristoph Hellwig1-46/+22
xfs_free_extent_later can handle the extra AGFL special casing with very little extra logic. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2024-07-02xfs: remove duplicate asserts in xfs_defer_extent_freeChristoph Hellwig1-13/+0
The bno/len verification is already done by the calls to xfs_verify_rtbext / xfs_verify_fsbext, and reporting a corruption error seem like the better handling than tripping an assert anyway. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2024-07-02xfs: convert "skip_discard" to a proper flags bitsetDarrick J. Wong9-22/+31
Convert the boolean to skip discard on free into a proper flags field so that we can add more flags in the next patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: clean up extent free log intent item tracepoint callsitesDarrick J. Wong1-4/+3
Pass the incore EFI structure to the tracepoints instead of open-coding the argument passing. This cleans up the call sites a bit. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsbDarrick J. Wong2-5/+5
Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct (xfs_sb) to compute the address of sb_crc within the ondisk superblock struct (xfs_dsb). This is a landmine if we ever change the layout of the incore superblock (as we're about to do), so redefine the macro to use xfs_dsb to compute the layout of xfs_dsb. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: move dirent update hooks to xfs_dir2.cDarrick J. Wong2-0/+129
Move the directory entry update hook code to xfs_dir2 so that it is mostly consolidated with the higher level directory functions. Retain the exports so that online fsck can still send notifications through the hooks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create libxfs helper to rename two directory entriesDarrick J. Wong2-0/+230
Create a new libxfs function to rename two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create libxfs helper to exchange two directory entriesDarrick J. Wong2-0/+128
Create a new libxfs function to exchange two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create libxfs helper to remove an existing inode/name from a directoryDarrick J. Wong2-0/+83
Create a new libxfs function to remove a (name, inode) entry from a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist inode free function to libxfsDarrick J. Wong2-0/+57
Create a libxfs helper function that marks an inode free on disk. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create libxfs helper to link an existing inode into a directoryDarrick J. Wong2-4/+71
Create a new libxfs function to link an existing inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: create libxfs helper to link a new inode into a directoryDarrick J. Wong2-0/+65
Create a new libxfs function to link a newly created inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: separate the icreate logic around INIT_XATTRSDarrick J. Wong2-10/+27
INIT_XATTRS is overloaded here -- it's set during the creat process when we think that we're immediately going to set some ACL xattrs to save time. However, it's also used by the parent pointers code to enable the attr fork in preparation to receive ppptr xattrs. This results in xfs_has_parent() branches scattered around the codebase to turn on INIT_XATTRS. Linkable files are created far more commonly than unlinkable temporary files or directory tree roots, so we should centralize this logic in xfs_inode_init. For the three callers that don't want parent pointers (online repiar tempfiles, unlinkable tempfiles, rootdir creation) we provide an UNLINKABLE flag to skip attr fork initialization. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist xfs_{bump,drop}link to libxfsDarrick J. Wong2-0/+55
Move xfs_bumplink and xfs_droplink to libxfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist xfs_iunlink to libxfsDarrick J. Wong2-0/+286
Move xfs_iunlink and xfs_iunlink_remove to libxfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist new inode initialization functions to libxfsDarrick J. Wong3-8/+224
Move all the code that initializes a new inode's attributes from the icreate_args structure and the parent directory into libxfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: split new inode creation into two piecesDarrick J. Wong1-0/+15
There are two parts to initializing a newly allocated inode: setting up the incore structures, and initializing the new inode core based on the parent inode and the current user's environment. The initialization code is not specific to the kernel, so we would like to share that with userspace by hoisting it to libxfs. Therefore, split xfs_icreate into separate functions to prepare for the next few patches. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: implement atime updates in xfs_trans_ichgtimeDarrick J. Wong2-0/+3
Enable xfs_trans_ichgtime to change the inode access time so that we can use this function to set inode times when allocating inodes instead of open-coding it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: pack icreate initialization parameters into a separate structureDarrick J. Wong1-0/+22
Callers that want to create an inode currently pass all possible file attribute values for the new inode into xfs_init_new_inode as ten separate parameters. This causes two code maintenance issues: first, we have large multi-line call sites which programmers must read carefully to make sure they did not accidentally invert a value. Second, all three file id parameters must be passed separately to the quota functions; any discrepancy results in quota count errors. Clean this up by creating a new icreate_args structure to hold all this information, some helpers to initialize them properly, and make the callers pass this structure through to the creation function, whose name we shorten to xfs_icreate. This eliminates the issues, enables us to keep the inode init code in sync with userspace via libxfs, and is needed for future metadata directory tree management. (A subsequent cleanup will also fix the quota alloc calls.) Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist project id get/set functions to libxfsDarrick J. Wong2-0/+12
Move the project id get and set functions into libxfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist inode flag conversion functions to libxfsDarrick J. Wong3-0/+139
Hoist the inode flag conversion functions into libxfs so that we can keep them in sync. Do this by creating a new xfs_inode_util.c file in libxfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-02xfs: hoist extent size helpers to libxfsDarrick J. Wong2-0/+45
Move the extent size helpers to xfs_bmap.c in libxfs since they're used there already. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-07-01xfs: Remove header files which are included more than onceWenchao Hao1-1/+0
Following warning is reported, so remove these duplicated header including: ./fs/xfs/libxfs/xfs_trans_resv.c: xfs_da_format.h is included more than once. ./fs/xfs/scrub/quota_repair.c: xfs_format.h is included more than once. ./fs/xfs/xfs_handle.c: xfs_da_btree.h is included more than once. ./fs/xfs/xfs_qm_bhv.c: xfs_mount.h is included more than once. ./fs/xfs/xfs_trace.c: xfs_bmap.h is included more than once. This is just a clean code, no logic changed. Signed-off-by: Wenchao Hao <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-07-01xfs: don't walk off the end of a directory data blocklei lu2-5/+33
This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry to make sure don't stray beyond valid memory region. Before patching, the loop simply checks that the start offset of the dup and dep is within the range. So in a crafted image, if last entry is xfs_dir2_data_unused, we can change dup->length to dup->length-1 and leave 1 byte of space. In the next traversal, this space will be considered as dup or dep. We may encounter an out of bound read when accessing the fixed members. In the patch, we make sure that the remaining bytes large enough to hold an unused entry before accessing xfs_dir2_data_unused and xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make sure that the remaining bytes large enough to hold a dirent with a single-byte name before accessing xfs_dir2_data_entry. Signed-off-by: lei lu <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-07-01xfs: avoid redundant AGFL buffer invalidationGao Xiang2-29/+5
Currently AGFL blocks can be filled from the following three sources: - allocbt free blocks, as in xfs_allocbt_free_block(); - rmapbt free blocks, as in xfs_rmapbt_free_block(); - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). Originally, allocbt free blocks would be marked as stale only when they put back in the general free space pool as Dave mentioned on IRC, "we don't stale AGF metadata btree blocks when they are returned to the AGFL .. but once they get put back in the general free space pool, we have to make sure the buffers are marked stale as the next user of those blocks might be user data...." However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks moved to the free list") and commit edfd9dd54921 ("xfs: move buffer invalidation to xfs_btree_free_block"), even allocbt / bmapbt free blocks will be invalidated immediately since they may fail to pass V5 format validation on writeback even writeback to free space would be safe. IOWs, IMHO currently there is actually no difference of free blocks between AGFL freespace pool and the general free space pool. So let's avoid extra redundant AGFL buffer invalidation, since otherwise we're currently facing unnecessary xfs_log_force() due to xfs_trans_binval() again on buffers already marked as stale before as below: [ 333.507469] Call Trace: [ 333.507862] xfs_buf_find+0x371/0x6a0 <- xfs_buf_lock [ 333.508451] xfs_buf_get_map+0x3f/0x230 [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 [ 333.511871] xfs_defer_finish+0xc/0xa0 [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 [ 333.513253] xfs_inactive_truncate+0xb8/0x130 [ 333.513930] xfs_inactive+0x223/0x270 xfs_log_force() will take tens of milliseconds with AGF buffer locked. It becomes an unnecessary long latency especially on our PMEM devices with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same time are stuck due to the same AGF lock. Removing the double invalidation on the AGFL blocks does not make this issue go away, but this patch fixes for our workloads in reality and it should also work by the code analysis. Note that I'm not sure I need to remove another redundant one in xfs_alloc_ag_vextent_small() since it's unrelated to our workloads. Also fstests are passed with this patch. Signed-off-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-06-26xfs: fix direction in XFS_IOC_EXCHANGE_RANGEDarrick J. Wong1-1/+1
The kernel reads userspace's buffer but does not write it back. Therefore this is really an _IOW ioctl. Change this before 6.10 final releases. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-06-26xfs: allow unlinked symlinks and dirs with zero sizeDarrick J. Wong1-5/+18
For a very very long time, inode inactivation has set the inode size to zero before unmapping the extents associated with the data fork. Unfortunately, commit 3c6f46eacd876 changed the inode verifier to prohibit zero-length symlinks and directories. If an inode happens to get logged in this state and the system crashes before freeing the inode, log recovery will also fail on the broken inode. Therefore, allow zero-size symlinks and directories as long as the link count is zero; nobody will be able to open these files by handle so there isn't any risk of data exposure. Fixes: 3c6f46eacd876 ("xfs: sanity check directory inode di_size") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-06-26xfs: restrict when we try to align cow fork delalloc to cowextsz hintsDarrick J. Wong1-4/+27
xfs/205 produces the following failure when always_cow is enabled: --- a/tests/xfs/205.out 2024-02-28 16:20:24.437887970 -0800 +++ b/tests/xfs/205.out.bad 2024-06-03 21:13:40.584000000 -0700 @@ -1,4 +1,5 @@ QA output created by 205 *** one file + !!! disk full (expected) *** one file, a few bytes at a time *** done This is the result of overly aggressive attempts to align cow fork delalloc reservations to the CoW extent size hint. Looking at the trace data, we're trying to append a single fsblock to the "fred" file. Trying to create a speculative post-eof reservation fails because there's not enough space. We then set @prealloc_blocks to zero and try again, but the cowextsz alignment code triggers, which expands our request for a 1-fsblock reservation into a 39-block reservation. There's not enough space for that, so the whole write fails with ENOSPC even though there's sufficient space in the filesystem to allocate the single block that we need to land the write. There are two things wrong here -- first, we shouldn't be attempting speculative preallocations beyond what was requested when we're low on space. Second, if we've already computed a posteof preallocation, we shouldn't bother trying to align that to the cowextsize hint. Fix both of these problems by adding a flag that only enables the expansion of the delalloc reservation to the cowextsize if we're doing a non-extending write, and only if we're not doing an ENOSPC retry. This requires us to move the ENOSPC retry logic to xfs_bmapi_reserve_delalloc. I probably should have caught this six years ago when 6ca30729c206d was being reviewed, but oh well. Update the comments to reflect what the code does now. Fixes: 6ca30729c206d ("xfs: bmap code cleanup") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-06-10xfs: make sure sb_fdblocks is non-negativeWengang Wang1-3/+4
A user with a completely full filesystem experienced an unexpected shutdown when the filesystem tried to write the superblock during runtime. kernel shows the following dmesg: [ 8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0 [ 8.177417] XFS (dm-4): Unmount and run xfs_repair [ 8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer: [ 8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............ [ 8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3 .....&E)....;... [ 8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................ [ 8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ [ 8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d.......... [ 8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d............. [ 8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................ [ 8.182008] XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem [ 8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s) When xfs_log_sb writes super block to disk, b_fdblocks is fetched from m_fdblocks without any lock. As m_fdblocks can experience a positive -> negative -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks). So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is type of unsigned long long, it reads super big. And sb_fdblocks being bigger than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() complains. Fix: As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled, We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is not nenative. This patch also takes care of other percpu counters in xfs_log_sb. Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-27xfs: Add cond_resched to block unmap range and reflink remap pathRitesh Harjani (IBM)1-0/+1
An async dio write to a sparse file can generate a lot of extents and when we unlink this file (using rm), the kernel can be busy in umapping and freeing those extents as part of transaction processing. Similarly xfs reflink remapping path can also iterate over a million extent entries in xfs_reflink_remap_blocks(). Since we can busy loop in these two functions, so let's add cond_resched() to avoid softlockup messages like these. watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435] CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1 Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290 LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290 Call Trace: xfs_alloc_get_rec+0x54/0x1b0 (unreliable) xfs_alloc_compute_aligned+0x5c/0x144 xfs_alloc_ag_vextent_size+0x238/0x8d4 xfs_alloc_fix_freelist+0x540/0x694 xfs_free_extent_fix_freelist+0x84/0xe0 __xfs_free_extent+0x74/0x1ec xfs_extent_free_finish_item+0xcc/0x214 xfs_defer_finish_one+0x194/0x388 xfs_defer_finish_noroll+0x1b4/0x5c8 xfs_defer_finish+0x2c/0xc4 xfs_bunmapi_range+0xa4/0x100 xfs_itruncate_extents_flags+0x1b8/0x2f4 xfs_inactive_truncate+0xe0/0x124 xfs_inactive+0x30c/0x3e0 xfs_inodegc_worker+0x140/0x234 process_scheduled_works+0x240/0x57c worker_thread+0x198/0x468 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 run fstests generic/175 at 2024-02-02 04:40:21 [ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0 NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs] LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs] Call Trace: 0xc000000014107c00 (unreliable) __xfs_btree_updkeys+0x8c/0x2c0 [xfs] xfs_btree_update_keys+0x150/0x170 [xfs] xfs_btree_lshift+0x534/0x660 [xfs] xfs_btree_make_block_unfull+0x19c/0x240 [xfs] xfs_btree_insrec+0x4e4/0x630 [xfs] xfs_btree_insert+0x104/0x2d0 [xfs] xfs_rmap_insert+0xc4/0x260 [xfs] xfs_rmap_map_shared+0x228/0x630 [xfs] xfs_rmap_finish_one+0x2d4/0x350 [xfs] xfs_rmap_update_finish_item+0x44/0xc0 [xfs] xfs_defer_finish_noroll+0x2e4/0x740 [xfs] __xfs_trans_commit+0x1f4/0x400 [xfs] xfs_reflink_remap_extent+0x2d8/0x650 [xfs] xfs_reflink_remap_blocks+0x154/0x320 [xfs] xfs_file_remap_range+0x138/0x3a0 [xfs] do_clone_file_range+0x11c/0x2f0 vfs_clone_file_range+0x60/0x1c0 ioctl_file_clone+0x78/0x140 sys_ioctl+0x934/0x1270 system_call_exception+0x158/0x320 system_call_vectored_common+0x15c/0x2ec Cc: Ojaswin Mujoo <[email protected]> Signed-off-by: Ritesh Harjani (IBM) <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Tested-by: Disha Goel<[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-27xfs: allow symlinks with short remote targetsDarrick J. Wong1-4/+24
An internal user complained about log recovery failing on a symlink ("Bad dinode after recovery") with the following (excerpted) format: core.magic = 0x494e core.mode = 0120777 core.version = 3 core.format = 2 (extents) core.nlinkv2 = 1 core.nextents = 1 core.size = 297 core.nblocks = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) u3.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,12,1,0] This is a symbolic link with a 297-byte target stored in a disk block, which is to say this is a symlink with a remote target. The forkoff is 0, which is to say that there's 512 - 176 == 336 bytes in the inode core to store the data fork. Eventually, testing of generic/388 failed with the same inode corruption message during inode recovery. In writing a debugging patch to call xfs_dinode_verify on dirty inode log items when we're committing transactions, I observed that xfs/298 can reproduce the problem quite quickly. xfs/298 creates a symbolic link, adds some extended attributes, then deletes them all. The test failure occurs when the final removexattr also deletes the attr fork because that does not convert the remote symlink back into a shortform symlink. That is how we trip this test. The only reason why xfs/298 only triggers with the debug patch added is that it deletes the symlink, so the final iflush shows the inode as free. I wrote a quick fstest to emulate the behavior of xfs/298, except that it leaves the symlinks on the filesystem after inducing the "corrupt" state. Kernels going back at least as far as 4.18 have written out symlink inodes in this manner and prior to 1eb70f54c445f they did not object to reading them back in. Because we've been writing out inodes this way for quite some time, the only way to fix this is to relax the check for symbolic links. Directories don't have this problem because di_size is bumped to blocksize during the sf->data conversion. Fixes: 1eb70f54c445f ("xfs: validate inode fork size against fork format") Signed-off-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-27xfs: fix xfs_init_attr_trans not handling explicit operation codesDarrick J. Wong2-23/+20
When we were converting the attr code to use an explicit operation code instead of keying off of attr->value being null, we forgot to change the code that initializes the transaction reservation. Split the function into two helpers that handle the !remove and remove cases, then fix both callsites to handle this correctly. Fixes: c27411d4c640 ("xfs: make attr removal an explicit operation") Signed-off-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-27xfs: Stop using __maybe_unused in xfs_alloc.cJohn Garry1-4/+2
In both xfs_alloc_cur_finish() and xfs_alloc_ag_vextent_exact(), local variable @afg is tagged as __maybe_unused. Otherwise an unused variable warning would be generated for when building with W=1 and CONFIG_XFS_DEBUG unset. In both cases, the variable is unused as it is only referenced in an ASSERT() call, which is compiled out (in this config). It is generally a poor programming style to use __maybe_unused for variables. The ASSERT() call is to verify that agbno of the end of the extent is within bounds for both functions. @afg is used as an intermediate variable to find the AG length. However xfs_verify_agbext() already exists to verify a valid extent range. The arguments for calling xfs_verify_agbext() are already available, so use that instead. An advantage of using xfs_verify_agbext() is that it verifies that both the start and the end of the extent are within the bounds of the AG and catches overflows. Suggested-by: Dave Chinner <[email protected]> Signed-off-by: John Garry <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: simplify iext overflow checking and upgradeChristoph Hellwig4-44/+29
Currently the calls to xfs_iext_count_may_overflow and xfs_iext_count_upgrade are always paired. Merge them into a single function to simplify the callers and the actual check and upgrade logic itself. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-03xfs: xfs_quota_unreserve_blkres can't failChristoph Hellwig2-12/+6
Unreserving quotas can't fail due to quota limits, and we'll notice a shut down file system a bit later in all the callers anyway. Return void and remove the error checking and propagation in the callers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-05-02xfs: minor cleanups of xfs_attr3_rmt_blocksDarrick J. Wong1-8/+8
Clean up the type signature of this function since we don't have negative attr lengths or block counts. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-05-02xfs: create a helper to compute the blockcount of a max sized remote valueDarrick J. Wong2-1/+7
Create a helper function to compute the number of fsblocks needed to store a maximally-sized extended attribute value. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2024-05-02xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a functionDarrick J. Wong2-6/+17
Turn this into a properly typechecked function, and actually use the correct blocksize for extended attributes. The function cannot be static inline because xfsprogs userspace uses it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Andrey Albershteyn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>