aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)AuthorFilesLines
2020-10-07xfs: periodically relog deferred intent itemsDarrick J. Wong1-0/+42
There's a subtle design flaw in the deferred log item code that can lead to pinning the log tail. Taking up the defer ops chain examples from the previous commit, we can get trapped in sequences like this: Caller hands us a transaction t0 with D0-D3 attached. The defer ops chain will look like the following if the transaction rolls succeed: t1: D0(t0), D1(t0), D2(t0), D3(t0) t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10) In transaction 9, we finish d9 and try to roll to t10 while holding onto an intent item for D3 that we logged in t0. The previous commit changed the order in which we place new defer ops in the defer ops processing chain to reduce the maximum chain length. Now make xfs_defer_finish_noroll capable of relogging the entire chain periodically so that we can always move the log tail forward. Most chains will never get relogged, except for operations that generate very long chains (large extents containing many blocks with different sharing levels) or are on filesystems with small logs and a lot of ongoing metadata updates. Callers are now required to ensure that the transaction reservation is large enough to handle logging done items and new intent items for the maximum possible chain length. Most callers are careful to keep the chain lengths low, so the overhead should be minimal. The decision to relog an intent item is made based on whether the intent was logged in a previous checkpoint, since there's no point in relogging an intent into the same checkpoint. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-10-07xfs: change the order in which child and parent defer ops are finishedDarrick J. Wong1-1/+10
The defer ops code has been finishing items in the wrong order -- if a top level defer op creates items A and B, and finishing item A creates more defer ops A1 and A2, we'll put the new items on the end of the chain and process them in the order A B A1 A2. This is kind of weird, since it's convenient for programmers to be able to think of A and B as an ordered sequence where all the sub-tasks for A must finish before we move on to B, e.g. A A1 A2 D. Right now, our log intent items are not so complex that this matters, but this will become important for the atomic extent swapping patchset. In order to maintain correct reference counting of extents, we have to unmap and remap extents in that order, and we want to complete that work before moving on to the next range that the user wants to swap. This patch fixes defer ops to satsify that requirement. The primary symptom of the incorrect order was noticed in an early performance analysis of the atomic extent swap code. An astonishingly large number of deferred work items accumulated when userspace requested an atomic update of two very fragmented files. The cause of this was traced to the same ordering bug in the inner loop of xfs_defer_finish_noroll. If the ->finish_item method of a deferred operation queues new deferred operations, those new deferred ops are appended to the tail of the pending work list. To illustrate, say that a caller creates a transaction t0 with four deferred operations D0-D3. The first thing defer ops does is roll the transaction to t1, leaving us with: t1: D0(t0), D1(t0), D2(t0), D3(t0) Let's say that finishing each of D0-D3 will create two new deferred ops. After finish D0 and roll, we'll have the following chain: t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1) d4 and d5 were logged to t1. Notice that while we're about to start work on D1, we haven't actually completed all the work implied by D0 being finished. So far we've been careful (or lucky) to structure the dfops callers such that D1 doesn't depend on d4 or d5 being finished, but this is a potential logic bomb. There's a second problem lurking. Let's see what happens as we finish D1-D3: t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2) t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3) t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) Let's say that d4-d11 are simple work items that don't queue any other operations, which means that we can complete each d4 and roll to t6: t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) ... t11: d10(t4), d11(t4) t12: d11(t4) <done> When we try to roll to transaction #12, we're holding defer op d11, which we logged way back in t4. This means that the tail of the log is pinned at t4. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot get roll to t11 because there isn't enough space left before we'd run into t4. Let's shift back to the original failure. I mentioned before that I discovered this flaw while developing the atomic file update code. In that scenario, we have a defer op (D0) that finds a range of file blocks to remap, creates a handful of new defer ops to do that, and then asks to be continued with however much work remains. So, D0 is the original swapext deferred op. The first thing defer ops does is rolls to t1: t1: D0(t0) We try to finish D0, logging d1 and d2 in the process, but can't get all the work done. We log a done item and a new intent item for the work that D0 still has to do, and roll to t2: t2: D0'(t1), d1(t1), d2(t1) We roll and try to finish D0', but still can't get all the work done, so we log a done item and a new intent item for it, requeue D0 a second time, and roll to t3: t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2) If it takes 48 more rolls to complete D0, then we'll finally dispense with D0 in t50: t50: D<fifty primes>(t49), d1(t1), ..., d102(t50) We then try to roll again to get a chain like this: t51: d1(t1), d2(t1), ..., d101(t50), d102(t50) ... t152: d102(t50) <done> Notice that in rolling to transaction #51, we're holding on to a log intent item for d1 that was logged in transaction #1. This means that the tail of the log is pinned at t1. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot roll to t51 because there isn't enough space left before we'd run into t1. This is of course problem #2 again. But notice the third problem with this scenario: we have 102 defer ops tied to this transaction! Each of these items are backed by pinned kernel memory, which means that we risk OOM if the chains get too long. Yikes. Problem #1 is a subtle logic bomb that could hit someone in the future; problem #2 applies (rarely) to the current upstream, and problem #3 applies to work under development. This is not how incremental deferred operations were supposed to work. The dfops design of logging in the same transaction an intent-done item and a new intent item for the work remaining was to make it so that we only have to juggle enough deferred work items to finish that one small piece of work. Deferred log item recovery will find that first unfinished work item and restart it, no matter how many other intent items might follow it in the log. Therefore, it's ok to put the new intents at the start of the dfops chain. For the first example, the chains look like this: t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10) For the second example, the chains look like this: t1: D0(t0) t2: d1(t1), d2(t1), D0'(t1) t3: d2(t1), D0'(t1) t4: D0'(t1) t5: d1(t4), d2(t4), D0''(t4) ... t148: D0<50 primes>(t147) t149: d101(t148), d102(t148) t150: d102(t148) <done> This actually sucks more for pinning the log tail (we try to roll to t10 while holding an intent item that was logged in t1) but we've solved problem #1. We've also reduced the maximum chain length from: sum(all the new items) + nr_original_items to: max(new items that each original item creates) + nr_original_items This solves problem #3 by sharply reducing the number of defer ops that can be attached to a transaction at any given time. The change makes the problem of log tail pinning worse, but is improvement we need to solve problem #2. Actually solving #2, however, is left to the next patch. Note that a subsequent analysis of some hard-to-trigger reflink and COW livelocks on extremely fragmented filesystems (or systems running a lot of IO threads) showed the same symptoms -- uncomfortably large numbers of incore deferred work items and occasional stalls in the transaction grant code while waiting for log reservations. I think this patch and the next one will also solve these problems. As originally written, the code used list_splice_tail_init instead of list_splice_init, so change that, and leave a short comment explaining our actions. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-10-07xfs: fix an incore inode UAF in xfs_bui_recoverDarrick J. Wong2-7/+47
In xfs_bui_item_recover, there exists a use-after-free bug with regards to the inode that is involved in the bmap replay operation. If the mapping operation does not complete, we call xfs_bmap_unmap_extent to create a deferred op to finish the unmapping work, and we retain a pointer to the incore inode. Unfortunately, the very next thing we do is commit the transaction and drop the inode. If reclaim tears down the inode before we try to finish the defer ops, we dereference garbage and blow up. Therefore, create a way to join inodes to the defer ops freezer so that we can maintain the xfs_inode reference until we're done with the inode. Note: This imposes the requirement that there be enough memory to keep every incore inode in memory throughout recovery. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-10-07xfs: xfs_defer_capture should absorb remaining transaction reservationDarrick J. Wong2-0/+6
When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the transaction reservation type from the old transaction so that when we continue the dfops chain, we still use the same reservation parameters. Doing this means that the log item recovery functions get to determine the transaction reservation instead of abusing tr_itruncate in yet another part of xfs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-10-07xfs: xfs_defer_capture should absorb remaining block reservationsDarrick J. Wong2-0/+8
When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the remaining block reservations so that when we continue the dfops chain, we can reserve the same number of blocks to use. We capture the reservations for both data and realtime volumes. This adds the requirement that every log intent item recovery function must be careful to reserve enough blocks to handle both itself and all defer ops that it can queue. On the other hand, this enables us to do away with the handwaving block estimation nonsense that was going on in xlog_finish_defer_ops. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-10-07xfs: proper replay of deferred ops queued during log recoveryDarrick J. Wong2-8/+100
When we replay unfinished intent items that have been recovered from the log, it's possible that the replay will cause the creation of more deferred work items. As outlined in commit 509955823cc9c ("xfs: log recovery should replay deferred ops in order"), later work items have an implicit ordering dependency on earlier work items. Therefore, recovery must replay the items (both recovered and created) in the same order that they would have been during normal operation. For log recovery, we enforce this ordering by using an empty transaction to collect deferred ops that get created in the process of recovering a log intent item to prevent them from being committed before the rest of the recovered intent items. After we finish committing all the recovered log items, we allocate a transaction with an enormous block reservation, splice our huge list of created deferred ops into that transaction, and commit it, thereby finishing all those ops. This is /really/ hokey -- it's the one place in XFS where we allow nested transactions; the splicing of the defer ops list is is inelegant and has to be done twice per recovery function; and the broken way we handle inode pointers and block reservations cause subtle use-after-free and allocator problems that will be fixed by this patch and the two patches after it. Therefore, replace the hokey empty transaction with a structure designed to capture each chain of deferred ops that are created as part of recovering a single unfinished log intent. Finally, refactor the loop that replays those chains to do so using one transaction per chain. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-10-07xfs: remove xfs_defer_resetDarrick J. Wong1-19/+5
Remove this one-line helper since the assert is trivially true in one call site and the rest obscures a bitmask operation. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-09-25xfs: avoid shared rmap operations for attr fork extentsDarrick J. Wong1-9/+18
During code review, I noticed that the rmap code uses the (slower) shared mappings rmap functions for any extent of a reflinked file, even if those extents are for the attr fork, which doesn't support sharing. We can speed up rmap a tiny bit by optimizing out this case. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Chandan Babu R <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-09-25xfs: code cleanup in xfs_attr_leaf_entsize_{remote,local}Kaixu Xia1-4/+4
Cleanup the typedef usage, the unnecessary parentheses, the unnecessary backslash and use the open-coded round_up call in xfs_attr_leaf_entsize_{remote,local}. Signed-off-by: Kaixu Xia <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-09-25xfs: remove the redundant crc feature check in xfs_attr3_rmt_verifyKaixu Xia1-2/+0
We already check whether the crc feature is enabled before calling xfs_attr3_rmt_verify(), so remove the redundant feature check in that function. Signed-off-by: Kaixu Xia <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-09-25xfs: fix some commentsKaixu Xia1-5/+5
Fix the comments to help people understand the code. Signed-off-by: Kaixu Xia <[email protected]> [darrick: fix the indenting problems too] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-09-25xfs: use the existing type definition for di_projidKaixu Xia1-1/+1
We have already defined the project ID type prid_t, so maybe should use it here. Signed-off-by: Kaixu Xia <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-09-23xfs: log new intent items created as part of finishing recovered intent itemsDarrick J. Wong2-2/+30
During a code inspection, I found a serious bug in the log intent item recovery code when an intent item cannot complete all the work and decides to requeue itself to get that done. When this happens, the item recovery creates a new incore deferred op representing the remaining work and attaches it to the transaction that it allocated. At the end of _item_recover, it moves the entire chain of deferred ops to the dummy parent_tp that xlog_recover_process_intents passed to it, but fail to log a new intent item for the remaining work before committing the transaction for the single unit of work. xlog_finish_defer_ops logs those new intent items once recovery has finished dealing with the intent items that it recovered, but this isn't sufficient. If the log is forced to disk after a recovered log item decides to requeue itself and the system goes down before we call xlog_finish_defer_ops, the second log recovery will never see the new intent item and therefore has no idea that there was more work to do. It will finish recovery leaving the filesystem in a corrupted state. The same logic applies to /any/ deferred ops added during intent item recovery, not just the one handling the remaining work. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-23xfs: don't free rt blocks when we're doing a REMAP bunmapi callDarrick J. Wong1-7/+12
When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent to be unmapped from the given file fork without the extent being freed. We do this for non-rt files, but we forgot to do this for realtime files. So far this isn't a big deal since nobody makes a bunmapi call to a rt file with the REMAP flag set, but don't leave a logic bomb. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: Convert xfs_attr_sf macros to inline functionsCarlos Maiolino3-23/+39
xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c instead of playing with more #includes. Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: Use variable-size array for nameval in xfs_attr_sf_entryCarlos Maiolino3-7/+6
nameval is a variable-size array, so, define it as it, and remove all the -1 magic number subtractions Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: Remove typedef xfs_attr_shortform_tCarlos Maiolino2-10/+10
Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: remove typedef xfs_attr_sf_entry_tCarlos Maiolino2-7/+8
Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: enable big timestampsDarrick J. Wong1-1/+2
Enable the big timestamp feature. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: widen ondisk quota expiration timestamps to handle y2038+Darrick J. Wong3-4/+70
Enable the bigtime feature for quota timers. We decrease the accuracy of the timers to ~4s in exchange for being able to set timers up to the bigtime maximum. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: widen ondisk inode timestamps to deal with y2038+Darrick J. Wong8-10/+134
Redesign the ondisk inode timestamps to be a simple unsigned 64-bit counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the 32-bit unix time epoch). This enables us to handle dates up to 2486, which solves the y2038 problem. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: redefine xfs_ictimestamp_tDarrick J. Wong1-2/+5
Redefine xfs_ictimestamp_t as a uint64_t typedef in preparation for the bigtime functionality. Preserve the legacy structure format so that we can let the compiler take care of the masking and shifting. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: redefine xfs_timestamp_tDarrick J. Wong3-19/+47
Redefine xfs_timestamp_t as a __be64 typedef in preparation for the bigtime functionality. Preserve the legacy structure format so that we can let the compiler take care of masking and shifting. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: move xfs_log_dinode_to_disk to the log recovery codeDarrick J. Wong2-54/+0
Move this function to xfs_inode_item_recover.c since there's only one caller of it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: refactor quota timestamp codingDarrick J. Wong2-0/+23
Refactor quota timestamp encoding and decoding into helper functions so that we can add extra behavior in the next patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: refactor default quota grace period setting codeDarrick J. Wong1-0/+13
Refactor the code that sets the default quota grace period into a helper function so that we can override the ondisk behavior later. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: refactor quota expiration timer modificationDarrick J. Wong1-0/+24
Define explicit limits on the range of quota grace period expiration timeouts and refactor the code that modifies the timeouts into helpers that clamp the values appropriately. Note that we'll refactor the default grace period timer separately. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: explicitly define inode timestamp rangeDarrick J. Wong1-0/+22
Formally define the inode timestamp ranges that existing filesystems support, and switch the vfs timetamp ranges to use it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2020-09-15xfs: enable new inode btree counters featureDarrick J. Wong1-1/+2
Enable the new inode btree counters feature. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-09-15xfs: support inode btree blockcounts in online repairDarrick J. Wong1-3/+13
Add the necessary bits to the online repair code to support logging the inode btree counters when rebuilding the btrees, and to support fixing the counters when rebuilding the AGI. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-09-15xfs: use the finobt block counts to speed up mount timesDarrick J. Wong1-1/+27
Now that we have reliable finobt block counts, use them to speed up the per-AG block reservation calculations at mount time. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-09-15xfs: store inode btree block counts in AGI headerDarrick J. Wong4-1/+44
Add a btree block usage counters for both inode btrees to the AGI header so that we don't have to walk the entire finobt at mount time to create the per-AG reservations. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-09-15xfs: simplify xfs_trans_getsbChristoph Hellwig1-2/+2
Remove the mp argument as this function is only called in transaction context, and open code xfs_getsb given that the function already accesses the buffer pointer in the mount point directly. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: remove xlog_recover_iodoneChristoph Hellwig1-1/+0
The log recovery I/O completion handler does not substancially differ from the normal one except for the fact that it: a) never retries failed writes b) can have log items that aren't on the AIL c) never has inode/dquot log items attached and thus don't need to handle them Add conditionals for (a) and (b) to the ioend code, while (c) doesn't need special handling anyway. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-15xfs: move the buffer retry logic to xfs_buf.cChristoph Hellwig1-3/+3
Move the buffer retry state machine logic to xfs_buf.c and call it once from xfs_ioend instead of duplicating it three times for the three kinds of buffers. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-06xfs: remove kmem_realloc()Carlos Maiolino2-5/+5
Remove kmem_realloc() function and convert its users to use MM API directly (krealloc()) Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-09-05Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+1
Pull xfs fix from Darrick Wong: "Fix a broken metadata verifier that would incorrectly validate attr fork extents of a realtime file against the realtime volume" * tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
2020-09-03xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt filesDarrick J. Wong1-1/+1
The realtime flag only applies to the data fork, so don't use the realtime block number checks on the attr fork of a realtime file. Fixes: 30b0984d9117 ("xfs: refactor bmap record validation") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Eric Sandeen <[email protected]>
2020-09-02Merge tag 'xfs-5.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-6/+8
Pull xfs fixes from Darrick Wong: "Various small corruption fixes that have come in during the past month: - Avoid a log recovery failure for an insert range operation by rolling deferred ops incrementally instead of at the end. - Fix an off-by-one error when calculating log space reservations for anything involving an inode allocation or free. - Fix a broken shortform xattr verifier. - Ensure that the shortform xattr header padding is always initialized to zero" * tag 'xfs-5.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: initialize the shortform attr header padding entry xfs: fix boundary test in xfs_attr_shortform_verify xfs: fix off-by-one in inode alloc block reservation calculation xfs: finish dfops on every insert range shift iteration
2020-08-28Merge tag 'writeback_for_v5.9-rc3' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback fixes from Jan Kara: "Fixes for writeback code occasionally skipping writeback of some inodes or livelocking sync(2)" * tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: writeback: Drop I_DIRTY_TIME_EXPIRE writeback: Fix sync livelock due to b_dirty_time processing writeback: Avoid skipping inode writeback writeback: Protect inode->i_io_list with inode->i_lock
2020-08-27xfs: initialize the shortform attr header padding entryDarrick J. Wong1-2/+2
Don't leak kernel memory contents into the shortform attr fork. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Eric Sandeen <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-08-26xfs: fix boundary test in xfs_attr_shortform_verifyEric Sandeen1-1/+3
The boundary test for the fixed-offset parts of xfs_attr_sf_entry in xfs_attr_shortform_verify is off by one, because the variable array at the end is defined as nameval[1] not nameval[]. Hence we need to subtract 1 from the calculation. This can be shown by: # touch file # setfattr -n root.a file and verifications will fail when it's written to disk. This only matters for a last attribute which has a single-byte name and no value, otherwise the combination of namelen & valuelen will push endp further out and this test won't fail. Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs") Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2020-08-26xfs: fix off-by-one in inode alloc block reservation calculationBrian Foster2-3/+3
The inode chunk allocation transaction reserves inobt_maxlevels-1 blocks to accommodate a full split of the inode btree. A full split requires an allocation for every existing level and a new root block, which means inobt_maxlevels is the worst case block requirement for a transaction that inserts to the inobt. This can lead to a transaction block reservation overrun when tmpfile creation allocates an inode chunk and expands the inobt to its maximum depth. This problem has been observed in conjunction with overlayfs, which makes frequent use of tmpfiles internally. The existing reservation code goes back as far as the Linux git repo history (v2.6.12). It was likely never observed as a problem because the traditional file/directory creation transactions also include worst case block reservation for directory modifications, which most likely is able to make up for a single block deficiency in the inode allocation portion of the calculation. tmpfile support is relatively more recent (v3.15), less heavily used, and only includes the inode allocation block reservation as tmpfiles aren't linked into the directory tree on creation. Fix up the inode alloc block reservation macro and a couple of the block allocator minleft parameters that enforce an allocation to leave enough free blocks in the AG for a full inobt split. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-08-05xfs: delete duplicated words + other fixesRandy Dunlap1-1/+1
Delete repeated words in fs/xfs/. {we, that, the, a, to, fork} Change "it it" to "it is" in one location. Signed-off-by: Randy Dunlap <[email protected]> To: [email protected] Cc: Darrick J. Wong <[email protected]> Cc: [email protected] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2020-07-28xfs: Lift -ENOSPC handler from xfs_attr_leaf_addnameAllison Collins1-14/+11
Lift -ENOSPC handler from xfs_attr_leaf_addname. This will help to reorganize transitions between the attr forms later. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2020-07-28xfs: Simplify xfs_attr_node_addnameAllison Collins1-63/+59
Invert the rename logic in xfs_attr_node_addname to simplify the delayed attr logic later. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2020-07-28xfs: Simplify xfs_attr_leaf_addnameAllison Collins1-52/+55
Invert the rename logic in xfs_attr_leaf_addname to simplify the delayed attr logic later. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2020-07-28xfs: Add helper function xfs_attr_node_removename_rmtAllison Collins1-9/+19
This patch adds another new helper function xfs_attr_node_removename_rmt. This will also help modularize xfs_attr_node_removename when we add delay ready attributes later. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Chandan Rajendra <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2020-07-28xfs: Add helper function xfs_attr_node_removename_setupAllison Collins1-13/+33
This patch adds a new helper function xfs_attr_node_removename_setup. This will help modularize xfs_attr_node_removename when we add delay ready attributes later. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Chandan Rajendra <[email protected]> [darrick: fix unused variable complaints by 0day robot] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]> Reported-by: kernel test robot <[email protected]>
2020-07-28xfs: Add remote block helper functionsAllison Collins1-20/+30
This patch adds two new helper functions xfs_attr_store_rmt_blk and xfs_attr_restore_rmt_blk. These two helpers assist to remove redundant code associated with storing and retrieving remote blocks during the attr set operations. Signed-off-by: Allison Collins <[email protected]> Reviewed-by: Chandan Rajendra <[email protected]> Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>