Age | Commit message (Collapse) | Author | Files | Lines |
|
Hoist the code that checks the attr name and value iovecs into separate
helpers so that we can add more callsites for the new parent pointer
attr intent items.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Move the name and length checks into the attr op switch statement so
that we can perform more specific checks of the value length. Over the
next few patches we're going to add new attr op flags with different
validation requirements.
While we're at it, remove the incorrect comment.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
We're about to start using tagged unions in the xattr log format, so
create a bunch of local variables in the recovery function so we only
have to decode the log item fields once.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Always set args->value to the recovered value buffer. This reduces the
amount of code in the switch statement, and hence the amount of thinking
that I have to do. We validated the recovered buffers, supposedly.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Strengthen the xattri log item recovery code by checking that we
actually have the required name and newname buffers for whatever
operation we're replaying.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Create helper functions to extract the xattr op from the ondisk xattri
log item and the incore attr intent item. These will get more use in
the patches that follow.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Eliminate the local variable from this function so that we can
streamline things a bit later when we add the PPTR_REPLACE op code.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
While reviewing flag checking in the attr scrub functions, we noticed
that the shortform attr scanner didn't catch entries that have the LOCAL
or INCOMPLETE bits set. Neither of these flags can ever be set on a
shortform attr, so we need to check this narrower set of valid flags.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
The xattr scrubber doesn't check for undefined flags in shortform attr
entries. Therefore, define a mask XFS_ATTR_ONDISK_MASK that has all
possible XFS_ATTR_* flags in it, and use that to check for unknown bits
in xchk_xattr_actor.
Refactor the check in the dabtree scanner function to use the new mask
as well. The redundant checks need to be in place because the dabtree
check examines the hash mappings and therefore needs to decode the attr
leaf entries to compute the namehash. This happens before the walk of
the xattr entries themselves.
Fixes: ae0506eba78fd ("xfs: check used space of shortform xattr structures")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Check that the number of recovered log iovecs is what is expected for
the xattri opcode is expecting.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Per reviewer request, use an OPSTATE flag (+ helpers) to decide if
logged xattrs are enabled, instead of querying the xfs_sb.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
The XFS_SB_FEAT_INCOMPAT_LOG_XATTRS feature bit protects a filesystem
from old kernels that do not know how to recover extended attribute log
intent items. Make this check mandatory instead of a debugging assert.
Fixes: fd920008784ea ("xfs: Set up infrastructure for log attribute replay")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can
access the incore extent tree of the attr fork, but nothing in the
xfs_attr_get path guarantees that the incore tree is actually loaded.
Most of the time it is, but seeing as xfs_attr_is_leaf ignores the
return value of xfs_iext_get_extent I guess we've been making choices
based on random stack contents and nobody's complained?
Reported-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
A few notes about struct xfs_da_args:
The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which
means that attr_filter could be a u8 field.
I've reduced the number of XFS_DA_OP_* flags down to the point where
op_flags would also fit into a u8.
filetype has 7 bytes of slack after it, which is wasteful.
namelen will never be greater than MAXNAMELEN, which is 256. This field
could be reduced to a short.
Rearrange the fields in xfs_da_args to waste less space. This reduces
the structure size from 136 bytes to 128. Later when we add extra
fields to support parent pointer replacement, this will only bloat the
structure to 144 bytes, instead of 168.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Parent pointers match attrs on name+value, unlike everything else which
matches on only the name. Therefore, we cannot keep using the heuristic
that !value means remove. Make this an explicit operation code.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
This field only ever contains XATTR_{CREATE,REPLACE}, and it only goes
as deep as xfs_attr_set. Remove the field from the structure and
replace it with an enum specifying exactly what kind of change we want
to make to the xattr structure. Upsert is the name that we'll give to
the flags==0 operation, because we're either updating an existing value
or inserting it, and the caller doesn't care.
Note: The "UPSERTR" name created here is to make userspace porting
easier. It will be removed in the next patch.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
The only user of this flag sets it prior to an xfs_attr_get_ilocked
call, which doesn't update anything. Get rid of the flag.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Nobody checks this flag, so get rid of it.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Commit aff3a9edb708 ("xfs: Use preallocation for inodes with extsz
hints") disabled delayed allocation for all inodes with extent size
hints due a data exposure problem. It turns out we fixed this data
exposure problem since by always creating unwritten extents for
delalloc conversions due to more data exposure problems, but the
writeback path doesn't actually support extent size hints when
converting delalloc these days, which probably isn't a problem given
that people using the hints know what they get.
However due to the way how xfs_get_extsz_hint is implemented, it
always claims an extent size hint for RT inodes even if the RT
extent size is a single FSB. Due to that the above commit effectively
disabled delalloc support for RT inodes.
Switch xfs_get_extsz_hint to return 0 for this case and work around
that in a few places to reinstate delalloc support for RT inodes on
file systems with an sb_rextsize of 1.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
When xfs_bmap_del_extent_delay has to split an indirect block it tries
to steal blocks from the the part that gets unmapped to increase the
indirect block reservation that now needs to cover for two extents
instead of one.
This works perfectly fine on the data device, where the data and
indirect blocks come from the same pool. It has no chance of working
when the inode sits on the RT device. To support re-enabling delalloc
for inodes on the RT device, make this behavior conditional on not
being for rt extents.
Note that split of delalloc extents should only happen on writeback
failure, as for other kinds of hole punching we first write back all
data and thus convert the delalloc reservations covering the hole to
a real allocation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Move the check if we have enough indirect blocks and the stealing of
the deleted extent blocks out of xfs_bmap_split_indlen and into the
caller to prepare for handling delayed allocation of RT extents that
can't easily be stolen.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Add a check for files on the RT subvolume and use m_frextents instead
of m_fdblocks to adjust the preallocation size.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
To prepare for re-enabling delalloc on RT devices, track the data blocks
(which use the RT device when the inode sits on it) and the indirect
blocks (which don't) separately to xfs_mod_delalloc, and add a new
percpu counter to also track the RT delalloc blocks.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
The code to account fdblocks and frextents in xfs_bmap_del_extent_delay
is a bit weird in that it accounts frextents before the iext tree
manipulations and fdblocks after it. Given that the iext tree
manipulations cannot fail currently that's not really a problem, but
still odd. Move the frextent manipulation to the end, and use a
fdblocks variable to account of the unconditional indirect blocks and
the data blocks only freed for !RT. This prepares for following
updates in the area and already makes the code more readable.
Also remove the !isrt assert given that this code clearly handles
rt extents correctly, and we'll soon reinstate delalloc support for
RT inodes.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Allocate data blocks for RT inodes using xfs_dec_frextents. While at
it optimize the data device case by doing only a single xfs_dec_fdblocks
call for the extent itself and the indirect blocks.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
xfs_mod_freecounter has two entirely separate code paths for adding or
subtracting from the free counters. Only the subtract case looks at the
rsvd flag and can return an error.
Split xfs_mod_freecounter into separate helpers for subtracting or
adding the freecounter, and remove all the impossible to reach error
handling for the addition case.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
And to make that more clear, rearrange the code a bit and add asserts
and a comment.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
inodes given that it is very high level code. While this only looks ugly
right now, it will become a problem when supporting delayed allocations
for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
and thus never unlock the rt inodes.
Move the locking into xfs_bmap_del_extent_real just before the call to
xfs_rtfree_blocks instead and use a new flag in the transaction to ensure
that the locking happens only once.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Currently xfs_bmap_del_extent_real frees RT extents before updating
the bmap btree, while it frees regular blocks after performing the bmap
btree update for convoluted historic reasons. Switch to free the RT
blocks in the same place as the regular data blocks instead to simply
the code and fix a very theoretical bug.
A short history of this code researched by Dave Chiner below:
The truncate for data device extents was originally a two-phase
operation. First it removed the bmapbt record, but because this can
free BMBT extents, it can use up all the free space tree reservation
space. So the transaction gets rolled to commit the BMBT change and
the xfs_bmap_finish() call that frees the data extent runs with a
new transaction reservation that allows different free space btrees
to be logged without overrun.
However, on crash, this could lose the free space because there was
nothing to tell recovery about the extents removed from the BMBT,
hence EFIs were introduced. They tie the extent free operation to the
bmapbt record removal commit for recovery of the second phase of the
extent removal process.
Then RT extents came along. RT extent freeing does not require a
free space btree reservation because the free space metadata is
static and transaction size is bound. Hence we don't need to care if
the BMBT record removal modifies the per-ag free space trees and we
don't need a two-phase extent remove transaction. The only thing we
have to care about is not losing space on crash.
Hence instead of recording the extent for freeing in the bmap list
for xfs_bmap_finish() to process in a new transaction, it simply
freed the rtextent directly. So the original code (from 1994) simply
replaced the "free AG extent later" queueing with a direct free.
This code was originally at the start of xfs_dmap_del_extent(), but
the xfs_bmap_add_free() got moved to the end of the function via the
"do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58
in the historic xfs-import tree) because there was a shutdown occurring
because of a case where splitting the extent record failed because the
BMBT split and the filesystem didn't have enough space for the split to
be done. (FWIW, I'm not sure this can happen anymore.)
The commit backed out the BMBT change on ENOSPC error, and in doing
so I think this actually breaks RT free space tracking. However, it
then returns an ENOSPC error, and we have a dirty transaction in the
RT case so this will shut down the filesysetm when the transaction
is cancelled. Hence the corrupted "bmbt now points at freed rt dev
space" condition never make it to disk, but it's still the wrong way
to handle the issue.
IOWs, this proposed change fixes that "shutdown at ENOSPC on rt
devices" situation that was introduced by the above commit back in
1997.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Create helper functions to deal with locking realtime metadata inodes.
This enables us to maintain correct locking order once we start adding
the realtime rmap and refcount btree inodes.
Signed-off-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done
items") switched the XFS_TRANS_ definitions to be bit based, and using
comments above the definitions. As XFS_TRANS_LOWMODE was last and has
a big fat comment it was missed. Switch it to the same style.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Add a few strategic IS_ENABLED statements to let the compiler eliminate
unused code when CONFIG_XFS_SUPPORT_V4 is disabled.
This saves multiple kilobytes of .text in my .config:
$ size xfs.o.*
text data bss dec hex filename
1363633 294836 592 1659061 1950b5 xfs.o.new
1371453 294868 592 1666913 196f61 xfs.o.old
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
The current structure of xfs_extent_busy_clear that locks the first busy
extent in each AG and unlocks when switching to a new AG makes sparse
unhappy as the lock critical section tracking can't cope with taking the
lock conditionally and inside a loop.
Rewrite xfs_extent_busy_clear so that it has an outer loop only advancing
when moving to a new AG, and an inner loop that consumes busy extents for
the given AG to make life easier for sparse and to also make this logic
more obvious to humans.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Move the handling of discarded entries into xfs_extent_busy_clear_one
to reuse the length check and tidy up the logic in the caller.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
The function are defined in the rmap_repair.c file, but not called
elsewhere, so delete the unused function.
fs/xfs/scrub/rmap_repair.c:436:1: warning: unused function 'is_rt_data_fork'.
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8425
Signed-off-by: Jiapeng Chong <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
The "mp" pointer is the same as "sc->mp" so this change doesn't affect
runtime at all. However, it's nicer to use same name for both the lock
and the unlock.
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
s/somethign/something/
Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Sparse throws warnings about the interval tree functions that are
defined and then not used in the scrub bitmap code:
fs/xfs/scrub/bitmap.c:57:1: warning: unused function 'xbitmap64_tree_iter_next' [-Wunused-function]
INTERVAL_TREE_DEFINE(struct xbitmap64_node, bn_rbnode, uint64_t,
^
./include/linux/interval_tree_generic.h:151:33: note: expanded from macro 'INTERVAL_TREE_DEFINE'
ITSTATIC ITSTRUCT * \
^
<scratch space>:3:1: note: expanded from here
xbitmap64_tree_iter_next
^
fs/xfs/scrub/bitmap.c:331:1: warning: unused function 'xbitmap32_tree_iter_next' [-Wunused-function]
INTERVAL_TREE_DEFINE(struct xbitmap32_node, bn_rbnode, uint32_t,
^
./include/linux/interval_tree_generic.h:151:33: note: expanded from macro 'INTERVAL_TREE_DEFINE'
ITSTATIC ITSTRUCT * \
^
<scratch space>:59:1: note: expanded from here
xbitmap32_tree_iter_next
Fix these by marking the functions created by the interval tree
creation macro as __maybe_unused to suppress this warning.
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Scrub checks the superblock version number against the known good
feature bits that can be set in the version mask. It calculates
the version mask to compare like so:
vernum_mask = cpu_to_be16(~XFS_SB_VERSION_OKBITS |
XFS_SB_VERSION_NUMBITS |
XFS_SB_VERSION_ALIGNBIT |
XFS_SB_VERSION_DALIGNBIT |
XFS_SB_VERSION_SHAREDBIT |
XFS_SB_VERSION_LOGV2BIT |
XFS_SB_VERSION_SECTORBIT |
XFS_SB_VERSION_EXTFLGBIT |
XFS_SB_VERSION_DIRV2BIT);
This generates a sparse warning:
fs/xfs/scrub/agheader.c:168:23: warning: cast truncates bits from constant value (ffff3f8f becomes 3f8f)
This is because '~XFS_SB_VERSION_OKBITS' is considered a 32 bit
constant, even though it's value is always under 16 bits.
This is a kinda silly thing to do, because:
/*
* Supported feature bit list is just all bits in the versionnum field because
* we've used them all up and understand them all. Except, of course, for the
* shared superblock bit, which nobody knows what it does and so is unsupported.
*/
#define XFS_SB_VERSION_OKBITS \
((XFS_SB_VERSION_NUMBITS | XFS_SB_VERSION_ALLFBITS) & \
~XFS_SB_VERSION_SHAREDBIT)
#define XFS_SB_VERSION_NUMBITS 0x000f
#define XFS_SB_VERSION_ALLFBITS 0xfff0
#define XFS_SB_VERSION_SHAREDBIT 0x0200
XFS_SB_VERSION_OKBITS has a value of 0xfdff, and so
~XFS_SB_VERSION_OKBITS == XFS_SB_VERSION_SHAREDBIT. The calculated
mask already sets XFS_SB_VERSION_SHAREDBIT, so starting with
~XFS_SB_VERSION_OKBITS is completely redundant....
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: "Darrick J. Wong" <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
Sparse reports:
fs/xfs/xfs_log_cil.c:1127:1: warning: context imbalance in 'xlog_cil_push_work' - different lock contexts for basic block
fs/xfs/xfs_log_cil.c:1380:1: warning: context imbalance in 'xlog_cil_push_background' - wrong count at exit
fs/xfs/xfs_log_cil.c:1623:9: warning: context imbalance in 'xlog_cil_commit' - unexpected unlock
xlog_cil_push_background() has a locking annotations for an rw_sem.
Sparse does not track lock contexts for rw_sems, so the
annotation generates false warnings. Remove the annotation.
xlog_wait_on_iclog() drops the log->l_ic_loglock. The function has a
sparse annotation, but the prototype in xfs_log_priv.h does not.
Hence the warning from xlog_cil_push_work() which calls
xlog_wait_on_iclog(). Add the missing annotation.
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
After creation, drop the ILOCK on temporary files that have been created
to stage a repair.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Now that we've fixed the directory operations to hold the ILOCK until
they're finished with rmapbt updates for directory shape changes, we no
longer need to take this lock when scanning directories for rmapbt
records.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Modify xfs_rename to hold all inode locks across a rename operation
We will need this later when we add parent pointers
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Catherine Hoang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Modify xfs_trans_alloc_dir to hold locks after return. Caller will be
responsible for manual unlock. We will need this later to hold locks
across parent pointer operations
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Catherine Hoang <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Modify xfs_ialloc to hold locks after return. Caller will be
responsible for manual unlock. We will need this later to hold locks
across parent pointer operations
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Catherine Hoang <[email protected]>
[djwong: hold the parent ilocked across transaction rolls too]
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
With parent pointers enabled, a rename operation can update up to 5
inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes
their dquots to a be attached to the transaction chain, so we need
to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper
function xfs_dqlockn to lock an arbitrary number of dquots.
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Renames that generate parent pointer updates can join up to 5
inodes locked in sorted order. So we need to increase the
number of defer ops inodes and relock them in the same way.
Signed-off-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Catherine Hoang <[email protected]>
[djwong: have one sorting function]
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
On a 10TB filesystem where the free space in each AG is heavily
fragmented, I noticed some very high runtimes on a FITRIM call for the
entire filesystem. xfs_scrub likes to report progress information on
each phase of the scrub, which means that a strace for the entire
filesystem:
ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839>
shows that scrub is uncommunicative for the entire duration. Reducing
the size of the FITRIM requests to a single AG at a time produces lower
times for each individual call, but even this isn't quite acceptable,
because the time between progress reports are still very high:
Strace for the first 4x 1TB AGs looks like (2):
ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033>
ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323>
ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226>
ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744>
I then had the idea to limit the length parameter of each call to a
smallish amount (~11GB) so that we could report progress relatively
quickly, but much to my surprise, each FITRIM call still took ~68
seconds!
Unfortunately, the by-length fstrim implementation handles this poorly
because it walks the entire free space by length index (cntbt), which is
a very inefficient way to walk a subset of the blocks of an AG.
Therefore, create a second implementation that will walk the bnobt and
perform the trims in block number order. This implementation avoids the
worst problems of the original code, though it lacks the desirable
attribute of freeing the biggest chunks first.
On the other hand, this second implementation will be much easier to
constrain the system call latency, and makes it much easier to report
fstrim progress to anyone who's running xfs_scrub.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]
|
|
When a file-based metadata structure is being scrubbed in
xchk_metadata_inode_subtype, we should create an entirely new scrub
context so that each scrubber doesn't trip over another's buffers.
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|