aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)AuthorFilesLines
2018-06-11xfs: update incore per-AG inode countDarrick J. Wong1-0/+2
For whatever reason we never actually update pagi_count (the in-core perag inode count) when we allocate or free inode chunks. Online scrub is going to use it, so we need to fix the accounting. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-06-08xfs: replace do_mod with native operationsDave Chinner1-14/+23
do_mod() is a hold-over from when we have different sizes for file offsets and and other internal values for 40 bit XFS filesystems. Hence depending on build flags variables passed to do_mod() could change size. We no longer support those small format filesystems and hence everything is of fixed size theses days, even on 32 bit platforms. As such, we can convert all the do_mod() callers to platform optimised modulus operations as defined by linux/math64.h. Individual conversions depend on the types of variables being used. Signed-Off-By: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-08xfs: don't call xfs_da_shrink_inode with NULL bpEric Sandeen1-3/+2
xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try. This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969 Reported-by: Xu, Wen <[email protected]> Tested-by: Xu, Wen <[email protected]> Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-08xfs: clean up MIN/MAXDave Chinner7-20/+20
Get rid of the MIN/MAX macros and just use the native min/max macros directly in the XFS code. Signed-Off-By: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-08xfs: move various type verifiers to common fileDave Chinner7-162/+192
New verification functions like xfs_verify_fsbno() and xfs_verify_agino() are spread across multiple files and different header files. They really don't fit cleanly into the places they've been put, and have wider scope than the current header includes. Move the type verifiers to a new file in libxfs (xfs-types.c) and the prototypes to xfs_types.h where they will be visible to all the code that uses the types. Signed-Off-By: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: convert to SPDX license tagsDave Chinner68-899/+68
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: validate btree records on retrievalDave Chinner4-9/+132
So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()Dave Chinner1-5/+0
In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if we are doing an untrusted lookup. This is done because we need failed filehandle lookups to report -ESTALE to the caller, and it does this by converting -EINVAL and -ENOENT errors to -ESTALE. The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for for xfs_iget(UNTRUSTED) callers to determine the difference between "inode does not exist" and "corruption detected during lookup". We realy need that distinction in places calling xfS_iget(UNTRUSTED), so move the filehandle error case handling all the way out to xfs_nfs_get_inode() where it is needed. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: verify COW extent size hint is valid in inode verifierDave Chinner1-0/+6
There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. Validate COW extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: verify extent size hint is valid in inode verifierDave Chinner1-1/+8
There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. This results in alignment assertion failures when setting up allocations such as this in direct IO: XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 .... Call Trace: xfs_bmap_btalloc+0x415/0x910 xfs_bmapi_write+0x71c/0x12e0 xfs_iomap_write_direct+0x2a9/0x420 xfs_file_iomap_begin+0x4dc/0xa70 iomap_apply+0x43/0x100 iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xba/0x300 __vfs_write+0xd5/0x150 vfs_write+0xb6/0x180 ksys_write+0x45/0xa0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe And from xfs_db: core.extsize = 10380288 Which is not an integer multiple of the block size, and so violates Rule #7 for setting extent size hints. Validate extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-06xfs: catch bad stripe alignment configurationsDave Chinner1-0/+16
When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-04xfs: use xfs_trans_getsb in xfs_sync_sb_bufEric Sandeen1-3/+5
Use xfs_trans_getsb rather than reaching right in for mp->m_sb_bp; I think this is more correct, and it facilitates building this libxfs code in userspace as well. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-04xfs: explicitly pass buffer size to xfs_corruption_errorDarrick J. Wong5-6/+10
Explicitly pass the buffer length to xfs_corruption_error() instead of assuming XFS_CORRUPTION_DUMP_LEN so that we avoid dumping off the end of the buffer. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: don't assert when on-disk btree pointers are garbageDarrick J. Wong1-7/+16
Don't ASSERT when we encounter bad on-disk btree pointers in the debug check functions. Log the error to leave breadcrumbs and let the upper layers deal with it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: strengthen btree pointer checks before useDarrick J. Wong1-15/+35
Instead of ASSERTing on null btree pointers in xfs_btree_ptr_to_daddr, use the new block number verifiers to ensure that the btree pointer doesn't point to any sensitive areas (AG headers, past-EOFS) and return -EFSCORRUPTED if this is the case. Remove the ASSERT because on-disk corruptions shouldn't trigger ASSERTs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: introduce xfs_btree_debug_check_ptrDarrick J. Wong1-47/+29
Make xfs_btree_check_ptr a non-debug function and introduce a new _debug version that only runs when #ifdef DEBUG. This will enable us to reuse the checking logic with other parts of the btree code. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: check directory bestfree information in the verifierDarrick J. Wong1-35/+68
Create a variant of xfs_dir2_data_freefind that is suitable for use in a verifier. Because _freefind is called by the verifier, we simply duplicate the _freefind function, convert the ASSERTs to return __this_address, and modify the verifier to call our new function. Once we've made it impossible for directory blocks with bad bestfree data to make it into the filesystem we can remove the DEBUG code from the regular _freefind function. Underlying argument: corruption of on-disk metadata should return -EFSCORRUPTED instead of blowing ASSERTs. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: don't return garbage buffers in xfs_da3_node_readDarrick J. Wong1-3/+5
If we're reading a node in a dir/attr btree and the buffer comes off the disk with a magic number we don't recognize, don't ASSERT and don't set a garbage buffer type (0 also triggers ASSERTs). Instead, report the corruption, release the buffer, and return -EFSCORRUPTED because that's what the dabtree is -- corrupt. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: don't ASSERT on short form btree root pointer of zeroDarrick J. Wong3-3/+0
Don't ASSERT if the short form btree root pointer is zero. Now that we use xfs_verify_agbno to check all short form btree pointers, we'll let that log the error and pass it to the upper layers. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: btree lookup shouldn't ASSERT on empty btree nodesDarrick J. Wong1-1/+6
If a btree lookup encounters an empty btree node or an empty btree leaf on a multi-level btree, that's evidence of a corrupt on-disk btree. Therefore, we should return -EFSCORRUPTED to the upper levels, not an ASSERT failure. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: xfs_alloc_get_rec should return EFSCORRUPTED for obvious bnobt corruptionDarrick J. Wong1-4/+8
Return -EFSCORRUPTED when the bnobt/cntbt return obviously corrupt values, rather than letting them bounce around in the internal code. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: remove redundant ASSERT on insufficient bestfree length in _leaf_addnameDarrick J. Wong1-1/+0
In xfs_dir2_leaf_addname we ASSERT if the length of the unused space described by bestfree[0] is less the amount of space we wish to consume. Immediately after it is a call to xfs_dir2_data_use_free where the offset parameter is offset of the unused space and the length parameter is the amount of space we wish to consume. Both values (and the unused space pointer) are passed into xfs_dir2_data_check_free, which also validates that the region of unused space is big enough to cover the space we wish to consume. This is effectively the same check that the ASSERT covers, and since a check failure results in a corruption message being logged we can remove the ASSERT. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-04xfs: don't assert when reporting on-disk corruption while loading btreeDarrick J. Wong1-1/+0
Don't bother ASSERTing when we're already going to log and return the corruption status. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-06-03xfs: verify AGI unlinked list contains valid blocksDave Chinner1-15/+8
The heads of tha AGI unlinked list are only scanned on debug kernels when the verifier runs. Change that to always scan the heads and validate that the inode numbers are valid. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-01xfs: fix error handling in xfs_refcount_insert()Dave Chinner1-0/+3
generic/475 fired an assert failure just after the filesystem was shut down: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_refcount.c, line: 182 ..... Call Trace: xfs_refcount_insert+0x151/0x190 xfs_refcount_adjust_extents.constprop.11+0x9c/0x470 xfs_refcount_adjust.constprop.10+0xb0/0x270 xfs_refcount_finish_one+0x25a/0x420 xfs_trans_log_finish_refcount_update+0x2a/0x40 xfs_refcount_update_finish_item+0x35/0xa0 xfs_defer_finish+0x15e/0x4d0 xfs_reflink_remap_extent+0x1bc/0x610 xfs_reflink_remap_blocks+0x6e/0x280 xfs_reflink_remap_range+0x311/0x530 vfs_clone_file_range+0x119/0x200 .... If xfs_btree_insert() returns an error, the corruption check fires instead of passing the error back the caller. The corruption check should be after we've checked for an error, not before, thereby avoiding assert failures if the filesystem shuts down during a refcount btree record insert. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-06-01xfs: fix xfs_rtalloc_rec unitsDarrick J. Wong1-13/+13
All the realtime allocation functions deal with space on the rtdev in units of realtime extents. However, struct xfs_rtalloc_rec confusingly uses the word 'block' in the name, even though they're really extents. Fix the naming problem and fix all the unit handling problems in the two existing users. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]>
2018-06-01xfs: strengthen rtalloc query range checksDarrick J. Wong1-2/+5
Strengthen the rtalloc range query checks to make sure that the keys do not run off the end of the realtime device inappropriately. Note that the query range functions require units of rt extents, not blocks, despite the type name. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]>
2018-06-01xfs: xfs_rtbuf_get should check the bmapi_read resultsDarrick J. Wong1-0/+3
The xfs_rtbuf_get function should check the block mapping it gets back from bmapi_read. If there are no mappings or the mapping isn't a real extent, we should return -EFSCORRUPTED rather than trying to read a garbage value. We also require realtime bitmap blocks to be real, written allocations. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]>
2018-06-01xfs: xfs_rtword_t should be unsigned, not signedDarrick J. Wong1-1/+1
xfs_rtword_t is used for bit manipulations in the realtime bitmap file. Since we're performing bit shifts with this type, we don't want sign extension and we don't want to be left shifting negative quantities because that's undefined behavior. This also shuts up these UBSAN warnings: UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_rtbitmap.c:833:48 signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]>
2018-05-30xfs: repair superblocksDarrick J. Wong2-0/+25
If one of the backup superblocks is found to differ seriously from superblock 0, write out a fresh copy from the in-core sb. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-29xfs: fix inobt magic number checkDarrick J. Wong1-1/+1
In commit a6a781a58befcbd467c ("xfs: have buffer verifier functions report failing address") the bad magic number return was ported incorrectly. Fixes: a6a781a58befcbd467ce843af4eaca3906aa1f08 Reported-by: [email protected] Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Eric Sandeen <[email protected]>
2018-05-16xfs: implement online get/set fs labelEric Sandeen3-2/+36
The GET ioctl is trivial, just return the current label. The SET ioctl is more involved: It transactionally modifies the superblock to write a new filesystem label to the primary super. A new variant of xfs_sync_sb then writes the superblock buffer immediately to disk so that the change is visible from userspace. It then invalidates any page cache that userspace might have previously read on the block device so that i.e. blkid can see the change immediately, and updates all secondary superblocks as userspace relable does. Signed-off-by: Eric Sandeen <[email protected]> [darrick: use dchinner's new xfs_update_secondary_sbs function] Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-05-15xfs: factor the ag length extension code into libxfsDave Chinner2-1/+66
Growfs currently manually codes the extension of the last AG in a filesytem during the growfs process. Factor that out of the growfs code and move it into libxfs along with teh rest of the AG header modification code. Signed-Off-By: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-05-15xfs: move growfs core to libxfsDave Chinner4-0/+511
So it can be shared with userspace (e.g. mkfs) easily. Signed-Off-By: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-05-15xfs: implement the metadata repair ioctl flagDarrick J. Wong2-2/+11
Plumb in the pieces necessary to make the "scrub" subfunction of the scrub ioctl actually work. This means that we make the IFLAG_REPAIR flag to the scrub ioctl actually do something, and we add an errortag knob so that xfstests can force the kernel to rebuild a metadata structure even if there's nothing wrong with it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-15xfs: teach xfs_bmapi_remap to accept some bmapi flagsDarrick J. Wong1-2/+8
Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap updates. This enables us to rebuild real and unwritten extents from the rmapbt. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-05-15xfs: make xfs_bmapi_remapi work with attribute forksDarrick J. Wong2-12/+20
Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI flags into the function. This enables us to pass in BMAPI_ATTRFORK so that we can remap things into the attribute fork. Eventually the online repair code will use this to rebuild attribute forks, so make it non-static. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-05-15xfs: hoist xfs_scrub_agfl_walk to libxfs as xfs_agfl_walkDarrick J. Wong2-0/+42
This function is basically a generic AGFL block iterator, so promote it to libxfs ahead of online repair wanting to use it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-05-15xfs: superblock scrub should use short-lived buffersDarrick J. Wong3-0/+26
Secondary superblocks are rarely used, so create a helper to read a given non-primary AG's superblock and ensure that it won't stick around hogging memory. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-05-15xfs: factor out nodiscard helpersBrian Foster3-31/+4
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-05-15xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmapbtDarrick J. Wong2-12/+23
Add a new flag, XFS_BMAPI_NORMAP, which will perform file block remapping without updating the rmapbt. This will be used by the repair code to reconstruct bmbts from the rmapbt, in which case we don't want the rmapbt update. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-15xfs: add repair helpers for the reference count btreeDarrick J. Wong4-0/+41
Add a couple of functions to the refcount btree and generic btree code that will be used to repair the refcountbt. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-15xfs: add repair helpers for the reverse mapping btreeDarrick J. Wong2-0/+85
Add a couple of functions to the reverse mapping btree that will be used to repair the rmapbt. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-15xfs: expose various functions to repair codeDarrick J. Wong4-3/+11
Expose various helpers that the repair code will want to use. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-15xfs: add helpers to calculate btree sizeDarrick J. Wong8-3/+37
Add a bunch of helper functions that calculate the sizes of various btrees. These will be used to repair btrees and btree headers. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2018-05-10xfs: replace XFS_QMOPT_DQALLOC with a simple booleanDarrick J. Wong1-1/+0
DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the _dqget family of functions cares about is DQALLOC. Therefore, change it to a boolean 'can alloc?' flag for the dqget interfaces where that makes sense. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-05-10xfs: remove unnecessary xfs_qm_dqattach parameterDarrick J. Wong1-2/+2
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-05-10xfs: refactor XFS_QMOPT_DQNEXT out of existenceDarrick J. Wong1-1/+0
There's only one caller of DQNEXT and its semantics can be moved into a separate function, so create the function and get rid of the flag. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-05-10xfs: don't discard on free of unwritten extentsBrian Foster1-1/+2
Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-05-10xfs: add bmapi nodiscard flagBrian Foster4-12/+75
Freed extents are unconditionally discarded when online discard is enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass discards when unnecessary. For example, this will be useful for eofblocks trimming. This patch does not change behavior. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>