aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)AuthorFilesLines
2018-05-28Btrfs: allow empty subvol= againOmar Sandoval1-0/+3
I got a report that after upgrading to 4.16, someone's filesystems weren't mounting: [ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol=' Before 4.16, this mounted the default subvolume. It turns out that this empty "subvol=" is actually an application bug, but it was causing the application to fail, so it's an ABI break if you squint. The generic parsing code we use for mount options (match_token()) doesn't match an empty string as "%s". Previously, setup_root_args() removed the "subvol=" string, but the mount path was cleaned up to not need that. Add a dummy Opt_subvol_empty to fix this. The simple workaround is to use / or . for the value of 'subvol=' . Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") CC: [email protected] # 4.16+ Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: fix describe_relocation when printing unknown flagsAnand Jain1-1/+1
Looks like the original idea was to print the hex of the flags which is not coded with their flag name. So use the current buf pointer bp instead of buf. Reaching the uknown flags should never happen, it's there just in case. Fixes: ebce0e01b930b ("btrfs: make block group flags in balance printks human-readable") Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: use kvzalloc for EXTENT_SAME temporary dataDavid Sterba1-7/+9
The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the arrays can be 32KiB large. To avoid allocation failures due to fragmented memory, use the allocation with fallback to vmalloc. The arrays are allocated and freed only inside btrfs_extent_same and reused for all the ranges. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: reuse cmp workspace in EXTENT_SAME ioctlTimofey Titovets1-39/+40
We support big dedup requests by splitting range to smaller parts, and call dedupe logic on each of them. Instead of repeated allocation and deallocation, allocate once at the beginning and reuse in the iteration. Signed-off-by: Timofey Titovets <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: dedupe_file_range ioctl: remove 16MiB restrictionTimofey Titovets1-6/+18
Currently btrfs_dedupe_file_range silently restricts the dedupe range to to 16MiB to limit locking and working memory size and is documented in manual page as implementation specific. Let's remove that restriction by iterating over the dedup range in 16MiB steps. This is backward compatible and will not change anything for requests smaller then 16MiB. Signed-off-by: Timofey Titovets <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: split btrfs_extent_sameTimofey Titovets1-28/+36
Split btrfs_extent_same() to two parts where one is the main EXTENT_SAME entry and a helper that can be repeatedly called on a range. This will be used in following patches. Signed-off-by: Timofey Titovets <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: reserve space for O_TMPFILE orphan item deletionOmar Sandoval1-1/+2
btrfs_link() calls btrfs_orphan_del() if it's linking an O_TMPFILE but it doesn't reserve space to do so. Even before the removal of the orphan_block_rsv it wasn't using it. Fixes: ef3b9af50bfa ("Btrfs: implement inode_operations callback tmpfile") Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: renumber BTRFS_INODE_ runtime flags and switch to enumsOmar Sandoval1-9/+11
We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make them consecutive again. Signed-off-by: Omar Sandoval <[email protected]> [ switch them enums so we don't have to do that again ] Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: get rid of unused orphan infrastructureOmar Sandoval6-99/+1
Now that we don't keep long-standing reservations for orphan items, root->orphan_block_rsv isn't used. We can git rid of it, along with: - root->orphan_lock, which was used to protect root->orphan_block_rsv - root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv - BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations in root->orphan_block_rsv - btrfs_orphan_commit_root(), which was the last user of any of these and does nothing else Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: fix ENOSPC caused by orphan items reservationsOmar Sandoval1-120/+38
Currently, we keep space reserved for all inode orphan items until the inode is evicted (i.e., all references to it are dropped). We hit an issue where an application would keep a bunch of deleted files open (by design) and thus keep a large amount of space reserved, causing ENOSPC errors when other operations tried to reserve space. This long-standing reservation isn't absolutely necessary for a couple of reasons: - We can almost always make the reservation we need or steal from the global reserve for the orphan item - If we can't, it's not the end of the world if we drop the orphan item on the floor and let the next mount clean it up So, get rid of persistent reservation and just reserve space in btrfs_evict_inode(). Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: refactor btrfs_evict_inode() reserve refill danceOmar Sandoval1-71/+42
The truncate loop in btrfs_evict_inode() does two things at once: - It refills the temporary block reserve, potentially stealing from the global reserve or committing - It calls btrfs_truncate_inode_items() The tangle of continues hides the fact that these two steps are actually separate. Split the first step out into a separate function both for clarity and so that we can reuse it in a later patch. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: don't return ino to ino cache if inode item removal failsOmar Sandoval1-12/+13
In btrfs_evict_inode(), if btrfs_truncate_inode_items() fails, the inode item will still be in the tree but we still return the ino to the ino cache. That will blow up later when someone tries to allocate that ino, so don't return it to the cache. Fixes: 581bb050941b ("Btrfs: Cache free inode numbers in memory") Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: delete dead code in btrfs_orphan_commit_root()Omar Sandoval1-12/+0
btrfs_orphan_commit_root() tries to delete an orphan item for a subvolume in the tree root, but we don't actually insert that item in the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in btrfs_orphan_add()"). We can get rid of it. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: get rid of BTRFS_INODE_HAS_ORPHAN_ITEMOmar Sandoval2-57/+20
Now that we don't add orphan items for truncate, there can't be races on adding or deleting an orphan item, so this bit is unnecessary. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: stop creating orphan items for truncateOmar Sandoval2-114/+51
Currently, we insert an orphan item during a truncate so that if there's a crash, we don't leak extents past the on-disk i_size. However, since commit 7f4f6e0a3f6d ("Btrfs: only update disk_i_size as we remove extents"), we keep disk_i_size in sync with the extent items as we truncate, so orphan cleanup will never have any extents to remove. Don't bother with the superfluous orphan item. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()Omar Sandoval1-1/+4
btrfs_free_extent() can fail because of ENOMEM. There's no reason to panic here, we can just abort the transaction. Fixes: f4b9aa8d3b87 ("btrfs_truncate") Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: fix error handling in btrfs_truncate_inode_items()Omar Sandoval1-27/+28
btrfs_truncate_inode_items() uses two variables for error handling, ret and err. These are not handled consistently, leading to a couple of bugs. - Errors from btrfs_del_items() are handled but not propagated to the caller - If btrfs_run_delayed_refs() fails and aborts the transaction, we continue running Just use ret everywhere and simplify things a bit, fixing both of these issues. Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling") Fixes: 1262133b8d6f ("Btrfs: account for crcs in delayed ref processing") Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28Btrfs: update stale comments referencing vmtruncate()Omar Sandoval1-3/+2
Commit a41ad394a03b ("Btrfs: convert to the new truncate sequence") changed btrfs_setsize() to call truncate_setsize() instead of vmtruncate() but didn't update the comment above it. truncate_setsize() never fails (the IS_SWAPFILE() check happens elsewhere), so remove the comment. Additionally, the comment above btrfs_page_mkwrite() references vmtruncate(), but truncate_setsize() does the size write and page locking now. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Remove stale comment about select_delayed_refNikolay Borisov1-4/+0
select_delayed_ref really just gets the next delayed ref which has to be processed - either an add ref or drop ref. We never go back for anything. So the comment is actually bogus, just remove it. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: sysfs: Add entry which shows if rmdir can work on subvolumesMisono Tomohiro1-0/+39
Deletion of a subvolume by rmdir(2) has become allowed by the 'commit cd2decf640b1 ("btrfs: Allow rmdir(2) to delete an empty subvolume")'. It is a kind of new feature and this commits add a sysfs entry /sys/fs/btrfs/features/rmdir_subvol to indicate the availability of the feature so that a user program (e.g. fstests) can detect it. Prior to this commit, all entries in /sys/fs/btrfs/features are feature which depend on feature bits of superblock (i.e. each feature affects on-disk format) and managed by attribute_group "btrfs_feature_attr_group". For each fs, entries in /sys/fs/btrfs/UUID/features indicate which features are enabled (or can be changed online) for the fs. However, rmdir_subvol feature only depends on kernel module. Therefore new attribute_group "btrfs_static_feature_attr_group" is introduced and sysfs_merge_group() is used to share /sys/fs/btrfs/features directory. Features in "btrfs_static_feature_attr_group" won't be listed in each /sys/fs/btrfs/UUID/features. Signed-off-by: Tomohiro Misono <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: sysfs: Use enum/define value for feature array definitionsTomohiro Misono2-7/+8
Use existing named values instead of the raw numbers. Signed-off-by: Tomohiro Misono <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: add prefix "balance:" for log messagesAnand Jain1-14/+24
Kernel logs are very important for the forensic investigations of the issues in general make it easy to use it. This patch adds 'balance:' prefix so that it can be easily searched. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: unify naming of flags variables for SETFLAGS and XFLAGSDavid Sterba1-53/+53
* The simple 'flags' refer to the btrfs inode * ... that's in 'binode * the FS_*_FL variables are 'fsflags' * the old copies of the variable are prefixed by 'old_' * Struct inode flags contain 'i_flags'. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: add FS_IOC_FSSETXATTR ioctlDavid Sterba1-0/+94
The new ioctl is an extension to the FS_IOC_SETFLAGS and adds new flags and is extensible. Don't get fooled by the XATTR in the name, it does not have anything in common with the extended attributes, incidentally also abbreviated as XATTRs. This patch allows to set the xflags portion of the fsxattr structure, other items have no meaning and non-zero values will result in EOPNOTSUPP. Currently supported xflags: - APPEND - IMMUTABLE - NOATIME - NODUMP - SYNC The structure of btrfs_ioctl_fssetxattr copies btrfs_ioctl_setflags but is simpler on the flag setting side. The original patch was written by Chandan Jay Sharma but was incomplete and no further revision has been sent. Based-on-patches-by: Chandan Jay Sharma <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: add FS_IOC_FSGETXATTR ioctlDavid Sterba1-0/+20
The new ioctl is an extension to the FS_IOC_GETFLAGS and adds new flags and is extensible. This patch allows to return the xflags portion of the fsxattr structure, other items have no meaning for btrfs or can be added later. The original patch was written by Chandan Jay Sharma but was incomplete and no further revision has been sent. Several cleanups were necessary to avoid confusion with other ioctls, as we have another flavor of flags. Based-on-patches-by: Chandan Jay Sharma <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: add helpers for FS_XFLAG_* conversionDavid Sterba1-0/+32
Preparatory work for the FS_IOC_FSGETXATTR ioctl, basic conversions and checking helpers. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: rename btrfs_flags_to_ioctl to reflect which flags it touchesDavid Sterba1-4/+5
Converts btrfs_inode::flags to the FS_*_FL flags. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: rename check_flags to reflect which flags it touchesDavid Sterba1-2/+3
The FS_*_FL flags cannot be easily identified by a prefix but we still need to recognize them so the 'fsflags' should be closer to the naming scheme but again the 'fs' part sounds like it's a filesystem flag. I don't have a better idea for now. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: rename btrfs_mask_flags to reflect which flags it touchesDavid Sterba1-4/+5
The FS_*_FL flags cannot be easily identified by a variable name prefix but we still need to recognize them so the 'fsflags' should be closer to the naming scheme but again the 'fs' part sounds like it's a filesystem flag. I don't have a better idea for now. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: rename btrfs_update_iflags to reflect which flags it touchesDavid Sterba3-5/+5
The btrfs inode flag flavour is now simply called 'inode flags' and the vfs inode are i_flags. Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: use common variable for fs_devices in btrfs_destroy_dev_replace_tgtdevAnand Jain1-5/+7
Use a local btrfs_fs_devices variable to access the structure. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: drop uuid_mutex in btrfs_destroy_dev_replace_tgtdevAnand Jain1-2/+0
Delete the uuid_mutex lock here as this thread accesses the btrfs_fs_devices::devices only (counters or called functions do a list traversal). And the device_list_mutex lock is already taken. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> [ update changelog ] Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: drop uuid_mutex in btrfs_dev_replace_finishingAnand Jain1-3/+0
btrfs_dev_replace_finishing updates devices (soruce and target) which are within the btrfs_fs_devices::devices or withint the cloned seed devices (btrfs_fs_devices::seed::devices), so we don't need the global uuid_mutex. The device replace context is also locked by its own locks. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devicesAnand Jain1-2/+3
btrfs_open_devices() is using the uuid_mutex, but as btrfs_open_devices is just limited to openning all the devices under for given fsid, so we don't need uuid_mutex. Instead it should hold the device_list_mutex as it updates the members of the btrfs_fs_devices and btrfs_device and not the whole fs_devs list. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> [ update changelog ] Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: document uuid_mutex uasge in read_chunk_treeAnand Jain1-0/+4
read_chunk_tree() calls read_one_dev(), but for seed device we have to search the fs_uuids list, so we need the uuid_mutex. Add a comment comment, so that we can improve this part. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: use existing cur_devices, cleanup btrfs_rm_deviceAnand Jain1-4/+9
Instead of de-referencing the device->fs_devices use cur_devices which points to the same fs_devices and does not change. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: reduce uuid_mutex critical section while scanning devicesAnand Jain1-7/+5
The generic block device lookup or cleanup does not need the uuid mutex, that's only for the device_list_add. Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> [ update changelog ] Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Unexport and rename btrfs_invalidate_inodesNikolay Borisov2-65/+65
This function is no longer used outside of inode.c so just make it static. At the same time give a more becoming name, since it's not really invalidating the inodes but just calling d_prune_alias. Last, but not least - move the function above the sole caller to avoid introducing yet-another-pointless forward declaration. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: replace waitqueue_actvie with cond_wake_upDavid Sterba9-91/+40
Use the wrappers and reduce the amount of low-level details about the waitqueue management. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeupsDavid Sterba1-2/+8
Currently the code assumes that there's an implied barrier by the sequence of code preceding the wakeup, namely the mutex unlock. As Nikolay pointed out: I think this is wrong (not your code) but the original assumption that the RELEASE semantics provided by mutex_unlock is sufficient. According to memory-barriers.txt: Section 'LOCK ACQUISITION FUNCTIONS' states: (2) RELEASE operation implication: Memory operations issued before the RELEASE will be completed before the RELEASE operation has completed. Memory operations issued after the RELEASE *may* be completed before the RELEASE operation has completed. (I've bolded the may portion) The example given there: As an example, consider the following: *A = a; *B = b; ACQUIRE *C = c; *D = d; RELEASE *E = e; *F = f; The following sequence of events is acceptable: ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE So if we assume that *C is modifying the flag which the waitqueue is checking, and *E is the actual wakeup, then those accesses can be re-ordered... IMHO this code should be considered broken... --- To be on the safe side, add the barriers. The synchronization logic around log using the mutexes and several other threads does not make it easy to reason for/against the barrier. CC: Nikolay Borisov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: introduce conditional wakeup helpersDavid Sterba1-0/+22
Add convenience wrappers for the waitqueue management that involves memory barriers to prevent deadlocks. The helpers will let us remove barriers and the necessary comments in several places. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: qgroup: Finish rescan when hit the last leaf of extent treeQu Wenruo1-0/+19
Under the following case, qgroup rescan can double account cowed tree blocks: In this case, extent tree only has one tree block. - | transid=5 last committed=4 | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 4). | Scan it, set qgroup_rescan_progress to the last | EXTENT/META_ITEM + 1 | now qgroup_rescan_progress = A + 1. | | fs tree get CoWed, new tree block is at A + 16K | transid 5 get committed - | transid=6 last committed=5 | btrfs_qgroup_rescan_worker() | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 5). | scan it using qgroup_rescan_progress (A + 1). | found new tree block beyong A, and it's fs tree block, | account it to increase qgroup numbers. - In above case, tree block A, and tree block A + 16K get accounted twice, while qgroup rescan should stop when it already reach the last leaf, other than continue using its qgroup_rescan_progress. Such case could happen by just looping btrfs/017 and with some possibility it can hit such double qgroup accounting problem. Fix it by checking the path to determine if we should finish qgroup rescan, other than relying on next loop to exit. Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: qgroup: Search commit root for rescan to avoid missing extentQu Wenruo1-3/+6
When doing qgroup rescan using the following script (modified from btrfs/017 test case), we can sometimes hit qgroup corruption. ------ umount $dev &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f -n 64k $dev mount $dev $mnt extent_size=8192 xfs_io -f -d -c "pwrite 0 $extent_size" $mnt/foo > /dev/null btrfs subvolume snapshot $mnt $mnt/snap xfs_io -f -c "reflink $mnt/foo" $mnt/foo-reflink > /dev/null xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink > /dev/null xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink2 > /dev/unll btrfs quota enable $mnt # -W is the new option to only wait rescan while not starting new one btrfs quota rescan -W $mnt btrfs qgroup show -prce $mnt umount $mnt # Need to patch btrfs-progs to report qgroup mismatch as error btrfs check $dev || _fail ------ For fast machine, we can hit some corruption which missed accounting tree blocks: ------ qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 8.00KiB 0.00B none none --- --- 0/257 8.00KiB 0.00B none none --- --- ------ This is due to the fact that we're always searching commit root for btrfs_find_all_roots() at qgroup_rescan_leaf(), but the leaf we get is from current transaction, not commit root. And if our tree blocks get modified in current transaction, we won't find any owner in commit root, thus causing the corruption. Fix it by searching commit root for extent tree for qgroup_rescan_leaf(). Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: take the last remnants of ->d_fsdata use outAl Viro1-6/+0
[spotted while going through ->d_fsdata handling around d_splice_alias(); don't really care which tree that goes through] The only thing even looking at ->d_fsdata in there (since 2012) had been kfree(dentry->d_fsdata) in btrfs_dentry_delete(). Which, incidentally, is all btrfs_dentry_delete() does. Signed-off-by: Al Viro <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Do super block verification before writing it to diskQu Wenruo1-0/+43
There are already 2 reports about strangely corrupted super blocks, where csum still matches but extra garbage gets slipped into super block. The corruption would looks like: ------ superblock: bytenr=65536, device=/dev/sdc1 --------------------------------------------------------- csum_type 41700 (INVALID) csum 0x3b252d3a [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x5b22400000000169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x5b22400000000000 ) ... ------ Or ------ superblock: bytenr=65536, device=/dev/mapper/x --------------------------------------------------------- csum_type 35355 (INVALID) csum_size 32 csum 0xf0dbeddd [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x176d200000000169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x176d200000000000 ) ------ Obviously, csum_type and incompat_flags get some garbage, but its csum still matches, which means kernel calculates the csum based on corrupted super block memory. And after manually fixing these values, the filesystem is completely healthy without any problem exposed by btrfs check. Although the cause is still unknown, at least detect it and prevent further corruption. Both reports have same symptoms, there's an overwrite on offset 192 of the superblock, by 4 bytes. The superblock structure is not allocated or freed and stays in the memory for the whole filesystem lifetime, so it's not a use-after-free kind of error on someone else's leaked page. As a vague point for the problable cause is mentioning of other system freezing related to graphic card drivers. Reported-by: Ken Swenson <[email protected]> Reported-by: Ben Parsons <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> [ add brief analysis of the reports ] Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Refactor btrfs_check_super_validQu Wenruo1-4/+25
Refactor btrfs_check_super_valid: 1) Rename it to btrfs_validate_mount_super() Now it's more obvious when the function should be called. 2) Extract core check routine into validate_super() Later write time check can reuse it, and if needed, we could also use validate_super() to check each super block. 3) Add more comments about btrfs_validate_mount_super() Mostly about what it doesn't check and when it should be called. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> [ rename to validate_super ] Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Move btrfs_check_super_valid() to avoid forward declarationQu Wenruo1-150/+149
Move btrfs_check_super_valid() before its single caller to avoid forward declaration. Though such code motion is not recommended as it pollutes git history, in this case the following patches would need to add new forward declarations for static functions that we want to avoid. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Remove fs_info argument from populate_free_space_treeNikolay Borisov1-4/+3
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Remove fs_info argument from add_to_free_space_treeNikolay Borisov3-5/+3
This function takes a transaction handle which already contains a reference to the fs_info. So use it and remove the extra function argument. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-05-28btrfs: Remove fs_info argument from remove_from_free_space_treeNikolay Borisov3-8/+4
This function alreay takes a transaction handle which holds a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>