aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/extent-tree.c
AgeCommit message (Collapse)AuthorFilesLines
2017-06-29btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ↵Qu Wenruo1-5/+7
ranges [BUG] For the following case, btrfs can underflow qgroup reserved space at an error path: (Page size 4K, function name without "btrfs_" prefix) Task A | Task B ---------------------------------------------------------------------- Buffered_write [0, 2K) | |- check_data_free_space() | | |- qgroup_reserve_data() | | Range aligned to page | | range [0, 4K) <<< | | 4K bytes reserved <<< | |- copy pages to page cache | | Buffered_write [2K, 4K) | |- check_data_free_space() | | |- qgroup_reserved_data() | | Range alinged to page | | range [0, 4K) | | Already reserved by A <<< | | 0 bytes reserved <<< | |- delalloc_reserve_metadata() | | And it *FAILED* (Maybe EQUOTA) | |- free_reserved_data_space() |- qgroup_free_data() Range aligned to page range [0, 4K) Freeing 4K (Special thanks to Chandan for the detailed report and analyse) [CAUSE] Above Task B is freeing reserved data range [0, 4K) which is actually reserved by Task A. And at writeback time, page dirty by Task A will go through writeback routine, which will free 4K reserved data space at file extent insert time, causing the qgroup underflow. [FIX] For btrfs_qgroup_free_data(), add @reserved parameter to only free data ranges reserved by previous btrfs_qgroup_reserve_data(). So in above case, Task B will try to free 0 byte, so no underflow. Reported-by: Chandan Rajendra <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Chandan Rajendra <[email protected]> Tested-by: Chandan Rajendra <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29btrfs: qgroup: Introduce extent changeset for qgroup reserve functionsQu Wenruo1-10/+13
Introduce a new parameter, struct extent_changeset for btrfs_qgroup_reserved_data() and its callers. Such extent_changeset was used in btrfs_qgroup_reserve_data() to record which range it reserved in current reserve, so it can free it in error paths. The reason we need to export it to callers is, at buffered write error path, without knowing what exactly which range we reserved in current allocation, we can free space which is not reserved by us. This will lead to qgroup reserved space underflow. Reviewed-by: Chandan Rajendra <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29btrfs: qgroup: Return actually freed bytes for qgroup release or free dataQu Wenruo1-1/+1
btrfs_qgroup_release/free_data() only returns 0 or a negative error number (ENOMEM is the only possible error). This is normally good enough, but sometimes we need the exact byte count it freed/released. Change it to return actually released/freed bytenr number instead of 0 for success. And slightly modify related extent_changeset structure, since in btrfs one no-hole data extent won't be larger than 128M, so "unsigned int" is large enough for the use case. Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: rework delayed ref total_bytes_pinned accountingOmar Sandoval1-9/+32
The total_bytes_pinned counter is completely broken when accounting delayed refs: - If two drops for the same extent are merged, we will decrement total_bytes_pinned twice but only increment it once. - If an add is merged into a drop or vice versa, we will decrement the total_bytes_pinned counter but never increment it. - If multiple references to an extent are dropped, we will account it multiple times, potentially vastly over-estimating the number of bytes that will be freed by a commit and doing unnecessary work when we're close to ENOSPC. The last issue is relatively minor, but the first two make the total_bytes_pinned counter leak or underflow very often. These accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu to keep track of possibly pinned bytes"), but they were papered over by zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix race of using total_bytes_pinned"). We need to make sure that an extent is accounted as pinned exactly once if and only if we will drop references to it when when the transaction is committed. Ideally we would only add to total_bytes_pinned when the *last* reference is dropped, but this information isn't readily available for data extents. Again, this over-estimation can lead to extra commits when we're close to ENOSPC, but it's not as bad as before. The fix implemented here is to increment total_bytes_pinned when the total refmod count for an extent goes negative and decrement it if the refmod count goes back to non-negative or after we've run all of the delayed refs for that extent. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: return old and new total ref mods when adding delayed refsOmar Sandoval1-24/+27
We need this to decide when to account pinned bytes. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: always account pinned bytes when dropping a tree block refOmar Sandoval1-9/+8
Currently, we only increment total_bytes_pinned in btrfs_free_tree_block() when dropping the last reference on the block. However, when the delayed ref is run later, we will decrement total_bytes_pinned regardless of whether it was the last reference or not. This causes the counter to underflow when the reference we dropped was not the last reference. Fix it by incrementing the counter unconditionally, which is what btrfs_free_extent() does. This makes total_bytes_pinned an overestimate when references to shared extents are dropped, but in the worst case this will just make us try to commit the transaction to try to free up space and find we didn't free enough. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: update total_bytes_pinned when pinning down extentsOmar Sandoval1-1/+2
The extents marked in pin_down_extent() will be unpinned later in unpin_extent_range(), which decrements total_bytes_pinned. pin_down_extent() must increment the counter to avoid underflowing it. Also adjust btrfs_free_tree_block() to avoid accounting for the same extent twice. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()Omar Sandoval1-1/+1
The value of flags is one of DATA/METADATA/SYSTEM, they must exist at when add_pinned_bytes is called. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: David Sterba <[email protected]> [ added changelog ] Signed-off-by: David Sterba <[email protected]>
2017-06-29Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64Omar Sandoval1-21/+20
There are a few places where we pass in a negative num_bytes, so make it signed for clarity. Also move it up in the file since later patches will need it there. Signed-off-by: Omar Sandoval <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: Use btrfs_space_info_used instead of opencoding itNikolay Borisov1-6/+3
Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: Refactor update_space_infoNikolay Borisov1-40/+18
Following the factoring out of the creation code udpate_space_info can only be called for already-existing space_info structs. As such it cannot fail. Remove superfluous error handling and make the function return void. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: Separate space_info create/updateNikolay Borisov1-66/+66
Currently the struct space_info creation code is intermixed in the udpate_space_info function. There are well-defined points at which the we actually want to create brand-new space_info structs (e.g. during mount of the filesystem as well as sometimes when adding/initialising new chunks). In such cases update_space_info is called with 0 as the bytes parameter. All of this makes for spaghetti code. Fix it by factoring out the creation code in a separate create_space_info structure. This also allows to simplify the internals. Also remove BUG_ON from do_alloc_chunk since the callers handle errors. Furthermore it will make the update_space_info function not fail, allowing us to remove error handling in callers. This will come in a follow up patch. Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19Btrfs: skip commit transaction if we don't have enough pinned bytesLiu Bo1-1/+1
We commit transaction in order to reclaim space from pinned bytes because it could process delayed refs, and in may_commit_transaction(), we check first if pinned bytes are enough for the required space, we then check if that plus bytes reserved for delayed insert are enough for the required space. This changes the code to the above logic. Fixes: b150a4f10d87 ("Btrfs: use a percpu to keep track of possibly pinned bytes") Tested-by: Nikolay Borisov <[email protected]> Reported-by: Nikolay Borisov <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: remove root usage from can_overcommitJeff Mahoney1-36/+46
can_overcommit using the root to determine the allocation profile is the only use of a root in the call graph below reserve_metadata_bytes. It turns out that we only need to know whether the allocation is for the chunk root or not -- and we can pass that around as a bool instead. This allows us to pull root usage out of the reservation path all the way up to reserve_metadata_bytes itself, which uses it only to compare against fs_info->chunk_root to set the bool. In turn, this eliminates a bunch of races where we use a particular root too early in the mount process. Signed-off-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: cleanup root usage by btrfs_get_alloc_profileJeff Mahoney1-7/+21
There are two places where we don't already know what kind of alloc profile we need before calling btrfs_get_alloc_profile, but we need access to a root everywhere we call it. This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile() and relegates btrfs_system_alloc_profile to a static for use in those two cases. The next patch will eliminate one of those. Signed-off-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-19btrfs: Convert fs_info->free_chunk_space to atomic64_tNikolay Borisov1-3/+1
The ->free_chunk_space variable is used to track the unallocated space and access to it is protected by a spinlock, which is not used for anything else. Make the code a bit self-explanatory by switching the variable to an atomic64_t type and kill the spinlock. Signed-off-by: Nikolay Borisov <[email protected]> [ not a performance critical code, use of atomic type is ok ] Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-01btrfs: fix race with relocation recovery and fs_root setupJeff Mahoney1-3/+3
If we have to recover relocation during mount, we'll ultimately have to evict the orphan inode. That goes through the reservation dance, where priority_reclaim_metadata_space and flush_space expect fs_info->fs_root to be valid. That's the next thing to be set up during mount, so we crash, almost always in flush_space trying to join the transaction but priority_reclaim_metadata_space is possible as well. This call path has been problematic in the past WRT whether ->fs_root is valid yet. Commit 957780eb278 (Btrfs: introduce ticketed enospc infrastructure) added new users that are called in the direct path instead of the async path that had already been worked around. The thing is that we don't actually need the fs_root, specifically, for anything. We either use it to determine whether the root is the chunk_root for use in choosing an allocation profile or as a root to pass btrfs_join_transaction before immediately committing it. Anything that isn't the chunk root works in the former case and any root works in the latter. A simple fix is to use a root we know will always be there: the extent_root. Cc: <[email protected]> # v4.8+ Fixes: 957780eb278 (Btrfs: introduce ticketed enospc infrastructure) Signed-off-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-06-01btrfs: fix memory leak in update_space_info failure pathJeff Mahoney1-0/+1
If we fail to add the space_info kobject, we'll leak the memory for the percpu counter. Fixes: 6ab0a2029c (btrfs: publish allocation data in sysfs) Cc: <[email protected]> # v3.14+ Signed-off-by: Jeff Mahoney <[email protected]> Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-04-18btrfs: scrub: Introduce full stripe lock for RAID56Qu Wenruo1-0/+11
Unlike mirror based profiles, RAID5/6 recovery needs to read out the whole full stripe. And if we don't do proper protection, it can easily cause race condition. Introduce 2 new functions: lock_full_stripe() and unlock_full_stripe() for RAID5/6. Which store a rb_tree of mutexes for full stripes, so scrub callers can use them to lock a full stripe to avoid race. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> [ minor comment adjustments ] Signed-off-by: David Sterba <[email protected]>
2017-04-18btrfs: remove unused qgroup members from btrfs_trans_handleDavid Sterba1-1/+0
The members have been effectively unused since "Btrfs: rework qgroup accounting" (fcebe4562dec83b3), there's no substitute for assert_qgroups_uptodate so it's removed as well. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-04-18Btrfs: update comments in cache_save_setupLiu Bo1-1/+2
We also don't bother to flush free space cache while with free space tree. Cc: David Sterba <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-04-18btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_tElena Reshetova1-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-04-18btrfs: convert btrfs_caching_control.count from atomic_t to refcount_tElena Reshetova1-6/+6
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-04-18btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_tElena Reshetova1-1/+1
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-03-03Merge branch 'WIP.sched-core-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched.h split-up from Ingo Molnar: "The point of these changes is to significantly reduce the <linux/sched.h> header footprint, to speed up the kernel build and to have a cleaner header structure. After these changes the new <linux/sched.h>'s typical preprocessed size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K lines), which is around 40% faster to build on typical configs. Not much changed from the last version (-v2) posted three weeks ago: I eliminated quirks, backmerged fixes plus I rebased it to an upstream SHA1 from yesterday that includes most changes queued up in -next plus all sched.h changes that were pending from Andrew. I've re-tested the series both on x86 and on cross-arch defconfigs, and did a bisectability test at a number of random points. I tried to test as many build configurations as possible, but some build breakage is probably still left - but it should be mostly limited to architectures that have no cross-compiler binaries available on kernel.org, and non-default configurations" * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits) sched/headers: Clean up <linux/sched.h> sched/headers: Remove #ifdefs from <linux/sched.h> sched/headers: Remove the <linux/topology.h> include from <linux/sched.h> sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h> sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h> sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h> sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h> sched/headers: Remove <linux/sched.h> from <linux/sched/init.h> sched/core: Remove unused prefetch_stack() sched/headers: Remove <linux/rculist.h> from <linux/sched.h> sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h> sched/headers: Remove <linux/signal.h> from <linux/sched.h> sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> sched/headers: Remove the runqueue_is_locked() prototype sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h> sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h> sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h> sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h> ...
2017-03-02sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API ↵Ingo Molnar1-0/+1
dependency Instead of including the full <linux/signal.h>, we are going to include the types-only <linux/signal_types.h> header in <linux/sched.h>, to further decouple the scheduler header from the signal headers. This means that various files which relied on the full <linux/signal.h> need to be updated to gain an explicit dependency on it. Update the code that relies on sched.h's inclusion of the <linux/signal.h> header. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-28Merge branch 'for-chris-4.11-part2' of ↵Chris Mason1-70/+65
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11
2017-02-28btrfs: Make btrfs_orphan_add take btrfs_inodeNikolay Borisov1-1/+1
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: all btrfs_delalloc_release_metadata take btrfs_inodeNikolay Borisov1-9/+9
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: Make btrfs_delalloc_reserve_metadata take btrfs_inodeNikolay Borisov1-35/+34
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: Make btrfs_orphan_release_metadata take btrfs_inodeNikolay Borisov1-5/+5
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: Make btrfs_orphan_reserve_metadata take btrfs_inodeNikolay Borisov1-5/+5
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: Make calc_csum_metadata_size take btrfs_inodeNikolay Borisov1-15/+12
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: Make drop_outstanding_extent take btrfs_inodeNikolay Borisov1-12/+11
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: make btrfs_alloc_data_chunk_ondemand take btrfs_inodeNikolay Borisov1-4/+4
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-28btrfs: make btrfs_is_free_space_inode take btrfs_inodeNikolay Borisov1-2/+2
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-24Btrfs: fix assertion failure when freeing block groups at close_ctree()Filipe Manana1-3/+6
At close_ctree() we free the block groups and then only after we wait for any running worker kthreads to finish and shutdown the workqueues. This behaviour is racy and it triggers an assertion failure when freeing block groups because while we are doing it we can have for example a block group caching kthread running, and in that case the block group's reference count can still be greater than 1 by the time we assert its reference count is 1, leading to an assertion failure: [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: fs/btrfs/extent-tree.c, line: 9799 [19041.200584] ------------[ cut here ]------------ [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! [19041.202830] invalid opcode: 0000 [#1] PREEMPT SMP [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 4.9.0-rc7-btrfs-next-36+ #1 [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [19041.208082] task: ffff88015f028980 task.stack: ffffc9000ad34000 [19041.208082] RIP: 0010:[<ffffffffa03e319e>] [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP: 0018:ffffc9000ad37d60 EFLAGS: 00010286 [19041.208082] RAX: 0000000000000061 RBX: ffff88015ecb4000 RCX: 0000000000000001 [19041.208082] RDX: ffff88023f392fb8 RSI: ffffffff817ef7ba RDI: 00000000ffffffff [19041.208082] RBP: ffffc9000ad37d60 R08: 0000000000000001 R09: 0000000000000000 [19041.208082] R10: ffffc9000ad37cb0 R11: ffffffff82f2b66d R12: ffff88023431d170 [19041.208082] R13: ffff88015ecb40c0 R14: ffff88023431d000 R15: ffff88015ecb4100 [19041.208082] FS: 00007f44f3d42840(0000) GS:ffff88023f380000(0000) knlGS:0000000000000000 [19041.208082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19041.208082] CR2: 00007f65d623b000 CR3: 00000002166f2000 CR4: 00000000000006e0 [19041.208082] Stack: [19041.208082] ffffc9000ad37d98 ffffffffa035989f ffff88015ecb4000 ffff88015ecb5630 [19041.208082] ffff88014f6be000 0000000000000000 00007ffcf0ba6a10 ffffc9000ad37df8 [19041.208082] ffffffffa0368cd4 ffff88014e9658e0 ffffc9000ad37e08 ffffffff811a634d [19041.208082] Call Trace: [19041.208082] [<ffffffffa035989f>] btrfs_free_block_groups+0x17f/0x392 [btrfs] [19041.208082] [<ffffffffa0368cd4>] close_ctree+0x1c5/0x2e1 [btrfs] [19041.208082] [<ffffffff811a634d>] ? evict_inodes+0x132/0x141 [19041.208082] [<ffffffffa034356d>] btrfs_put_super+0x15/0x17 [btrfs] [19041.208082] [<ffffffff8118fc32>] generic_shutdown_super+0x6a/0xeb [19041.208082] [<ffffffff8119004f>] kill_anon_super+0x12/0x1c [19041.208082] [<ffffffffa0343370>] btrfs_kill_super+0x16/0x21 [btrfs] [19041.208082] [<ffffffff8118fad1>] deactivate_locked_super+0x3b/0x68 [19041.208082] [<ffffffff8118fb34>] deactivate_super+0x36/0x39 [19041.208082] [<ffffffff811a9946>] cleanup_mnt+0x58/0x76 [19041.208082] [<ffffffff811a99a2>] __cleanup_mnt+0x12/0x14 [19041.208082] [<ffffffff81071573>] task_work_run+0x6f/0x95 [19041.208082] [<ffffffff81001897>] prepare_exit_to_usermode+0xa3/0xc1 [19041.208082] [<ffffffff81001a23>] syscall_return_slowpath+0x16e/0x1d2 [19041.208082] [<ffffffff814c607d>] entry_SYSCALL_64_fastpath+0xab/0xad [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 [19041.208082] RIP [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP <ffffc9000ad37d60> [19041.279264] ---[ end trace 23330586f16f064d ]--- This started happening as of kernel 4.8, since commit f3bca8028bd9 ("Btrfs: add ASSERT for block group's memory leak") introduced these assertions. So fix this by freeing the block groups only after waiting for all worker kthreads to complete and shutdown the workqueues. Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: Liu Bo <[email protected]>
2017-02-17btrfs: free-space-cache, clean up unnecessary root argumentsJeff Mahoney1-4/+5
The free space cache APIs accept a root but always use the tree root. Also, btrfs_truncate_free_space_cache accepts a root AND an inode but the inode always points to the root anyway, so let's just pass the inode. Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: convert btrfs_inc_block_group_ro to accept fs_infoJeff Mahoney1-3/+2
btrfs_inc_block_group_ro is either passed the extent root or the dev root, but it doesn't do anything with the dev tree. Let's convert to passing an fs_info and using the extent root. Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: flush_space always takes fs_info->fs_rootJeff Mahoney1-10/+10
We don't need to pass a root to flush_space since it always uses the fs_root. Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: pass fs_info to (more) routines that are only called with extent_rootJeff Mahoney1-50/+53
Outside of interactions with qgroups, the roots passed in extent-tree.c are usually passed to ensure that we don't do refcounts on log trees or to get the allocation profile for an allocation request. Otherwise, it operates on the extent root. This patch converts some more routines in extent-tree.c that are always called with the extent root to accept an fs_info instead. Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: remove unused parameter from btrfs_prepare_extent_commitDavid Sterba1-2/+1
Added but never used. Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: remove unused parameter from btrfs_subvolume_release_metadataDavid Sterba1-2/+1
Unused since qgroup refactoring that split data and metadata accounting, the btrfs_qgroup_free helper. Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17btrfs: remove unused parameter from clean_tree_blockDavid Sterba1-2/+2
Added but never needed. Reviewed-by: Liu Bo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17Btrfs: use helper to get used bytes of space_infoLiu Bo1-22/+19
This uses a helper instead of open code around used byte of space_info everywhere. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-17Btrfs: try to avoid acquiring free space ctl's lockLiu Bo1-11/+13
We don't need to take the lock if the block group has not been cached. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-14Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_existLiu Bo1-10/+12
run_delalloc_nocow has used trans in two places where they don't actually need @trans. For btrfs_lookup_file_extent, we search for file extents without COWing anything, and for btrfs_cross_ref_exist, the only place where we need @trans is deferencing it in order to get running_transaction which we could easily get from the global fs_info. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-14Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_headLiu Bo1-3/+3
All we need is @delayed_refs, all callers have get it ahead of calling btrfs_find_delayed_ref_head since lock needs to be acquired firstly, there is no reason to deference it again inside the function. Signed-off-by: Liu Bo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-14btrfs: allow unlink to exceed subvolume quotaJeff Mahoney1-2/+2
Once a qgroup limit is exceeded, it's impossible to restore normal operation to the subvolume without modifying the limit or removing the subvolume. This is a surprising situation for many users used to the typical workflow with quotas on other file systems where it's possible to remove files until the used space is back under the limit. When we go to unlink a file and start the transaction, we'll hit the qgroup limit while trying to reserve space for the items we'll modify while removing the file. We discussed last month how best to handle this situation and agreed that there is no perfect solution. The best principle-of-least-surprise solution is to handle it similarly to how we already handle ENOSPC when unlinking, which is to allow the operation to succeed with the expectation that it will ultimately release space under most circumstances. This patch modifies the transaction start path to select whether to honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv is the only caller that skips enforcement. The reservation and tracking still happens normally -- it just skips the enforcement step. Signed-off-by: Jeff Mahoney <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2017-02-14Btrfs: constify struct btrfs_{,disk_}key wherever possibleOmar Sandoval1-4/+5
In a lot of places, it's unclear when it's safe to reuse a struct btrfs_key after it has been passed to a helper function. Constify these arguments wherever possible to make it obvious. Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>