Age | Commit message (Collapse) | Author | Files | Lines |
|
We don't need to pass a root to flush_space since it always uses
the fs_root.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Outside of interactions with qgroups, the roots passed in extent-tree.c
are usually passed to ensure that we don't do refcounts on log trees or
to get the allocation profile for an allocation request. Otherwise, it
operates on the extent root. This patch converts some more routines in
extent-tree.c that are always called with the extent root to accept
an fs_info instead.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Added but never used.
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Unused since qgroup refactoring that split data and metadata accounting,
the btrfs_qgroup_free helper.
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Added but never needed.
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This uses a helper instead of open code around used byte of space_info
everywhere.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
We don't need to take the lock if the block group has not been cached.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
run_delalloc_nocow has used trans in two places where they don't
actually need @trans.
For btrfs_lookup_file_extent, we search for file extents without COWing
anything, and for btrfs_cross_ref_exist, the only place where we need
@trans is deferencing it in order to get running_transaction which we
could easily get from the global fs_info.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
All we need is @delayed_refs, all callers have get it ahead of calling
btrfs_find_delayed_ref_head since lock needs to be acquired firstly,
there is no reason to deference it again inside the function.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume. This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.
When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file. We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.
This patch modifies the transaction start path to select whether to
honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement. The reservation and tracking
still happens normally -- it just skips the enforcement step.
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In a lot of places, it's unclear when it's safe to reuse a struct
btrfs_key after it has been passed to a helper function. Constify these
arguments wherever possible to make it obvious.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This goes as a separate patch because fixing that inside the patches
caused too many many conflicts.
Signed-off-by: David Sterba <[email protected]>
|
|
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.
Signed-off-by: Nikolay Borisov <[email protected]>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <[email protected]>
|
|
The expression is open-coded in several places, this asks for a wrapper.
As we know the MAX_EXTENT fits to u32, we can use the appropirate
division helper. This cascades to the result type updates.
Compiler is clever enough to use shift instead of integer division, so
there's no change in the generated assembly.
Signed-off-by: David Sterba <[email protected]>
|
|
btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
If @block_group is not @used_bg, it'll try to get @used_bg's lock without
droping @block_group 's lock and lockdep has throwed a scary deadlock warning
about it.
Fix it by using down_read_nested.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In __btrfs_run_delayed_refs, when we put back a delayed ref that's too
new, we have already dropped the lock on locked_ref when we set
->processing = 0.
This patch keeps the lock to cover that assignment.
Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
fails sets locked_ref->processing = 0 but doesn't re-increment
delayed_refs->num_heads_ready. As a result, we end up triggering
the WARN_ON in btrfs_select_ref_head.
Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Reported-by: Jon Nelson <[email protected]>
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The helpers are trivial and we don't use them consistently.
Signed-off-by: David Sterba <[email protected]>
|
|
Now we only use the root parameter to print the root objectid in
a tracepoint. We can use the root parameter from the transaction
handle for that. It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
With the exception of the one case where btrfs_wait_cache_io is called
without a block group, it's called with the same arguments. The root
argument is only used in the special case, so let's factor out the core
and simplify the call in the normal case to require a trans, block group,
and path.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The extent-tree tracepoints all operate on the extent root, regardless of
which root is passed in. Let's just use the extent root objectid instead.
If it turns out that nobody is depending on the format of this tracepoint,
we can drop the root printing entirely.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable. This makes the code considerably
more readable.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
We track the node sizes per-root, but they never vary from the values
in the superblock. This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Even though a separate root is passed in, we're still operating on the
extent root. Let's use that for the trace point.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
There are 11 functions that accept a root parameter and immediately
overwrite it. We can pass those an fs_info pointer instead.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:
PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
---------------------------------------------------------------------------------------
84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80
11.02% [kernel] [k] delay_tsc
0.79% [kernel] [k] _raw_spin_unlock_irq
0.78% [kernel] [k] _raw_spin_unlock_irqrestore
0.45% [kernel] [k] do_raw_spin_lock
0.18% [kernel] [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.
To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:
PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
----------------------------------------------------------------------------------------
20.74% [kernel] [k] _raw_spin_unlock_irqrestore
16.33% [kernel] [k] __slab_alloc
5.41% [kernel] [k] lock_acquired
4.42% [kernel] [k] lock_acquire
4.05% [kernel] [k] lock_release
3.37% [kernel] [k] _raw_spin_unlock_irq
For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.
Signed-off-by: Wang Xiaoguang <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
Tested-by: Holger Hoffstätte <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Move account_shared_subtree() to qgroup.c and rename it to
btrfs_qgroup_trace_subtree().
Do the same thing for account_leaf_items() and rename it to
btrfs_qgroup_trace_leaf_items().
Since all these functions are only for qgroup, move them to qgroup.c and
export them is more appropriate.
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-and-Tested-by: Goldwyn Rodrigues <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-and-Tested-by: Goldwyn Rodrigues <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in
btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount
with qgroups enabled.
I was able to reproduce this by setting up a small (~500kb) quota limit
and writing a file one byte at a time until I hit the limit. The warnings
would all hit on umount.
The root cause is that we would reserve a block-sized range in both
the reservation and the quota in btrfs_check_data_free_space, but if we
encountered a problem (like e.g. EDQUOT), we would only release the single
byte in the qgroup reservation. That caused an iotree state split, which
increased the number of outstanding extents, in turn disallowing releasing
the metadata reservation.
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.
Signed-off-by: David Sterba <[email protected]>
|
|
During the time, the function has been shrunk to the point that it just
calls find_extent_buffer, just passing the parameters.
Signed-off-by: David Sterba <[email protected]>
|
|
Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Wang Xiaoguang <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
btrfs_should_throttle_delayed_refs()
Signed-off-by: Wang Xiaoguang <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
btrfs_map_block supports different types of mappings, which to a large
extent resemble block layer operations. But they don't always do, and
currently btrfs dangerously overlays it's own flag over the block layer
flags. This is just asking for a conflict, so introduce a different
map flags enum inside of btrfs instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This issue was found when testing in-band dedupe enospc behaviour,
sometimes run_one_delayed_ref() may fail for enospc reason, then
__btrfs_run_delayed_refs()will return, but forget to add num_heads_read
back, which will trigger "WARN_ON(delayed_refs->num_heads_ready == 0)" in
btrfs_select_ref_head().
Signed-off-by: Wang Xiaoguang <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This reverts commit 5d8eb6fe517583f9c6d5b94faf2254a0207a45c9.
When we remove devices, we free the device structures. Delaying
btfs_remove_chunk() ends up hitting a use-after-free on them.
Signed-off-by: Chris Mason <[email protected]>
|
|
Really there's lots of things that can go wrong here, kill all the
BUG_ON()'s and replace the logic ones with ASSERT()'s and return EIO
instead.
Signed-off-by: Josef Bacik <[email protected]>
[ switched to btrfs_err, errors go to common label ]
Reviewed-by: Liu Bo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Fixes: 7cf5b97650f2 ("btrfs: qgroup: Cleanup old inaccurate facilities")
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Code cleanup. count is already (unsgined long)-1. That is the reason
run_all was set. Do not reassign it (unsigned long)-1.
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The extent buffer 'next' needs to be free'd conditionally.
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
For many printks, we want to know which file system issued the message.
This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.
fs/btrfs/check-integrity.c is left alone for another day.
Signed-off-by: Jeff Mahoney <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This patch converts printk(KERN_* style messages to use the pr_* versions.
One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."
This patch unsplits user-visible strings.
Signed-off-by: Jeff Mahoney <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|