diff options
author | Filipe Manana <fdmanana@suse.com> | 2022-04-13 16:20:41 +0100 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2022-05-16 17:03:13 +0200 |
commit | 16b0c2581e3a81e88872ff054fca8d8f5a92ca5e (patch) | |
tree | c314a15662f086dae9fe2f56b3ff342c69e1a8ac /fs/btrfs/extent-tree.c | |
parent | 08dddb2951c96b53413cf1982e9358fa4c123183 (diff) |
btrfs: use a read/write lock for protecting the block groups tree
Currently we use a spin lock to protect the red black tree that we use to
track block groups. Most accesses to that tree are actually read only and
for large filesystems, with thousands of block groups, it actually has
a bad impact on performance, as concurrent read only searches on the tree
are serialized.
Read only searches on the tree are very frequent and done when:
1) Pinning and unpinning extents, as we need to lookup the respective
block group from the tree;
2) Freeing the last reference of a tree block, regardless if we pin the
underlying extent or add it back to free space cache/tree;
3) During NOCOW writes, both buffered IO and direct IO, we need to check
if the block group that contains an extent is read only or not and to
increment the number of NOCOW writers in the block group. For those
operations we need to search for the block group in the tree.
Similarly, after creating the ordered extent for the NOCOW write, we
need to decrement the number of NOCOW writers from the same block
group, which requires searching for it in the tree;
4) Decreasing the number of extent reservations in a block group;
5) When allocating extents and freeing reserved extents;
6) Adding and removing free space to the free space tree;
7) When releasing delalloc bytes during ordered extent completion;
8) When relocating a block group;
9) During fitrim, to iterate over the block groups;
10) etc;
Write accesses to the tree, to add or remove block groups, are much less
frequent as they happen only when allocating a new block group or when
deleting a block group.
We also use the same spin lock to protect the list of currently caching
block groups. Additions to this list are made when we need to cache a
block group, because we don't have a free space cache for it (or we have
but it's invalid), and removals from this list are done when caching of
the block group's free space finishes. These cases are also not very
common, but when they happen, they happen only once when the filesystem
is mounted.
So switch the lock that protects the tree of block groups from a spinning
lock to a read/write lock.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/extent-tree.c')
-rw-r--r-- | fs/btrfs/extent-tree.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cd79a5f4c643..963160a0c393 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2497,7 +2497,7 @@ static u64 first_logical_byte(struct btrfs_fs_info *fs_info) struct rb_node *leftmost; u64 bytenr = 0; - spin_lock(&fs_info->block_group_cache_lock); + read_lock(&fs_info->block_group_cache_lock); /* Get the block group with the lowest logical start address. */ leftmost = rb_first_cached(&fs_info->block_group_cache_tree); if (leftmost) { @@ -2506,7 +2506,7 @@ static u64 first_logical_byte(struct btrfs_fs_info *fs_info) bg = rb_entry(leftmost, struct btrfs_block_group, cache_node); bytenr = bg->start; } - spin_unlock(&fs_info->block_group_cache_lock); + read_unlock(&fs_info->block_group_cache_lock); return bytenr; } |