aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: New error message helpersKent Overstreet14-171/+171
Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fiemap: Fix a lockdep splatKent Overstreet1-1/+4
As with the previous patch, we generally can't hold btree locks while copying to userspace, as that may incur a page fault and require mmap_lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: seqmutex; fix a lockdep splatKent Overstreet5-23/+96
We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't call lock_graph_descend() with wait lock heldKent Overstreet1-6/+15
This fixes a deadlock: 01305 WARNING: possible circular locking dependency detected 01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G W 01305 ------------------------------------------------------ 01305 cat/14658 is trying to acquire lock: 01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278 01305 01305 but task is already holding lock: 01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58 01305 01305 which lock already depends on the new lock. 01305 01305 01305 the existing dependency chain (in reverse order) is: 01305 01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}: 01305 _raw_spin_lock+0x54/0x70 01305 __six_lock_wakeup+0x40/0x1b0 01305 six_unlock_ip+0xe8/0x248 01305 bch2_btree_key_cache_scan+0x720/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #1 (&c->lock#2){+.+.}-{3:3}: 01305 __mutex_lock+0x104/0x14a0 01305 mutex_lock_nested+0x30/0x40 01305 bch2_btree_key_cache_scan+0x5c/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #0 (fs_reclaim){+.+.}-{0:0}: 01305 __lock_acquire+0x19d0/0x2930 01305 lock_acquire+0x1dc/0x458 01305 fs_reclaim_acquire+0x9c/0xe0 01305 __kmem_cache_alloc_node+0x48/0x278 01305 __kmalloc_node_track_caller+0x5c/0x278 01305 krealloc+0x94/0x180 01305 bch2_printbuf_make_room.part.0+0xac/0x118 01305 bch2_prt_printf+0x150/0x1e8 01305 bch2_btree_bkey_cached_common_to_text+0x170/0x298 01305 bch2_btree_trans_to_text+0x244/0x348 01305 print_cycle+0x7c/0xb0 01305 break_cycle+0x254/0x528 01305 bch2_check_for_deadlock+0x59c/0xa58 01305 bch2_btree_deadlock_read+0x174/0x200 01305 full_proxy_read+0x94/0xf0 01305 vfs_read+0x15c/0x3a8 01305 ksys_read+0xb8/0x148 01305 __arm64_sys_read+0x48/0x60 01305 invoke_syscall.constprop.0+0x64/0x138 01305 do_el0_svc+0x84/0x138 01305 el0_svc+0x34/0x80 01305 el0t_64_sync_handler+0xb0/0xb8 01305 el0t_64_sync+0x14c/0x150 01305 01305 other info that might help us debug this: 01305 01305 Chain exists of: 01305 fs_reclaim --> &c->lock#2 --> &lock->wait_lock 01305 01305 Possible unsafe locking scenario: 01305 01305 CPU0 CPU1 01305 ---- ---- 01305 lock(&lock->wait_lock); 01305 lock(&c->lock#2); 01305 lock(&lock->wait_lock); 01305 lock(fs_reclaim); 01305 01305 *** DEADLOCK *** Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_check_discard_freespace_key()Kent Overstreet1-12/+35
We weren't correctly checking the freespace btree - it's an extents btree, which means we need to iterate over each bucket in a freespace extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_unlock_noassert()Kent Overstreet3-1/+11
This fixes a spurious assert in the btree node read path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_btree_update_start()Kent Overstreet1-1/+1
The calculation for number of nodes to allocate in bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the small nodes test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_extent_ptr_desired_durability()Kent Overstreet3-8/+23
This adds a new helper for getting a pointer's durability irrespective of the device state, and uses it in the the data update path. This fixes a bug where we do a data update but request 0 replicas to be allocated, because the replica being rewritten is on a device marked as failed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: snapshot_to_text() includes snapshot treeKent Overstreet1-2/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix try_decrease_writepoints()Kent Overstreet1-17/+21
- We may need to drop btree locks before taking the writepoint_lock, as is done in other places. - We should be using open_bucket_free_unused(), so that we don't waste space. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete weird hacky transaction restart injectionKent Overstreet1-3/+0
since we currently don't have a good fault injection library, bch2_btree_insert_node() was randomly injecting faults based on local_clock(). At the very least this should have been a debug mode only thing, but this is a brittle method so let's just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Write buffer flush needs BTREE_INSERT_NOCHECK_RWKent Overstreet1-0/+1
btree write buffer flush is only invoked from contexts that already hold a write ref, and checking if we're still RW could cause us to fail to completely flush the write buffer when shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New assertions when marking filesystem cleanKent Overstreet1-0/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: ec: Fix a lost wakeupKent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix NULL pointer dereference in try_alloc_bucketMikulas Patocka1-1/+2
On Mon, 29 May 2023, Mikulas Patocka wrote: > The oops happens in set_btree_iter_dontneed and it is caused by the fact > that iter->path is NULL. The code in try_alloc_bucket is buggy because it > sets "struct btree_iter iter = { NULL };" and then jumps to the "err" > label that tries to dereference values in "iter". Here I'm sending a patch for it. From: Mikulas Patocka <mpatocka@redhat.com> The function try_alloc_bucket sets the variable "iter" to NULL and then (on various error conditions) jumps to the label "err". On the "err" label, it calls "set_btree_iter_dontneed" that tries to dereference "iter->trans" and "iter->path". So, we get an oops on error condition. This patch fixes the crash by testing that iter.trans and iter.path is non-zero before calling set_btree_iter_dontneed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix subvol deletion deadlockKent Overstreet1-15/+11
d_prune_aliases() may call bch2_evict_inode(), which needs c->vfs_inodes_list_lock. Fix this by always calling igrab() before putting the inodes onto our disposal list, and then calling d_prune_aliases() with c->vfs_inodes_lock dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: don't spin in rebalance when background target is not usableBrian Foster2-1/+10
If a bcachefs filesystem is configured with a background device (disk group), rebalance will relocate data to this device in the background by checking extent keys for whether they currently reside in the specified target. For keys that do not, rebalance performs a read/write cycle to allow the write path to properly relocate data. If the background target is not usable (read-only, for example), however, the write path doesn't actually move data to another device. Instead, rebalance spins indefinitely reading and rewriting the same data over and over to the same device. If the background target is made available again, the rebalance picks this up, relocates the data, and eventually terminates. To avoid this spinning behavior, update the rebalance background target logic to not only check whether the extent is not in the target, but whether the target is actually usable as well. If not, then don't mark the key for rewrite. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: push rcu lock down into bch2_target_to_mask()Brian Foster2-5/+13
We have one caller that cycles the rcu lock solely for this call (via target_rw_devs()), and we'd like to add another. Simplify things by pushing the rcu lock down into bch2_target_to_mask(), similar to how bch2_dev_in_target() works. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: create internal disk_groups sysfs fileBrian Foster3-0/+42
We have bch2_sb_disk_groups_to_text() to dump disk group labels, but no good information on device group membership at runtime. Add bch2_disk_groups_to_text() and an associated 'disk_groups' sysfs file to print group and device relationships. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Clean up tests codeKent Overstreet1-59/+18
- delete redundant error messages - convert various code to bch2_trans_run Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve backpointers error messageKent Overstreet1-1/+1
the error message here dated from when backpointers could be stored in alloc keys; now, we should always print the full key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More drop_locks_do() conversionsKent Overstreet4-59/+18
Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete warning from promote_alloc()Kent Overstreet1-3/+4
It's possible to see a -BCH_ERR_ENOSPC_disk_reservation here, and that's fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_fsck_ask_yn()Kent Overstreet1-1/+2
- getline() output includes a newline, without stripping that we were just looping - Make the prompt clearer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: replicas_deltas_realloc() uses allocate_dropping_locks()Kent Overstreet1-25/+56
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert acl.c to allocate_dropping_locks()Kent Overstreet1-7/+16
More work to avoid allocating memory with btree locks held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: allocate_dropping_locks()Kent Overstreet3-19/+31
Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use unlikely() in bch2_err_matches()Kent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix error handling in promote pathKent Overstreet1-2/+4
The promote path had a BUG_ON() for unknown error type, which we're now seeing: change it to a WARN_ON() - because we're curious what this is - and otherwise handle it in the normal error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fs-io: Eliminate GFP_NOFS usageKent Overstreet1-3/+4
GFP_NOFS doesn't ever make sense. If we're allocatingc memory it should be GFP_NOWAIT if btree locks are held, GFP_KERNEL otherwise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_kmalloc no longer allocates memory with btree locks heldKent Overstreet1-8/+21
When allocating memory, gfp flags should generally be - GFP_NOWAIT|__GFP_NOWARN if btree locks are held - GFP_NOFS if in the IO path or otherwise holding resources needed for IO submission - GFP_KERNEL otherwise Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: drop_locks_do()Kent Overstreet5-37/+15
Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet10-29/+29
GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure bch2_btree_node_get() calls relock() after unlock()Kent Overstreet1-12/+19
Fix a bug where bch2_btree_node_get() might call bch2_trans_unlock() (in fill) without calling bch2_trans_relock(); this is a bug when it's done in the core btree code. Also, twea bch2_btree_node_mem_alloc() to drop btree locks before doing a blocking memory allocation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Avoid __GFP_NOFAILKent Overstreet1-41/+50
We've been using __GFP_NOFAIL for allocating struct bch_folio, our private per-folio state. However, that struct is variable size - it holds state for each sector in the folio, and folios can be quite large now, which means it's possible for bch_folio to be larger than PAGE_SIZE now. __GFP_NOFAIL allocations are undesirable in normal circumstances, but particularly so at >= PAGE_SIZE, and warnings are emitted for that. So, this patch adds proper error paths and eliminates most uses of __GFP_NOFAIL. Also, do some more cleanup of gfp flags w.r.t. btree node locks: we can use GFP_KERNEL, but only if we're not holding btree locks, and if we are holding btree locks we should be using GFP_NOWAIT. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix corruption with writeable snapshotsKent Overstreet3-91/+99
When partially overwriting an extent in an older snapshot, the existing extent has to be split. If the existing extent was overwritten in a different (sibling) snapshot, we have to ensure that the split won't be visible in the sibling snapshot. data_update.c already has code for this, bch2_insert_snapshot_writeouts() - we just need to move it into btree_update_leaf.c and change bch2_trans_update_extent() to use it as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert -ENOENT to private error codesKent Overstreet18-43/+59
As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: trans_for_each_path_safe()Kent Overstreet3-5/+39
bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a quota read bugKent Overstreet1-1/+8
bch2_fs_quota_read() could see an inode that's been deleted (KEY_TYPE_inode_generation) - bch2_fs_quota_read_inode() needs to check for that instead of erroring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix move_extent_fail counterKent Overstreet1-1/+1
fail counters need to be events, not numbers of sectors - or the calculations the tests use for determining if we've had too many slowpath events don't work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't reuse reflink btree keyspaceKent Overstreet2-17/+4
We've been seeing difficult to debug "missing indirect extent" bugs, that fsck doesn't seem to find. One possibility is that there was a missing indirect extent, but then a new indirect extent was created at the location of the previous indirect extent. This patch eliminates that possibility by always creating new indirect extents right after the last one, at the end of the reflink btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22mean and variance: Add a missing includeKent Overstreet1-0/+1
abs() is in math.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22mean and variance: More testsKent Overstreet3-17/+102
Add some more tests that test conventional and weighted mean simultaneously, and with a table of values that represents events that we'll be using this to look for so we can verify-by-eyeball that the output looks sane. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Disable percpu read lock mode in userspaceKent Overstreet1-0/+6
When running in userspace, we currently don't have a real percpu implementation available - at least in bcachefs-tools, which is where this code is currently used in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Use atomic_try_cmpxchg_acquire()Kent Overstreet1-11/+6
This switches to a newer cmpxchg variant which updates @old for us on failure, simplifying the cmpxchg loops a bit and supposedly generating better code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Fix an unitialized varKent Overstreet1-2/+1
In the conversion to atomic_t, six_lock_slowpath() ended up calling six_lock_wakeup() in the failure path with a state variable that was never initialized - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Delete redundant commentKent Overstreet1-11/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Tiny bit more tidyingKent Overstreet1-34/+30
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Seq now only incremented on unlockKent Overstreet4-16/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Split out seq, use atomic_t instead of atomic64_tKent Overstreet2-71/+58
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>