aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/alloc_background.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Must check for errors from bch2_trans_cond_resched()Kent Overstreet1-2/+0
But we don't need to call it from outside the btree iterator code anymore, since it's called by bch2_trans_begin() and bch2_btree_path_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix allocator shutdown error messageKent Overstreet1-2/+2
We return 1 to indicate kthread_should_stop() returned true - we shouldn't be printing an error. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet1-27/+28
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always check for transaction restartsKent Overstreet1-1/+1
On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add safe versions of varint encode/decodeKent Overstreet1-2/+2
This adds safe versions of bch2_varint_(encode|decode) that don't read or write past the end of the buffer, or varint being encoded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add open_buckets to sysfsKent Overstreet1-0/+19
This is to help debug a rare shutdown deadlock in the allocator code - the btree code is leaking open_buckets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for errors from bch2_trans_update()Kent Overstreet1-4/+5
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs; Check for allocator thread shutdownKent Overstreet1-0/+5
We were missing a kthread_should_stop() check in the loop in bch2_invalidate_buckets(), very occasionally leading to us getting stuck while shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix some refcounting bugsKent Overstreet1-1/+1
We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Clean up bch2_btree_and_journal_walk()Kent Overstreet1-8/+5
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allocator refactoringKent Overstreet1-248/+134
This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Handle errors in bch2_trans_mark_update()Kent Overstreet1-0/+2
It's not actually the case that iterators are always checked here - __bch2_trans_commit() checks for that after running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allocator thread doesn't need gc_lock anymoreKent Overstreet1-21/+5
Even with runtime gc (which currently isn't supported), runtime gc no longer clears/recalculates the main set of bucket marks - it allocates and calculates another set, updating the primary at the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: gc shouldn't care about owned_by_allocatorKent Overstreet1-3/+2
The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_trans_mark_dev_sb()Kent Overstreet1-3/+3
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocatorKent Overstreet1-10/+9
It used to be necessary for the allocator thread to batch up invalidating buckets when possible - but since we added the btree key cache that hasn't been a concern, and now it's causing the allocator thread to livelock when the filesystem is nearly full. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't unconditially version_upgrade in initializeKent Overstreet1-23/+1
This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet1-2/+2
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for more enumsKent Overstreet1-4/+4
This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet1-5/+5
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add allocator thread state to sysfsKent Overstreet1-9/+16
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rip out copygc pd controllerKent Overstreet1-35/+0
We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix copygc thresholdKent Overstreet1-15/+4
Awhile back the meaning of is_available_bucket() and thus also bch_dev_usage->buckets_unavailable changed to include buckets that are owned by the allocator - this was so that the stat could be persisted like other allocation information, and wouldn't have to be regenerated by walking each bucket at mount time. This broke copygc, which needs to consider buckets that are reclaimable and haven't yet been grabbed by the allocator thread and moved onta freelist. This patch fixes that by adding dev_buckets_reclaimable() for copygc and the allocator thread, and cleans up some of the callers a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a cond_seched() to the allocator threadKent Overstreet1-0/+1
This is just a band-aid fix for now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Create allocator threads when allocating filesystemKent Overstreet1-1/+14
We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add error message for some allocation failuresKent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Journal updates to dev usageKent Overstreet1-4/+0
This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Persist 64 bit io clocksKent Overstreet1-183/+42
Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_alloc_v2Kent Overstreet1-80/+167
This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an assertionKent Overstreet1-1/+2
If we're invalidating a bucket that has cached data in it, data_type won't be 0 - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Mark superblocks transactionallyKent Overstreet1-29/+17
More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bch2_invalidate_bucket()Kent Overstreet1-16/+14
This patch is working towards eventually getting rid of the in memory struct bucket, and relying only on the btree representation. Since bch2_invalidate_bucket() was only used for incrementing gens, not invalidating cached data, no other counters were being changed as a side effect - meaning it's safe for the allocator code to increment the bucket gen directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dev usageKent Overstreet1-20/+22
This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_alloc_write() should be writing for all devicesKent Overstreet1-1/+1
Alloc info isn't stored on a particular device, it makes no sense to only be writing it out for rw members - this was causing fsck to not fix alloc info errors, oops. Also, make sure we write out alloc info in other repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet1-7/+3
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't write bucket IO time lazilyKent Overstreet1-0/+48
With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Move journal reclaim to a kthreadKent Overstreet1-1/+1
This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for bad stripe pointersKent Overstreet1-2/+0
The allocator usually doesn't increment bucket gens right away on buckets that it's about to hand out (for reasons that need to be documented), instead deferring that to whatever extent update first references that bucket. But stripe pointers reference buckets without changing bucket sector counts, meaning we could end up with a pointer in a stripe with a gen newer than the bucket it points to. Fix this by adding a transactional trigger for KEY_TYPE_stripe that just writes out the keys in the alloc btree for the buckets it points to. Also - consolidate the code that checks pointer validity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Start/stop io clock hands in read/write pathsKent Overstreet1-12/+0
This fixes a bug where the clock hands in the journal and superblock didn't match, because we were still incrementing the read clock hand while read-only. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improvements to writing alloc infoKent Overstreet1-39/+29
Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Perf improvements for bch_alloc_read()Kent Overstreet1-4/+22
On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a cond_resched() to bch2_alloc_write()Kent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't let copygc buckets be stolen by other threadsKent Overstreet1-1/+3
And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete unused argumentsKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make copygc thread globalKent Overstreet1-11/+9
Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for data typesKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a kthread_should_stop() check to allocator threadKent Overstreet1-0/+2
Turns out it's possible during shutdown for the allocator to get stuck spinning on bch2_invalidate_buckets() without hitting any of the other checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Increase size of btree node reserveKent Overstreet1-3/+6
Also tweak the allocator to be more aggressive about keeping it full. The recent changes to make updates to interior nodes transactional (and thus generate updates to the alloc btree) all put more stress on the btree node reserves. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use cached iterators for alloc btreeKent Overstreet1-37/+14
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always increment bucket gen on bucket reuseKent Overstreet1-10/+28
Not doing so confuses copygc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>