aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/alloc_foreground.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: EINTR -> BCH_ERR_transaction_restartKent Overstreet1-7/+10
Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Prevent a btree iter overflow in alloc pathKent Overstreet1-0/+1
In bch2_bucket_alloc_trans(), we're iterating over buckets - but not directly with an iterator, since we're iterating over the freespace btree. This means that we need to clear iter->path->preserve, otherwise we'll end up retaining a btree_path for every alloc key we touched - which is not what we want here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improved errcodesKent Overstreet1-19/+20
Instead of overloading standard error codes (EINTR/EAGAIN), and defining short lists of error codes in multiple places that potentially end up overlapping & conflicting, we're now going to have one master list of error codes. Error codes are defined with an x-macro: thus we also have bch2_err_str() now. Also, error codes have a class field. Now, instead of checking for errors with ==, code should use bch2_err_matches(), which returns true if the error is equal to or a sub-error of the error class. This means we can define unique errors for every source location where an error is generated, which will help improve our error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bucket_alloc_fail tracepointKent Overstreet1-3/+12
We should be printing the number of free buckets, not just the number of available buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out dev_buckets_free()Kent Overstreet1-1/+1
Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Printbuf reworkKent Overstreet1-11/+11
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve bch2_open_buckets_to_text()Kent Overstreet1-3/+3
This patch updates bch2_open_buckets_to_text() to include the device and bucket the open_bucket owns. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fold bucket_state in to BCH_DATA_TYPES()Kent Overstreet1-16/+29
Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill allocator threads & freelistsKent Overstreet1-93/+468
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Run btree updates after write out of write_pointKent Overstreet1-16/+51
In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: x-macroize alloc_reserve enumKent Overstreet1-10/+17
This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill verify_not_stale()Kent Overstreet1-18/+0
This is ancient code that's more effectively checked in other places now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: New in-memory array for bucket gensKent Overstreet1-2/+2
The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Put open_buckets in a hashtableKent Overstreet1-2/+28
This is so that the copygc code doesn't have to refer to bucket_mark.owned_by_allocator - assisting in getting rid of the in memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Refactor open_bucket codeKent Overstreet1-35/+65
Prep work for adding a hash table of open buckets - instead of embedding a bch_extent_ptr, we need to refer to the bucket directly so that we're not calling sector_to_bucket() in the hash table lookup code, which has an expensive divide. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_alloc_sectors_append_ptrs() now takes cached flagKent Overstreet1-6/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rewrite bch2_bucket_alloc_new_fs()Kent Overstreet1-14/+8
This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make sure bch2_bucket_alloc_new_fs() obeys buckets_nouseKent Overstreet1-0/+1
This fixes the filesystem migrate tool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert bucket_alloc_ret to negative error codesKent Overstreet1-19/+16
Start a new header, errcode.h, for bcachefs-private error codes - more error codes will be converted later. This patch just converts bucket_alloc_ret so that they can be mixed with standard error codes and passed as ERR_PTR errors - the ec.c code was doing this already, but incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Allocator refactoringKent Overstreet1-45/+2
This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: gc shouldn't care about owned_by_allocatorKent Overstreet1-2/+1
The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an RCU splatKent Overstreet1-3/+6
Writepoints are never deallocated so the rcu_read_lock() isn't really needed, but we are doing lockless list traversal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix copygc thresholdKent Overstreet1-1/+4
Awhile back the meaning of is_available_bucket() and thus also bch_dev_usage->buckets_unavailable changed to include buckets that are owned by the allocator - this was so that the stat could be persisted like other allocation information, and wouldn't have to be regenerated by walking each bucket at mount time. This broke copygc, which needs to consider buckets that are reclaimable and haven't yet been grabbed by the allocator thread and moved onta freelist. This patch fixes that by adding dev_buckets_reclaimable() for copygc and the allocator thread, and cleans up some of the callers a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dev usageKent Overstreet1-10/+9
This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework allocating buckets for stripesKent Overstreet1-7/+14
Allocating buckets for existing stripes was busted, in part because the data structures were too contorted. This reworks new stripes so that we have an array of open buckets that matches blocks in the stripe, and it's sparse if we're reusing an existing stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reserve some open buckets for btree allocationsKent Overstreet1-1/+5
This reverts part of the change from "bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much" - it turns out we still should be reserving open buckets for btree node allocations, because otherwise data bucket allocations (especially with erasure coding enabled) can use up all our open buckets and we won't be able to do the metadata update that lets us release those open bucket references. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use separate new stripes for copygc and non-copygcKent Overstreet1-1/+3
Allocations for copygc have to be kept separate from everything else, so that copygc doesn't get starved. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change allocations for ec stripes to blockingKent Overstreet1-17/+25
We don't want writes to not get erasure coded just because the allocator temporarily wasn't keeping up. However, it's not guaranteed that these allocations will ever succeed, we can currently get stuck - especially if devices are different sizes - we still have work to do in this area. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet1-13/+1
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't write bucket IO time lazilyKent Overstreet1-2/+0
With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure we only allocate one EC bucket per writepointKent Overstreet1-11/+15
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't let copygc buckets be stolen by other threadsKent Overstreet1-16/+30
And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete unused argumentsKent Overstreet1-3/+3
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't restrict copygc writes to the same deviceKent Overstreet1-45/+47
This no longer makes any sense, since copygc is now one thread per filesystem, not per device, with a single write point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make copygc thread globalKent Overstreet1-2/+3
Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for data typesKent Overstreet1-9/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor stripe creationKent Overstreet1-88/+15
Prep work for the patch to update existing stripes with new data blocks. This moves allocating new stripes into ec.c, and also sets up the data structures so that we can handly only allocating some of the blocks in a stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Move stripe creation to workqueueKent Overstreet1-1/+1
This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make open bucket reserves more conservativeKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop unused arg to bch2_open_buckets_stop_dev()Kent Overstreet1-3/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix promoting to cache devices (durability = 0)Kent Overstreet1-30/+48
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add more time stats for being blocked on allocatorKent Overstreet1-0/+21
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a mechanism for blocking the journalKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix some reserve calculationsKent Overstreet1-2/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an allocator error pathKent Overstreet1-5/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: correctly initialize bch_extent_ptrKent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: refactor bch_fs_usageKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: s/usage_lock/mark_lockKent Overstreet1-7/+7
better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make bkey types globally uniqueKent Overstreet1-6/+5
this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Erasure codingKent Overstreet1-72/+281
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>