Age | Commit message (Collapse) | Author | Files | Lines |
|
bucket_gen() checks if we're lookup up a valid bucket and returns NULL
otherwise, but bucket_gen_get() was failing to check; other callers were
correct.
Also do a bit of cleanup on callers.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
bch2_folio_reservation_get_partial(), on partial success, will now
return a reservation that's aligned to the filesystem blocksize.
This is a partial fix for fstests generic/299 - fio verify is badly
behaved in the presence of short writes that aren't aligned to its
blocksize.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
A user with a 30 tb device is overflowing the INT_MAX limit on vmalloc
allocations...
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add a line for capacity
Signed-off-by: Kent Overstreet <[email protected]>
|
|
gc_lock is now only for synchronization between check_alloc_info and
interior btree updates - nothing else
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Rewrite fsck/gc for the new accounting scheme.
This adds a second set of in-memory accounting counters for gc to use;
like with other parts of gc we run all trigger in TRIGGER_GC mode, then
compare what we calculated to existing in-memory accounting at the end.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
More dead code deletion.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Dead code.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
More deletion of dead code.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
With bch2_ioctl_fs_usage(), this is now dead code.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Deleting code for the old disk accounting scheme.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Main part of the disk accounting rewrite.
This is a wholesale rewrite of the existing disk space accounting, which
relies on percepu counters that are sharded by journal buffer, and
rolled up and added to each journal write.
With the new scheme, every set of counters is a distinct key in the
accounting btree; this fixes scaling limitations of the old scheme,
where counters took up space in each journal entry and required multiple
percpu counters.
Now, in memory accounting requires a single set of percpu counters - not
multiple for each in flight journal buffer - and in the future we'll
probably also have counters that don't use in memory percpu counters,
they're not strictly required.
An accounting update is now a normal btree update, using the btree write
buffer path. At transaction commit time, we apply accounting updates to
the in memory counters, which are percpu counters indexed in an
eytzinger tree by the accounting key.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Use try_cmpxchg() family of functions instead of
cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).
Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.
No functional change intended.
Signed-off-by: Uros Bizjak <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Brian Foster <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Turn more asserts into proper recoverable error paths.
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The bucket_gens array and gc_buckets array known their own size; we
should be using those members, and returning an error.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
More elimination of bch2_dev_bkey_exists() usage.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Now explicitly allocate and free the buckets_nouse bitmap - this is
going to be used for online fsck.
To go RW when we haven't check allocations, we'll do a much slimmed down
version that just initializes the buckets_nouse bitmaps.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
by using bucket_m_to_alloc() more, we can get some nice code cleanup.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Currently, the reflink_p gc trigger does repair as well - turning a
reflink_p key into an error key if the reflink_v it points to doesn't
exist.
This won't work with online check/repair, because the repair path once
online will be subject to transaction restarts, but BTREE_TRIGGER_gc is
not idempotant - we can't run it multiple times if we get a transaction
restart.
So we need to split these paths; to do so this patch calls
check_fix_ptrs() by a new general path - a new trigger type,
BTREE_TRIGGER_check_repair.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
If we hit an inconsistency when updating allocation information, we
don't want to fail the update if it's for a deletion - only if it's for
a new key.
Rename check_bucket_ref() -> bucket_ref_update() so we can centralize
the logic to do this.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Some renaming for better consistency
bch2_member_exists -> bch2_member_alive
bch2_dev_exists -> bch2_member_exists
bch2_dev_exsits2 -> bch2_dev_exists
bch_dev_locked -> bch2_dev_locked
bch_dev_bkey_exists -> bch2_dev_bkey_exists
new helper - bch2_dev_safe
Signed-off-by: Kent Overstreet <[email protected]>
|
|
cut out a branch from doing it the obvious way
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.
These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Consolidate mark_superblock() and trans_mark_superblock(), like we did
with the other trigger paths.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.
One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The disk space accounting rewrite is splitting out accounting for each
replicas set - those are moving to btree keys, instead of percpu
counters.
This breaks bch2_trans_fs_usage_apply() up, splitting out the part we
will still need.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We need bounds checking since new versions may introduce new data types.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
now that type signatures are unified, redundant
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Upcoming rebalance_work btree will require extent triggers to be
BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion,
let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We can't mark device superblocks or allocate journal on a device that
isn't online.
That means we may need to do this on every mount, because we may have
formatted a new filesystem and then done the first mount
(bch2_fs_initialize()) in degraded mode.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
fsck_err() may sleep - it takes a mutex and may allocate memory, so
bucket_lock() needs to be a sleepable lock.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
bucket_lock() previously open coded a spinlock, because we need to cram
a spinlock into a single byte.
But it turns out not all archs support xchg() on a single byte; since we
need struct bucket to be small, this means we have to play fun games
with casts and ifdefs for endianness.
This fixes building on 32 bit arm, and likely other architectures.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
In general it's a good idea to avoid using bare unreachable() because it
introduces undefined behavior in compiled code. In this case it even
confuses GCC into emitting an empty unused
bch2_dev_buckets_reserved.part.0() function.
Use BUG() instead, which is nice and defined. While in theory it should
never trigger, if something were to go awry and the BCH_WATERMARK_NR
case were to actually hit, the failure mode is much more robust.
Fixes the following warnings:
vmlinux.o: warning: objtool: bch2_bucket_alloc_trans() falls through to next function bch2_reset_alloc_cursors()
vmlinux.o: warning: objtool: bch2_dev_buckets_reserved.part.0() is missing an ELF size annotation
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Split out a new file for bch_sb_field_members - we'll likely want to
move more code here in the future.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
bit of reorg
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Add another watermark for journal reclaim - this is needed for the next
patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK.
Signed-off-by: Kent Overstreet <[email protected]>
|