aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Fix race leading to btree node write getting stuckKent Overstreet5-13/+19
Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bch2_btree_node_write_cond()Kent Overstreet2-18/+16
bch2_btree_node_write_cond() was only used in one place - this inlines it into __btree_node_flush() and makes the cmpxchg loop actually correct. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve btree_node_write_if_need()Kent Overstreet4-24/+29
btree_node_write_if_need() kicks off a btree node write only if need_write is set; this makes the locking easier to reason about by moving the check into the cmpxchg loop in __bch2_btree_node_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix locking in btree_node_write_done()Kent Overstreet1-18/+7
There was a rare recursive locking bug, in __bch2_btree_node_write() nowrite path -> btree_node_write_done(), in the path that kicks off another write. This splits out an inner __btree_node_write_done() that expects to be run with the btree node lock held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Start moving debug info from sysfs to debugfsKent Overstreet7-71/+206
In sysfs, files can only output at most PAGE_SIZE. This is a problem for debug info that needs to list an arbitrary number of times, and because of this limit some of our debug info has been terser and harder to read than we'd like. This patch moves info about journal pins and cached btree nodes to debugfs, and greatly expands and improves the output we return. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve struct journal layoutKent Overstreet1-9/+12
This cacheline aligns struct journal, and puts j->reservations and j->prereserved on their own cacheline - we may want to split them up in a separate patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for btree node flagsKent Overstreet6-43/+41
This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill BCH_FS_HOLD_BTREE_WRITESKent Overstreet4-8/+2
This was just dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't spin in journal reclaimKent Overstreet1-1/+1
If we're not able to flush anything, we shouldn't keep looping. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix btree path sortingKent Overstreet2-1/+4
In btree_update_interior.c, we were changing a path's level directly - which affects path sort order - without re-sorting paths, leading to assertions when bch2_path_get() verified paths were sorted correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix journal_flush_done()Kent Overstreet1-1/+2
journal_flush_done() was overwriting did_work, thus occasionally returning false when it did do work and occasional assertions in the shutdown sequence because we didn't completely flush the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Heap allocate printbufsKent Overstreet28-620/+808
This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert bch2_pd_controller_print_debug() to a printbufKent Overstreet2-33/+43
Fewer random on-stack char arrays. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve debug assertionKent Overstreet1-2/+7
We're hitting a strange bug with transaction paths not being sorted correctly - this dumps transaction paths in the order we thought was sorted, which will hopefully shed some light as to what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_journal_pins_to_text()Kent Overstreet1-0/+4
When key cache pins were put onto their own list, we neglected to update bch2_journal_pins_to_text() to print them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always clear should_be_locked in bch2_trans_begin()Kent Overstreet1-1/+3
bch2_trans_begin() invalidates all iterators, until they're revalidated by calling peek() or traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Run alloc triggers lastKent Overstreet1-0/+17
Triggers can generate additional btree updates - we need to run alloc triggers after all other triggers have run, because they generate updates for the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Trigger code uses stashed copy of old keyKent Overstreet1-16/+16
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Consolidate trigger code a bitKent Overstreet3-147/+148
Upcoming patches are doing more work on the triggers code, this patch just moves code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_mark_key() now takes a bkey_i *Kent Overstreet4-59/+71
We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix 32 bit buildKent Overstreet4-8/+8
vstruct_bytes() was returning a u64 - it should be a size_t, the corect type for the size of anything that fits in memory. Also replace a 64 bit divide with div_u64(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve some btree node read error messagesKent Overstreet1-2/+3
On btree node read error, it's helpful to see what we were trying to read - was it all zeroes? Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use unlikely() in err_on() macrosKent Overstreet1-4/+4
Should be obviously a good thing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve reflink repair codeKent Overstreet1-14/+18
When a reflink pointer points to a missing indirect extent, we replace it with an error key. Instead of replacing the entire reflink pointer with an error key, this patch replaces only the missing range with an error key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Normal update/commit path now works before going RWKent Overstreet5-47/+61
This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Revert "Ensure journal doesn't get stuck in nochanges mode"Kent Overstreet5-10/+3
This patch was originally to work around the journal geting stuck in nochanges mode - but that was just a hack, we needed to fix the actual bug. It should be fixed now, so revert it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for journal getting stuckKent Overstreet2-2/+2
The journal can get stuck if we need to get a journal reservation for something we have a pre-reservation for, but aren't able to reclaim space, or if the pin fifo is full - it's impractical to resize the pin fifo at runtime. Previously, we reserved 8 entries in the pin fifo for pre-reservations, but that seems small - we're seeing the journal occasionally get stuck. Let's reserve a quarter of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Set BTREE_NODE_SEQ() correctly in merge pathKent Overstreet1-0/+4
BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes on disk, so that we can tell which btree node is newer if we ever have to scan the entire device to find btree nodes. The btree node merge path wasn't setting it correctly on the new node - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Drop journal_write_compact()Kent Overstreet1-45/+0
Long ago it was possible to get a journal reservation and not use it, but that's no longer allowed, which means journal_write_compact() has very little work to do, and isn't really worth the code anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Btree key cache optimizationKent Overstreet2-2/+11
This helps with lock contention in the journalling code: instead of updating our journal pin on every write, only get a journal pin if we don't have one. This means we can avoid hammering on journal locks nearly so much, at the cost of carrying around a journal pin for an older entry than the one we actually need. To handle that, if needed we update our journal pin to the correct one when flushed by journal reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add tabstops to printbufsKent Overstreet3-20/+84
Now, when outputting to printbufs, we can set tabstops and left or right justify text to them - this is to be used by the userspace 'bcachefs fs usage' command. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a use after freeKent Overstreet1-3/+1
In move_read_endio, we were checking if the next pending write has its read completed - but this can turn after a use after free (and we were accessing the list without a lock), so instead just better to just unconditionally do the wakeup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add .to_text() methods for all superblock sectionsKent Overstreet14-64/+588
This patch improves the superblock .to_text() methods and adds methods for all types that were missing them. It also improves printbufs by allowing them to specfiy what units we want to be printing in, and adds new wrapper methods for unifying our kernel and userspace environments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bch_scnmemcpy()Kent Overstreet6-29/+13
bch_scnmemcpy was for printing length-limited strings that might not have a terminating null - turns out sprintf & pr_buf can do this with %.*s. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't issue discards when in nochanges modeKent Overstreet2-2/+4
When the nochanges option is selected, we're supposed to never issue writes. Unfortunately, it seems discards were missed when implemnting this, leading to some painful filesystem corruption. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: opts.read_journal_onlyKent Overstreet2-0/+8
Add an option that tells recovery to only read the journal, to be used by the list_journal command. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Change __bch2_trans_commit() to run triggers then get RWKent Overstreet1-11/+11
This is prep work for the next patch, which is going to change __bch2_trans_commit() to use bch2_journal_key_insert() when very early in the recovery process, so that we have a unified interface for doing btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete some flag bits that are no longer usedKent Overstreet3-8/+0
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Store logical location of journal entriesKent Overstreet2-11/+24
When viewing what's in the journal, it's more useful to have the logical location - journal bucket and offset within that bucket - than just the offset on that device. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for errors from crypto_skcipher_encrypt()Kent Overstreet6-38/+85
Apparently it actually is possible for crypto_skcipher_encrypt() to return an error - not sure why that would be - but we need to replace our assertion with actual error handling. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix failure to allocate btree node in cacheKent Overstreet4-17/+23
The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Change bch2_dev_lookup() to not use lookup_bdev()Kent Overstreet1-8/+2
bch2_dev_lookup() is used from the extended attribute set methods, for setting the target options, where we're already holding an inode lock - it turns out pathname lookups also take inode locks, so that was susceptible to deadlocks. Fortunately we already stash the device name in ca->name. This does change user-visible behaviour though: instead of specifying e.g. /dev/sda1, user must now specify sda1. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Only allocate buckets_nouse when requestedKent Overstreet2-5/+12
It's only needed by the migrate tool - this patch adds an option to enable allocating it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Stale ptr cleanup is now done by gc_gensKent Overstreet1-45/+10
Before we had dedicated gc code for bucket->oldest_gen this was btree_gc's responsibility, but now that we have that we can rip it out, simplifying the already overcomplicated btree_gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve journal_entry_btree_keys_to_text()Kent Overstreet2-3/+31
This improves the formatting of journal_entry_btree_keys_to_text() by putting each key on its own line. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix __btree_path_traverse_allKent Overstreet1-10/+10
The loop that traverses paths in traverse_all() needs to be a little bit tricky, because traversing a path can cause other paths to be added (or perhaps removed) at about the same position. The old logic was buggy, replace it with simpler logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix slow tracepointsKent Overstreet1-8/+8
Some of our tracepoints were calling snprintf("pS") - which does symbol table lookups - in TP_fast_assign(), which turns out to be a really bad idea. This was done because perf trace wasn't correctly printing tracepoints that use %pS anymore - but it turns out trace-cmd does handle it correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for stale dirty pointer before readsKent Overstreet3-14/+54
Since we retry reads when we discover we read from a pointer that went stale, if a dirty pointer is erroniously stale it would cause us to loop retrying that read forever - unless we check before issuing the read, while the btree is still locked, when we know that a dirty pointer should never be stale. This patch adds that check, along with printing some helpful debug info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill verify_not_stale()Kent Overstreet1-18/+0
This is ancient code that's more effectively checked in other places now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix __bch2_btree_node_lockKent Overstreet1-30/+31
__bch2_btree_node_lock() was implementing the wrong lock ordering for cached vs. non cached paths - this fixes it to match the btree path sort order as defined by __btree_path_cmp(), and also simplifies the code some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>