aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Btree key cacheKent Overstreet14-45/+787
This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Implement a new gc that only recalcs oldest genKent Overstreet4-0/+92
Full mark and sweep gc doesn't (yet?) work with the new btree key cache code, but it also blocks updates to interior btree nodes for the duration and isn't really necessary in practice; we aren't currently attempting to repair errors in allocation info at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Turn c->state_lock into an rwsemKent Overstreet7-57/+50
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an internal option for reading entire journalKent Overstreet4-22/+44
To be used the debug tool that dumps the contents of the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't deadlock when btree node reuse changes lock orderingKent Overstreet4-16/+62
Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a deadlockKent Overstreet4-32/+67
__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for btree node lock ordering, this caused an off by one error that was triggered by bch2_btree_node_get_sibling() getting the previous node. This refactors the code to compare against btree node keys directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor btree insert pathKent Overstreet1-52/+38
This splits out the journalling code from the btree update code; prep work for the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always give out journal pre-res if we already have oneKent Overstreet3-15/+30
This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More open bucketsKent Overstreet3-11/+17
We need a larger open bucket reserve now that the btree interior update path holds onto open bucket references; filesystems with many high through devices may need more open buckets now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't allocate memory under the btree cache lockKent Overstreet1-29/+58
The btree cache lock is needed for reclaiming from the btree node cache, and memory allocation can potentially spin and sleep (for 100 ms at a time), so.. don't do that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a linked list bugKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make open bucket reserves more conservativeKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_update_nodes_written() requires alloc reserveKent Overstreet1-3/+6
Also, in the btree_update_start() path, if we already have a journal pre-reservation we don't want to take another - that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()Kent Overstreet1-1/+1
bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as we're a filesystem now GFP_NOFS makes more sense now and is looser. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Call bch2_btree_iter_traverse() if necessary in commit pathKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_downgrade()Kent Overstreet3-24/+22
bch2_btree_iter_downgrade() was looping over all iterators in a transaction; bch2_trans_downgrade() should be doing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve warning for copygc failing to move dataKent Overstreet3-3/+20
This will help narrow down which code is at fault when this happens. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always increment bucket gen on bucket reuseKent Overstreet2-21/+47
Not doing so confuses copygc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill old allocator startup codeKent Overstreet5-257/+1
It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve assorted error messagesKent Overstreet5-136/+127
This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()Kent Overstreet4-11/+29
There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(), which can leave a btree node locked that is just outside iter->pos, breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally we should get rid of this corner case, but for now fix it locally with verbose comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add debug code to print btree transactionsKent Overstreet8-5/+92
Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Set filesystem features earlier in fs init pathKent Overstreet1-5/+9
Before we were setting features after allocating btree nodes, which meant we were using the old btree pointer format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option to disable reflink supportKent Overstreet4-0/+13
Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fixes for going ROKent Overstreet5-33/+60
Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't require alloc btree to be updated before buckets are usedKent Overstreet2-16/+42
This is to break a circular dependency in the shutdown path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fsck_error_lock requires GFP_NOFSKent Overstreet1-1/+1
this fixes a lockdep splat Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Interior btree updates are now fully transactionalKent Overstreet21-626/+412
We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Factor out bch2_fs_btree_interior_update_init()Kent Overstreet3-11/+24
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a mechanism for passing extra journal entries to ↵Kent Overstreet4-4/+25
bch2_trans_commit() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix reading of alloc info after unclean shutdownKent Overstreet6-51/+114
When updates to interior nodes started being journalled, that meant that after an unclean shutdown, until journal replay is done we can't walk the btree without overlaying the updates from the journal. The initial btree gc was changed to walk the btree overlaying keys from the journal - but bch2_alloc_read() and bch2_stripes_read() were missed. Major whoops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix memalloc_nofs_restore() usageKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Better error messages on bucket sector count overflowsKent Overstreet2-17/+26
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Be more rigorous about marking the filesystem cleanKent Overstreet2-3/+13
Previously, there was at least one error path where we could mark the filesystem clean when we hadn't sucessfully written out alloc info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Handle printing of null bkeysKent Overstreet1-7/+14
This fixes a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add vmalloc fallback for decompress workspaceKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Print out d_type in dirent_to_text()Kent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix stack corruptionYuxuan Shui4-10/+11
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no guarantee that it will be big enough to hold the bkey. And bch_read_indirect_extent is not aware of bkey_on_stack to call realloc on it. This cause a stack corruption. This commit makes bch_read_indirect_extent aware of bkey_on_stack so it can call realloc when appropriate. Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Wrap vmap() in memalloc_nofs_save()/restore()Kent Overstreet1-1/+5
vmalloc() and vmap() don't take GFP_NOFS - this should be pushed further up the IO path, but for now just doing the simple fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix another iterator counting bugKent Overstreet1-1/+2
We were marking the end of where we could insert incorrectly for indirect extents. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix setquotaKent Overstreet1-29/+30
We were returning -EINTR because we were failing to retry the btree transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a workqueue deadlockKent Overstreet2-2/+28
writes running out of a workqueue (via dio path) could block and prevent other writes from calling bch2_write_index() and completing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Validate that we read the correct btree nodeKent Overstreet1-0/+11
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fixes for startup on very full filesystemsKent Overstreet2-3/+16
- Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys - Don't strand buckest on the copygc freelist until after recovery is done and we're starting copygc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix initialization of bounce mempoolsKent Overstreet1-9/+9
When they were converted to kvpmalloc pools they weren't converted to pass the actual size of the allocation. Oops. Also, validate the real length in the zstd decompression path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Some compression improvementsKent Overstreet1-16/+37
In __bio_map_or_bounce(), the check for if the bio is physically contiguous is improved; it's now more readable and handles multi page but contiguous bios. Also when decompressing, we were doing a redundant memcpy in the case where we were able to use vmap to map a bio contigiously. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix two more deadlocksKent Overstreet2-53/+64
Deadlock on shutdown: btree_update_nodes_written() unblocks btree nodes from being written; after doing so, it has to check if they were marked as needing to be written and if so kick off those writes - if that doesn't happen, we'll never release journal pins and shutdown will get stuck when flushing the journal. There was an error path where this didn't happen, because in the error path we don't actually want those btree nodes write to happen; however, we still have to kick off the write path so the journal pins get released. The btree write path checks if we're in a journal error state and doesn't do the actual write if we are. Also - there was another deadlock because btree_update_nodes_written() was taking the btree update off of the unwritten_list too soon - before getting a journal reservation, which could fail and have to be retried. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix another deadlock in btree_update_nodes_written()Kent Overstreet1-3/+38
We also can't be blocking on btree node write locks while holding btree_interior_update_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add some printks for error pathsKent Overstreet2-7/+20
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't issue writes that are more than 1 MBKent Overstreet1-1/+12
the bcachefs io path in io.c can't bounce writes larger than that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>