aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Refactor bch2_fpunch_at()Kent Overstreet1-14/+9
This cleans up the error hanlding and flow control a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_assert_pos_locked()Kent Overstreet7-31/+88
This adds a new assertion to be used by bch2_inode_update_after_write(), which updates the VFS inode based on the update to the btree inode we just did - we require that the btree inode still be locked when we do that update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: path->should_be_locked fixesKent Overstreet2-10/+17
- We should only be clearing should_be_locked in btree_path_set_pos() - it's the responsiblity of the btree_path code, not the btree_iter code. - bch2_path_put() needs to pay attention to path->should_be_locked, to ensure we don't drop locks we're supposed to be keeping. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Clean up error reporting in the startup pathKent Overstreet1-90/+87
It used to be that error reporting in the startup path was done by returning strings describing the error, but that turned out to be a rather silly idea - if there's something we can describe about the error, just print it right away. This converts a good chunk of code to returning error codes, as is more typical style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Return -ENOKEY/EINVAL when mount decryption failsChris Webb1-16/+33
bch2_fs_encryption_init() correctly passes back -ENOKEY from request_key() when no unlock key is found, or -EINVAL if superblock decryption fails because of an invalid key. However, these get absorbed into a generic NULL return from bch2_fs_alloc() and later returned to user space as -ENOMEM, leading to a misleading error from mount(1): mount(2) system call failed: Out of memory. Return explicit error pointers out of bch2_fs_alloc() and handle them in both callers, so the user instead sees mount(2) system call failed: Required key not available. when attempting to mount a filesystem which is still locked. Signed-off-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix upgrade path for reflink_p fixKent Overstreet1-4/+8
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Switch fsync to use bi_journal_seqKent Overstreet9-96/+65
Now that we're recording in each inode the journal sequence number of the most recent update, fsync becomes a lot simpler and we can delete all the plumbing for ei_journal_seq. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bucket quantiles sysfs codeKent Overstreet1-90/+0
We're getting rid of code that uses the in memory bucket array - and we now have better mechanisms for viewing most of what the bucket quantiles code gave us (especially internal fragmentation). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill journal buf bloom filterKent Overstreet5-75/+0
This was used for recording which inodes have been modified by in flight journal writes, but was broken and has been superceded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add journal_seq to inode & alloc keysKent Overstreet14-186/+307
Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Update inode on every writeKent Overstreet1-36/+40
This is going to be a performance regression until we get the btree key cache re-enabled - but it's needed for fixing fsync. Upcoming patches will record the journal_seq an inode was updated at in the inode itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_UPDATE_NOJOURNALKent Overstreet2-2/+7
We're going to have btree updates that don't need to be journalled; add a flag for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __remove_dirent()Kent Overstreet1-1/+30
__lookup_inode() doesn't work for what __remove_dirent() wants - it just wants the first inode at a given inode number, they all have the same hash info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix check_inodes()Kent Overstreet1-3/+2
We were starting at the wrong btree position, and thus not actually checking any inodes - oops. Also, make check_key_has_snapshot() a mustfix fsck error, since later fsck code assumes that all keys have valid snapshot IDs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error message in bch2_write_super()Kent Overstreet1-1/+2
It's helpful to know what the superblock write is for. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix trans_lock_write()Kent Overstreet1-1/+2
On failure to get a write lock (because we had a conflicting read lock), we need to make sure to upgrade the read lock to an intent lock - or we could end up spinning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix upgrade_readers()Kent Overstreet3-3/+15
The bch2_btree_path_upgrade() call was failing and tripping an assert - path->level + 1 is in this case not necessarily exactly what we want, fix it by upgrading exactly the locks we want. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix faulty assertionKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_TRIGGER_INSERT now only means insertKent Overstreet3-25/+6
This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert bch2_mark_key() to take a btree_trans *Kent Overstreet10-111/+152
This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Assorted ec fixesKent Overstreet6-22/+32
- The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_mark_update()Kent Overstreet1-0/+6
When the old or new key doesn't exist, we should still pass in a deleted key with the correct pos. This fixes a bug in the ec code, when bch2_mark_stripe() was looking up the wrong in-memory stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure journal doesn't get stuck in nochanges modeKent Overstreet5-3/+10
This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve transaction restart handling in fsck codeKent Overstreet4-290/+291
The fsck code has been handling transaction restarts locally, to avoid calling fsck_err() multiple times (and asking the user/logging the error multiple times) on transaction restart. However, with our improving assertions about iterator validity, this isn't working anymore - the code wasn't entirely correct, in ways that are fine for now but are going to matter once we start wanting online fsck. This code converts much of the fsck code to handle transaction restarts in a more rigorously correct way - moving restart handling up to the top level of check_dirent, check_xattr and others - at the cost of logging errors multiple times on transaction restart. Fixing the issues with logging errors multiple times is probably going to require memoizing calls to fsck_err() - we'll leave that for future improvements. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_btree_iter_advance()Kent Overstreet1-1/+3
Was popping an assertion on !BTREE_ITER_ALL_SNAPSHOTS iters when getting to the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Move bch2_evict_subvolume_inodes() to fs.cKent Overstreet4-64/+61
This fixes building in userspace - code that's coupled to the kernel VFS interface should live in fs.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't do upgrades in nochanges modeKent Overstreet1-9/+11
nochanges mode is often used for getting data off of otherwise nonrecoverable filesystems, which is often because of errors hit during fsck. Don't force version upgrade & fsck in nochanges mode, so that it's more likely to mount. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Drop bch2_journal_meta() call when going RWKent Overstreet1-7/+0
Back when we relied on the journal sequence number blacklist machinery for consistency between btree and the journal, we needed to ensure a new journal entry was written before any btree writes were done. But, this had the side effect of consuming some space in the journal prior to doing journal replay - which could lead to a very wedged filesystem, since we don't yet have a way to grow the journal prior to going RW. Fortunately, the journal sequence number blacklist machinery isn't needed anymore, as btree node pointers now record the numer of sectors currently written to that node - that code should all be ripped out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add BCH_SUBVOLUME_UNLINKEDKent Overstreet12-51/+223
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve error messages in trans_mark_reflink_p()Kent Overstreet1-4/+7
We should always print out the key we were marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't run triggers in fix_reflink_p_key()Kent Overstreet1-1/+1
It seems some users have reflink pointers which span many indirect extents, from a short window in time when merging of reflink pointers was allowed. Now, we're seeing transaction path overflows in fix_reflink_p(), the code path to clear out the reflink_p fields now used for front/back pad - but, we don't actually need to be running triggers in that path, which is an easy partial fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More general fix for transaction paths overflowKent Overstreet1-2/+3
for_each_btree_key() now calls bch2_trans_begin() as needed; that means, we can also call it when we're in danger of overflowing transaction paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix fsck path for refink pointersKent Overstreet1-76/+46
The way __bch2_mark_reflink_p returns errors was clashing with returning the number of sectors processed - we weren't returning FSCK_ERR_EXIT correctly. Fix this by only using the return code for errors, which actually ends up simplifying the overall logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ensure we flush btree updates in evacuate pathKent Overstreet1-4/+4
This fixes a possible race where we fail to remove a device because of btree nodes still on it, that are being deleted by in flight btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_btree_node_rewrite() now returns transaction restartsKent Overstreet4-31/+37
We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix bch2_btree_iter_next_node()Kent Overstreet1-16/+34
We were modifying state, then return -EINTR, causing us to skip nodes - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Must check for errors from bch2_trans_cond_resched()Kent Overstreet7-25/+15
But we don't need to call it from outside the btree iterator code anymore, since it's called by bch2_trans_begin() and bch2_btree_path_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix dev accounting after device addKent Overstreet1-0/+12
This is a hacky but effective fix to device usage stats for superblock and journal being wrong on a newly added device (following the comment that already told us how it needed to be done!) Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a transaction path overflowKent Overstreet1-0/+9
readdir() in a directory with many subvolumes could overflow transaction paths - this is a simple hack around the issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix error handling in bch2_trans_extent_mergingKent Overstreet1-3/+6
The back merging case wasn't returning errors correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Subvol dirents are now only visible in parent subvolKent Overstreet6-129/+232
This changes the on disk format for dirents that point to subvols so that they also record the subvolid of the parent subvol, so that we can filter them out in other subvolumes. This also updates the dirent code to do that filtering, and in particular tweaks the rename code - we need to ensure that there's only ever one dirent (counting multiplicities in different snapshots) that point to a subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix restart handling in for_each_btree_key()Kent Overstreet11-65/+83
Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: cached data shouldn't prevent fs from mountingKent Overstreet1-0/+3
It's not an error if we don't have cached data - skip BCH_DATA_cached in bch2_have_enough_devs(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Delete dentry when deleting snapshotsKent Overstreet1-1/+8
This fixes a bug where subsequently doing creates with the same name fails. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix check_path() for snapshotsKent Overstreet1-19/+45
check_path() wasn't checking the snapshot ID when checking for directory structure loops - so, snapshots would cause us to detect a loop that wasn't actually a loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix for leaking of reflinked extentsKent Overstreet4-11/+58
When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: New on disk format to fix reflink_p pointersKent Overstreet3-11/+73
We had a bug where reflink_p pointers weren't being initialized to 0, and when we started using the second word, things broke badly. This patch revs the on disk format version and adds cleanup code to zero out the second word of reflink_p pointers before we start using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Handle transaction restarts in bch2_blacklist_entries_gc()Kent Overstreet3-6/+17
It shouldn't be necessary when we're only using a single iterator and not doing updates, but that's harder to debug at the moment. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_exit() no longer returns errorsKent Overstreet15-29/+31
Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: for_each_btree_node() now returns errors directlyKent Overstreet8-21/+39
This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>