aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-22bcachefs: Inode create no longer needs to probe key cacheKent Overstreet1-24/+4
Now that we have full key cache coherency, we can simplify bch2_inode_create(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_id_cached()Kent Overstreet1-10/+5
Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Simplify bch2_inode_delete_keys()Kent Overstreet1-35/+22
Had a bug report that implies bch2_inode_delete_keys() returned -EINTR before it completed, so this patch simplifies it and makes the flow control a little more conventional. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a null ptr deref in bch2_inode_delete_keys()Kent Overstreet1-1/+5
Similarly to bch2_btree_delete_range_trans(), bch2_inode_delete_keys() may sometimes split compressed extents, and needs to pass in a disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix debug build in userspaceKent Overstreet1-10/+0
This fixes some compiler warnings that only trigger in userspace - dead code, a maybe uninitialed variable, a maybe null ptr passed to printk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix missing field initializationKent Overstreet1-0/+1
When unpacking v1 inodes, we were failing to initialize the journal_seq field, leading to a BUG_ON() when fsync tries to flush a garbage journal sequence number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_assert_pos_locked()Kent Overstreet1-3/+3
This adds a new assertion to be used by bch2_inode_update_after_write(), which updates the VFS inode based on the update to the btree inode we just did - we require that the btree inode still be locked when we do that update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add journal_seq to inode & alloc keysKent Overstreet1-108/+103
Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add BCH_SUBVOLUME_UNLINKEDKent Overstreet1-5/+1
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Plumb through subvolume idKent Overstreet1-19/+90
To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Subvolumes, snapshotsKent Overstreet1-2/+13
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet1-31/+30
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add flags field to bch2_inode_to_text()Kent Overstreet1-6/+17
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always check for transaction restartsKent Overstreet1-1/+1
On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add an option for whether inodes use the key cacheKent Overstreet1-7/+10
We probably don't ever want to flip this off in production, but it may be useful for certain kinds of testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add safe versions of varint encode/decodeKent Overstreet1-3/+3
This adds safe versions of bch2_varint_(encode|decode) that don't read or write past the end of the buffer, or varint being encoded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Kill bch2_btree_iter_peek_cached()Kent Overstreet1-10/+7
It's now been rolled into bch2_btree_iter_peek_slot() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Preallocate transaction memKent Overstreet1-1/+1
This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for errors from bch2_trans_update()Kent Overstreet1-5/+3
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix pathalogical behaviour with inode sharding by cpu IDKent Overstreet1-3/+1
If the transactior restarts on a different CPU, it could end up needing to read in a different btree node, which makes another transaction restart more likely... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add an option to control sharding new inode numbersKent Overstreet1-7/+14
We're seeing a bug where inode creates end up spinning in bch2_inode_create - disabling sharding will simplify what we're testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: properly initialize used valuesDan Robertson1-1/+1
- Ensure the second key value in bch_hash_info is initialized to zero if the info type is of type BCH_STR_HASH_SIPHASH. - Initialize the possibly returned value in bch2_inode_create. Assuming bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value of ret could be returned to the user as an error pointer. - Fix compiler warning in initialization of bkey_s_c_stripe fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct bkey_s_c_stripe new_s = { NULL }; ^~~~ Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved check_directory_structure()Kent Overstreet1-26/+5
Now that we have inode backpointers, we can simplify checking directory structure: instead of doing a DFS from the filesystem root and then checking if we found everything, we can iterate over every inode and see if we can go up until we get to the root. This patch also has a number of fixes and simplifications for the inode backpointer checks. Also, it turns out we don't actually need the BCH_INODE_BACKPTR_UNTRUSTED flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change inode allocation code for snapshotsKent Overstreet1-23/+55
For snapshots, when we allocate a new inode we want to allocate an inode number that isn't in use in any other subvolume. We won't be able to use ITER_SLOTS for this, inode allocation needs to change to use BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode backpointersKent Overstreet1-13/+5
This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Start using bpos.snapshot fieldKent Overstreet1-0/+1
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve inode deletion codeKent Overstreet1-31/+14
It had some silly redundancies. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet1-0/+1
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet1-9/+9
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use inode btree key cache in fsck codeKent Overstreet1-4/+15
We had a cache coherency bug with the btree key cache in the fsck code - this fixes fsck to be consistent about not using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a shift greater than type sizeKent Overstreet1-1/+1
Found by UBSAN Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet1-6/+0
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_btree_delete_range_trans()Kent Overstreet1-10/+10
This helps reduce stack usage by avoiding multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use bkey cache for inode update in fsckKent Overstreet1-4/+10
fsck doesn't know about the btree key cache, and non-cached iterators aren't cache coherent (yet?) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode delete doesn't need to flush key cache anymoreKent Overstreet1-9/+2
Inode create checks to make sure the slot doesn't exist in the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree iterator leakKent Overstreet1-1/+3
this fixes an occasonial btree transaction iterators overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New varintsKent Overstreet1-54/+133
Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved inode create optimizationKent Overstreet1-92/+47
This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode create optimizationKent Overstreet1-44/+93
On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use cached iterators for inode updatesKent Overstreet1-40/+64
This switches inode updates to use cached btree iterators - which should be a nice performance boost, since lock contention on the inodes btree can be a bottleneck on multithreaded workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add mode to bch2_inode_to_textKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bkey_type_successorKent Overstreet1-16/+16
Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make sure we're releasing btree iteratorsKent Overstreet1-36/+24
This wasn't originally required, but this is the model we're moving towards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out btree_trigger_flagsKent Overstreet1-3/+3
The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_ATOMICKent Overstreet1-1/+0
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMICKent Overstreet1-1/+1
BTREE_INSERT_ATOMIC should really be the default mode, and there's not that much code that doesn't need it - so this is prep work for getting rid of the flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Initialize padding space after alloc bkeyJustin Husted1-4/+4
Packed bkeys are padded up to 64 bit alignment, but the alloc bkey type was not clearing the pad bytes after the last data byte. This left the key possibly containing some random garbage at the end. This problem was found using valgrind. This patch also changes a path with the inode bkey to clear in the same way. Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add missing error checking in bch2_find_by_inum_trans()Kent Overstreet1-3/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Factor out fs-common.cKent Overstreet1-14/+25
This refactoring makes the code easier to understand by separating the bcachefs btree transactional code from the linux VFS code - but more importantly, it's also to share code with the fuse port. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use sha256 for siphash str hash keyKent Overstreet1-3/+4
With the refactoring that's coming to add fuse support, we want bch2_hash_info_init() to be cheaper so we don't have to rely on anything cached besides the inode in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>