aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/bcachefs_format.h
AgeCommit message (Collapse)AuthorFilesLines
2024-06-20bcachefs: Fix safe errors by defaultKent Overstreet1-2/+3
i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet1-0/+3
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix btree ID bitmasksKent Overstreet1-2/+3
these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out sb-errors_format.hKent Overstreet1-11/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out journal_seq_blacklist_format.hKent Overstreet1-10/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out replicas_format.hKent Overstreet1-32/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out disk_groups_format.hKent Overstreet1-18/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: split out sb-downgrade_format.hKent Overstreet1-13/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: split out sb-members_format.hKent Overstreet1-101/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix shift overflow in btree_lost_data()Kent Overstreet1-0/+6
Reported-by: syzbot+29f65db1a5fe427b5c56@syzkaller.appspotmail.com Fixes: 55936afe1107 ("bcachefs: Flag btrees with missing data") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.hPetr Vorel1-1/+2
Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch_member.last_journal_bucketKent Overstreet1-0/+7
On recovery from clean shutdown we don't typically read the journal, but we still want to avoid overwriting existing entries in the journal for list_journal debugging. Thus, add some fields to the member info section so we can remember where we left off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAXKent Overstreet1-0/+2
Define a constant for the max superblock size, to avoid a too-large shift. Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Add a better limit for maximum number of bucketsKent Overstreet1-0/+6
The bucket_gens array is a single array allocation (one byte per bucket), and kernel allocations are still limited to INT_MAX. Check this limit to avoid failing the bucket_gens array allocation. Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: KEY_TYPE_error is allowed for reflinkKent Overstreet1-1/+2
KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: bch_member.btree_allocated_bitmapKent Overstreet1-2/+5
This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet1-0/+14
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet1-0/+1
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: omit alignment attribute on big endian struct bkeyThomas Bertschinger1-2/+35
This is needed for building Rust bindings on big endian architectures like s390x. Currently this is only done in userspace, but it might happen in-kernel in the future. When creating a Rust binding for struct bkey, the "packed" attribute is needed to get a type with the correct member offsets in the big endian case. However, rustc does not allow types to have both a "packed" and "align" attribute. Thus, in order to get a Rust type compatible with the C type, we must omit the "aligned" attribute in C. This does not affect the struct's size or member offsets, only its toplevel alignment, which should be an acceptable impact. The little endian version can have the "align" attribute because the "packed" attr is redundant, and rust-bindgen will omit the "packed" attr when an "align" attr is present and it can do so without changing a type's layout Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: BTREE_ID_subvolume_childrenKent Overstreet1-2/+5
Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: bch_subvolume::fs_path_parentKent Overstreet1-1/+2
Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: jset_entry_datetimeKent Overstreet1-1/+7
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: logged_ops_format.hKent Overstreet1-27/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: reflink_format.hKent Overstreet1-47/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; extents_format.hKent Overstreet1-279/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: ec_format.hKent Overstreet1-16/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: subvolume_format.hKent Overstreet1-32/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: snapshot_format.hKent Overstreet1-33/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: alloc_background_format.hKent Overstreet1-93/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: xattr_format.hKent Overstreet1-15/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: dirent_format.hKent Overstreet1-39/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: inode_format.hKent Overstreet1-164/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; quota_format.hKent Overstreet1-42/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: sb-counters_format.hKent Overstreet1-95/+2
bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: comment bch_subvolumeKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet1-0/+1
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet1-56/+26
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: bch_member->seqKent Overstreet1-2/+6
Add new fields for split brain detection: - bch_member->seq, which tracks the sequence number of the last superblock write that happened to each member device - bch_sb->write_time, which tracks the time of the last superblock write, to allow detection of when two members have diverged but had the same number of superblock writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Check journal entries for invalid keys in trans commit pathKent Overstreet1-0/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix userspace build errorsKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree write buffer now slurps keys from journalKent Overstreet1-1/+2
Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve btree write buffer tracepointsKent Overstreet1-1/+3
- add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill dev_usage->buckets_ecKent Overstreet1-2/+2
This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1Kent Overstreet1-3/+3
Prep work for introducing bch_replicas_entry_v2 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill memset() in bch2_btree_iter_init()Kent Overstreet1-0/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb_field_downgradeKent Overstreet1-1/+19
Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredKent Overstreet1-13/+20
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: trace_move_extent_start_fail() now includes errcodeKent Overstreet1-1/+1
Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25bcachefs: bpos is misaligned on big endianKent Overstreet1-1/+5
bkey embeds a bpos that is misaligned on big endian; this is so that bch2_bkey_swab() works correctly without having to differentiate between packed and non-packed keys (a debatable design decision). This means it can't have the __aligned() tag on big endian. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>