aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2024-11-12bcachefs: Fix assertion pop in bch2_ptr_swab()Kent Overstreet1-1/+4
This runs on extents that haven't yet been validated, so we don't want to assert that we have a valid entry type. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-12bcachefs: Fix journal_entry_dev_usage_to_text() overrunKent Overstreet1-0/+3
If the jset_entry_dev_usage is malformed, and too small, our nr_entries calculation will be incorrect - just bail out. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-11bcachefs: Allow for unknown key types in backpointers fsckKent Overstreet1-4/+6
We can't assume that btrees only contain keys of a given type - even if they only have a single key type listed in the allowed key types for that btree; this is a forwards compatibility issue. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-11bcachefs: Fix assertion pop in topology repairKent Overstreet2-2/+3
Fixes: baefd3f849ed ("bcachefs: btree_cache.freeable list fixes") Signed-off-by: Kent Overstreet <[email protected]>
2024-11-08bcachefs: Fix hidden btree errors when reading rootsKent Overstreet2-0/+7
We silence btree errors in btree_node_scan, since it's probing and errors are expected: add a fake pass so that btree_node_scan is no longer recovery pass 0, and we don't think we're in btree node scan when reading btree roots. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-08bcachefs: Fix validate_bset() repair pathKent Overstreet1-5/+1
When we truncate a bset (due to it extending past the end of the btree node), we can't skip the rest of the validation for e.g. the packed format (if it's the first bset in the node). Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-08bcachefs: Fix missing validation for bch_backpointer.levelKent Overstreet2-2/+11
This fixes an assertion pop where we try to navigate to the target of the backpointer, and the path level isn't what we expect. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Fix bch_member.btree_bitmap_shift validationKent Overstreet2-2/+8
Needs to match the assert later when we resize... Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: bch2_btree_write_buffer_flush_going_ro()Kent Overstreet3-3/+29
The write buffer needs to be specifically flushed when going RO: keys in the journal that haven't yet been moved to the write buffer don't have a journal pin yet. This fixes numerous syzbot bugs, all with symptoms of still doing writes after we've got RO. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Fix UAF in __promote_alloc() error pathKent Overstreet1-1/+2
If we error in data_update_init() after adding to the rhashtable of outstanding promotes, kfree_rcu() is required. Reported-by: Reed Riley <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Change OPT_STR max to be 1 less than the size of choices arrayPiotr Zalewski1-2/+2
Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices" array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text. The "_choices" array is a null-terminated array, so computing the maximum using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values bigger than the actual maximum would pass through option validation. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6 Fixes: 63c4b2545382 ("bcachefs: Better superblock opt validation") Suggested-by: Kent Overstreet <[email protected]> Signed-off-by: Piotr Zalewski <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: btree_cache.freeable list fixesKent Overstreet3-48/+67
When allocating new btree nodes, we were leaving them on the freeable list - unlocked - allowing them to be reclaimed: ouch. Additionally, bch2_btree_node_free_never_used() -> bch2_btree_node_hash_remove was putting it on the freelist, while bch2_btree_node_free_never_used() was putting it back on the btree update reserve list - ouch. Originally, the code was written to always keep btree nodes on a list - live or freeable - and this worked when new nodes were kept locked. But now with the cycle detector, we can't keep nodes locked that aren't tracked by the cycle detector; and this is fine as long as they're not reachable. We also have better and more robust leak detection now, with memory allocation profiling, so the original justification no longer applies. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: check the invalid parameter for perf testHongbo Li1-0/+5
The perf_test does not check the number of iterations and threads when it is zero. If nr_thread is 0, the perf test will keep waiting for wakekup. If iteration is 0, it will cause exception of division by zero. This can be reproduced by: echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test or echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit") Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: add check NULL return of bio_kmalloc in journal_read_bucketPei Xiao2-0/+3
bio_kmalloc may return NULL, will cause NULL pointer dereference. Add check NULL return for bio_kmalloc in journal_read_bucket. Signed-off-by: Pei Xiao <[email protected]> Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recoveryKent Overstreet2-0/+13
If BCH_FS_may_go_rw is not yet set, it indicates to the transaction commit path that updates should be done via the list of journal replay keys. This must be set before multithreaded use commences. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Fix topology errors on split after mergeKent Overstreet1-3/+14
If a btree split picks a pivot that's being deleted by a btree node merge, we're going to have problems. Fix this by checking if the pivot is being deleted, the same as we check for deletions in journal replay keys. Found by single_devic.ktest small_nodes. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Ancient versions with bad bkey_formats are no longer supportedKent Overstreet1-4/+3
Syzbot found an assertion pop, by generating an ancient filesystem version with an invalid bkey_format (with fields that can overflow) as well as packed keys that aren't representable unpacked. This breaks key comparisons in all sorts of painful ways. Filesystems have been automatically rewriting nodes with such invalid formats for years; we can safely drop support for them. Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Fix error handling in bch2_btree_node_prefetch()Kent Overstreet1-2/+5
Signed-off-by: Kent Overstreet <[email protected]>
2024-11-07bcachefs: Fix null ptr deref in bucket_gen_get()Kent Overstreet4-18/+17
bucket_gen() checks if we're lookup up a valid bucket and returns NULL otherwise, but bucket_gen_get() was failing to check; other callers were correct. Also do a bit of cleanup on callers. Signed-off-by: Kent Overstreet <[email protected]>
2024-11-01Merge tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefsLinus Torvalds15-38/+160
Pull bcachefs fixes from Kent Overstreet: "Various syzbot fixes, and the more notable ones: - Fix for pointers in an extent overflowing the max (16) on a filesystem with many devices: we were creating too many cached copies when moving data around. Now, we only create at most one cached copy if there's a promote target set. Caching will be a bit broken for reflinked data until 6.13: I have larger series queued up which significantly improves the plumbing for data options down into the extent (bch_extent_rebalance) to fix this. - Fix for deadlock on -ENOSPC on tiny filesystems Allocation from the partial open_bucket list wasn't correctly accounting partial open_buckets as free: this fixes the main cause of tests timing out in the automated tests" * tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs: bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peek bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get() bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets bcachefs: Don't filter partial list buckets in open_buckets_to_text() bcachefs: Don't keep tons of cached pointers around bcachefs: init freespace inited bits to 0 in bch2_fs_initialize bcachefs: Fix unhandled transaction restart in fallocate bcachefs: Fix UAF in bch2_reconstruct_alloc() bcachefs: fix null-ptr-deref in have_stripes() bcachefs: fix shift oob in alloc_lru_idx_fragmentation bcachefs: Fix invalid shift in validate_sb_layout()
2024-10-29bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peekPiotr Zalewski1-0/+13
Add NULL check for key returned from bch2_btree_and_journal_iter_peek in btree_node_iter_and_journal_peek to avoid NULL ptr dereference in bch2_bkey_buf_reassemble. When key returned from bch2_btree_and_journal_iter_peek is NULL it means that btree topology needs repair. Print topology error message with position at which node wasn't found, its parent node information and btree_id with level. Return error code returned by bch2_topology_error to ensure that topology error is handled properly by recovery. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=005ef9aa519f30d97657 Fixes: 5222a4607cd8 ("bcachefs: BTREE_ITER_WITH_JOURNAL") Suggested-by: Alan Huang <[email protected]> Suggested-by: Kent Overstreet <[email protected]> Signed-off-by: Piotr Zalewski <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get()Gaosheng Cui2-0/+5
The function ec_new_stripe_head_alloc() returns nullptr if kzalloc() fails. It is crucial to verify its return value before dereferencing it to avoid a potential nullptr dereference. Fixes: 035d72f72c91 ("bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices") Signed-off-by: Gaosheng Cui <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open bucketsKent Overstreet2-1/+16
Open buckets on the partial list should not count as allocated when we're trying to allocate from the partial list. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: Don't filter partial list buckets in open_buckets_to_text()Kent Overstreet1-2/+1
these are an important source of stranded buckets we need to be able to watch Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: Don't keep tons of cached pointers aroundKent Overstreet5-28/+89
We had a bug report where the data update path was creating an extent that failed to validate because it had too many pointers; almost all of them were cached. To fix this, we have: - want_cached_ptr(), a new helper that checks if we even want a cached pointer (is on appropriate target, device is readable). - bch2_extent_set_ptr_cached() now only sets a pointer cached if we want it. - bch2_extent_normalize_by_opts() now ensures that we only have a single cached pointer that we want. While working on this, it was noticed that this doesn't work well with reflinked data and per-file options. Another patch series is coming that plumbs through additional io path options through bch_extent_rebalance, with improved option handling. Reported-by: Reed Riley <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: init freespace inited bits to 0 in bch2_fs_initializePiotr Zalewski1-0/+9
Initialize freespace_initialized bits to 0 in member's flags and update member's cached version for each device in bch2_fs_initialize. It's possible for the bits to be set to 1 before fs is initialized and if call to bch2_trans_mark_dev_sbs (just before bch2_fs_freespace_init) fails bits remain to be 1 which can later indirectly trigger BUG condition in bch2_bucket_alloc_freelist during shutdown. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=2b6a17991a6af64f9489 Fixes: bbe682c76789 ("bcachefs: Ensure devices are always correctly initialized") Suggested-by: Kent Overstreet <[email protected]> Signed-off-by: Piotr Zalewski <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-29bcachefs: Fix unhandled transaction restart in fallocateKent Overstreet1-4/+13
This used to not matter, but now we're being more strict. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-25bcachefs: Fix UAF in bch2_reconstruct_alloc()Kent Overstreet1-3/+2
write_super() -> sb_counters_from_cpu() may reallocate the superblock Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-10-25bcachefs: fix null-ptr-deref in have_stripes()Jeongjun Park1-0/+3
c->btree_roots_known[i].b can be NULL. In this case, a NULL pointer dereference occurs, so you need to add code to check the variable. Reported-by: [email protected] Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Jeongjun Park <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-24bcachefs: fix shift oob in alloc_lru_idx_fragmentationJeongjun Park1-0/+3
The size of a.data_type is set abnormally large, causing shift-out-of-bounds. To fix this, we need to add validation on a.data_type in alloc_lru_idx_fragmentation(). Reported-by: [email protected] Fixes: 260af1562ec1 ("bcachefs: Kill alloc_v4.fragmentation_lru") Signed-off-by: Jeongjun Park <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-24bcachefs: Fix invalid shift in validate_sb_layout()Gianfranco Trad2-0/+6
Add check on layout->sb_max_size_bits against BCH_SB_LAYOUT_SIZE_BITS_MAX to prevent UBSAN shift-out-of-bounds in validate_sb_layout(). Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=089fad5a3a5e77825426 Fixes: 03ef80b469d5 ("bcachefs: Ignore unknown mount options") Tested-by: [email protected] Signed-off-by: Gianfranco Trad <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-24Merge tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefsLinus Torvalds41-205/+475
Pull bcachefs fixes from Kent Overstreet: "Lots of hotfixes: - transaction restart injection has been shaking out a few things - fix a data corruption in the buffered write path on -ENOSPC, found by xfstests generic/299 - Some small show_options fixes - Repair mismatches in inode hash type, seed: different snapshot versions of an inode must have the same hash/type seed, used for directory entries and xattrs. We were checking the hash seed, but not the type, and a user contributed a filesystem where the hash type on one inode had somehow been flipped; these fixes allow his filesystem to repair. Additionally, the hash type flip made some directory entries invisible, which were then recreated by userspace; so the hash check code now checks for duplicate non dangling dirents, and renames one of them if necessary. - Don't use wait_event_interruptible() in recovery: this fixes some filesystems failing to mount with -ERESTARTSYS - Workaround for kvmalloc not supporting > INT_MAX allocations, causing an -ENOMEM when allocating the sorted array of journal keys: this allows a 75 TB filesystem to mount - Make sure bch_inode_unpacked.bi_snapshot is set in the old inode compat path: this alllows Marcin's filesystem (in use since before 6.7) to repair and mount" * tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs: (26 commits) bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path bcachefs: Mark more errors as AUTOFIX bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations bcachefs: Don't use wait_event_interruptible() in recovery bcachefs: Fix __bch2_fsck_err() warning bcachefs: fsck: Improve hash_check_key() bcachefs: bch2_hash_set_or_get_in_snapshot() bcachefs: Repair mismatches in inode hash seed, type bcachefs: Add hash seed, type to inode_to_text() bcachefs: INODE_STR_HASH() for bch_inode_unpacked bcachefs: Run in-kernel offline fsck without ratelimit errors bcachefs: skip mount option handle for empty string. bcachefs: fix incorrect show_options results bcachefs: Fix data corruption on -ENOSPC in buffered write path bcachefs: bch2_folio_reservation_get_partial() is now better behaved bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() bcachefS: ec: fix data type on stripe deletion bcachefs: Don't use commit_do() unnecessarily bcachefs: handle restarts in bch2_bucket_io_time_reset() bcachefs: fix restart handling in __bch2_resume_logged_op_finsert() ...
2024-10-20bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode pathKent Overstreet1-0/+2
This fixes a fsck bug on a very old filesystem (pre mainline merge). Fixes: 72350ee0ea22 ("bcachefs: Kill snapshot arg to fsck_write_inode()") Reported-by: Marcin MirosÅ‚aw <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-20bcachefs: Mark more errors as AUTOFIXKent Overstreet1-2/+2
Reported-by: Marcin MirosÅ‚aw <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-20bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocationsKent Overstreet1-1/+14
kvmalloc() doesn't support allocations > INT_MAX, but vmalloc() does - the limit should be lifted, but we can work around this for now. A user with a 75 TB filesystem reported the following journal replay error: https://github.com/koverstreet/bcachefs/issues/769 In journal replay we have to sort and dedup all the keys from the journal, which means we need a large contiguous allocation. Given that the user has 128GB of ram, the 2GB limit on allocation size has become far too small. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-20bcachefs: Don't use wait_event_interruptible() in recoveryKent Overstreet3-6/+8
Fix a bug where mount was failing with -ERESTARTSYS: https://github.com/koverstreet/bcachefs/issues/741 We only want the interruptible wait when called from fsync. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-20bcachefs: Fix __bch2_fsck_err() warningKent Overstreet1-1/+4
We only warn about having a btree_trans that wasn't passed in if we'll be prompting. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: fsck: Improve hash_check_key()Kent Overstreet3-50/+194
hash_check_key() checks and repairs the hash table btrees: dirents and xattrs are open addressing hash tables. We recently had a corruption reported where the hash type on an inode somehow got flipped, which made the existing dirents invisible and allowed new ones to be created with the same name. Now, hash_check_key() can repair duplicates: it will delete one of them, if it has an xattr or dangling dirent, but if it has two valid dirents one of them gets renamed. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: bch2_hash_set_or_get_in_snapshot()Kent Overstreet1-14/+36
Add a variant of bch2_hash_set_in_snapshot() that returns the existing key on -EEXIST. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: Repair mismatches in inode hash seed, typeKent Overstreet1-10/+39
Different versions of the same inode (same inode number, different snapshot ID) must have the same hash seed and type - lookups require this, since they see keys from different snapshots simultaneously. To repair we only need to make the inodes consistent, hash_check_key() will do the rest. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: Add hash seed, type to inode_to_text()Kent Overstreet3-2/+8
This helped with discovering some filesystem corruption fsck has having trouble with: the str_hash type had gotten flipped on one snapshot's version of an inode. All versions of a given inode number have the same hash seed and hash type, since lookups will be done with a single hash/seed and type and see dirents/xattrs from multiple snapshots. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: INODE_STR_HASH() for bch_inode_unpackedKent Overstreet4-15/+13
Trivial cleanup - add a normal BITMASK() helper for bch_inode_unpacked. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: Run in-kernel offline fsck without ratelimit errorsKent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: skip mount option handle for empty string.Hongbo Li1-0/+3
The options parse in get_tree will split the options buffer, it will get the empty string for last one by strsep(). After commit ea0eeb89b1d5 ("bcachefs: reject unknown mount options") is merged, unknown mount options is not allowed (here is empty string), and this causes this errors. This can be reproduced just by the following steps: bcachefs format /dev/loop mount -t bcachefs -o metadata_target=loop1 /dev/loop1 /mnt/bcachefs/ Fixes: ea0eeb89b1d5 ("bcachefs: reject unknown mount options") Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: fix incorrect show_options resultsHongbo Li1-1/+1
When call show_options in bcachefs, the options buffer is appeneded to the seq variable. In fact, it requires an additional comma to be appended first. This will affect the remount process when reading existing mount options. Fixes: 9305cf91d05e ("bcachefs: bch2_opts_to_text()") Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: Fix data corruption on -ENOSPC in buffered write pathKent Overstreet1-0/+6
Found by generic/299: When we have to truncate a write due to -ENOSPC, we may have to read in the folio we're writing to if we're now no longer doing a complete write to a !uptodate folio. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: bch2_folio_reservation_get_partial() is now better behavedKent Overstreet3-32/+57
bch2_folio_reservation_get_partial(), on partial success, will now return a reservation that's aligned to the filesystem blocksize. This is a partial fix for fstests generic/299 - fio verify is badly behaved in the presence of short writes that aren't aligned to its blocksize. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()Kent Overstreet1-1/+1
bch2_disk_reservation_put() zeroes out the reservation - oops. This fixes a disk reservation leak when getting a quota reservation returned an error. Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefS: ec: fix data type on stripe deletionKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <[email protected]>
2024-10-18bcachefs: Don't use commit_do() unnecessarilyKent Overstreet21-39/+41
Using commit_do() to call alloc_sectors_start_trans() breaks when we're randomly injecting transaction restarts - the restart in the commit causes us to leak the lock that alloc_sectorS_start_trans() takes. Signed-off-by: Kent Overstreet <[email protected]>