aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2024-06-25bcachefs: Fix kmalloc bug in __snapshot_t_mutPei Li1-0/+3
When allocating too huge a snapshot table, we should fail gracefully in __snapshot_t_mut() instead of fail in kmalloc(). Reported-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1 Tested-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-25bcachefs: Discard, invalidate workers are now per deviceKent Overstreet5-133/+161
There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-25bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gcPei Li1-1/+1
This series fix the shift-out-of-bounds issue in bch2_blacklist_entries_gc(). Instead of passing 0 to eytzinger0_first() when iterating the entries, we explicitly check 0 and initialize i to be 0. syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+835d255ad6bc7f29ee12@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=835d255ad6bc7f29ee12 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-25bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpuPei Li1-4/+10
Acquire fsck_error_counts_lock before accessing the critical section protected by this lock. syzbot has tested the proposed patch and the reproducer did not trigger any issue. Reported-by: syzbot+a2bc0e838efd7663f4d9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a2bc0e838efd7663f4d9 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Add missing bch2_journal_do_writes() callKent Overstreet1-0/+7
This fixes a rare deadlock when we're doing an emergency shutdown due to failure to do a journal write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Fix null ptr deref in journal_pins_to_text()Kent Overstreet1-0/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Add missing recalc_capacity() callKent Overstreet1-0/+1
This fixes filesystem size not changing on device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Fix btree_trans list orderingKent Overstreet2-11/+34
The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Fix race between trans_put() and btree_transactions_read()Kent Overstreet2-16/+13
debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Make btree_deadlock_to_text() clearerKent Overstreet1-23/+29
btree_deadlock_to_text() searches the list of btree transactions to find a deadlock - when it finds one it's done; it's not like other *_read() functions that's printing each object. Factor out btree_deadlock_to_text() to make this clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: fix seqmutex_relock()Kent Overstreet2-13/+6
We were grabbing the sequence number before unlock incremented it - fix this by moving the increment to seqmutex_lock() (so the seqmutex_relock() failure path skips the mutex_trylock()), and returning the sequence number from unlock(), to make the API simpler and safer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-22bcachefs: Fix freeing of error pointersKent Overstreet1-3/+6
This fixes incorrect/missign checking of strndup_user() returns. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Move the ei_flags setting to after initializationYouling Tang1-5/+3
`inode->ei_flags` setting and cleaning should be done after initialization, otherwise the operation is invalid. Fixes: 9ca4853b98af ("bcachefs: Fix quota support for snapshots") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Fix a UAF after write_super()Kent Overstreet1-2/+2
write_super() may reallocate the superblock buffer - but bch_sb_field_ext was referencing it; don't use it after the write_super call. Reported-by: syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Use bch2_print_string_as_lines for long errKent Overstreet1-5/+8
printk strings get truncated to 1024 bytes; if we have a long error message (journal debug info) we need to use a helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Fix I_NEW warning in race path in bch2_inode_insert()Kent Overstreet1-2/+10
discard_new_inode() is the correct interface for tearing down an indoe that was fully created but not made visible to other threads, but it expects I_NEW to be set, which we don't use. Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Replace bare EEXIST with private error codesKent Overstreet5-8/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21bcachefs: Fix missing alloc_data_type_set()Kent Overstreet1-1/+3
Incorrect bucket state transition in the discard path; when incrementing a bucket's generation number that had already been discarded, we were forgetting to check if it should be need_gc_gens, not free. This was caught by the .invalid checks in the transaction commit path, causing us to go emergency read only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-20bcachefs: fix alignment of VMA for memory mapped files on THPYouling Tang1-0/+1
With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs for read-only mmapped files, such as shared libraries. However, the kernel makes no attempt to actually align those mappings on 2MB boundaries, which makes it impossible to use those THPs most of the time. This issue applies to general file mapping THP as well as existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily fixed by using thp_get_unmapped_area for the unmapped_area function in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use. Similar to commit b0c582233a85 ("btrfs: fix alignment of VMA for memory mapped files on THP"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-20bcachefs: Fix safe errors by defaultKent Overstreet5-289/+308
i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix bch2_trans_put()Kent Overstreet1-3/+8
reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: set_worker_desc() for delete_dead_snapshotsKent Overstreet1-0/+2
this is long running - help users see what's going on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix bch2_sb_downgrade_update()Kent Overstreet1-1/+1
Missing enum conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Handle cached data LRU wraparoundKent Overstreet1-5/+41
We only have 48 bits for the LRU time field, which is insufficient to prevent wraparound. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet6-12/+32
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: delete_dead_snapshots() doesn't need to go RWKent Overstreet1-7/+0
We've been moving away from going RW lazily; if we want to go RW we do that in set_may_go_rw(), and if we didn't go RW we don't need to delete dead snapshots. Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix early init error path in journal codeKent Overstreet1-0/+3
We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Check for invalid btree IDsKent Overstreet2-2/+12
We can only handle btree IDs up to 62, since the btree id (plus the type for interior btree nodes) has to fit ito a 64 bit bitmask - check for invalid ones to avoid invalid shifts later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix btree ID bitmasksKent Overstreet2-10/+11
these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix shift overflow in read_one_super()Kent Overstreet1-3/+4
Reported-by: syzbot+9f74cb4006b83e2a3df1@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix a locking bug in the do_discard_fast() pathKent Overstreet1-1/+1
We can't discard a bucket while it's still open; this needs the bucket_is_open_safe() version, which takes the open_buckets lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix array-index-out-of-boundsKent Overstreet3-3/+8
We use 0 size arrays as markers, but ubsan doesn't know that - cast them to a pointer to fix the splat. Also, make sure this code gets tested a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix initialization order for srcu barrierKent Overstreet1-1/+1
btree_iter_init() needs to happen before key_cache_init(), to initialize btree_trans_barrier Reported-by: syzbot+3cca837c2183f8f6fcaf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-11bcachefs: Fix rcu_read_lock() leak in drop_extra_replicasKent Overstreet1-2/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Add missing bch_inode_info.ei_flags initKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Add missing synchronize_srcu_expedited() call when shutting downKent Overstreet1-1/+3
We use the polling interface to srcu for tracking pending frees; when shutting down we don't need to wait for an srcu barrier to free them, but SRCU still gets confused if we shutdown with an outstanding grace period. Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()Kent Overstreet8-47/+135
Turn more asserts into proper recoverable error paths. Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checksKent Overstreet4-2/+10
The bucket_gens array and gc_buckets array known their own size; we should be using those members, and returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix snapshot_create_lock lock orderingKent Overstreet1-12/+5
====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted ------------------------------------------------------ fio/1345 is trying to acquire lock: ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0 but task is already holding lock: ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}: down_write+0x3d/0xd0 bch2_write_iter+0x1c0/0x10f0 vfs_write+0x24a/0x560 __x64_sys_pwrite64+0x77/0xb0 x64_sys_call+0x17e5/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (sb_writers#10){.+.+}-{0:0}: mnt_want_write+0x4a/0x1d0 filename_create+0x69/0x1a0 user_path_create+0x38/0x50 bch2_fs_file_ioctl+0x315/0xbf0 __x64_sys_ioctl+0x297/0xaf0 x64_sys_call+0x10cb/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&c->snapshot_create_lock){++++}-{3:3}: __lock_acquire+0x1445/0x25b0 lock_acquire+0xbd/0x2b0 down_read+0x40/0x180 bch2_truncate+0x76/0xf0 bchfs_truncate+0x240/0x3f0 bch2_setattr+0x7b/0xb0 notify_change+0x322/0x4b0 do_truncate+0x8b/0xc0 do_ftruncate+0x110/0x270 __x64_sys_ftruncate+0x43/0x80 x64_sys_call+0x1373/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 other info that might help us debug this: Chain exists of: &c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#13); lock(sb_writers#10); lock(&sb->s_type->i_mutex_key#13); rlock(&c->snapshot_create_lock); *** DEADLOCK *** Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix refcount leak in check_fix_ptrs()Kent Overstreet1-116/+133
fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so that our error label can drop our device ref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Leave a buffer in the btree key cache to avoid lock thrashingKent Overstreet1-0/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix reporting of freed objects from key cache shrinkerKent Overstreet1-8/+5
We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: set sb->s_shrinker->seeks = 0Kent Overstreet1-0/+1
inodes and dentries are still present in the btree node cache, in much more compact form Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: increase key cache shrinker batch sizeKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Enable automatic shrinking for rhashtablesKent Overstreet4-14/+18
Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: fix the display format for show-superHongbo Li1-3/+3
There are three keys displayed in non-uniform format. Let's fix them. [Before] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` [After] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` Fixes: 7423330e30ab ("bcachefs: prt_printf() now respects \r\n\t") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: fix stack frame size in fsck.cKent Overstreet1-0/+3
fsck.c always runs top of the stack so we're not too concerned here; noinline_for_stack is sufficient Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Delete incorrect BTREE_ID_NR assertionKent Overstreet1-6/+1
for forwards compat we now explicitly allow mounting and using filesystems with unknown btrees, and we have to walk them for fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix incorrect error handling found_btree_node_is_readable()Kent Overstreet1-4/+5
error handling here is slightly odd, which is why we were accidently calling evict() on an error pointer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Split out btree_write_submit_wqKent Overstreet3-8/+13
Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>