aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-01smb: client: fix UAF in smb2_reconnect_server()Paulo Alcantara1-49/+34
The UAF bug is due to smb2_reconnect_server() accessing a session that is already being teared down by another thread that is executing __cifs_put_smb_ses(). This can happen when (a) the client has connection to the server but no session or (b) another thread ends up setting @ses->ses_status again to something different than SES_EXITING. To fix this, we need to make sure to unconditionally set @ses->ses_status to SES_EXITING and prevent any other threads from setting a new status while we're still tearing it down. The following can be reproduced by adding some delay to right after the ipc is freed in __cifs_put_smb_ses() - which will give smb2_reconnect_server() worker a chance to run and then accessing @ses->ipc: kinit ... mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10 [disconnect srv] ls /mnt/1 &>/dev/null sleep 30 kdestroy [reconnect srv] sleep 10 umount /mnt/1 ... CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed CIFS: VFS: \\srv Send error in SessSetup = -126 CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed CIFS: VFS: \\srv Send error in SessSetup = -126 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 Workqueue: cifsiod smb2_reconnect_server [cifs] RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0 Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75 7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8 RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83 RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800 RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000 R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000 FS: 0000000000000000(0000) GS:ffff888157c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? die_addr+0x36/0x90 ? exc_general_protection+0x1c1/0x3f0 ? asm_exc_general_protection+0x26/0x30 ? __list_del_entry_valid_or_report+0x33/0xf0 __cifs_put_smb_ses+0x1ae/0x500 [cifs] smb2_reconnect_server+0x4ed/0x710 [cifs] process_one_work+0x205/0x6b0 worker_thread+0x191/0x360 ? __pfx_worker_thread+0x10/0x10 kthread+0xe2/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-01io_uring: disable io-wq execution of multishot NOWAIT requestsJens Axboe1-4/+9
Do the same check for direct io-wq execution for multishot requests that commit 2a975d426c82 did for the inline execution, and disable multishot mode (and revert to single shot) if the file type doesn't support NOWAIT, and isn't opened in O_NONBLOCK mode. For multishot to work properly, it's a requirement that nonblocking read attempts can be done. Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-04-01io_uring/rw: don't allow multishot reads without NOWAIT supportJens Axboe1-1/+8
Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever the poll readiness triggered. Any fast file type will have NOWAIT support (eg it understands both O_NONBLOCK and IOCB_NOWAIT). If the given file type does not, then simply resort to single shot execution. Cc: [email protected] Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <[email protected]>
2024-04-01OSS: dmasound/paula: Mark driver struct with __refdata to prevent section ↵Uwe Kleine-König1-1/+7
mismatch As described in the added code comment, a reference to .exit.text is ok for drivers registered via module_platform_driver_probe(). Make this explicit to prevent the following section mismatch warning WARNING: modpost: sound/oss/dmasound/dmasound_paula: section mismatch in reference: amiga_audio_driver+0x8 (section: .data) -> amiga_audio_remove (section: .exit.text) that triggers on an allmodconfig W=1 build. Signed-off-by: Uwe Kleine-König <[email protected]> Message-ID: <c216a129aa88f3af5c56fe6612a472f7a882f048.1711748999.git.u.kleine-koenig@pengutronix.de> Signed-off-by: Takashi Iwai <[email protected]>
2024-04-01timers: Fix text inconsistencies and spellingRandy Dunlap1-11/+11
Fix some text for consistency: s/lvl/level/ in a comment and use correct/full function names in comments. Correct spelling errors as reported by codespell. Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01tick/sched: Fix struct tick_sched doc warningsRandy Dunlap1-1/+1
Fix kernel-doc warnings in struct tick_sched: tick-sched.h:103: warning: Function parameter or struct member 'idle_sleeptime_seq' not described in 'tick_sched' tick-sched.h:104: warning: Excess struct member 'nohz_mode' description in 'tick_sched' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01tick/sched: Fix various kernel-doc warningsRandy Dunlap1-3/+15
Fix a slew of kernel-doc warnings in tick-sched.c: tick-sched.c:650: warning: Function parameter or struct member 'now' not described in 'tick_nohz_update_jiffies' tick-sched.c:741: warning: No description found for return value of 'get_cpu_idle_time_us' tick-sched.c:767: warning: No description found for return value of 'get_cpu_iowait_time_us' tick-sched.c:1210: warning: No description found for return value of 'tick_nohz_idle_got_tick' tick-sched.c:1228: warning: No description found for return value of 'tick_nohz_get_next_hrtimer' tick-sched.c:1243: warning: No description found for return value of 'tick_nohz_get_sleep_length' tick-sched.c:1282: warning: Function parameter or struct member 'cpu' not described in 'tick_nohz_get_idle_calls_cpu' tick-sched.c:1282: warning: No description found for return value of 'tick_nohz_get_idle_calls_cpu' tick-sched.c:1294: warning: No description found for return value of 'tick_nohz_get_idle_calls' tick-sched.c:1577: warning: Function parameter or struct member 'hrtimer' not described in 'tick_setup_sched_timer' tick-sched.c:1577: warning: Excess function parameter 'mode' description in 'tick_setup_sched_timer' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01timers: Fix kernel-doc format and add Return valuesRandy Dunlap1-2/+10
Fix kernel-doc format and warnings: timer.h:26: warning: Cannot understand * @TIMER_DEFERRABLE: A deferrable timer will work normally when the on line 26 - I thought it was a doc line timer.h:146: warning: No description found for return value of 'timer_pending' timer.h:180: warning: No description found for return value of 'del_timer_sync' timer.h:193: warning: No description found for return value of 'del_timer' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01time/timekeeping: Fix kernel-doc warnings and typosRandy Dunlap1-7/+42
Fix punctuation, spellos, and kernel-doc warnings: timekeeping.h:79: warning: No description found for return value of 'ktime_get_real' timekeeping.h:95: warning: No description found for return value of 'ktime_get_boottime' timekeeping.h:108: warning: No description found for return value of 'ktime_get_clocktai' timekeeping.h:149: warning: Function parameter or struct member 'mono' not described in 'ktime_mono_to_real' timekeeping.h:149: warning: No description found for return value of 'ktime_mono_to_real' timekeeping.h:255: warning: Function parameter or struct member 'cs_id' not described in 'system_time_snapshot' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01time/timecounter: Fix inline documentationRandy Dunlap1-2/+9
Fix kernel-doc warnings, text punctuation, and a kernel-doc marker (change '%' to '&' to indicate a struct): timecounter.h:72: warning: No description found for return value of 'cyclecounter_cyc2ns' timecounter.h:85: warning: Function parameter or member 'tc' not described in 'timecounter_adjtime' timecounter.h:111: warning: No description found for return value of 'timecounter_read' timecounter.h:128: warning: No description found for return value of 'timecounter_cyc2time' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-01KVM: arm64: Rationalise KVM banner outputMarc Zyngier1-8/+5
We are not very consistent when it comes to displaying which mode we're in (VHE, {n,h}VHE, protected or not). For example, booting in protected mode with hVHE results in: [ 0.969545] kvm [1]: Protected nVHE mode initialized successfully which is mildly amusing considering that the machine is VHE only. We already cleaned this up a bit with commit 1f3ca7023fe6 ("KVM: arm64: print Hyp mode"), but that's still unsatisfactory. Unify the three strings into one and use a mess of conditional statements to sort it out (yes, it's a slow day). Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01arm64: Fix early handling of FEAT_E2H0 not being implementedMarc Zyngier1-13/+16
Commit 3944382fa6f2 introduced checks for the FEAT_E2H0 not being implemented. However, the check is absolutely wrong and makes a point it testing a bit that is guaranteed to be zero. On top of that, the detection happens way too late, after the init_el2_state has done its job. This went undetected because the HW this was tested on has E2H being RAO/WI, and not RES1. However, the bug shows up when run as a nested guest, where HCR_EL2.E2H is not necessarily set to 1. As a result, booting the kernel in hVHE mode fails with timer accesses being cought in a trap loop (which was fun to debug). Fix the check for ID_AA64MMFR4_EL1.E2H0, and set the HCR_EL2.E2H bit early so that it can be checked by the rest of the init sequence. With this, hVHE works again in a NV environment that doesn't have FEAT_E2H0. Fixes: 3944382fa6f2 ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01KVM: arm64: Ensure target address is granule-aligned for range TLBIWill Deacon1-4/+7
When zapping a table entry in stage2_try_break_pte(), we issue range TLB invalidation for the region that was mapped by the table. However, we neglect to align the base address down to the granule size and so if we ended up reaching the table entry via a misaligned address then we will accidentally skip invalidation for some prefix of the affected address range. Align 'ctx->addr' down to the granule size when performing TLB invalidation for an unmapped table in stage2_try_break_pte(). Cc: Raghavendra Rao Ananta <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Shaoqin Huang <[email protected]> Cc: Quentin Perret <[email protected]> Fixes: defc8cc7abf0 ("KVM: arm64: Invalidate the table entries upon a range") Signed-off-by: Will Deacon <[email protected]> Reviewed-by: Shaoqin Huang <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()Will Deacon2-2/+4
Commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") updated the __tlbi_level() macro to take the target level as an argument, with TLBI_TTL_UNKNOWN (rather than 0) indicating that the caller cannot provide level information. Unfortunately, the two implementations of __kvm_tlb_flush_vmid_range() were not updated and so now ask for an level 0 invalidation if FEAT_LPA2 is implemented. Fix the problem by passing TLBI_TTL_UNKNOWN instead of 0 as the level argument to __flush_s2_tlb_range_op() in __kvm_tlb_flush_vmid_range(). Cc: Catalin Marinas <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Marc Zyngier <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") Signed-off-by: Will Deacon <[email protected]> Reviewed-by: Shaoqin Huang <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01KVM: arm64: Don't pass a TLBI level hint when zapping table entriesWill Deacon1-5/+7
The TLBI level hints are for leaf entries only, so take care not to pass them incorrectly after clearing a table entry. Cc: Gavin Shan <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Quentin Perret <[email protected]> Fixes: 82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2") Fixes: 6d9d2115c480 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table") Signed-off-by: Will Deacon <[email protected]> Reviewed-by: Shaoqin Huang <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01KVM: arm64: Don't defer TLB invalidation when zapping table entriesWill Deacon1-1/+3
Commit 7657ea920c54 ("KVM: arm64: Use TLBI range-based instructions for unmap") introduced deferred TLB invalidation for the stage-2 page-table so that range-based invalidation can be used for the accumulated addresses. This works fine if the structure of the page-tables remains unchanged, but if entire tables are zapped and subsequently freed then we transiently leave the hardware page-table walker with a reference to freed memory thanks to the translation walk caches. For example, stage2_unmap_walker() will free page-table pages: if (childp) mm_ops->put_page(childp); and issue the TLB invalidation later in kvm_pgtable_stage2_unmap(): if (stage2_unmap_defer_tlb_flush(pgt)) /* Perform the deferred TLB invalidations */ kvm_tlb_flush_vmid_range(pgt->mmu, addr, size); For now, take the conservative approach and invalidate the TLB eagerly when we clear a table entry. Note, however, that the existing level hint passed to __kvm_tlb_flush_vmid_ipa() is incorrect and will be fixed in a subsequent patch. Cc: Raghavendra Rao Ananta <[email protected]> Cc: Shaoqin Huang <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Will Deacon <[email protected]> Reviewed-by: Shaoqin Huang <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2024-04-01xfs: allow cross-linking special files without project quotaAndrey Albershteyn1-2/+13
There's an issue that if special files is created before quota project is enabled, then it's not possible to link this file. This works fine for normal files. This happens because xfs_quota skips special files (no ioctls to set necessary flags). The check for having the same project ID for source and destination then fails as source file doesn't have any ID. mkfs.xfs -f /dev/sda mount -o prjquota /dev/sda /mnt/test mkdir /mnt/test/foo mkfifo /mnt/test/foo/fifo1 xfs_quota -xc "project -sp /mnt/test/foo 9" /mnt/test > Setting up project 9 (path /mnt/test/foo)... > xfs_quota: skipping special file /mnt/test/foo/fifo1 > Processed 1 (/etc/projects and cmdline) paths for project 9 with recursion depth infinite (-1). ln /mnt/test/foo/fifo1 /mnt/test/foo/fifo1_link > ln: failed to create hard link '/mnt/test/testdir/fifo1_link' => '/mnt/test/testdir/fifo1': Invalid cross-device link mkfifo /mnt/test/foo/fifo2 ln /mnt/test/foo/fifo2 /mnt/test/foo/fifo2_link Fix this by allowing linking of special files to the project quota if special files doesn't have any ID set (ID = 0). Signed-off-by: Andrey Albershteyn <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-04-01bcachefs: On emergency shutdown, print out current journal sequence numberKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <[email protected]>
2024-04-01bcachefs: Fix overlapping extent repairKent Overstreet1-4/+6
overlapping extent repair was colliding with extent past end of inode checks - don't update "extent ends at" until we know we have an extent. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-01bcachefs: Fix remove_dirent()Kent Overstreet1-3/+4
We were missing an iter_traverse(). Signed-off-by: Kent Overstreet <[email protected]>
2024-04-01bcachefs: Logged op errors should be ignoredKent Overstreet1-4/+3
If something is wrong with a logged op, we just want to delete it - there's nothing to repair. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Improve -o norecovery; opts.recovery_pass_limitKent Overstreet6-14/+26
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: bch2_run_explicit_recovery_pass_persistent()Kent Overstreet2-0/+19
Flag that we need to run a recovery pass and run it - persistenly, so if we crash it'll still get run. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Ensure bch_sb_field_ext always existsKent Overstreet2-17/+16
This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Flush journal immediately after replay if we did early repairKent Overstreet1-0/+20
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Resume logged ops after fsckKent Overstreet1-1/+1
Finishing logged ops requires the filesystem to be in a reasonably consistent state - and other fsck passes don't require it to have completed, so just run it last. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Add error messages to logged ops fnsKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Split out recovery_passes.cKent Overstreet16-286/+313
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: fix backpointer for missing alloc key msgKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix bch2_btree_increase_depth()Kent Overstreet1-2/+5
When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Kill bch2_bkey_ptr_data_type()Kent Overstreet5-66/+67
Remove some duplication, and inconsistency between check_fix_ptrs and the main ptr marking paths Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix use after free in check_root_trans()Kent Overstreet1-7/+11
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix repair path for missing indirect extentsKent Overstreet1-2/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix use after free in bch2_check_fix_ptrs()Kent Overstreet1-6/+6
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix btree node keys accounting in topology repair pathKent Overstreet4-4/+14
When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Check btree ptr min_key in .invalidKent Overstreet2-3/+9
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: add REQ_SYNC and REQ_IDLE in write diozhuxiaohui1-2/+2
when writing file with direct_IO on bcachefs, then performance is much lower than other fs due to write back throttle in block layer: wbt_wait+1 __rq_qos_throttle+32 blk_mq_submit_bio+394 submit_bio_noacct_nocheck+649 bch2_submit_wbio_replicas+538 __bch2_write+2539 bch2_direct_write+1663 bch2_write_iter+318 aio_write+355 io_submit_one+1224 __x64_sys_io_submit+169 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+110 add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io Signed-off-by: zhuxiaohui <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Improved topology repair checksKent Overstreet8-165/+137
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Be careful about btree node splits during journal replayKent Overstreet3-3/+26
Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finishedKent Overstreet1-5/+8
btree_and_journal_iter is now safe to use at runtime, not just during recovery before journal keys have been freed. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: fix trans->mem realloc in __bch2_trans_kmallocHongbo Li1-3/+30
The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Don't do extent merging before journal replay is finishedKent Overstreet1-0/+6
We don't normally do extent updates this early in recovery, but some of the repair paths have to and when we do, we don't want to do anything that requires the snapshots table. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Add checks for invalid snapshot IDsKent Overstreet5-39/+40
Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Move snapshot table size to struct snapshot_tableKent Overstreet4-13/+21
We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Add an assertion for trying to evict btree rootKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: fix mount error pathKent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()Thomas Bertschinger1-1/+1
before: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0bi_atime=227064388944 ... after: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0 bi_atime=227064388944 ... Signed-off-by: Thomas Bertschinger <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix journal pins in btree write bufferKent Overstreet1-0/+14
btree write buffer flush has two phases - in natural key order, which is more efficient but may fail - then in journal order The journal order flush was assuming that keys were still correctly ordered by journal sequence number - but due to coalescing by the previous phase, we need an additional sort. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-31bcachefs: Fix assert in bch2_backpointer_invalid()Kent Overstreet1-0/+5
Backpointers that point to invalid devices are caught by fsck, not .key_invalid; so .key_invalid needs to check for them instead of hitting asserts. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-01ata: pata_macio: drop driver owner assignmentKrzysztof Kozlowski1-3/+0
PCI core in pci_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>