aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-13bcachefs: kill kvpmalloc()Kent Overstreet14-115/+49
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13mempool: kvmalloc poolKent Overstreet2-0/+26
Add mempool_init_kvmalloc_pool() and mempool_create_kvmalloc_pool(), which wrap kvmalloc() instead of kmalloc() - kmalloc() with a vmalloc() fallback. This is part of a bcachefs cleanup - dropping an internal kvpmalloc() helper (which predates kvmalloc()) along with mempool helpers; this replaces the bcachefs-private kvpmalloc_pool. Signed-off-by: Kent Overstreet <[email protected]> Cc: [email protected]
2024-03-10bcachefs: bch2_lookup() gives better error message on inode not foundKent Overstreet1-9/+64
When a dirent points to a missing inode, we really should print out the dirent. This requires quite a bit of refactoring, but there's some other benefits: we now do the entire looup (dirent and inode) in a single btree transaction, and copy to the VFS inode with btree locks still held, like the create path. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bch2_inode_insert()Kent Overstreet1-62/+76
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARNKent Overstreet2-6/+15
Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT PF_MEMALLOC_NOWARN -> __GFP_NOWARN Cc: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10mm: introduce memalloc_flags_{save,restore}Kent Overstreet1-17/+26
Our proliferation of memalloc_*_{save,restore} APIs is getting a bit silly, this adds a generic version and converts the existing save/restore functions to wrappers. Signed-off-by: Kent Overstreet <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: [email protected] Acked-by: Vlastimil Babka <[email protected]>
2024-03-10bcachefs: factor out check_inode_backpointer()Kent Overstreet1-9/+29
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Factor out check_subvol_dirent()Kent Overstreet2-48/+58
Going to be adding more code here for checking subvol structure. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Kill some -EINVALsKent Overstreet2-5/+5
Repurposing standard error codes in bcachefs code is banned in new code, and we need to get rid of the remaining ones - private error codes give us much better error messages. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bump max_active on btree_interior_update_workerKent Overstreet1-1/+1
WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: move fsck_write_inode() to inode.cKent Overstreet3-40/+44
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Initialize super_block->s_uuidKent Overstreet1-0/+1
Need to fix this oversight for the new FS_IOC_(GET|SET)UUID ioctls. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Switch to uuid_to_fsid()Kent Overstreet1-5/+1
switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Subvolumes may now be renamedKent Overstreet2-26/+55
Files within a subvolume cannot be renamed into another subvolume, but subvolumes themselves were intended to be. This implements subvolume renaming - we need to ensure that there's only a single dirent that points to a subvolume key (not multiple versions in different snapshots), and we need to ensure that dirent.d_parent_subol and inode.bi_parent_subvol are updated. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: btree node prefetching in check_topologyKent Overstreet4-3/+42
btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: btree_and_journal_iter.transKent Overstreet4-17/+21
we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: better journal pipeliningKent Overstreet4-59/+98
Recently a severe performance regression was discovered, which bisected to a6548c8b5eb5 bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: closure per journal bufKent Overstreet3-23/+41
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bio per journal bufKent Overstreet3-29/+34
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: jset_entry_datetimeKent Overstreet4-17/+67
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: improve journal entry read fsck error messagesKent Overstreet1-41/+55
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: convert journal replay ptrs to darrayKent Overstreet3-58/+36
Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Cleanup bch2_dirent_lookup_trans()Kent Overstreet3-26/+14
Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __ from the name - this is a normal interface. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: bch2_hash_set_snapshot() -> bch2_hash_set_in_snapshot()Kent Overstreet3-18/+12
Minor renaming for clarity, bit of refactoring. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Workqueues should be WQ_HIGHPRIKent Overstreet1-4/+4
Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Improve bch2_dirent_to_text()Kent Overstreet1-9/+11
For DT_SUBVOL, we now print both parent and child subvol IDs. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: fixup for building in userspaceKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Avoid taking journal lock unnecessarilyKent Overstreet2-53/+55
Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Journal writes should be REQ_SYNC|REQ_METAKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Avoid setting j->write_work unnecessarilyKent Overstreet1-13/+11
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Split out journal workqueueKent Overstreet3-16/+19
We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Kill unnecessary wakeups in journal reclaimKent Overstreet1-11/+9
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: skip invisible entries in empty subvolume checkingGuoyu Ou3-5/+9
When we are checking whether a subvolume is empty in the specified snapshot, entries that do not belong to this subvolume should be skipped. This fixes the following case: $ bcachefs subvolume create ./sub $ cd sub $ bcachefs subvolume create ./sub2 $ bcachefs subvolume snapshot . ./snap $ ls -a snap . .. $ rmdir snap rmdir: failed to remove 'snap': Directory not empty As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols in the subvol we are checking, because inode.bi_subvol is only set on subvolume roots, and we can't go through every inode in the subvolume and change bi_subvol when taking a snapshot. It makes the check less strict, but that's ok, the rest of fsck will still catch it. Signed-off-by: Guoyu Ou <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: fix split brain messageKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Set path->uptodate when no node at levelKent Overstreet1-2/+2
We were failing to set path->uptodate when reaching the end of a btree node iterator, causing the new prefetch code for backpointers gc to go into an infinite loop. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Correctly validate k->u64s in btree node read pathKent Overstreet3-6/+27
validate_bset_keys() never properly validated k->u64s; it checked if it was 0, but not if it was smaller than keys for the given packed format; this fixes that small oversight. This patch was backported, so it's adding quite a few error enums so that they don't get renumbered and we don't have confusing gaps. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Fix degraded mode fsckKent Overstreet1-18/+18
We don't know where the superblock and journal lives on offline devices; that means if a device is offline fsck can't check those buckets. Previously, fsck would incorrectly clear bucket data types for those buckets on offline devices; now we just use the previous state. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Fix journal replay with unreadable btree rootsKent Overstreet4-6/+70
When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: fix check_inode_deleted_list()Kent Overstreet1-6/+3
check_inode_deleted_list() returns true if the inode is on the deleted list; check_inode() was checking the return code incorrectly. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: no_splitbrain_check optionKent Overstreet2-8/+22
This adds an option to disable kicking out devices when splitbrain is detected - it seems there's some issues with splitbrain detection and we're kicking out devices erronously. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: extent_entry_next_safe()Kent Overstreet1-3/+8
We need to be able to iterate over extent ptrs that may be corrupted in order to print them - this fixes a bug where we'd pop an assert in bch2_bkey_durability_safe(). Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: journal_seq_blacklist_add() now handles entries being added out of ↵Kent Overstreet2-46/+22
order bch2_journal_seq_blacklist_add() was bugged when the new entry overlapped with multiple existing entries, and it also assumed new entries are being added in increasing order. This is true on any sane filesystem, but when trying to recover from very badly mangled filesystems we might end up with the journal sequence number rewinding vs. what the blacklist list knows about - easiest to just handle that here. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-10bcachefs: Fix null-ptr-deref in bch2_fs_alloc()Li Zetao1-3/+3
There is a null-ptr-deref issue reported by kasan: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Call Trace: <TASK> bch2_fs_alloc+0x1092/0x2170 [bcachefs] bch2_fs_open+0x683/0xe10 [bcachefs] ... When initializing the name of bch_fs, it needs to dynamically alloc memory to meet the length of the name. However, when name allocation failed, it will cause a null-ptr-deref access exception in subsequent string copy. Fix this issue by checking if name allocation is successful. Fixes: 401ec4db6308 ("bcachefs: Printbuf rework") Signed-off-by: Li Zetao <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-02-25Linux 6.8-rc6Linus Torvalds1-1/+1
2024-02-25Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefsLinus Torvalds7-22/+25
Pull bcachefs fixes from Kent Overstreet: "Some more mostly boring fixes, but some not User reported ones: - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty performance bug; user reported an untar initially taking two seconds and then ~2 minutes - kill a __GFP_NOFAIL in the buffered read path; this was a leftover from the trickier fix to kill __GFP_NOFAIL in readahead, where we can't return errors (and have to silently truncate the read ourselves). bcachefs can't use GFP_NOFAIL for folio state unlike iomap based filesystems because our folio state is just barely too big, 2MB hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL. additionally, the flags argument was just buggy, we weren't supplying GFP_KERNEL previously (!)" * tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs: bcachefs: fix bch2_save_backtrace() bcachefs: Fix check_snapshot() memcpy bcachefs: Fix bch2_journal_flush_device_pins() bcachefs: fix iov_iter count underflow on sub-block dio read bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree bcachefs: Kill __GFP_NOFAIL in buffered read path bcachefs: fix backpointer_to_text() when dev does not exist
2024-02-25bcachefs: fix bch2_save_backtrace()Kent Overstreet1-1/+1
Missed a call in the previous fix. Signed-off-by: Kent Overstreet <[email protected]>
2024-02-25Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linuxLinus Torvalds2-6/+10
Pull two documentation build fixes from Jonathan Corbet: - The XFS online fsck documentation uses incredibly deeply nested subsection and list nesting; that broke the PDF docs build. Tweak a parameter to tell LaTeX to allow the deeper nesting. - Fix a 6.8 PDF-build regression * tag 'docs-6.8-fixes3' of git://git.lwn.net/linux: docs: translations: use attribute to store current language docs: Instruct LaTeX to cope with deeper nesting
2024-02-25Merge tag 'usb-6.8-rc6' of ↵Linus Torvalds12-26/+75
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 6.8-rc6 to resolve some reported problems. These include: - regression fixes with typec tpcm code as reported by many - cdnsp and cdns3 driver fixes - usb role setting code bugfixes - build fix for uhci driver - ncm gadget driver bugfix - MAINTAINERS entry update All of these have been in linux-next all week with no reported issues and there is at least one fix in here that is in Thorsten's regression list that is being tracked" * tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tpcm: Fix issues with power being removed during reset MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role" usb: gadget: omap_udc: fix USB gadget regression on Palm TE usb: dwc3: gadget: Don't disconnect if not started usb: cdns3: fix memory double free when handle zero packet usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable() usb: roles: don't get/set_role() when usb_role_switch is unregistered usb: roles: fix NULL pointer issue when put module's reference usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers usb: cdnsp: blocked some cdns3 specific code usb: uhci-grlib: Explicitly include linux/platform_device.h
2024-02-25Merge tag 'tty-6.8-rc6' of ↵Linus Torvalds3-34/+38
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are three small serial/tty driver fixes for 6.8-rc6 that resolve the following reported errors: - riscv hvc console driver fix that was reported by many - amba-pl011 serial driver fix for RS485 mode - stm32 serial driver fix for RS485 mode All of these have been in linux-next all week with no reported problems" * tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: amba-pl011: Fix DMA transmission in RS485 mode serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled tty: hvc: Don't enable the RISC-V SBI console by default
2024-02-25Merge tag 'x86_urgent_for_v6.8_rc6' of ↵Linus Torvalds13-46/+112
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure clearing CPU buffers using VERW happens at the latest possible point in the return-to-userspace path, otherwise memory accesses after the VERW execution could cause data to land in CPU buffers again * tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM/VMX: Move VERW closer to VMentry for MDS mitigation KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key x86/entry_32: Add VERW just before userspace transition x86/entry_64: Add VERW just before userspace transition x86/bugs: Add asm helpers for executing VERW