aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-03-15xfs: fix dev_t usage in xmbuf tracepointsDarrick J. Wong2-4/+9
Fix some inconsistencies in the xmbuf tracepoints -- they should be reporting the major/minor of the filesystem that they're associated with, so that we have some clue on whose behalf the xmbuf was created. Fix the xmbuf_free tracepoint to report the same. Don't call the trace function until the xmbuf is fully initialized. Fixes: 5076a6040ca1 ("xfs: support in-memory buffer cache target") Signed-off-by: "Darrick J. Wong" <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds24-387/+457
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-15btrfs: zoned: use zone aware sb location for scrubJohannes Thumshirn1-1/+11
At the moment scrub_supers() doesn't grab the super block's location via the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset(). This leads to checksum errors on 'scrub' as we're not accessing the correct location of the super block. So use btrfs_sb_log_location() for getting the super blocks location on scrub. Reported-by: WA AM <[email protected]> Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com CC: [email protected] # 5.15+ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Naohiro Aota <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds6-80/+44
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-14ksmbd: Fix spelling mistake "connction" -> "connection"Colin Ian King1-1/+1
There is a spelling mistake in a ksmbd_debug debug message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_closeMarios Makassikis1-1/+1
rcu_dereference can return NULL, so make sure we check against that. Signed-off-by: Marios Makassikis <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: remove redundant variable assignmentBharath SM1-2/+0
This removes an unnecessary variable assignment. The assigned value will be overwritten by cifs_fattr_to_inode before it is accessed, making the line redundant. Signed-off-by: Bharath SM <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: fixes for get_inode_infoMeetakshi Setiya2-12/+13
Fix potential memory leaks, add error checking, remove unnecessary initialisation of status_file_deleted and do not use cifs_iget() to get inode in reparse_info_to_fattr since fattrs may not be fully set. Fixes: ffceb7640cbf ("smb: client: do not defer close open handles to deleted files") Reported-by: Paulo Alcantara <[email protected]> Signed-off-by: Meetakshi Setiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: open_cached_dir(): add FILE_READ_EA to desired accessEugene Korenevsky1-1/+2
Since smb2_query_eas() reads EA and uses cached directory, open_cached_dir() should request FILE_READ_EA access. Otherwise listxattr() and getxattr() will fail with EACCES (0xc0000022 STATUS_ACCESS_DENIED SMB status). Link: https://bugzilla.kernel.org/show_bug.cgi?id=218543 Cc: [email protected] Signed-off-by: Eugene Korenevsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: reduce warning log level for server not advertising interfacesShyam Prasad N1-2/+2
Several users have reported this log getting dumped too regularly to kernel log. The likely root cause has been identified, and it suggests that this situation is expected for some configurations (for example SMB2.1). Since the function returns appropriately even for such cases, it is fairly harmless to make this a debug log. When needed, the verbosity can be increased to capture this log. Cc: [email protected] Reported-by: Jan Čermák <[email protected]> Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: make sure server interfaces are requested only for SMB3+Shyam Prasad N4-3/+13
Some code paths for querying server interfaces make a false assumption that it will only get called for SMB3+. Since this function now can get called from a generic code paths, the correct thing to do is to have specific handler for this functionality per SMB dialect, and call this handler. This change adds such a handler and implements this handler only for SMB 3.0 and 3.1.1. Cc: [email protected] Cc: Jan Čermák <[email protected]> Reported-by: Paulo Alcantara <[email protected]> Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14cifs: defer close file handles having RH leaseBharath SM1-4/+15
Previously we only deferred closing file handles with RHW lease. To enhance performance benefits from deferred closes, we now include handles with RH leases as well. Signed-off-by: Bharath SM <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-14nilfs2: prevent kernel bug at submit_bh_wbc()Ryusuke Konishi1-1/+1
Fix a bug where nilfs_get_block() returns a successful status when searching and inserting the specified block both fail inconsistently. If this inconsistent behavior is not due to a previously fixed bug, then an unexpected race is occurring, so return a temporary error -EAGAIN instead. This prevents callers such as __block_write_begin_int() from requesting a read into a buffer that is not mapped, which would cause the BUG_ON check for the BH_Mapped flag in submit_bh_wbc() to fail. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-14nilfs2: fix failure to detect DAT corruption in btree and direct mappingsRyusuke Konishi2-4/+14
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()". This resolves a kernel BUG reported by syzbot. Since there are two flaws involved, I've made each one a separate patch. The first patch alone resolves the syzbot-reported bug, but I think both fixes should be sent to stable, so I've tagged them as such. This patch (of 2): Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data to a nilfs2 file system whose metadata is corrupted. There are two flaws involved in this issue. The first flaw is that when nilfs_get_block() locates a data block using btree or direct mapping, if the disk address translation routine nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata corruption, it can be passed back to nilfs_get_block(). This causes nilfs_get_block() to misidentify an existing block as non-existent, causing both data block lookup and insertion to fail inconsistently. The second flaw is that nilfs_get_block() returns a successful status in this inconsistent state. This causes the caller __block_write_begin_int() or others to request a read even though the buffer is not mapped, resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc() failing. This fixes the first issue by changing the return value to code -EINVAL when a conversion using DAT fails with code -ENOENT, avoiding the conflicting condition that leads to the kernel bug described above. Here, code -EINVAL indicates that metadata corruption was detected during the block lookup, which will be properly handled as a file system error and converted to -EIO when passing through the nilfs2 bmap layer. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks") Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-14ocfs2: enable ocfs2_listxattr for special filesSu Yue1-0/+1
For special files in S_IFBLK/S_IFCHR/S_IFIFO type, we already have ocfs2_setattr and ocfs2_getattr enabled. It's confusing for user space if it can use setattr/getattr to control one attribute appointed but can not list attributes using listxattr for above type files: $ mknod /mnt/b b 0 0 $ setfattr -h -n trusted.name -v 0xbabe /mnt/b $ getfattr -n trusted.name /mnt/b getfattr: Removing leading '/' from absolute path names trusted.name=0sur4= $ getfattr -m trusted /mnt/b $ Fix it by enabling ocfs2_listxattr for ocfs2_special_file_iops. After the commit, fstests/generic/062 will pass. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Su Yue <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-14ocfs2: remove SLAB_MEM_SPREAD flag usageChengming Zhou2-5/+4
The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove its usage so we can delete it from slab. No functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-14f2fs: fix to avoid use-after-free issue in f2fs_filemap_faultChao Yu1-1/+2
syzbot reports a f2fs bug as below: BUG: KASAN: slab-use-after-free in f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49 Read of size 8 at addr ffff88807bb22680 by task syz-executor184/5058 CPU: 0 PID: 5058 Comm: syz-executor184 Not tainted 6.7.0-syzkaller-09928-g052d534373b7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x163/0x540 mm/kasan/report.c:488 kasan_report+0x142/0x170 mm/kasan/report.c:601 f2fs_filemap_fault+0xd1/0x2c0 fs/f2fs/file.c:49 __do_fault+0x131/0x450 mm/memory.c:4376 do_shared_fault mm/memory.c:4798 [inline] do_fault mm/memory.c:4872 [inline] do_pte_missing mm/memory.c:3745 [inline] handle_pte_fault mm/memory.c:5144 [inline] __handle_mm_fault+0x23b7/0x72b0 mm/memory.c:5285 handle_mm_fault+0x27e/0x770 mm/memory.c:5450 do_user_addr_fault arch/x86/mm/fault.c:1364 [inline] handle_page_fault arch/x86/mm/fault.c:1507 [inline] exc_page_fault+0x456/0x870 arch/x86/mm/fault.c:1563 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570 The root cause is: in f2fs_filemap_fault(), vmf->vma may be not alive after filemap_fault(), so it may cause use-after-free issue when accessing vmf->vma->vm_flags in trace_f2fs_filemap_fault(). So it needs to keep vm_flags in separated temporary variable for tracepoint use. Fixes: 87f3afd366f7 ("f2fs: add tracepoint for f2fs_vm_page_mkwrite()") Reported-and-tested-by: [email protected] Closes: https://lore.kernel.org/lkml/[email protected] Cc: Ed Tsai <[email protected]> Suggested-by: Hillf Danton <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-14f2fs: truncate page cache before clearing flags when aborting atomic writeSunmin Jeong1-1/+3
In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode between the original inode and COW inode. When aborting atomic write and writeback occur simultaneously, invalid data can be written to original inode if the FI_ATOMIC_FILE flag is cleared meanwhile. To prevent the problem, let's truncate all pages before clearing the flag Atomic write thread Writeback thread f2fs_abort_atomic_write clear_inode_flag(inode, FI_ATOMIC_FILE) __writeback_single_inode do_writepages f2fs_do_write_data_page - use dn of original inode truncate_inode_pages_final Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: [email protected] #v5.19+ Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Yeongjin Gil <[email protected]> Signed-off-by: Sunmin Jeong <[email protected]> Reviewed-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-14f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flagSunmin Jeong1-0/+1
In f2fs_update_inode, i_size of the atomic file isn't updated until FI_ATOMIC_COMMITTED flag is set. When committing atomic write right after the writeback of the inode, i_size of the raw inode will not be updated. It can cause the atomicity corruption due to a mismatch between old file size and new data. To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED Atomic write thread Writeback thread __writeback_single_inode write_inode f2fs_update_inode - skip i_size update f2fs_ioc_commit_atomic_write f2fs_commit_atomic_write set_inode_flag(inode, FI_ATOMIC_COMMITTED) f2fs_do_sync_file f2fs_fsync_node_pages - skip f2fs_update_inode since the inode is clean Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: [email protected] #v5.19+ Reviewed-by: Sungjong Seo <[email protected]> Reviewed-by: Yeongjin Gil <[email protected]> Signed-off-by: Sunmin Jeong <[email protected]> Reviewed-by: Daeho Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-03-14afs: Fix occasional rmdir-then-VNOVNODE with generic/011David Howells1-7/+9
Sometimes generic/011 causes kafs to follow up an FS.RemoveDir RPC call by spending around a second sending a slew of FS.FetchStatus RPC calls to the directory just deleted that then abort with VNOVNODE, indicating deletion of the target directory. This seems to stem from userspace attempting to stat the directory or something in it: afs_select_fileserver+0x46d/0xaa2 afs_wait_for_operation+0x12/0x17e afs_fetch_status+0x56/0x75 afs_validate+0xfb/0x240 afs_permission+0xef/0x1b0 inode_permission+0x90/0x139 link_path_walk.part.0.constprop.0+0x6f/0x2f0 path_lookupat+0x4c/0xfa filename_lookup+0x63/0xd7 vfs_statx+0x62/0x13f vfs_fstatat+0x72/0x8a The issue appears to be that afs_dir_remove_subdir() marks the callback promise as being cancelled by setting the expiry time to AFS_NO_CB_PROMISE - which then confuses afs_validate() which sends the FetchStatus to try and get a new one before it checks for the AFS_VNODE_DELETED flag which indicates that we know the directory got deleted. Fix this by: (1) Make afs_check_validity() return true if AFS_VNODE_DELETED is set, and then tweak the return from afs_validate() if the DELETED flag is set. (2) Move the AFS_VNODE_DELETED check in afs_validate() up above the expiration check to immediately after we've grabbed the validate_lock. Fixes: 453924de6212 ("afs: Overhaul invalidation handling to better support RO volumes") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Marc Dionne <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-03-14afs: Don't cache preferred addressDavid Howells1-17/+4
In the AFS fileserver rotation algorithm, don't cache the preferred address for the server as that will override the explicit preference if a non-preferred address responds first. Fixes: 495f2ae9e355 ("afs: Fix fileserver rotation") Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Marc Dionne <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-03-14afs: Revert "afs: Hide silly-rename files from userspace"David Howells1-10/+0
This reverts commit 57e9d49c54528c49b8bffe6d99d782ea051ea534. This undoes the hiding of .__afsXXXX silly-rename files. The problem with hiding them is that rm can't then manually delete them. This also reverts commit 5f7a07646655fb4108da527565dcdc80124b14c4 ("afs: Fix endless loop in directory parsing") as that's a bugfix for the above. Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Reported-by: Markus Suvanto <[email protected]> Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008102.html Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeffrey E Altman <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-03-13bcachefs: time_stats: shrink time_stat_buffer for better alignmentDarrick J. Wong1-1/+1
Shrink this percpu object by one array element so that the object size becomes exactly 512 bytes. This will lead to more efficient memory use, hopefully. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: time_stats: split stats-with-quantiles into a separate structureDarrick J. Wong7-15/+41
Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a dietDarrick J. Wong6-67/+84
The only caller of this code (time_stats) always knows the weights and whether or not any information has been collected. Pass this information into the mean and variance code so that it doesn't have to store that information. This reduces the structure size from 24 to 16 bytes, which shrinks each time_stats counter to 192 bytes from 208. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: time_stats: add larger unitsDarrick J. Wong1-0/+3
Filesystems can stay mounted for a very long time, so add some larger units. Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: pull out time_stats.[ch]Kent Overstreet11-279/+326
prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: reconstruct_alloc cleanupKent Overstreet7-95/+113
Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: fix bch_folio_sector paddingKent Overstreet1-6/+3
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet2-4/+6
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Always flush write buffer in delete_dead_inodes()Kent Overstreet1-5/+10
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Fix order of gc_done passesKent Overstreet1-4/+4
gc_stripes_done() and gc_reflink_done() may do alloc btree updates (i.e. when deleting an indirect extent) - we need bucket gens to be fixed by then. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: fix deletion of indirect extents in btree_gcKent Overstreet1-2/+2
we need to run the normal extent update path on deletion - bch2_bkey_make_mut() is incorrect when key type is changing. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Prefer struct_size over open coded arithmeticErick Archer2-3/+2
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "op" variable is a pointer to "struct promote_op" and this structure ends in a flexible array: struct promote_op { [...] struct bio_vec bi_inline_vecs[]; }; and the "t" variable is a pointer to "struct journal_seq_blacklist_table" and this structure also ends in a flexible array: struct journal_seq_blacklist_table { [...] struct journal_seq_blacklist_table_entry { u64 start; u64 end; bool dirty; } entries[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the argument "size + size * count" in the kzalloc() functions. This way, the code is more readable and safer. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Kill unused flags argument to btree_split()Kent Overstreet1-11/+8
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Check for writing superblocks with nonsense member seq fieldsKent Overstreet1-0/+8
We're seeing some unmountable filesystems due to split brain detection going awry; it seems we somehow wrote out superblocks where we updated the superblock seq without updating any member seq fields. A given device's superblock should always have the main seq equal to it's member seq field, so this is easy to check for. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: fix bch2_journal_buf_to_text()Kent Overstreet1-5/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: copy_(to|from)_user_errcode()Kent Overstreet2-6/+16
we've got some helpers that return errors sanely, move them to a more common location for use in fs-ioctl.c Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Split out bkey_types.hKent Overstreet2-201/+214
We're going to need bkey_types.h in bcachefs_ioctl.h in a future patch. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: fix lost journal buf wakeup due to improved pipeliningBrian Foster1-1/+1
The journal_write_done() handler was reworked into a loop in commit 746a33c96b7a ("bcachefs: better journal pipelining"). As part of this, the journal buffer wake was factored into a post-loop branch that executes if at least one journal buffer has completed. The journal buffer processing loop iterates on the journal buffer pointer, however. This means that w refers to the last buffer processed by the loop, which may or may not be done. This also means that if multiple buffers are processed by the loop, only the last is awoken. This lost wakeup behavior has lead to stalling problems in various CI and fstests, such as generic/703. Lift the wake into the loop so each done buffer sees a wake call as it is processed. Signed-off-by: Brian Foster <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: intercept mountoption value for bool typeHongbo Li2-1/+2
For mount option with bool type, the value must be 0 or 1 (See bch2_opt_parse). But this seems does not well intercepted cause for other value(like 2...), it returns the unexpect return code with error message printed. Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: avoid returning private error code in bch2_xattr_bcachefs_setHongbo Li1-2/+3
Avoid the private error code return to caller. The error code should be transformed into genernal error code. Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Buffered write path now can avoid the inode lockKent Overstreet2-41/+111
Non append, non extending buffered writes can now avoid taking the inode lock. To ensure atomicity of writes w.r.t. other writes, we lock every folio that we'll be writing to, and if this fails we fall back to taking the inode lock. Extensive comments are provided as to corner cases. Link: https://lore.kernel.org/linux-fsdevel/[email protected]/ Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13fs: file_remove_privs_flags()Kent Overstreet1-3/+4
Rename and export __file_remove_privs(); for a buffered write path that doesn't take the inode lock we need to be able to check if the operation needs to do work first. Signed-off-by: Kent Overstreet <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]>
2024-03-13bcachefs: Fix bch2_journal_noflush_seq()Kent Overstreet2-5/+6
Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: fix the error code when mounting with incorrect options.Hongbo Li3-4/+9
When mount with incorrect options such as: "mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/". It rebacks the error "mount: /mnt/bcachefs: permission denied." cause bch2_parse_mount_opts returns -1 and bch2_mount throws it up. This is unreasonable. The real error message should be like this: "mount: /mnt/bcachefs: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error." Adding three private error codes for mounting error. Here are: - BCH_ERR_mount_option as the parent class for option error. - BCH_ERR_option_name represents the invalid option name. - BCH_ERR_option_value represents the invalid option value. Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: split out ignore_blacklisted, ignore_not_dirtyKent Overstreet5-21/+33
prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: improve move_gap()Kent Overstreet3-8/+9
Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: journal_keys now uses darray helpersKent Overstreet2-61/+25
nice bit of code cleanup Signed-off-by: Kent Overstreet <[email protected]>
2024-03-13bcachefs: Rename journal_keys.d -> journal_keys.dataKent Overstreet3-42/+42
This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <[email protected]>