aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-27Merge tag 'erofs-for-6.9-rc2-fixes' of ↵Linus Torvalds2-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Add a new reviewer Sandeep Dhavale to build a healthier community - Drop experimental warning for FSDAX * tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: add myself as reviewer erofs: drop experimental warning for FSDAX
2024-03-27Merge tag '9p-fixes-for-6.9-rc1' of ↵Linus Torvalds2-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p fixes from Eric Van Hensbergen: "Two of these fix syzbot reported issues, and the other fixes a unused variable in some configurations" * tag '9p-fixes-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: fix uninitialized values during inode evict fs/9p: remove redundant pointer v9ses fs/9p: fix uaf in in v9fs_stat2inode_dotl
2024-03-27Merge tag 'for-6.9-rc1-tag' of ↵Linus Torvalds6-22/+63
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix race when reading extent buffer and 'uptodate' status is missed by one thread (introduced in 6.5) - do additional validation of devices using major:minor numbers - zoned mode fixes: - use zone-aware super block access during scrub - fix use-after-free during device replace (found by KASAN) - also delete zones that are 100% unusable to reclaim space - extent unpinning fixes: - fix extent map leak after error handling - print correct range in error message - error code and message updates * tag 'for-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race in read_extent_buffer_pages() btrfs: return accurate error code on open failure in open_fs_devices() btrfs: zoned: don't skip block groups with 100% zone unusable btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping() btrfs: fix message not properly printing interval when adding extent map btrfs: fix warning messages not printing interval at unpin_extent_range() btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache() btrfs: validate device maj:min during open btrfs: zoned: fix use-after-free in do_zone_finish() btrfs: zoned: use zone aware sb location for scrub
2024-03-27Merge tag 'mm-hotfixes-stable-2024-03-27-11-25' of ↵Linus Torvalds25-40/+177
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Various hotfixes. About half are cc:stable and the remainder address post-6.8 issues or aren't considered suitable for backporting. zswap figures prominently in the post-6.8 issues - folloup against the large amount of changes we have just made to that code. Apart from that, all over the map" * tag 'mm-hotfixes-stable-2024-03-27-11-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) crash: use macro to add crashk_res into iomem early for specific arch mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices selftests/mm: fix ARM related issue with fork after pthread_create hexagon: vmlinux.lds.S: handle attributes section userfaultfd: fix deadlock warning when locking src and dst VMAs tmpfs: fix race on handling dquot rbtree selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion ARM: prctl: reject PR_SET_MDWE on pre-ARMv6 prctl: generalize PR_SET_MDWE support check to be per-arch MAINTAINERS: remove incorrect M: tag for [email protected] mm: zswap: fix kernel BUG in sg_init_one selftests: mm: restore settings from only parent process tools/Makefile: remove cgroup target mm: cachestat: fix two shmem bugs mm: increase folio batch size mm,page_owner: fix recursion mailmap: update entry for Leonard Crestez init: open /initrd.image with O_LARGEFILE selftests/mm: Fix build with _FORTIFY_SOURCE ...
2024-03-27Merge tag 'probes-fixes-v6.9-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixlet from Masami Hiramatsu: - tracing/probes: initialize a 'val' local variable with zero. This variable is read by FETCH_OP_ST_EDATA in a loop, and is initialized by FETCH_OP_ARG in the same loop. Since this initialization is not obvious, smatch warns about it. Explicitly initializing 'val' with zero fixes this warning. * tag 'probes-fixes-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: probes: Fix to zero initialize a local variable
2024-03-27Merge tag 'execve-v6.9-rc2' of ↵Linus Torvalds7-55/+61
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fixes from Kees Cook: - Fix selftests to conform to the TAP output format (Muhammad Usama Anjum) - Fix NOMMU linux_binprm::exec pointer in auxv (Max Filippov) - Replace deprecated strncpy usage (Justin Stitt) - Replace another /bin/sh instance in selftests * tag 'execve-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt: replace deprecated strncpy exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() selftests/exec: Convert remaining /bin/sh to /bin/bash selftests/exec: execveat: Improve debug reporting selftests/exec: recursion-depth: conform test to TAP format output selftests/exec: load_address: conform test to TAP format output selftests/exec: binfmt_script: Add the overall result line according to TAP
2024-03-27Fix build errors due to new UIO_MEM_DMA_COHERENT messLinus Torvalds3-4/+4
Commit 576882ef5e7f ("uio: introduce UIO_MEM_DMA_COHERENT type") introduced a new use-case for 'struct uio_mem' where the 'mem' field now contains a kernel virtual address when 'memtype' is set to UIO_MEM_DMA_COHERENT. That in turn causes build errors, because 'mem' is of type 'phys_addr_t', and a virtual address is a pointer type. When the code just blindly uses cast to mix the two, it caused problems when phys_addr_t isn't the same size as a pointer - notably on 32-bit architectures with PHYS_ADDR_T_64BIT. The proper thing to do would probably be to use a union member, and not have any casts, and make the 'mem' member be a union of 'mem.physaddr' and 'mem.vaddr', based on 'memtype'. This is not that proper thing. This is just fixing the ugly casts to be even uglier, but at least not cause build errors on 32-bit platforms with 64-bit physical addresses. Reported-by: Guenter Roeck <[email protected]> Fixes: 576882ef5e7f ("uio: introduce UIO_MEM_DMA_COHERENT type") Fixes: 7722151e4651 ("uio_pruss: UIO_MEM_DMA_COHERENT conversion") Fixes: 019947805a8d ("uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion") Cc: Greg Kroah-Hartman <[email protected]> Cc: Chris Leech <[email protected]> Cc: Nilesh Javali <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-03-27Fix memory leak in posix_clock_open()Linus Torvalds1-7/+9
If the clk ops.open() function returns an error, we don't release the pccontext we allocated for this clock. Re-organize the code slightly to make it all more obvious. Reported-by: Rohit Keshri <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Fixes: 60c6946675fc ("posix-clock: introduce posix_clock_context concept") Cc: Jakub Kicinski <[email protected]> Cc: David S. Miller <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-03-26crash: use macro to add crashk_res into iomem early for specific archBaoquan He2-0/+9
There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/[email protected]/T/#u [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/[email protected]/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Li Huafei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devicesJohannes Weiner1-4/+19
Zhongkun He reports data corruption when combining zswap with zram. The issue is the exclusive loads we're doing in zswap. They assume that all reads are going into the swapcache, which can assume authoritative ownership of the data and so the zswap copy can go. However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will try to bypass the swapcache. This results in an optimistic read of the swap data into a page that will be dismissed if the fault fails due to races. In this case, zswap mustn't drop its authoritative copy. Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@mail.gmail.com/ Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Zhongkun He <[email protected]> Tested-by: Zhongkun He <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Acked-by: Barry Song <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Acked-by: Chris Li <[email protected]> Cc: <[email protected]> [6.5+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: fix ARM related issue with fork after pthread_createEdward Liaw3-0/+15
Following issue was observed while running the uffd-unit-tests selftest on ARM devices. On x86_64 no issues were detected: pthread_create followed by fork caused deadlock in certain cases wherein fork required some work to be completed by the created thread. Used synchronization to ensure that created thread's start function has started before invoking fork. [[email protected]: refactored to use atomic_bool] Link: https://lkml.kernel.org/r/[email protected] Fixes: 760aee0b71e3 ("selftests/mm: add tests for RO pinning vs fork()") Signed-off-by: Lokesh Gidra <[email protected]> Signed-off-by: Edward Liaw <[email protected]> Cc: Peter Xu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26hexagon: vmlinux.lds.S: handle attributes sectionNathan Chancellor1-0/+1
After the linked LLVM change, the build fails with CONFIG_LD_ORPHAN_WARN_LEVEL="error", which happens with allmodconfig: ld.lld: error: vmlinux.a(init/main.o):(.hexagon.attributes) is being placed in '.hexagon.attributes' Handle the attributes section in a similar manner as arm and riscv by adding it after the primary ELF_DETAILS grouping in vmlinux.lds.S, which fixes the error. Link: https://lkml.kernel.org/r/20240319-hexagon-handle-attributes-section-vmlinux-lds-s-v1-1-59855dab8872@kernel.org Fixes: 113616ec5b64 ("hexagon: select ARCH_WANT_LD_ORPHAN_WARN") Link: https://github.com/llvm/llvm-project/commit/31f4b329c8234fab9afa59494d7f8bdaeaefeaad Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Brian Cain <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26userfaultfd: fix deadlock warning when locking src and dst VMAsLokesh Gidra1-1/+2
Use down_read_nested() to avoid the warning. Link: https://lkml.kernel.org/r/[email protected] Fixes: 867a43a34ff8 ("userfaultfd: use per-vma locks in userfaultfd operations") Reported-by: [email protected] Signed-off-by: Lokesh Gidra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Brian Geffon <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Jann Horn <[email protected]> [Bug #2] Cc: Kalesh Singh <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Nicolas Geoffray <[email protected]> Cc: Peter Xu <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26tmpfs: fix race on handling dquot rbtreeCarlos Maiolino1-3/+7
A syzkaller reproducer found a race while attempting to remove dquot information from the rb tree. Fetching the rb_tree root node must also be protected by the dqopt->dqio_sem, otherwise, giving the right timing, shmem_release_dquot() will trigger a warning because it couldn't find a node in the tree, when the real reason was the root node changing before the search starts: Thread 1 Thread 2 - shmem_release_dquot() - shmem_{acquire,release}_dquot() - fetch ROOT - Fetch ROOT - acquire dqio_sem - wait dqio_sem - do something, triger a tree rebalance - release dqio_sem - acquire dqio_sem - start searching for the node, but from the wrong location, missing the node, and triggering a warning. Link: https://lkml.kernel.org/r/[email protected] Fixes: eafc474e2029 ("shmem: prepare shmem quota infrastructure") Signed-off-by: Carlos Maiolino <[email protected]> Reported-by: Ubisectech Sirius <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEMEdward Liaw1-1/+2
The sigbus-wp test requires the UFFD_FEATURE_WP_HUGETLBFS_SHMEM flag for shmem and hugetlb targets. Otherwise it is not backwards compatible with kernels <5.19 and fails with EINVAL. Link: https://lkml.kernel.org/r/[email protected] Fixes: 73c1ea939b65 ("selftests/mm: move uffd sig/events tests into uffd unit tests") Signed-off-by: Edward Liaw <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Peter Xu <[email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursionJohannes Weiner1-0/+8
Kent forwards this bug report of zswap re-entering the block layer from an IO request allocation and locking up: [10264.128242] sysrq: Show Blocked State [10264.128268] task:kworker/20:0H state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000 [10264.128271] Workqueue: bcachefs_io btree_write_submit [bcachefs] [10264.128295] Call Trace: [10264.128295] <TASK> [10264.128297] __schedule+0x3e6/0x1520 [10264.128303] schedule+0x32/0xd0 [10264.128304] schedule_timeout+0x98/0x160 [10264.128308] io_schedule_timeout+0x50/0x80 [10264.128309] wait_for_completion_io_timeout+0x7f/0x180 [10264.128310] submit_bio_wait+0x78/0xb0 [10264.128313] swap_writepage_bdev_sync+0xf6/0x150 [10264.128317] zswap_writeback_entry+0xf2/0x180 [10264.128319] shrink_memcg_cb+0xe7/0x2f0 [10264.128322] __list_lru_walk_one+0xb9/0x1d0 [10264.128325] list_lru_walk_one+0x5d/0x90 [10264.128326] zswap_shrinker_scan+0xc4/0x130 [10264.128327] do_shrink_slab+0x13f/0x360 [10264.128328] shrink_slab+0x28e/0x3c0 [10264.128329] shrink_one+0x123/0x1b0 [10264.128331] shrink_node+0x97e/0xbc0 [10264.128332] do_try_to_free_pages+0xe7/0x5b0 [10264.128333] try_to_free_pages+0xe1/0x200 [10264.128334] __alloc_pages_slowpath.constprop.0+0x343/0xde0 [10264.128337] __alloc_pages+0x32d/0x350 [10264.128338] allocate_slab+0x400/0x460 [10264.128339] ___slab_alloc+0x40d/0xa40 [10264.128345] kmem_cache_alloc+0x2e7/0x330 [10264.128348] mempool_alloc+0x86/0x1b0 [10264.128349] bio_alloc_bioset+0x200/0x4f0 [10264.128352] bio_alloc_clone+0x23/0x60 [10264.128354] alloc_io+0x26/0xf0 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382] [10264.128361] dm_submit_bio+0xb8/0x580 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382] [10264.128366] __submit_bio+0xb0/0x170 [10264.128367] submit_bio_noacct_nocheck+0x159/0x370 [10264.128368] bch2_submit_wbio_replicas+0x21c/0x3a0 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2] [10264.128391] btree_write_submit+0x1cf/0x220 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2] [10264.128406] process_one_work+0x178/0x350 [10264.128408] worker_thread+0x30f/0x450 [10264.128409] kthread+0xe5/0x120 The zswap shrinker resumes the swap_writepage()s that were intercepted by the zswap store. This will enter the block layer, and may even enter the filesystem depending on the swap backing file. Make it respect GFP_NOIO and GFP_NOFS. Link: https://lore.kernel.org/linux-mm/rc4pk2r42oyvjo4dc62z6sovquyllq56i5cdgcaqbd7wy3hfzr@n4nbxido3fme/ Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Kent Overstreet <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Reported-by: Jérôme Poulin <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Cc: [email protected] [v6.8] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26ARM: prctl: reject PR_SET_MDWE on pre-ARMv6Zev Weiss1-0/+14
On v5 and lower CPUs we can't provide MDWE protection, so ensure we fail any attempt to enable it via prctl(PR_SET_MDWE). Previously such an attempt would misleadingly succeed, leading to any subsequent mmap(PROT_READ|PROT_WRITE) or execve() failing unconditionally (the latter somewhat violently via force_fatal_sig(SIGSEGV) due to READ_IMPLIES_EXEC). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zev Weiss <[email protected]> Cc: <[email protected]> [6.3+] Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Florent Revest <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Sam James <[email protected]> Cc: Stefan Roesch <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26prctl: generalize PR_SET_MDWE support check to be per-archZev Weiss3-2/+27
Patch series "ARM: prctl: Reject PR_SET_MDWE where not supported". I noticed after a recent kernel update that my ARM926 system started segfaulting on any execve() after calling prctl(PR_SET_MDWE). After some investigation it appears that ARMv5 is incapable of providing the appropriate protections for MDWE, since any readable memory is also implicitly executable. The prctl_set_mdwe() function already had some special-case logic added disabling it on PARISC (commit 793838138c15, "prctl: Disable prctl(PR_SET_MDWE) on parisc"); this patch series (1) generalizes that check to use an arch_*() function, and (2) adds a corresponding override for ARM to disable MDWE on pre-ARMv6 CPUs. With the series applied, prctl(PR_SET_MDWE) is rejected on ARMv5 and subsequent execve() calls (as well as mmap(PROT_READ|PROT_WRITE)) can succeed instead of unconditionally failing; on ARMv6 the prctl works as it did previously. [0] https://lore.kernel.org/all/2023112456-linked-nape-bf19@gregkh/ This patch (of 2): There exist systems other than PARISC where MDWE may not be feasible to support; rather than cluttering up the generic code with additional arch-specific logic let's add a generic function for checking MDWE support and allow each arch to override it as needed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zev Weiss <[email protected]> Acked-by: Helge Deller <[email protected]> [parisc] Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Florent Revest <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Sam James <[email protected]> Cc: Stefan Roesch <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yin Fengwei <[email protected]> Cc: <[email protected]> [6.3+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26MAINTAINERS: remove incorrect M: tag for [email protected]Kuan-Wei Chiu1-1/+0
The [email protected] mailing list should only be listed under the L: (List) tag in the MAINTAINERS file. However, it was incorrectly listed under both L: and M: (Maintainers) tags, which is not accurate. Remove the M: tag for [email protected] in the MAINTAINERS file to reflect the correct categorization. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Michael Sclafani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix kernel BUG in sg_init_oneBarry Song1-2/+12
sg_init_one() relies on linearly mapped low memory for the safe utilization of virt_to_page(). Otherwise, we trigger a kernel BUG, kernel BUG at include/linux/scatterlist.h:187! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 2997 Comm: syz-executor198 Not tainted 6.8.0-syzkaller #0 Hardware name: ARM-Versatile Express PC is at sg_set_buf include/linux/scatterlist.h:187 [inline] PC is at sg_init_one+0x9c/0xa8 lib/scatterlist.c:143 LR is at sg_init_table+0x2c/0x40 lib/scatterlist.c:128 Backtrace: [<807e16ac>] (sg_init_one) from [<804c1824>] (zswap_decompress+0xbc/0x208 mm/zswap.c:1089) r7:83471c80 r6:def6d08c r5:844847d0 r4:ff7e7ef4 [<804c1768>] (zswap_decompress) from [<804c4468>] (zswap_load+0x15c/0x198 mm/zswap.c:1637) r9:8446eb80 r8:8446eb80 r7:8446eb84 r6:def6d08c r5:00000001 r4:844847d0 [<804c430c>] (zswap_load) from [<804b9644>] (swap_read_folio+0xa8/0x498 mm/page_io.c:518) r9:844ac800 r8:835e6c00 r7:00000000 r6:df955d4c r5:00000001 r4:def6d08c [<804b959c>] (swap_read_folio) from [<804bb064>] (swap_cluster_readahead+0x1c4/0x34c mm/swap_state.c:684) r10:00000000 r9:00000007 r8:df955d4b r7:00000000 r6:00000000 r5:00100cca r4:00000001 [<804baea0>] (swap_cluster_readahead) from [<804bb3b8>] (swapin_readahead+0x68/0x4a8 mm/swap_state.c:904) r10:df955eb8 r9:00000000 r8:00100cca r7:84476480 r6:00000001 r5:00000000 r4:00000001 [<804bb350>] (swapin_readahead) from [<8047cde0>] (do_swap_page+0x200/0xcc4 mm/memory.c:4046) r10:00000040 r9:00000000 r8:844ac800 r7:84476480 r6:00000001 r5:00000000 r4:df955eb8 [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_pte_fault mm/memory.c:5301 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (__handle_mm_fault mm/memory.c:5439 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_mm_fault+0x3d8/0x12b8 mm/memory.c:5604) r10:00000040 r9:842b3900 r8:7eb0d000 r7:84476480 r6:7eb0d000 r5:835e6c00 r4:00000254 [<8047e2ec>] (handle_mm_fault) from [<80215d28>] (do_page_fault+0x148/0x3a8 arch/arm/mm/fault.c:326) r10:00000007 r9:842b3900 r8:7eb0d000 r7:00000207 r6:00000254 r5:7eb0d9b4 r4:df955fb0 [<80215be0>] (do_page_fault) from [<80216170>] (do_DataAbort+0x38/0xa8 arch/arm/mm/fault.c:558) r10:7eb0da7c r9:00000000 r8:80215be0 r7:df955fb0 r6:7eb0d9b4 r5:00000207 r4:8261d0e0 [<80216138>] (do_DataAbort) from [<80200e3c>] (__dabt_usr+0x5c/0x60 arch/arm/kernel/entry-armv.S:427) Exception stack(0xdf955fb0 to 0xdf955ff8) 5fa0: 00000000 00000000 22d5f800 0008d158 5fc0: 00000000 7eb0d9a4 00000000 00000109 00000000 00000000 7eb0da7c 7eb0da3c 5fe0: 00000000 7eb0d9a0 00000001 00066bd4 00000010 ffffffff r8:824a9044 r7:835e6c00 r6:ffffffff r5:00000010 r4:00066bd4 Code: 1a000004 e1822003 e8860094 e89da8f0 (e7f001f2) ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 1a000004 bne 0x18 4: e1822003 orr r2, r2, r3 8: e8860094 stm r6, {r2, r4, r7} c: e89da8f0 ldm sp, {r4, r5, r6, r7, fp, sp, pc} * 10: e7f001f2 udf #18 <-- trapping instruction Consequently, we have two choices: either employ kmap_to_page() alongside sg_set_page(), or resort to copying high memory contents to a temporary buffer residing in low memory. However, considering the introduction of the WARN_ON_ONCE in commit ef6e06b2ef870 ("highmem: fix kmap_to_page() for kmap_local_page() addresses"), which specifically addresses high memory concerns, it appears that memcpy remains the sole viable option. Link: https://lkml.kernel.org/r/[email protected] Fixes: 270700dd06ca ("mm/zswap: remove the memcpy if acomp is not sleepable") Signed-off-by: Barry Song <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Tested-by: [email protected] Acked-by: Yosry Ahmed <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Chris Li <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests: mm: restore settings from only parent processMuhammad Usama Anjum1-1/+5
The atexit() is called from parent process as well as forked processes. Hence the child restores the settings at exit while the parent is still executing. Fix this by checking pid of atexit() calling process and only restore THP number from parent process. Link: https://lkml.kernel.org/r/[email protected] Fixes: c23ea61726d5 ("selftests/mm: protection_keys: save/restore nr_hugepages settings") Signed-off-by: Muhammad Usama Anjum <[email protected]> Tested-by: Joey Gouly <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26tools/Makefile: remove cgroup targetCong Liu1-7/+6
The tools/cgroup directory no longer contains a Makefile. This patch updates the top-level tools/Makefile to remove references to building and installing cgroup components. This change reflects the current structure of the tools directory and fixes the build failure when building tools in the top-level directory. linux/tools$ make cgroup DESCEND cgroup make[1]: *** No targets specified and no makefile found. Stop. make: *** [Makefile:73: cgroup] Error 2 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cong Liu <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Dmitry Rokosov <[email protected]> Cc: Cong Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: cachestat: fix two shmem bugsJohannes Weiner1-0/+16
When cachestat on shmem races with swapping and invalidation, there are two possible bugs: 1) A swapin error can have resulted in a poisoned swap entry in the shmem inode's xarray. Calling get_shadow_from_swap_cache() on it will result in an out-of-bounds access to swapper_spaces[]. Validate the entry with non_swap_entry() before going further. 2) When we find a valid swap entry in the shmem's inode, the shadow entry in the swapcache might not exist yet: swap IO is still in progress and we're before __remove_mapping; swapin, invalidation, or swapoff have removed the shadow from swapcache after we saw the shmem swap entry. This will send a NULL to workingset_test_recent(). The latter purely operates on pointer bits, so it won't crash - node 0, memcg ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a bogus test. In theory that could result in a false "recently evicted" count. Such a false positive wouldn't be the end of the world. But for code clarity and (future) robustness, be explicit about this case. Bail on get_shadow_from_swap_cache() returning NULL. Link: https://lkml.kernel.org/r/[email protected] Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Chengming Zhou <[email protected]> [Bug #1] Reported-by: Jann Horn <[email protected]> [Bug #2] Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: <[email protected]> [v6.5+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: increase folio batch sizeMatthew Wilcox (Oracle)1-2/+2
On a 104 thread, 2 socket Skylake system, Intel report a 4.7% performance reduction with will-it-scale page_fault2. This was due to reducing the size of the batch from 32 to 15. Increasing the folio batch size from 15 to 31 gives a performance increase of 12.5% relative to the original, or 17.2% relative to the reduced performance commit. The penalty of this commit is an additional 128 bytes of stack usage. Six folio_batches are also allocated from percpu memory in cpu_fbatches so that will be an additional 768 bytes of percpu memory (per CPU). Tim Chen originally submitted a patch like this in 2020: https://lore.kernel.org/linux-mm/d1cc9f12a8ad6c2a52cb600d93b06b064f2bbc57.1593205965.git.tim.c.chen@linux.intel.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 99fbb6bfc16f ("mm: make folios_put() the basis of release_pages()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Yujie Liu <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm,page_owner: fix recursionOscar Salvador1-10/+23
Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") the only place where page_owner could potentially go into recursion due to its need of allocating more memory was in save_stack(), which ends up calling into stackdepot code with the possibility of allocating memory. We made sure to guard against that by signaling that the current task was already in page_owner code, so in case a recursion attempt was made, we could catch that and return dummy_handle. After above commit, a new place in page_owner code was introduced where we could allocate memory, meaning we could go into recursion would we take that path. Make sure to signal that we are in page_owner in that codepath as well. Move the guard code into two helpers {un}set_current_in_page_owner() and use them prior to calling in the two functions that might allocate memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Reviewed-by: Vlastimil Babka <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mailmap: update entry for Leonard CrestezLeonard Crestez1-1/+2
Put my personal email first because NXP employment ended some time ago. Also add my old intel email address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Leonard Crestez <[email protected]> Cc: Florian Fainelli <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26init: open /initrd.image with O_LARGEFILEJohn Sperbeck1-1/+1
If initrd data is larger than 2Gb, we'll eventually fail to write to the /initrd.image file when we hit that limit, unless O_LARGEFILE is set. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Sperbeck <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: Fix build with _FORTIFY_SOURCEVitaly Chikunov3-3/+3
Add missing flags argument to open(2) call with O_CREAT. Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid value) (together with -O), resulting in similar error messages such as: In file included from /usr/include/fcntl.h:342, from gup_test.c:1: In function 'open', inlined from 'main' at gup_test.c:206:10: /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments 50 | __open_missing_mode (); | ^~~~~~~~~~~~~~~~~~~~~~ _FORTIFY_SOURCE is enabled by default in some distributions, so the tests are not built by default and are skipped. open(2) man-page warns about missing flags argument: "if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode." Link: https://lkml.kernel.org/r/[email protected] Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file") Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split") Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect") Signed-off-by: Vitaly Chikunov <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm/memory: fix missing pte marker for !page on pte zapsPeter Xu1-1/+3
Commit 0cf18e839f64 of large folio zap work broke uffd-wp. Now mm's uffd unit test "wp-unpopulated" will trigger this WARN_ON_ONCE(). The WARN_ON_ONCE() asserts that an VMA cannot be registered with userfaultfd-wp if it contains a !normal page, but it's actually possible. One example is an anonymous vma, register with uffd-wp, read anything will install a zero page. Then when zap on it, this should trigger. What's more, removing that WARN_ON_ONCE may not be enough either, because we should also not rely on "whether it's a normal page" to decide whether pte marker is needed. For example, one can register wr-protect over some DAX regions to track writes when UFFD_FEATURE_WP_ASYNC enabled, in which case it can have page==NULL for a devmap but we may want to keep the marker around. Link: https://lkml.kernel.org/r/[email protected] Fixes: 0cf18e839f64 ("mm/memory: handle !page case in zap_present_pte() separately") Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26Merge tag 'printk-for-6.9-rc2' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Prevent scheduling in an atomic context when printk() takes over the console flushing duty * tag 'printk-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Update @console_may_schedule in console_trylock_spinning()
2024-03-26Merge tag 'pwm/for-6.9-rc2-fixes' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux Pull pwm fix from Uwe Kleine-König: "This contains a single fix for a regression introduced in v5.18-rc1 which made the img pwm driver fail to bind" * tag 'pwm/for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: pwm: img: fix pwm clock lookup
2024-03-26btrfs: fix race in read_extent_buffer_pages()Tavian Barnes1-0/+13
There are reports from tree-checker that detects corrupted nodes, without any obvious pattern so possibly an overwrite in memory. After some debugging it turns out there's a race when reading an extent buffer the uptodate status can be missed. To prevent concurrent reads for the same extent buffer, read_extent_buffer_pages() performs these checks: /* (1) */ if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) return 0; /* (2) */ if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags)) goto done; At this point, it seems safe to start the actual read operation. Once that completes, end_bbio_meta_read() does /* (3) */ set_extent_buffer_uptodate(eb); /* (4) */ clear_bit(EXTENT_BUFFER_READING, &eb->bflags); Normally, this is enough to ensure only one read happens, and all other callers wait for it to finish before returning. Unfortunately, there is a racey interleaving: Thread A | Thread B | Thread C ---------+----------+--------- (1) | | | (1) | (2) | | (3) | | (4) | | | (2) | | | (1) When this happens, thread B kicks of an unnecessary read. Worse, thread C will see UPTODATE set and return immediately, while the read from thread B is still in progress. This race could result in tree-checker errors like this as the extent buffer is concurrently modified: BTRFS critical (device dm-0): corrupted node, root=256 block=8550954455682405139 owner mismatch, have 11858205567642294356 expect [256, 18446744073709551360] Fix it by testing UPTODATE again after setting the READING bit, and if it's been set, skip the unnecessary read. Fixes: d7172f52e993 ("btrfs: use per-buffer locking for extent_buffer reading") Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/f51a6d5d7432455a6a858d51b49ecac183e0bbc9.1706312914.git.wqu@suse.com/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 6.5+ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Tavian Barnes <[email protected]> Reviewed-by: David Sterba <[email protected]> [ minor update of changelog ] Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: return accurate error code on open failure in open_fs_devices()Anand Jain1-5/+12
When attempting to exclusive open a device which has no exclusive open permission, such as a physical device associated with the flakey dm device, the open operation will fail, resulting in a mount failure. In this particular scenario, we erroneously return -EINVAL instead of the correct error code provided by the bdev_open_by_path() function, which is -EBUSY. Fix this, by returning error code from the bdev_open_by_path() function. With this correction, the mount error message will align with that of ext4 and xfs. Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: zoned: don't skip block groups with 100% zone unusableJohannes Thumshirn1-1/+2
Commit f4a9f219411f ("btrfs: do not delete unused block group if it may be used soon") changed the behaviour of deleting unused block-groups on zoned filesystems. Starting with this commit, we're using btrfs_space_info_used() to calculate the number of used bytes in a space_info. But btrfs_space_info_used() also accounts btrfs_space_info::bytes_zone_unusable as used bytes. So if a block group is 100% zone_unusable it is skipped from the deletion step. In order not to skip fully zone_unusable block-groups, also check if the block-group has bytes left that can be used on a zoned filesystem. Fixes: f4a9f219411f ("btrfs: do not delete unused block group if it may be used soon") CC: [email protected] # 6.1+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()Filipe Manana1-5/+5
At btrfs_add_extent_mapping(), if we failed to merge the extent map, which is unexpected and theoretically should never happen, we use WARN_ONCE() to log a message which is not great because we don't get information about which filesystem it relates to in case we have multiple btrfs filesystems mounted. So change this to use btrfs_warn() and surround the error check with WARN_ON() so we always get a useful stack trace and the condition is flagged as "unlikely" since it's not expected to ever happen. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: fix message not properly printing interval when adding extent mapFilipe Manana1-2/+2
At btrfs_add_extent_mapping(), if we are unable to merge the existing extent map, we print a warning message that suggests interval ranges in the form "[X, Y)", where the first element is the inclusive start offset of a range and the second element is the exclusive end offset. However we end up printing the length of the ranges instead of the exclusive end offsets. So fix this by printing the range end offsets. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: fix warning messages not printing interval at unpin_extent_range()Filipe Manana1-2/+2
At unpin_extent_range() we print warning messages that are supposed to print an interval in the form "[X, Y)", with the first element being an inclusive start offset and the second element being the exclusive end offset of a range. However we end up printing the range's length instead of the range's exclusive end offset, so fix that to avoid having confusing and non-sense messages in case we hit one of these unexpected scenarios. Fixes: 00deaf04df35 ("btrfs: log messages at unpin_extent_range() during unexpected cases") Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()Filipe Manana1-1/+1
At unpin_extent_cache() if we happen to find an extent map with an unexpected start offset, we jump to the 'out' label and never release the reference we added to the extent map through the call to lookup_extent_mapping(), therefore resulting in a leak. So fix this by moving the free_extent_map() under the 'out' label. Fixes: c03c89f821e5 ("btrfs: handle errors returned from unpin_extent_cache()") Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: validate device maj:min during openAnand Jain1-0/+10
Boris managed to create a device capable of changing its maj:min without altering its device path. Only multi-devices can be scanned. A device that gets scanned and remains in the btrfs kernel cache might end up with an incorrect maj:min. Despite the temp-fsid feature patch did not introduce this bug, it could lead to issues if the above multi-device is converted to a single device with a stale maj:min. Subsequently, attempting to mount the same device with the correct maj:min might mistake it for another device with the same fsid, potentially resulting in wrongly auto-enabling the temp-fsid feature. To address this, this patch validates the device's maj:min at the time of device open and updates it if it has changed since the last scan. CC: [email protected] # 6.7+ Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability") Reported-by: Boris Burkov <[email protected]> Co-developed-by: Boris Burkov <[email protected]> Reviewed-by: Boris Burkov <[email protected]># Signed-off-by: Anand Jain <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-26btrfs: zoned: fix use-after-free in do_zone_finish()Johannes Thumshirn1-7/+7
Shinichiro reported the following use-after-free triggered by the device replace operation in fstests btrfs/070. BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0 ================================================================== BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs] Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007 CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G W 6.8.0-rc5-kts #1 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Call Trace: <TASK> dump_stack_lvl+0x5b/0x90 print_report+0xcf/0x670 ? __virt_addr_valid+0x200/0x3e0 kasan_report+0xd8/0x110 ? do_zone_finish+0x91a/0xb90 [btrfs] ? do_zone_finish+0x91a/0xb90 [btrfs] do_zone_finish+0x91a/0xb90 [btrfs] btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs] ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs] ? btrfs_put_root+0x2d/0x220 [btrfs] ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs] cleaner_kthread+0x21e/0x380 [btrfs] ? __pfx_cleaner_kthread+0x10/0x10 [btrfs] kthread+0x2e3/0x3c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Allocated by task 3493983: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 btrfs_alloc_device+0xb3/0x4e0 [btrfs] device_list_add.constprop.0+0x993/0x1630 [btrfs] btrfs_scan_one_device+0x219/0x3d0 [btrfs] btrfs_control_ioctl+0x26e/0x310 [btrfs] __x64_sys_ioctl+0x134/0x1b0 do_syscall_64+0x99/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Freed by task 3494056: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3f/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x32/0x70 kfree+0x11b/0x320 btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs] btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs] btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs] btrfs_ioctl+0xb27/0x57d0 [btrfs] __x64_sys_ioctl+0x134/0x1b0 do_syscall_64+0x99/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 The buggy address belongs to the object at ffff8881543c8000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 96 bytes inside of freed 1024-byte region [ffff8881543c8000, ffff8881543c8400) The buggy address belongs to the physical page: page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8 head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb This UAF happens because we're accessing stale zone information of a already removed btrfs_device in do_zone_finish(). The sequence of events is as follows: btrfs_dev_replace_start btrfs_scrub_dev btrfs_dev_replace_finishing btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced btrfs_rm_dev_replace_free_srcdev btrfs_free_device <-- device freed cleaner_kthread btrfs_delete_unused_bgs btrfs_zone_finish do_zone_finish <-- refers the freed device The reason for this is that we're using a cached pointer to the chunk_map from the block group, but on device replace this cached pointer can contain stale device entries. The staleness comes from the fact, that btrfs_block_group::physical_map is not a pointer to a btrfs_chunk_map but a memory copy of it. Also take the fs_info::dev_replace::rwsem to prevent btrfs_dev_replace_update_device_in_mapping_tree() from changing the device underneath us again. Note: btrfs_dev_replace_update_device_in_mapping_tree() is holding fs_info::mapping_tree_lock, but as this is a spinning read/write lock we cannot take it as the call to blkdev_zone_mgmt() requires a memory allocation which may not sleep. But btrfs_dev_replace_update_device_in_mapping_tree() is always called with the fs_info::dev_replace::rwsem held in write mode. Many thanks to Shinichiro for analyzing the bug. Reported-by: Shinichiro Kawasaki <[email protected]> CC: [email protected] # 6.8 Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Johannes Thumshirn <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-03-25Merge tag 'gfs2-v6.8-fix' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: - Fix boundary check in punch_hole * tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix invalid metadata access in punch_hole
2024-03-25Merge tag 'v6.9-p2' of ↵Linus Torvalds9-5/+114
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression that broke iwd as well as a divide by zero in iaa" * tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: iaa - Fix nr_cpus < nr_iaa case Revert "crypto: pkcs7 - remove sha1 support"
2024-03-25fs/9p: fix uninitialized values during inode evictEric Van Hensbergen1-6/+10
If an iget fails due to not being able to retrieve information from the server then the inode structure is only partially initialized. When the inode gets evicted, references to uninitialized structures (like fscache cookies) were being made. This patch checks for a bad_inode before doing anything other than clearing the inode from the cache. Since the inode is bad, it shouldn't have any state associated with it that needs to be written back (and there really isn't a way to complete those anyways). Reported-by: [email protected] Signed-off-by: Eric Van Hensbergen <[email protected]>
2024-03-25tracing: probes: Fix to zero initialize a local variableMasami Hiramatsu (Google)1-1/+1
Fix to initialize 'val' local variable with zero. Dan reported that Smatch static code checker reports an error that a local 'val' variable needs to be initialized. Actually, the 'val' is expected to be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So initialize it with zero. Link: https://lore.kernel.org/all/171092223833.237219.17304490075697026697.stgit@devnote2/ Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: 25f00e40ce79 ("tracing/probes: Support $argN in return probe (kprobe and fprobe)") Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-03-25pwm: img: fix pwm clock lookupZoltan HERPAI1-2/+2
22e8e19 has introduced a regression in the imgchip->pwm_clk lookup, whereas the clock name has also been renamed to "imgchip". This causes the driver failing to load: [ 0.546905] img-pwm 18101300.pwm: failed to get imgchip clock [ 0.553418] img-pwm: probe of 18101300.pwm failed with error -2 Fix this lookup by reverting the clock name back to "pwm". Signed-off-by: Zoltan HERPAI <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 22e8e19a46f7 ("pwm: img: Rename variable pointing to driver private data") Signed-off-by: Uwe Kleine-König <[email protected]>
2024-03-25MAINTAINERS: erofs: add myself as reviewerSandeep Dhavale1-0/+1
I have been contributing to erofs for sometime and I would like to help with code reviews as well. Signed-off-by: Sandeep Dhavale <[email protected]> Acked-by: Chao Yu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-03-25erofs: drop experimental warning for FSDAXGao Xiang1-1/+0
As EXT4/XFS filesystems, FSDAX functionality is considered to be stable. Let's drop this warning. Reviewed-by: Jingbo Xu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-25fs/9p: remove redundant pointer v9sesColin Ian King1-4/+0
Pointer v9ses is being assigned the value from the return of inlined function v9fs_inode2v9ses (which just returns inode->i_sb->s_fs_info). The pointer is not used after the assignment, so the variable is redundant and can be removed. Cleans up clang scan warnings such as: fs/9p/vfs_inode_dotl.c:300:28: warning: variable 'v9ses' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Dominique Martinet <[email protected]> Signed-off-by: Eric Van Hensbergen <[email protected]>
2024-03-25fs/9p: fix uaf in in v9fs_stat2inode_dotlLizhi Xu1-1/+1
The incorrect logical order of accessing the st object code in v9fs_fid_iget_dotl is causing this uaf. Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths") Reported-and-tested-by: [email protected] Signed-off-by: Lizhi Xu <[email protected]> Tested-by: Breno Leitao <[email protected]> Reviewed-by: Dominique Martinet <[email protected]> Signed-off-by: Eric Van Hensbergen <[email protected]>
2024-03-24Linux 6.9-rc1Linus Torvalds1-2/+2