aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2024-07-04mm/ksm: don't waste time searching stable tree for fast changing pageChengming Zhou1-15/+17
The code flow in cmp_and_merge_page() is suboptimal for handling the ksm page and non-ksm page at the same time. For example: - ksm page 1. Mostly just return if this ksm page is not migrated and this rmap_item has been on the rmap hlist. Or we have to fix this rmap_item mapping. 2. But we absolutely don't need to checksum for this ksm page, since it can't change. - non-ksm page 1. First don't need to waste time searching stable tree if fast changing. 2. Should try to merge with zero page before search the stable tree. 3. Then search stable tree to find mergeable ksm page. This patch optimizes the code flow so the handling differences between ksm page and non-ksm page become clearer and more efficient too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengming Zhou <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Stefan Roesch <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04mm/ksm: refactor out try_to_merge_with_zero_page()Chengming Zhou2-31/+40
Patch series "mm/ksm: cmp_and_merge_page() optimizations and cleanup", v2. This series mainly optimizes cmp_and_merge_page() to have more efficient separate code flow for ksm page and non-ksm anon page. - ksm page: don't need to calculate the checksum obviously. - anon page: don't need to search stable tree if changing fast and try to merge with zero page before searching ksm page on stable tree. Please see the patch-2 for details. Patch-3 is cleanup also a little optimization for the chain()/chain_prune interfaces, which made the stable_tree_search()/stable_tree_insert() over complex. I have done simple testing using "hackbench -g 1 -l 300000" (maybe I need to use a better workload) on my machine, have seen a little CPU usage decrease of ksmd and some improvements of cmp_and_merge_page() latency: We can see the latency of cmp_and_merge_page() when handling non-ksm anon pages has been improved. This patch (of 3): In preparation for later changes, refactor out a new function called try_to_merge_with_zero_page(), which tries to merge with zero page. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengming Zhou <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Stefan Roesch <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04hugetlb: force allocating surplus hugepages on mempolicy allowed nodesAristeu Rozanski1-19/+28
When trying to allocate a hugepage with no reserved ones free, it may be allowed in case a number of overcommit hugepages was configured (using /proc/sys/vm/nr_overcommit_hugepages) and that number wasn't reached. This allows for a behavior of having extra hugepages allocated dynamically, if there're resources for it. Some sysadmins even prefer not reserving any hugepages and setting a big number of overcommit hugepages. But while attempting to allocate overcommit hugepages in a multi node system (either NUMA or mempolicy/cpuset) said allocations might randomly fail even when there're resources available for the allocation. This happens due to allowed_mems_nr() only accounting for the number of free hugepages in the nodes the current process belongs to and the surplus hugepage allocation is done so it can be allocated in any node. In case one or more of the requested surplus hugepages are allocated in a different node, the whole allocation will fail due allowed_mems_nr() returning a lower value. So allocate surplus hugepages in one of the nodes the current process belongs to. Easy way to reproduce this issue is to use a 2+ NUMA nodes system: # echo 0 >/proc/sys/vm/nr_hugepages # echo 1 >/proc/sys/vm/nr_overcommit_hugepages # numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2 Repeating the execution of map_hugetlb test application will eventually fail when the hugepage ends up allocated in a different node. [[email protected]: v2] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aristeu Rozanski <[email protected]> Cc: Muchun Song <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Vishal Moola <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04mm/damon/paddr: initialize nr_succeeded in __damon_pa_migrate_folio_list()SeongJae Park1-1/+1
The variable is supposed to be set via later migrate_pages() call. However, the function does not do that when CONFIG_MIGRATION is unset. Initialize the variable to zero. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5311c0a2eee3 ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion") Signed-off-by: SeongJae Park <[email protected]> Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Cc: Honggyu Kim <[email protected]> Cc: Hyeongtak Ji <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04mm: refactor folio_undo_large_rmappable()Kefeng Wang5-27/+22
Folios of order <= 1 are not in deferred list, the check of order is added into folio_undo_large_rmappable() from commit 8897277acfef ("mm: support order-1 folios in the page cache"), but there is a repeated check for small folio (order 0) during each call of the folio_undo_large_rmappable(), so only keep folio_order() check inside the function. In addition, move all the checks into header file to save a function call for non-large-rmappable or empty deferred_list folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vishal Moola (Oracle) <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lance Yang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04mm, slab: move allocation tagging code in the alloc path into a hookSuren Baghdasaryan1-36/+50
Move allocation tagging specific code in the allocation path into alloc_tagging_slab_alloc_hook, similar to how freeing path uses alloc_tagging_slab_free_hook. No functional changes, just code cleanup. Suggested-by: Vlastimil Babka <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2024-07-03mm/hugetlb_vmemmap: fix race with speculative PFN walkersYu Zhao2-46/+23
While investigating HVO for THPs [1], it turns out that speculative PFN walkers like compaction can race with vmemmap modifications, e.g., CPU 1 (vmemmap modifier) CPU 2 (speculative PFN walker) ------------------------------- ------------------------------ Allocates an LRU folio page1 Sees page1 Frees page1 Allocates a hugeTLB folio page2 (page1 being a tail of page2) Updates vmemmap mapping page1 get_page_unless_zero(page1) Even though page1->_refcount is zero after HVO, get_page_unless_zero() can still try to modify this read-only field, resulting in a crash. An independent report [2] confirmed this race. There are two discussed approaches to fix this race: 1. Make RO vmemmap RW so that get_page_unless_zero() can fail without triggering a PF. 2. Use RCU to make sure get_page_unless_zero() either sees zero page->_refcount through the old vmemmap or non-zero page->_refcount through the new one. The second approach is preferred here because: 1. It can prevent illegal modifications to struct page[] that has been HVO'ed; 2. It can be generalized, in a way similar to ZERO_PAGE(), to fix similar races in other places, e.g., arch_remove_memory() on x86 [3], which frees vmemmap mapping offlined struct page[]. While adding synchronize_rcu(), the goal is to be surgical, rather than optimized. Specifically, calls to synchronize_rcu() on the error handling paths can be coalesced, but it is not done for the sake of Simplicity: noticeably, this fix removes ~50% more lines than it adds. According to the hugetlb_optimize_vmemmap section in Documentation/admin-guide/sysctl/vm.rst, enabling HVO makes allocating or freeing hugeTLB pages "~2x slower than before". Having synchronize_rcu() on top makes those operations even worse, and this also affects the user interface /proc/sys/vm/nr_overcommit_hugepages. This is *very* hard to trigger: 1. Most hugeTLB use cases I know of are static, i.e., reserved at boot time, because allocating at runtime is not reliable at all. 2. On top of that, someone has to be very unlucky to get tripped over above, because the race window is so small -- I wasn't able to trigger it with a stress testing that does nothing but that (with THPs though). [1] https://lore.kernel.org/[email protected]/ [2] https://lore.kernel.org/[email protected]/ [3] https://lore.kernel.org/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Acked-by: Muchun Song <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Frank van der Linden <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Peter Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03cachestat: do not flush stats in recency checkNhat Pham2-4/+15
syzbot detects that cachestat() is flushing stats, which can sleep, in its RCU read section (see [1]). This is done in the workingset_test_recent() step (which checks if the folio's eviction is recent). Move the stat flushing step to before the RCU read section of cachestat, and skip stat flushing during the recency check. [1]: https://lore.kernel.org/cgroups/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: b00684722262 ("mm: workingset: move the stats flush into workingset_test_recent()") Signed-off-by: Nhat Pham <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/cgroups/[email protected]/ Debugged-by: Johannes Weiner <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Al Viro <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kairui Song <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: <[email protected]> [6.8+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/shmem: disable PMD-sized page cache if neededGavin Shan1-2/+13
For shmem files, it's possible that PMD-sized page cache can't be supported by xarray. For example, 512MB page cache on ARM64 when the base page size is 64KB can't be supported by xarray. It leads to errors as the following messages indicate when this sort of xarray entry is split. WARNING: CPU: 34 PID: 7578 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 \ nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject \ nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse xfs \ libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net \ net_failover virtio_console virtio_blk failover dimlib virtio_mmio CPU: 34 PID: 7578 Comm: test Kdump: loaded Tainted: G W 6.10.0-rc5-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x720 sp : ffff8000882af5f0 x29: ffff8000882af5f0 x28: ffff8000882af650 x27: ffff8000882af768 x26: 0000000000000cc0 x25: 000000000000000d x24: ffff00010625b858 x23: ffff8000882af650 x22: ffffffdfc0900000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0900000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000018000000000 x15: 52f8004000000000 x14: 0000e00000000000 x13: 0000000000002000 x12: 0000000000000020 x11: 52f8000000000000 x10: 52f8e1c0ffff6000 x9 : ffffbeb9619a681c x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff00010b02ddb0 x5 : ffffbeb96395e378 x4 : 0000000000000000 x3 : 0000000000000cc0 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace: xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x720 truncate_inode_partial_folio+0xdc/0x160 shmem_undo_range+0x2bc/0x6a8 shmem_fallocate+0x134/0x430 vfs_fallocate+0x124/0x2e8 ksys_fallocate+0x4c/0xa0 __arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Fix it by disabling PMD-sized page cache when HPAGE_PMD_ORDER is larger than MAX_PAGECACHE_ORDER. As Matthew Wilcox pointed, the page cache in a shmem file isn't represented by a multi-index entry and doesn't have this limitation when the xarry entry is split until commit 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache"). Link: https://lkml.kernel.org/r/[email protected] Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Gavin Shan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: William Kucharski <[email protected]> Cc: Zhenyu Zhang <[email protected]> Cc: <[email protected]> [5.17+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/filemap: skip to create PMD-sized page cache if neededGavin Shan1-1/+1
On ARM64, HPAGE_PMD_ORDER is 13 when the base page size is 64KB. The PMD-sized page cache can't be supported by xarray as the following error messages indicate. ------------[ cut here ]------------ WARNING: CPU: 35 PID: 7484 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm \ fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 \ sha1_ce virtio_net net_failover virtio_console virtio_blk failover \ dimlib virtio_mmio CPU: 35 PID: 7484 Comm: test Kdump: loaded Tainted: G W 6.10.0-rc5-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x720 sp : ffff800087a4f6c0 x29: ffff800087a4f6c0 x28: ffff800087a4f720 x27: 000000001fffffff x26: 0000000000000c40 x25: 000000000000000d x24: ffff00010625b858 x23: ffff800087a4f720 x22: ffffffdfc0780000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0780000 x18: 000000001ff40000 x17: 00000000ffffffff x16: 0000018000000000 x15: 51ec004000000000 x14: 0000e00000000000 x13: 0000000000002000 x12: 0000000000000020 x11: 51ec000000000000 x10: 51ece1c0ffff8000 x9 : ffffbeb961a44d28 x8 : 0000000000000003 x7 : ffffffdfc0456420 x6 : ffff0000e1aa6eb8 x5 : 20bf08b4fe778fca x4 : ffffffdfc0456420 x3 : 0000000000000c40 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace: xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x720 truncate_inode_partial_folio+0xdc/0x160 truncate_inode_pages_range+0x1b4/0x4a8 truncate_pagecache_range+0x84/0xa0 xfs_flush_unmap_range+0x70/0x90 [xfs] xfs_file_fallocate+0xfc/0x4d8 [xfs] vfs_fallocate+0x124/0x2e8 ksys_fallocate+0x4c/0xa0 __arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Fix it by skipping to allocate PMD-sized page cache when its size is larger than MAX_PAGECACHE_ORDER. For this specific case, we will fall to regular path where the readahead window is determined by BDI's sysfs file (read_ahead_kb). Link: https://lkml.kernel.org/r/[email protected] Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Signed-off-by: Gavin Shan <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: William Kucharski <[email protected]> Cc: Zhenyu Zhang <[email protected]> Cc: <[email protected]> [5.18+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/readahead: limit page cache size in page_cache_ra_order()Gavin Shan1-4/+4
In page_cache_ra_order(), the maximal order of the page cache to be allocated shouldn't be larger than MAX_PAGECACHE_ORDER. Otherwise, it's possible the large page cache can't be supported by xarray when the corresponding xarray entry is split. For example, HPAGE_PMD_ORDER is 13 on ARM64 when the base page size is 64KB. The PMD-sized page cache can't be supported by xarray. Link: https://lkml.kernel.org/r/[email protected] Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Gavin Shan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Don Dutile <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: William Kucharski <[email protected]> Cc: Zhenyu Zhang <[email protected]> Cc: <[email protected]> [5.18+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/damon/core: merge regions aggressively when max_nr_regions is unmetSeongJae Park1-3/+20
DAMON keeps the number of regions under max_nr_regions by skipping regions split operations when doing so can make the number higher than the limit. It works well for preventing violation of the limit. But, if somehow the violation happens, it cannot recovery well depending on the situation. In detail, if the real number of regions having different access pattern is higher than the limit, the mechanism cannot reduce the number below the limit. In such a case, the system could suffer from high monitoring overhead of DAMON. The violation can actually happen. For an example, the user could reduce max_nr_regions while DAMON is running, to be lower than the current number of regions. Fix the problem by repeating the merge operations with increasing aggressiveness in kdamond_merge_regions() for the case, until the limit is met. [[email protected]: increase regions merge aggressiveness while respecting min_nr_regions] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: ensure max threshold attempt for max_nr_regions violation] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: b9a6ac4e4ede ("mm/damon: adaptively adjust regions") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [5.15+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: vmalloc: check if a hash-index is in cpu_possible_maskUladzislau Rezki (Sony)1-1/+9
The problem is that there are systems where cpu_possible_mask has gaps between set CPUs, for example SPARC. In this scenario addr_to_vb_xa() hash function can return an index which accesses to not-possible and not setup CPU area using per_cpu() macro. This results in an oops on SPARC. A per-cpu vmap_block_queue is also used as hash table, incorrectly assuming the cpu_possible_mask has no gaps. Fix it by adjusting an index to a next possible CPU. Link: https://lkml.kernel.org/r/[email protected] Fixes: 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks xarray") Reported-by: Nick Bowler <[email protected]> Closes: https://lore.kernel.org/linux-kernel/ZntjIE6msJbF8zTa@MiWiFi-R3L-srv/T/ Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hailong.Liu <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: page_ref: remove folio_try_get_rcu()Yang Shi2-6/+6
The below bug was reported on a non-SMP kernel: [ 275.267158][ T4335] ------------[ cut here ]------------ [ 275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275! [ 275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI [ 275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1 [ 275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202 [ 275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000 [ 275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034 [ 275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006 [ 275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136 [ 275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008 [ 275.276790][ T4335] FS: 00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000 [ 275.277570][ T4335] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0 [ 275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 275.280201][ T4335] Call Trace: [ 275.280499][ T4335] <TASK> [ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) [ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153) [ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174) [ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212) [ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264) [ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) [ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.285278][ T4335] try_grab_folio (mm/gup.c:148) [ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1)) [ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188) [ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825) [ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1)) [ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209) [ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204) [ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) [ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106) [ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350) [ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350) [ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1)) [ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573) [ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360 [ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10 [ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573) [ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0 [ 275.293384][ T4335] ? hlock_class (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228) [ 275.293780][ T4335] ? __pfx_process_vm_rw_core+0x10/0x10 [ 275.294350][ T4335] process_vm_rw (mm/process_vm_access.c:284) [ 275.294748][ T4335] ? __pfx_process_vm_rw (mm/process_vm_access.c:259) [ 275.295197][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1)) [ 275.295634][ T4335] __x64_sys_process_vm_readv (mm/process_vm_access.c:291) [ 275.296139][ T4335] ? syscall_enter_from_user_mode (kernel/entry/common.c:94 kernel/entry/common.c:112) [ 275.296642][ T4335] do_syscall_64 (arch/x86/entry/common.c:51 (discriminator 1) arch/x86/entry/common.c:82 (discriminator 1)) [ 275.297032][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1)) [ 275.297470][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359) [ 275.297988][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97) [ 275.298389][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359) [ 275.298906][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97) [ 275.299304][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97) [ 275.299703][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97) [ 275.300115][ T4335] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) This BUG is the VM_BUG_ON(!in_atomic() && !irqs_disabled()) assertion in folio_ref_try_add_rcu() for non-SMP kernel. The process_vm_readv() calls GUP to pin the THP. An optimization for pinning THP instroduced by commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") calls try_grab_folio() to pin the THP, but try_grab_folio() is supposed to be called in atomic context for non-SMP kernel, for example, irq disabled or preemption disabled, due to the optimization introduced by commit e286781d5f2e ("mm: speculative page references"). The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") is not actually the root cause although it was bisected to. It just makes the problem exposed more likely. The follow up discussion suggested the optimization for non-SMP kernel may be out-dated and not worth it anymore [1]. So removing the optimization to silence the BUG. However calling try_grab_folio() in GUP slow path actually is unnecessary, so the following patch will clean this up. [1] https://lore.kernel.org/linux-mm/821cf1d6-92b9-4ac4-bacc-d8f2364ac14f@paulmck-laptop/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") Signed-off-by: Yang Shi <[email protected]> Reported-by: kernel test robot <[email protected]> Tested-by: Oliver Sang <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vivek Kasireddy <[email protected]> Cc: <[email protected]> [6.6+] Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: simplify gotos in page_cache_sync_ra()Jan Kara1-17/+9
Unify all conditions for initial readahead to simplify goto logic in page_cache_sync_ra(). No functional changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: fold try_context_readahead() into its single callerJan Kara1-62/+22
try_context_readahead() has a single caller page_cache_sync_ra(). Fold the function there to make ra state modifications more obvious. No functional changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: disentangle async and sync readaheadJan Kara1-85/+77
Both async and sync readahead are handled by ondemand_readahead() function. However there isn't actually much in common. Just move async related parts into page_cache_ra_async() and sync related parts to page_cache_ra_sync(). No functional changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: drop dead code in ondemand_readahead()Jan Kara1-19/+2
ondemand_readahead() scales up the readahead window if the current read would hit the readahead mark placed by itself. However the condition is mostly dead code because: a) In case of async readahead we always increase ra->start so ra->start == index is never true. b) In case of sync readahead we either go through try_context_readahead() in which case ra->async_size == 1 < ra->size or we go through initial_readahead where ra->async_size == ra->size iff ra->size == max_pages. So the only practical effect is reducing async_size for large initial reads. Make the code more obvious. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: drop dead code in page_cache_ra_order()Jan Kara1-5/+0
page_cache_ra_order() scales folio order down so that is fully fits within readahead window. Thus the code handling the case where we walked past the readahead window is a dead code. Remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: drop pointless index from force_page_cache_ra()Jan Kara1-4/+1
Current index to readahead is tracked in readahead_control and properly updated by page_cache_ra_unbounded() (read_pages() in fact). So there's no need to track the index separately in force_page_cache_ra(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: properly shorten readahead when falling back to do_page_cache_ra()Jan Kara1-2/+3
When we succeed in creating some folios in page_cache_ra_order() but then need to fallback to single page folios, we don't shorten the amount to read passed to do_page_cache_ra() by the amount we've already read. This then results in reading more and also in placing another readahead mark in the middle of the readahead window which confuses readahead code. Fix the problem by properly reducing number of pages to read. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03filemap: fix page_cache_next_miss() when no hole foundJan Kara1-3/+3
page_cache_next_miss() should return value outside of the specified range when no hole is found. However currently it will return the last index *in* the specified range confusing ondemand_readahead() to think there's a hole in the searched range and upsetting readahead logic. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03readahead: make sure sync readahead reads needed pageJan Kara1-1/+1
Patch series "mm: Fix various readahead quirks". When we were internally testing performance of recent kernels, we have noticed quite variable performance of readahead arising from various quirks in readahead code. So I went on a cleaning spree there. This is a batch of patches resulting out of that. A quick testing in my test VM with the following fio job file: [global] direct=0 ioengine=sync invalidate=1 blocksize=4k size=10g readwrite=read [reader] numjobs=1 shows that this patch series improves the throughput from variable one in 310-340 MB/s range to rather stable one at 350 MB/s. As a side effect these cleanups also address the issue noticed by Bruz Zhang [1]. [1] https://lore.kernel.org/all/[email protected]/ Zhang Peng reported: : I test this batch of patch with fio, it indeed has a huge sppedup : in sequential read when block size is 4KiB. The result as follow, : for async read, iodepth is set to 128, and other settings : are self-evident. : : casename                upstream   withFix speedup : ----------------        --------   -------- ------- : randread-4k-sync        48991      47 : seqread-4k-sync         1162758    14229 : seqread-1024k-sync      1460208    1452522 : randread-4k-libaio      47467      4730 : randread-4k-posixaio    49190      49512 : seqread-4k-libaio       1085932    1234635 : seqread-1024k-libaio    1423341    1402214 -1 : seqread-4k-posixaio     1165084    1369613 1 : seqread-1024k-posixaio  1435422    1408808 -1.8 This patch (of 10): page_cache_sync_ra() is called when a folio we want to read is not in the page cache. It is expected that it creates the folio (and perhaps the following folios as well) and submits reads for them unless some error happens. However if index == ra->start + ra->size, ondemand_readahead() will treat the call as another async readahead hit. Thus ra->start will be advanced and we create pages and queue reads from ra->start + ra->size further. Consequentially the page at 'index' is not created and filemap_get_pages() has to always go through filemap_create_folio() path. This behavior has particularly unfortunate consequences when we have two IO threads sequentially reading from a shared file (as is the case when NFS serves sequential reads). In that case what can happen is: suppose ra->size == ra->async_size == 128, ra->start = 512 T1 T2 reads 128 pages at index 512 - hits async readahead mark filemap_readahead() ondemand_readahead() if (index == expected ...) ra->start = 512 + 128 = 640 ra->size = 128 ra->async_size = 128 page_cache_ra_order() blocks in ra_alloc_folio() reads 128 pages at index 640 - no page found page_cache_sync_readahead() ondemand_readahead() if (index == expected ...) ra->start = 640 + 128 = 768 ra->size = 128 ra->async_size = 128 page_cache_ra_order() submits reads from 768 - still no page found at index 640 filemap_create_folio() - goes on to index 641 page_cache_sync_readahead() ondemand_readahead() - founds ra is confused, trims is to small size finds pages were already inserted And as a result read performance suffers. Fix the problem by triggering async readahead case in ondemand_readahead() only if we are calling the function because we hit the readahead marker. In any other case we need to read the folio at 'index' and thus we cannot really use the current ra state. Note that the above situation could be viewed as a special case of file->f_ra state corruption. In fact two thread reading using the shared file can also seemingly corrupt file->f_ra in interesting ways due to concurrent access. I never saw that in practice and the fix is going to be much more complex so for now at least fix this practical problem while we ponder about the theoretically correct solution. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Tested-by: Zhang Peng <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/migrate: move NUMA hinting fault folio isolation + checks under PTLDavid Hildenbrand3-51/+49
Currently we always take a folio reference even if migration will not even be tried or isolation failed, requiring us to grab+drop an additional reference. Further, we end up calling folio_likely_mapped_shared() while the folio might have already been unmapped, because after we dropped the PTL, that can easily happen. We want to stop touching mapcounts and friends from such context, and only call folio_likely_mapped_shared() while the folio is still mapped: mapcount information is pretty much stale and unreliable otherwise. So let's move checks into numamigrate_isolate_folio(), rename that function to migrate_misplaced_folio_prepare(), and call that function from callsites where we call migrate_misplaced_folio(), but still with the PTL held. We can now stop taking temporary folio references, and really only take a reference if folio isolation succeeded. Doing the folio_likely_mapped_shared() + folio isolation under PT lock is now similar to how we handle MADV_PAGEOUT. While at it, combine the folio_is_file_lru() checks. [[email protected]: fix list_del() corruption] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Zi Yan <[email protected]> Tested-by: Donet Tom <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm/migrate: make migrate_misplaced_folio() return 0 on successDavid Hildenbrand3-6/+5
Patch series "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". Let's just return 0 on success, which is less confusing. ... especially because we got it wrong in the migrate.h stub where we have "return -EAGAIN; /* can't migrate now */" instead of "return 0;". Likely this wrong return value doesn't currently matter, but it certainly adds confusion. We'll add migrate_misplaced_folio_prepare() next, where we want to use the same "return 0 on success" approach, so let's just clean this up. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Donet Tom <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: mmap_lock: replace get_memcg_path_buf() with on-stack bufferTetsuo Handa1-155/+20
Commit 2b5067a8143e ("mm: mmap_lock: add tracepoints around lock acquisition") introduced TRACE_MMAP_LOCK_EVENT() macro using preempt_disable() in order to let get_mm_memcg_path() return a percpu buffer exclusively used by normal, softirq, irq and NMI contexts respectively. Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") replaced preempt_disable() with local_lock(&memcg_paths.lock) based on an argument that preempt_disable() has to be avoided because get_mm_memcg_path() might sleep if PREEMPT_RT=y. But syzbot started reporting inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. and inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. messages, for local_lock() does not disable IRQ. We could replace local_lock() with local_lock_irqsave() in order to suppress these messages. But this patch instead replaces percpu buffers with on-stack buffer, for the size of each buffer returned by get_memcg_path_buf() is only 256 bytes which is tolerable for allocating from current thread's kernel stack memory. Link: https://lkml.kernel.org/r/[email protected] Reported-by: syzbot <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=40905bca570ae6784745 Fixes: 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Nicolas Saenz Julienne <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: do not pass NULL pointers as 0Ilya Leoshkevich2-9/+10
sparse complains about passing NULL pointers as 0. Fix all instances. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: add missing __user tagsIlya Leoshkevich3-6/+6
sparse complains that __user pointers are being passed to functions that expect non-__user ones. In all cases, these functions are in fact working with user pointers, only the tag is missing. Add it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: accept ranges starting with 0 on s390Ilya Leoshkevich1-1/+4
On s390 the virtual address 0 is valid (current CPU's lowcore is mapped there), therefore KMSAN should not complain about it. Disable the respective check on s390. There doesn't seem to be a Kconfig option to describe this situation, so explicitly check for s390. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: kfence: disable KMSAN when checking the canaryIlya Leoshkevich1-2/+9
KMSAN warns about check_canary() accessing the canary. The reason is that, even though set_canary() is properly instrumented and sets shadow, slub explicitly poisons the canary's address range afterwards. Unpoisoning the canary is not the right thing to do: only check_canary() is supposed to ever touch it. Instead, disable KMSAN checks around canary read accesses. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Tested-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: slub: disable KMSAN when checking the padding bytesIlya Leoshkevich1-4/+12
Even though the KMSAN warnings generated by memchr_inv() are suppressed by metadata_access_enable(), its return value may still be poisoned. The reason is that the last iteration of memchr_inv() returns `*start != value ? start : NULL`, where *start is poisoned. Because of this, somewhat counterintuitively, the shadow value computed by visitSelectInst() is equal to `(uintptr_t)start`. One possibility to fix this, since the intention behind guarding memchr_inv() behind metadata_access_enable() is to touch poisoned metadata without triggering KMSAN, is to unpoison its return value. However, this approach is too fragile. So simply disable the KMSAN checks in the respective functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: slub: let KMSAN access metadataIlya Leoshkevich1-0/+2
Building the kernel with CONFIG_SLUB_DEBUG and CONFIG_KMSAN causes KMSAN to complain about touching redzones in kfree(). Fix by extending the existing KASAN-related metadata_access_enable() and metadata_access_disable() functions to KMSAN. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: expose KMSAN_WARN_ON()Ilya Leoshkevich1-23/+1
KMSAN_WARN_ON() is required for implementing s390-specific KMSAN functions, but right now it's available only to the KMSAN internal functions. Expose it to subsystems through <linux/kmsan.h>. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: do not round up pg_data_t sizeIlya Leoshkevich1-1/+1
x86's alloc_node_data() rounds up node data size to PAGE_SIZE. It's not explained why it's needed, but it's most likely for performance reasons, since the padding bytes are not used anywhere. Some other architectures do it as well, e.g., mips rounds it up to the cache line size. kmsan_init_shadow() initializes metadata for each node data and assumes the x86 rounding, which does not match other architectures. This may cause the range end to overshoot the end of available memory, in turn causing virt_to_page_or_null() in kmsan_init_alloc_meta_for_range() to return NULL, which leads to kernel panic shortly after. Since the padding bytes are not used, drop the rounding. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: use ALIGN_DOWN() in kmsan_get_metadata()Ilya Leoshkevich1-5/+3
Improve the readability by replacing the custom aligning logic with ALIGN_DOWN(). Unlike other places where a similar sequence is used, there is no size parameter that needs to be adjusted, so the standard macro fits. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: support SLAB_POISONIlya Leoshkevich2-5/+12
Avoid false KMSAN negatives with SLUB_DEBUG by allowing kmsan_slab_free() to poison the freed memory, and by preventing init_object() from unpoisoning new allocations by using __memset(). There are two alternatives to this approach. First, init_object() can be marked with __no_sanitize_memory. This annotation should be used with great care, because it drops all instrumentation from the function, and any shadow writes will be lost. Even though this is not a concern with the current init_object() implementation, this may change in the future. Second, kmsan_poison_memory() calls may be added after memset() calls. The downside is that init_object() is called from free_debug_processing(), in which case poisoning will erase the distinction between simply uninitialized memory and UAF. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: allow disabling KMSAN checks for the current taskIlya Leoshkevich3-7/+19
Like for KASAN, it's useful to temporarily disable KMSAN checks around, e.g., redzone accesses. Introduce kmsan_disable_current() and kmsan_enable_current(), which are similar to their KASAN counterparts. Make them reentrant in order to handle memory allocations in interrupt context. Repurpose the allow_reporting field for this. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: export panic_on_kmsanIlya Leoshkevich1-0/+1
When building the kmsan test as a module, modpost fails with the following error message: ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined! Export panic_on_kmsan in order to improve the KMSAN usability for modules. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: expose kmsan_get_metadata()Ilya Leoshkevich2-1/+1
Each s390 CPU has lowcore pages associated with it. Each CPU sees its own lowcore at virtual address 0 through a hardware mechanism called prefixing. Additionally, all lowcores are mapped to non-0 virtual addresses stored in the lowcore_ptr[] array. When lowcore is accessed through virtual address 0, one needs to resolve metadata for lowcore_ptr[raw_smp_processor_id()]. Expose kmsan_get_metadata() to make it possible to do this from the arch code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: remove an x86-specific #include from kmsan.hIlya Leoshkevich1-4/+4
Replace the x86-specific asm/pgtable_64_types.h #include with the linux/pgtable.h one, which all architectures have. While at it, sort the headers alphabetically for the sake of consistency with other KMSAN code. Link: https://lkml.kernel.org/r/[email protected] Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") Signed-off-by: Ilya Leoshkevich <[email protected]> Suggested-by: Heiko Carstens <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: remove a useless assignment from kmsan_vmap_pages_range_noflush()Ilya Leoshkevich1-1/+0
The value assigned to prot is immediately overwritten on the next line with PAGE_KERNEL. The right hand side of the assignment has no side-effects. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Ilya Leoshkevich <[email protected]> Suggested-by: Alexander Gordeev <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: fix kmsan_copy_to_user() on arches with overlapping address spacesIlya Leoshkevich1-1/+2
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Assume that we are handling user memory access in this case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reported-by: Alexander Gordeev <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: fix is_bad_asm_addr() on arches with overlapping address spacesIlya Leoshkevich1-1/+2
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Skip the comparison when this is the case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: increase the maximum store size to 4096Ilya Leoshkevich1-4/+3
The inline assembly block in s390's chsc() stores that much. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabledIlya Leoshkevich1-0/+1
KMSAN relies on memblock returning all available pages to it (see kmsan_memblock_free_pages()). It partitions these pages into 3 categories: pages available to the buddy allocator, shadow pages and origin pages. This partitioning is static. If new pages appear after kmsan_init_runtime(), it is considered an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as incompatible with KMSAN. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03kmsan: make the tests compatible with kmsan.panic=1Ilya Leoshkevich1-0/+5
It's useful to have both tests and kmsan.panic=1 during development, but right now the warnings, that the tests cause, lead to kernel panics. Temporarily set kmsan.panic=0 for the duration of the KMSAN testing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: memory: rename pages_per_huge_page to nr_pagesKefeng Wang1-12/+12
Since the callers are converted to use nr_pages naming, use it inside too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: memory: improve copy_user_large_folio()Kefeng Wang2-17/+12
Use nr_pages instead of pages_per_huge_page and move the address alignment from copy_user_large_folio() into the callers since it is only needed when we don't know which address will be accessed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: memory: use folio in struct copy_subpage_argKefeng Wang1-6/+6
Directly use folio in struct copy_subpage_arg. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: memory: convert clear_huge_page() to folio_zero_user()Kefeng Wang3-22/+19
Patch series "mm: improve clear and copy user folio", v2. Some folio conversions. An improvement is to move address alignment into the caller as it is only needed if we don't know which address will be accessed when clearing/copying user folios. This patch (of 4): Replace clear_huge_page() with folio_zero_user(), and take a folio instead of a page. Directly get number of pages by folio_nr_pages() to remove pages_per_huge_page argument, furthermore, move the address alignment from folio_zero_user() to the callers since the alignment is only needed when we don't know which address will be accessed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>