blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-17	mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries	Anshuman Khandual	1	-25/+25
	This replaces all the existing READ_ONCE() based page table accesses with respective pxdp_get() helpers. Although these helpers might also fallback to READ_ONCE() as default, but they do provide an opportunity for various platforms to override when required. This change is a step in direction to replace all page table entry accesses with respective pxdp_get() helpers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-15	mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick	Peter Xu	1	-26/+5
	Macro RANDOM_ORVALUE was used to make sure the pgtable entry will be populated with !none data in clear tests. The RANDOM_ORVALUE tried to cover mostly all the bits in a pgtable entry, even if there's no discussion on whether all the bits will be vaild. Both S390 and PPC64 have their own masks to avoid touching some bits. Now it's the turn for x86_64. The issue is there's a recent report from Mikhail Gavrilov showing that this can cause a warning with the newly added pte set check in commit 8430557fc5 on writable v.s. userfaultfd-wp bit, even though the check itself was valid, the random pte is not. We can choose to mask more bits out. However the need to have such random bits setup is questionable, as now it's already guaranteed to be true on below: - For pte level, the pgtable entry will be installed with value from pfn_pte(), where pfn points to a valid page. Hence the pte will be !none already if populated with pfn_pte(). - For upper-than-pte level, the pgtable entry should contain a directory entry always, which is also !none. All the cases look like good enough to test a pxx_clear() helper. Instead of extending the bitmask, drop the "set random bits" trick completely. Add some warning guards to make sure the entries will be !none before clear(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 8430557fc584 ("mm/page_table_check: support userfault wr-protect entries") Signed-off-by: Peter Xu <[email protected]> Reported-by: Mikhail Gavrilov <[email protected]> Link: https://lore.kernel.org/r/CABXGCsMB9A8-X+Np_Q+fWLURYL_0t3Y-MdoNabDM-Lzk58-DGA@mail.gmail.com Tested-by: Mikhail Gavrilov <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-05-07	mm/debug_vm_pgtable: test pmd_leaf() behavior with pmd_mkinvalid()	Ryan Roberts	1	-0/+1
	An invalidated pmd should still cause pmd_leaf() to return true. Let's test for that to ensure all arches remain consistent. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryan Roberts <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-25	fix missing vmalloc.h includes	Kent Overstreet	1	-0/+1
	Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/[email protected]/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [[email protected]: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arc build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-23	mm/debug_vm_pgtable: fix BUG_ON with pud advanced test	Aneesh Kumar K.V (IBM)	1	-0/+8
	Architectures like powerpc add debug checks to ensure we find only devmap PUD pte entries. These debug checks are only done with CONFIG_DEBUG_VM. This patch marks the ptes used for PUD advanced test devmap pte entries so that we don't hit on debug checks on architecture like ppc64 as below. WARNING: CPU: 2 PID: 1 at arch/powerpc/mm/book3s64/radix_pgtable.c:1382 radix__pud_hugepage_update+0x38/0x138 .... NIP [c0000000000a7004] radix__pud_hugepage_update+0x38/0x138 LR [c0000000000a77a8] radix__pudp_huge_get_and_clear+0x28/0x60 Call Trace: [c000000004a2f950] [c000000004a2f9a0] 0xc000000004a2f9a0 (unreliable) [c000000004a2f980] [000d34c100000000] 0xd34c100000000 [c000000004a2f9a0] [c00000000206ba98] pud_advanced_tests+0x118/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388 Also kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:202! .... NIP [c000000000096510] pudp_huge_get_and_clear_full+0x98/0x174 LR [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 Call Trace: [c000000004a2f950] [000d34c100000000] 0xd34c100000000 (unreliable) [c000000004a2f9a0] [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388 Link: https://lkml.kernel.org/r/[email protected] Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage") Signed-off-by: Aneesh Kumar K.V (IBM) <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-01-08	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	Kirill A. Shutemov	1	-2/+2
	commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-25	mm: fix multiple typos in multiple files	Muhammad Muzammil	1	-2/+2
	Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muhammad Muzammil <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muhammad Muzammil <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-31	Merge tag 'x86_shstk_for_6.6-rc1' of ↵	Linus Torvalds	1	-6/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-18	mm: change pudp_huge_get_and_clear_full take vm_area_struct as arg	Aneesh Kumar K.V	1	-1/+1
	We will use this in a later patch to do tlb flush when clearing pud entries on powerpc. This is similar to commit 93a98695f2f9 ("mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Williams <[email protected]> Cc: Joao Martins <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	mm/hugepage pud: allow arch-specific helper function to check huge page pud ↵	Aneesh Kumar K.V	1	-9/+7
	support Patch series "Add support for DAX vmemmap optimization for ppc64", v6. This patch series implements changes required to support DAX vmemmap optimization for ppc64. The vmemmap optimization is only enabled with radix MMU translation and 1GB PUD mapping with 64K page size. The patch series also splits the hugetlb vmemmap optimization as a separate Kconfig variable so that architectures can enable DAX vmemmap optimization without enabling hugetlb vmemmap optimization. This should enable architectures like arm64 to enable DAX vmemmap optimization while they can't enable hugetlb vmemmap optimization. More details of the same are in patch "mm/vmemmap optimization: Split hugetlb and devdax vmemmap optimization". With 64K page size for 16384 pages added (1G) we save 14 pages With 4K page size for 262144 pages added (1G) we save 4094 pages With 4K page size for 512 pages added (2M) we save 6 pages This patch (of 13): Architectures like powerpc would like to enable transparent huge page pud support only with radix translation. To support that add has_transparent_pud_hugepage() helper that architectures can override. [[email protected]: use the new has_transparent_pud_hugepage()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Williams <[email protected]> Cc: Joao Martins <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-07-11	mm: Make pte_mkwrite() take a VMA	Rick Edgecombe	1	-6/+6
	The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Previous work pte_mkwrite() renamed pte_mkwrite_novma() and converted callers that don't have a VMA were to use pte_mkwrite_novma(). So now change pte_mkwrite() to take a VMA and change the remaining callers to pass a VMA. Apply the same changes for pmd_mkwrite(). No functional change. Suggested-by: David Hildenbrand <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-4-rick.p.edgecombe%40intel.com
2023-06-19	mm/debug_vm_pgtable,page_table_check: warn pte map fails	Hugh Dickins	1	-1/+8
	Failures here would be surprising: pte_advanced_tests() and pte_clear_tests() and __page_table_check_pte_clear_range() each issue a warning if pte_offset_map() or pte_offset_map_lock() fails. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qi Zheng <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Steven Price <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zack Rusin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm, treewide: redefine MAX_ORDER sanely	Kirill A. Shutemov	1	-2/+2
	MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [[email protected]: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [[email protected]: fix another min_t warning] [[email protected]: fixups per Zi Yan] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Michael Ellerman <[email protected]> [powerpc] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28	mm: prefer xxx_page() alloc/free functions for order-0 pages	Lorenzo Stoakes	1	-2/+2
	Update instances of alloc_pages(..., 0), __get_free_pages(..., 0) and __free_pages(..., 0) to use alloc_page(), __get_free_page() and __free_page() respectively in core code. Link: https://lkml.kernel.org/r/50c48ca4789f1da2a65795f2346f5ae3eff7d665.1678710232.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-03-28	mm/debug_vm_pgtable: replace pte_mkhuge() with arch_make_huge_pte()	Anshuman Khandual	1	-1/+1
	Since the following commit arch_make_huge_pte() should be used directly in generic memory subsystem as a platform provided page table helper, instead of pte_mkhuge(). Change hugetlb_basic_tests() to call arch_make_huge_pte() directly, and update its relevant documentation entry as required. 'commit 16785bd77431 ("mm: merge pte_mkhuge() call into arch_make_huge_pte()")' Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reported-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Cc: Jonathan Corbet <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02	mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE	David Hildenbrand	1	-2/+0
	__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02	mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks	David Hildenbrand	1	-1/+24
	Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". This is the follow-up on [1]: [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all remaining architectures that support swap PTEs. This makes sure that exclusive anonymous pages will stay exclusive, even after they were swapped out -- for example, making GUP R/W FOLL_GET of anonymous pages reliable. Details can be found in [1]. This primarily fixes remaining known O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we can lose DMA reads to a page (modifying the user page by writing to it). To verify, there are two test cases (requiring swap space, obviously): (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries triggering a race condition. (2) My vmsplice() test case [3] that tries to detect if the exclusive marker was lost during swapout, not relying on a race condition. For example, on 32bit x86 (with and without PAE), my test case fails without these patches: $ ./test_swp_exclusive FAIL: page was replaced during COW But succeeds with these patches: $ ./test_swp_exclusive PASS: page was not replaced during COW Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even the ones where swap support might be in a questionable state? This is the first step towards removing "readable_exclusive" migration entries, and instead using pte_swp_exclusive() also with (readable) migration entries instead (as suggested by Peter). The only missing piece for that is supporting pmd_swp_exclusive() on relevant architectures with THP migration support. As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. I tried cross-compiling all relevant setups and tested on x86 and sparc64 so far. CCing arch maintainers only on this cover letter and on the respective patch(es). [1] https://lkml.kernel.org/r/[email protected] [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c This patch (of 26): We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. Let's extend our sanity checks, especially testing that our PTE bit does not affect: * is_swap_pte() -> pte_present() and pte_none() * the swap entry + type * pte_swp_soft_dirty() Especially, the pfn_pte() is dodgy when the swap PTE layout differs heavily from ordinary PTEs. Let's properly construct a swap PTE from swap type+offset. [[email protected]: fix build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Brian Cain <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: H. Peter Anvin (Intel) <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Johannes Berg <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Xu <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuerui Wang <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-18	mm/debug: use valid physical memory for pmd/pud tests	Frank van der Linden	1	-19/+83
	The page table debug tests need a physical address to validate low-level page table manipulation with. The memory at this address is not actually touched, it just encoded in the page table entries at various levels during the tests only. Since the memory is not used, the code just picks the physical address of the start_kernel symbol. This value is then truncated to get a properly aligned address that is to be used for various tests. Because of the truncation, the address might not actually exist, or might not describe a complete huge page. That's not a problem for most tests, but the arch-specific code may check for attribute validity and consistency. The x86 version of {pud,pmd}_set_huge actually validates the MTRRs for the PMD/PUD range. This may fail with an address derived from start_kernel, depending on where the kernel was loaded and what the physical memory layout of the system is. This then leads to false negatives for the {pud,pmd}_set_huge tests. Avoid this by finding a properly aligned memory range that exists and is usable. If such a range is not found, skip the tests that needed it. [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 399145f9eb6c ("mm/debug: add tests validating architecture page table helpers") Signed-off-by: Frank van der Linden <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: remove unused savedwrite infrastructure	David Hildenbrand	1	-32/+0
	NUMA hinting no longer uses savedwrite, let's rip it out. ... and while at it, drop __pte_write() and __pmd_write() on ppc64. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08	mm: debug_vm_pgtable: use VM_ACCESS_FLAGS	Kefeng Wang	1	-6/+2
	Directly use VM_ACCESS_FLAGS instead VMFLAGS. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-06-27	docs: rename Documentation/vm to Documentation/mm	Mike Rapoport	1	-1/+1
	so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by: Mike Rapoport <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Tested-by: Ira Weiny <[email protected]> Acked-by: Jonathan Corbet <[email protected]> Acked-by: Wu XiangCheng <[email protected]>
2022-05-09	mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE	David Hildenbrand	1	-0/+15
	Let's test that __HAVE_ARCH_PTE_SWP_EXCLUSIVE works as expected. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Don Dutile <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Liang Zhang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Oded Gabbay <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pedro Demarchi Gomes <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28	mm/debug_vm_pgtable: drop protection_map[] usage	Anshuman Khandual	1	-12/+19
	Patch series "mm: protection_map[] cleanups". This patch (of 2): Although protection_map[] contains the platform defined page protection map for a given vm_flags combination, vm_get_page_prot() is the right interface to use. This will also reduce dependency on protection_map[] which is going to be dropped off completely later on. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-02-04	mm/debug_vm_pgtable: remove pte entry from the page table	Pasha Tatashin	1	-0/+2
	Patch series "page table check fixes and cleanups", v5. This patch (of 4): The pte entry that is used in pte_advanced_tests() is never removed from the page table at the end of the test. The issue is detected by page_table_check, to repro compile kernel with the following configs: CONFIG_DEBUG_VM_PGTABLE=y CONFIG_PAGE_TABLE_CHECK=y CONFIG_PAGE_TABLE_CHECK_ENFORCED=y During the boot the following BUG is printed: debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:162! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-11413-g2c271fe77d52 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 ... The entry should be properly removed from the page table before the page is released to the free list. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers") Signed-off-by: Pasha Tatashin <[email protected]> Reviewed-by: Zi Yan <[email protected]> Tested-by: Zi Yan <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Paul Turner <[email protected]> Cc: Wei Xu <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Muchun Song <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> [5.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15	mm: ptep_clear() page table helper	Pasha Tatashin	1	-1/+1
	We have ptep_get_and_clear() and ptep_get_and_clear_full() helpers to clear PTE from user page tables, but there is no variant for simple clear of a present PTE from user page tables without using a low level pte_clear() which can be either native or para-virtualised. Add a new ptep_clear() that can be used in common code to clear PTEs from page table. We will need this call later in order to add a hook for page table check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Xu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15	mm/debug_vm_pgtable: update comments regarding migration swap entries	Anshuman Khandual	1	-2/+2
	Commit 4dd845b5a3e5 ("mm/swapops: rework swap entry manipulation code") had changed migtation entry related helpers. Just update debug_vm_pgatble() synced documentation to reflect those changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm: debug_vm_pgtable: don't use __P000 directly	Guo Ren	1	-3/+4
	The __Pxxx/__Sxxx macros are only for protection_map[] init. All usage of them in linux should come from protection_map array. Because a lot of architectures would re-initilize protection_map[] array, eg: x86-mem_encrypt, m68k-motorola, mips, arm, sparc. Using __P000 is not rigorous. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Guo Ren <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Gerald Schaefer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: fix corrupted page flag	Gavin Shan	1	-4/+51
	In page table entry modifying tests, set_xxx_at() are used to populate the page table entries. On ARM64, PG_arch_1 (PG_dcache_clean) flag is set to the target page flag if execution permission is given. The logic exits since commit 4f04d8f00545 ("arm64: MMU definitions"). The page flag is kept when the page is free'd to buddy's free area list. However, it will trigger page checking failure when it's pulled from the buddy's free area list, as the following warning messages indicate. BUG: Bad page state in process memhog pfn:08000 page:0000000015c0a628 refcount:0 mapcount:0 \ mapping:0000000000000000 index:0x1 pfn:0x8000 flags: 0x7ffff8000000800(arch_1\|node=0\|zone=0\|lastcpupid=0xfffff) raw: 07ffff8000000800 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set This fixes the issue by clearing PG_arch_1 through flush_dcache_page() after set_xxx_at() is called. For architectures other than ARM64, the unexpected overhead of cache flushing is acceptable. Link: https://lkml.kernel.org/r/[email protected] Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers") Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: remove unused code	Gavin Shan	1	-54/+0
	The variables used by old implementation isn't needed as we switched to "struct pgtable_debug_args". Lets remove them and related code in debug_vm_pgtable(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in PGD and P4D modifying ↵	Gavin Shan	1	-48/+38
	tests This uses struct pgtable_debug_args in PGD/P4D modifying tests. No allocated huge page is used in these tests. Besides, the unused variable @saved_p4dp and @saved_pudp are dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in PUD modifying tests	Gavin Shan	1	-78/+48
	This uses struct pgtable_debug_args in PUD modifying tests. The allocated huge page is used when set_pud_at() is used. The corresponding tests are skipped if the huge page doesn't exist. Besides, the following unused variables in debug_vm_pgtable() are dropped: @prot, @paddr, @pud_aligned. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in PMD modifying tests	Gavin Shan	1	-52/+46
	This uses struct pgtable_debug_args in PMD modifying tests. The allocated huge page is used when set_pmd_at() is used. The corresponding tests are skipped if the huge page doesn't exist. Besides, the unused variable @pmd_aligned in debug_vm_pgtable() is dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in PTE modifying tests	Gavin Shan	1	-35/+32
	This uses struct pgtable_debug_args in PTE modifying tests. The allocated page is used as set_pte_at() is used there. The tests are skipped if the allocated page doesn't exist. It's notable that args->ptep need to be mapped before the tests. The reason why we don't map args->ptep at the beginning is PTE entry is only mapped and accessible in atomic context when CONFIG_HIGHPTE is enabled. So we avoid to do that so that atomic context is only enabled if needed. Besides, the unused variable @pte_aligned and @ptep in debug_vm_pgtable() are dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in migration and thp tests	Gavin Shan	1	-19/+17
	This uses struct pgtable_debug_args in the migration and thp test functions. It's notable that the pre-allocated page is used in swap_migration_tests() as set_pte_at() is used there. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in soft_dirty and swap tests	Gavin Shan	1	-25/+23
	This uses struct pgtable_debug_args in the soft_dirty and swap test functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in protnone and devmap tests	Gavin Shan	1	-32/+26
	This uses struct pgtable_debug_args in protnone and devmap test functions. After that, the unused variable @protnone in debug_vm_pgtable() is dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in leaf and savewrite tests	Gavin Shan	1	-16/+16
	This uses struct pgtable_debug_args in the leaf and savewrite test functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: use struct pgtable_debug_args in basic tests	Gavin Shan	1	-26/+24
	This uses struct pgtable_debug_args in the basic test functions. The unused variables @pgd_aligned and @p4d_aligned in debug_vm_pgtable() are dropped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Aneesh Kumar K.V <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chunyu Hu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03	mm/debug_vm_pgtable: introduce struct pgtable_debug_args	Gavin Shan	1	-1/+269
	Patch series "mm/debug_vm_pgtable: Enhancements", v6. There are a couple of issues with current implementations and this series tries to resolve the issues: (a) All needed information are scattered in variables, passed to various test functions. The code is organized in pretty much relaxed fashion. (b) The page isn't allocated from buddy during page table entry modifying tests. The page can be invalid, conflicting to the implementations of set_xxx_at() on ARM64. The target page is accessed so that the iCache can be flushed when execution permission is given on ARM64. Besides, the target page can be unmapped and accessing to it causes kernel crash. "struct pgtable_debug_args" is introduced to address issue (a). For issue (b), the used page is allocated from buddy in page table entry modifying tests. The corresponding tets will be skipped if we fail to allocate the (huge) page. For other test cases, the original page around to kernel symbol (@start_kernel) is still used. The patches are organized as below. PATCH[2-10] could be combined to one patch, but it will make the review harder: PATCH[1] introduces "struct pgtable_debug_args" as place holder of all needed information. With it, the old and new implementation can coexist. PATCH[2-10] uses "struct pgtable_debug_args" in various test functions. PATCH[11] removes the unused code for old implementation. PATCH[12] fixes the issue of corrupted page flag for ARM64 This patch (of 6): In debug_vm_pgtable(), there are many local variables introduced to track the needed information and they are passed to the functions for various test cases. It'd better to introduce a struct as place holder for these information. With it, what the tests functions need is the struct. In this way, the code is simplified and easier to be maintained. Besides, set_xxx_at() could access the data on the corresponding pages in the page table modifying tests. So the accessed pages in the tests should have been allocated from buddy. Otherwise, we're accessing pages that aren't owned by us. This causes issues like page flag corruption or kernel crash on accessing unmapped page when CONFIG_DEBUG_PAGEALLOC is enabled. This introduces "struct pgtable_debug_args". The struct is initialized and destroyed, but the information in the struct isn't used yet. It will be used in subsequent patches. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx] Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Anshuman Khandual <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Qian Cai <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Chunyu Hu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/swapops: rework swap entry manipulation code	Alistair Popple	1	-6/+6
	Both migration and device private pages use special swap entries that are manipluated by a range of inline functions. The arguments to these are somewhat inconsistent so rework them to remove flag type arguments and to make the arguments similar for both read and write entry creation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment ↵	Shixin Liu	1	-3/+3
	mistake Remove redundant pfn_{pmd/pte}() in {pmd/pte}_advanced_tests() and adjust pfn_pud() in pud_advanced_tests() to make it similar with other two functions. In addition, the branch condition should be CONFIG_TRANSPARENT_HUGEPAGE instead of CONFIG_ARCH_HAS_PTE_DEVMAP. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shixin Liu <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of ↵	Shixin Liu	1	-52/+39
	CONFIG_TRANSPARENT_HUGEPAGE The functions {pmd/pud}_set_huge and {pmd/pud}_clear_huge are not dependent on THP. Hence move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shixin Liu <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29	mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()	Anshuman Khandual	1	-12/+51
	On certain platforms, THP support could not just be validated via the build option CONFIG_TRANSPARENT_HUGEPAGE. Instead has_transparent_hugepage() also needs to be called upon to verify THP runtime support. Otherwise the debug test will just run into unusable THP helpers like in the case of a 4K hash config on powerpc platform [1]. This just moves all pfn_pmd() and pfn_pud() after THP runtime validation with has_transparent_hugepage() which prevents the mentioned problem. [1] https://bugzilla.kernel.org/show_bug.cgi?id=213069 Link: https://lkml.kernel.org/r/[email protected] Fixes: 787d563b8642 ("mm/debug_vm_pgtable: fix kernel crash by checking for THP support") Signed-off-by: Anshuman Khandual <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-05	mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()	Gerald Schaefer	1	-2/+2
	In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud entry, and so it does not match the given pmdp/pudp and (aligned down) pfn any more. For s390, this results in memory corruption, because the IDTE instruction used e.g. in xxx_get_and_clear() will take the vaddr for some calculations, in combination with the given pmdp. It will then end up with a wrong table origin, ending on ...ff8, and some of those wrongly set low-order bits will also select a wrong pagetable level for the index addition. IDTE could therefore invalidate (or 0x20) something outside of the page tables, depending on the wrongly picked index, which in turn depends on the random vaddr. As result, we sometimes see "BUG task_struct (Not tainted): Padding overwritten" on s390, where one 0x5a padding value got overwritten with 0x7a. Fix this by aligning down, similar to how the pmd/pud_aligned pfns are calculated. Link: https://lkml.kernel.org/r/[email protected] Fixes: a5c3b9ffb0f40 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers") Signed-off-by: Gerald Schaefer <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: <[email protected]> [5.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30	mm: HUGE_VMAP arch support cleanup	Nicholas Piggin	1	-2/+2
	This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Ding Tianhong <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Russell King <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/debug_vm_pgtable/basic: iterate over entire protection_map[]	Anshuman Khandual	1	-12/+35
	Currently the basic tests just validate various page table transformations after starting with vm_get_page_prot(VM_READ\|VM_WRITE\|VM_EXEC) protection. Instead scan over the entire protection_map[] for better coverage. It also makes sure that all these basic page table tranformations checks hold true irrespective of the starting protection value for the page table entry. There is also a slight change in the debug print format for basic tests to capture the protection value it is being tested with. The modified output looks something like [pte_basic_tests ]: Validating PTE basic () [pte_basic_tests ]: Validating PTE basic (read) [pte_basic_tests ]: Validating PTE basic (write) [pte_basic_tests ]: Validating PTE basic (read\|write) [pte_basic_tests ]: Validating PTE basic (exec) [pte_basic_tests ]: Validating PTE basic (read\|exec) [pte_basic_tests ]: Validating PTE basic (write\|exec) [pte_basic_tests ]: Validating PTE basic (read\|write\|exec) [pte_basic_tests ]: Validating PTE basic (shared) [pte_basic_tests ]: Validating PTE basic (read\|shared) [pte_basic_tests ]: Validating PTE basic (write\|shared) [pte_basic_tests ]: Validating PTE basic (read\|write\|shared) [pte_basic_tests ]: Validating PTE basic (exec\|shared) [pte_basic_tests ]: Validating PTE basic (read\|exec\|shared) [pte_basic_tests ]: Validating PTE basic (write\|exec\|shared) [pte_basic_tests ]: Validating PTE basic (read\|write\|exec\|shared) This adds a missing argument 'struct mm_struct *' in pud_basic_tests() test . This never got exposed before as PUD based THP is available only on X86 platform where mm_pmd_folded(mm) call gets macro replaced without requiring the mm_struct i.e __is_defined(__PAGETABLE_PMD_FOLDED). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Tested-by: Gerald Schaefer <[email protected]> [s390] Reviewed-by: Steven Price <[email protected]> Suggested-by: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24	mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect	Anshuman Khandual	1	-0/+39
	Patch series "mm/debug_vm_pgtable: Some minor updates", v3. This series contains some cleanups and new test suggestions from Catalin from an earlier discussion. https://lore.kernel.org/linux-mm/20201123142237.GF17833@gaia/ This patch (of 2): This adds validation tests for dirtiness after write protect conversion for each page table level. There are two new separate test types involved here. The first test ensures that a given page table entry does not become dirty after pxx_wrprotect(). This is important for platforms like arm64 which transfers and drops the hardware dirty bit (!PTE_RDONLY) to the software dirty bit while making it an write protected one. This test ensures that no fresh page table entry could be created with hardware dirty bit set. The second test ensures that a given page table entry always preserve the dirty information across pxx_wrprotect(). This adds two previously missing PUD level basic tests and while here fixes pxx_wrprotect() related typos in the documentation file. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Suggested-by: Catalin Marinas <[email protected]> Tested-by: Gerald Schaefer <[email protected]> [s390] Cc: Christophe Leroy <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Steven Price <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16	mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.	Aneesh Kumar K.V	1	-3/+8
	With highmem, pte_alloc_map() keep the level4 page table mapped using kmap_atomic(). Avoid doing new memory allocation with page table mapped like above. [ 9.409233] BUG: sleeping function called from invalid context at mm/page_alloc.c:4822 [ 9.410557] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper [ 9.411932] no locks held by swapper/1. [ 9.412595] CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc3-00323-gc50eb1ed654b5 #2 [ 9.413824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 9.415207] Call Trace: [ 9.415651] ? ___might_sleep.cold+0xa7/0xcc [ 9.416367] ? __alloc_pages_nodemask+0x14c/0x5b0 [ 9.417055] ? swap_migration_tests+0x50/0x293 [ 9.417704] ? debug_vm_pgtable+0x4bc/0x708 [ 9.418287] ? swap_migration_tests+0x293/0x293 [ 9.418911] ? do_one_initcall+0x82/0x3cb [ 9.419465] ? parse_args+0x1bd/0x280 [ 9.419983] ? rcu_read_lock_sched_held+0x36/0x60 [ 9.420673] ? trace_initcall_level+0x1f/0xf3 [ 9.421279] ? trace_initcall_level+0xbd/0xf3 [ 9.421881] ? do_basic_setup+0x9d/0xdd [ 9.422410] ? do_basic_setup+0xc3/0xdd [ 9.422938] ? kernel_init_freeable+0x72/0xa3 [ 9.423539] ? rest_init+0x134/0x134 [ 9.424055] ? kernel_init+0x5/0x12c [ 9.424574] ? ret_from_fork+0x19/0x30 Reported-by: kernel test robot <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16	mm/debug_vm_pgtable: avoid none pte in pte_clear_test	Aneesh Kumar K.V	1	-3/+6
	pte_clear_tests operate on an existing pte entry. Make sure that is not a none pte entry. [[email protected]: avoid kernel crash with riscv] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Albert Ou <[email protected]> Cc: Palmer Dabbelt <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16	mm/debug_vm_pgtable/hugetlb: disable hugetlb test on ppc64	Aneesh Kumar K.V	1	-51/+0
	The seems to be missing quite a lot of details w.r.t allocating the correct pgtable_t page (huge_pte_alloc()), holding the right lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA. ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these. Hence disable the test on ppc64. [[email protected]: drop hugetlb_advanced_tests()] Link: https://lore.kernel.org/lkml/[email protected]/#t Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Michael Ellerman <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>