blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-07-11	mm/rmap: try_to_migrate() skip zone_device !device_private	Hugh Dickins	1	-3/+3
	I know nothing about zone_device pages and !device_private pages; but if try_to_migrate_one() will do nothing for them, then it's better that try_to_migrate() filter them first, than trawl through all their vmas. Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11	mm/rmap: fix new bug: premature return from page_mlock_one()	Hugh Dickins	1	-6/+5
	In the unlikely race case that page_mlock_one() finds VM_LOCKED has been cleared by the time it got page table lock, page_vma_mapped_walk_done() must be called before returning, either explicitly, or by a final call to page_vma_mapped_walk() - otherwise the page table remains locked. Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11	mm/rmap: fix old bug: munlocking THP missed other mlocks	Hugh Dickins	1	-5/+8
	The kernel recovers in due course from missing Mlocked pages: but there was no point in calling page_mlock() (formerly known as try_to_munlock()) on a THP, because nothing got done even when it was found to be mapped in another VM_LOCKED vma. It's true that we need to be careful: Mlocked accounting of pte-mapped THPs is too difficult (so consistently avoided); but Mlocked accounting of only-pmd-mapped THPs is supposed to work, even when multiple mappings are mlocked and munlocked or munmapped. Refine the tests. There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so page_mlock_one() does not even have to worry about that complication. (I said the kernel recovers: but would page reclaim be likely to split THP before rediscovering that it's VM_LOCKED? I've not followed that up) Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11	mm/rmap: fix comments left over from recent changes	Hugh Dickins	2	-7/+2
	Parallel developments in mm/rmap.c have left behind some out-of-date comments: try_to_migrate_one() also accepts TTU_SYNC (already commented in try_to_migrate() itself), and try_to_migrate() returns nothing at all. TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete the "recently referenced" comment from try_to_unmap_one() (once upon a time the comment was near the removed codeblock, but they drifted apart). Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-10	mm/page_alloc: Revert pahole zero-sized workaround	Mel Gorman	1	-11/+0
	Commit dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") folded in a workaround patch for pahole that was unable to deal with zero-sized percpu structures. A superior workaround is achieved with commit a0b8200d06ad ("kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21"). This patch reverts the dummy field and the pahole version check. Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock") Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-10	Merge branch 'for-5.14-fixes' of ↵	Linus Torvalds	3	-8/+35
	git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu fix from Dennis Zhou: "This is just a single change to fix percpu depopulation. The code relied on depopulation code written specifically for the free path and relied on vmalloc to do the tlb flush lazily. As we're modifying the backing pages during the lifetime of a chunk, we need to also flush the tlb accordingly. Guenter Roeck reported this issue in [1] on mips. I believe we just happen to be lucky given the much larger chunk sizes on x86 and consequently less churning of this memory" Link: https://lore.kernel.org/lkml/[email protected]/ [1] * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: flush tlb in pcpu_reclaim_populated()
2021-07-08	mm/mremap: allow arch runtime override	Aneesh Kumar K.V	1	-1/+14
	Patch series "Speedup mremap on ppc64", v8. This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires the platform to support updating higher-level page tables without updating page table entries. This also needs to invalidate the Page Walk Cache on architecture supporting the same. This patch (of 3): Architectures like ppc64 support faster mremap only with radix translation. Hence allow a runtime check w.r.t support for fast mremap. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm/mremap: hold the rmap lock in write mode when moving page table entries.	Aneesh Kumar K.V	1	-2/+2
	To avoid a race between rmap walk and mremap, mremap does take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss a page table entry due to PTE moves via move_pagetables(). The kernel does further optimization of this lock such that if we are going to find the newly added vma after the old vma, the rmap lock is not taken. This is because rmap walk would find the vmas in the same order and if we don't find the page table attached to older vma we would find it with the new vma which we would iterate later. As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page") mremap is special in that it doesn't take ownership of the page. The optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock. This can result in stale TLB entries as show below. This patch updates the rmap locking requirement in mremap to handle the race condition explained below with optimized mremap:: Optmized PMD move CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one mmap_write_lock_killable() addr = old_addr lock(pte_ptl) lock(pmd_ptl) pmd = old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) new_pmd = pmd *new_addr = 10; and fills TLB with new addr and old pfn unlock(pmd_ptl) ptep_clear_flush() old pfn is free. Stale TLB entry Optimized PUD move also suffers from a similar race. Both the above race condition can be fixed if we force mremap path to take rmap lock. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions") Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm/mremap: use pmd/pud_poplulate to update page table entries	Aneesh Kumar K.V	1	-4/+3
	pmd/pud_populate is the right interface to be used to set the respective page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at can only be used to set a hugepage PTE. Since we are not setting up a hugepage PTE here, use the pmd/pud_populate interface. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm/mremap: don't enable optimized PUD move if page table levels is 2	Aneesh Kumar K.V	1	-1/+1
	With two level page table don't enable move_normal_pud. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm/mremap: convert huge PUD move to separate helper	Aneesh Kumar K.V	1	-7/+73
	With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD entries. Add a helper to move huge PUD entries on mremap(). This will be used by a later patch to optimize mremap of PUD_SIZE aligned level 4 PTE mapped address This also make sure we support mremap on huge PUD entries even with CONFIG_HAVE_MOVE_PUD disabled. [[email protected]: fix build failure with clang-10] Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161 Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm: add setup_initial_init_mm() helper	Kefeng Wang	1	-0/+9
	Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3. Add setup_initial_init_mm() helper, then use it to cleanup the text, data and brk setup code. This patch (of 15): Add setup_initial_init_mm() helper to setup kernel text, data and brk. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nick Hu <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	PM: hibernate: disable when there are active secretmem users	Mike Rapoport	1	-0/+15
	It is unsafe to allow saving of secretmem areas to the hibernation snapshot as they would be visible after the resume and this essentially will defeat the purpose of secret memory mappings. Prevent hibernation whenever there are active secret memory users. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: James Bottomley <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Hagen Paul Pfeifer <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: Will Deacon <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm: introduce memfd_secret system call to create "secret" memory areas	Mike Rapoport	5	-1/+258
	Introduce "memfd_secret" system call with the ability to create memory areas visible only in the context of the owning process and not mapped not only to other processes but in the kernel page tables as well. The secretmem feature is off by default and the user must explicitly enable it at the boot time. Once secretmem is enabled, the user will be able to create a file descriptor using the memfd_secret() system call. The memory areas created by mmap() calls from this file descriptor will be unmapped from the kernel direct map and they will be only mapped in the page table of the processes that have access to the file descriptor. Secretmem is designed to provide the following protections: * Enhanced protection (in conjunction with all the other in-kernel attack prevention systems) against ROP attacks. Seceretmem makes "simple" ROP insufficient to perform exfiltration, which increases the required complexity of the attack. Along with other protections like the kernel stack size limit and address space layout randomization which make finding gadgets is really hard, absence of any in-kernel primitive for accessing secret memory means the one gadget ROP attack can't work. Since the only way to access secret memory is to reconstruct the missing mapping entry, the attacker has to recover the physical page and insert a PTE pointing to it in the kernel and then retrieve the contents. That takes at least three gadgets which is a level of difficulty beyond most standard attacks. * Prevent cross-process secret userspace memory exposures. Once the secret memory is allocated, the user can't accidentally pass it into the kernel to be transmitted somewhere. The secreremem pages cannot be accessed via the direct map and they are disallowed in GUP. * Harden against exploited kernel flaws. In order to access secretmem, a kernel-side attack would need to either walk the page tables and create new ones, or spawn a new privileged uiserspace process to perform secrets exfiltration using ptrace. The file descriptor based memory has several advantages over the "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File descriptor approach allows explicit and controlled sharing of the memory areas, it allows to seal the operations. Besides, file descriptor based memory paves the way for VMMs to remove the secret memory range from the userspace hipervisor process, for instance QEMU. Andy Lutomirski says: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." memfd_secret() is made a dedicated system call rather than an extension to memfd_create() because it's purpose is to allow the user to create more secure memory mappings rather than to simply allow file based access to the memory. Nowadays a new system call cost is negligible while it is way simpler for userspace to deal with a clear-cut system calls than with a multiplexer or an overloaded syscall. Moreover, the initial implementation of memfd_secret() is completely distinct from memfd_create() so there is no much sense in overloading memfd_create() to begin with. If there will be a need for code sharing between these implementation it can be easily achieved without a need to adjust user visible APIs. The secret memory remains accessible in the process context using uaccess primitives, but it is not exposed to the kernel otherwise; secret memory areas are removed from the direct map and functions in the follow_page()/get_user_page() family will refuse to return a page that belongs to the secret memory area. Once there will be a use case that will require exposing secretmem to the kernel it will be an opt-in request in the system call flags so that user would have to decide what data can be exposed to the kernel. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. Pages in the secretmem regions are unevictable and unmovable to avoid accidental exposure of the sensitive data via swap or during page migration. Since the secretmem mappings are locked in memory they cannot exceed RLIMIT_MEMLOCK. Since these mappings are already locked independently from mlock(), an attempt to mlock()/munlock() secretmem range would fail and mlockall()/munlockall() will ignore secretmem mappings. However, unlike mlock()ed memory, secretmem currently behaves more like long-term GUP: secretmem mappings are unmovable mappings directly consumed by user space. With default limits, there is no excessive use of secretmem and it poses no real problem in combination with ZONE_MOVABLE/CMA, but in the future this should be addressed to allow balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA. A page that was a part of the secret memory area is cleared when it is freed to ensure the data is not exposed to the next user of that page. The following example demonstrates creation of a secret mapping (error handling is omitted): fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ \| PROT_WRITE, MAP_SHARED, fd, 0); [1] https://lore.kernel.org/linux-mm/[email protected]/ [[email protected]: suppress Kconfig whine] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Hagen Paul Pfeifer <[email protected]> Acked-by: James Bottomley <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: Will Deacon <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mmap: make mlock_future_check() global	Mike Rapoport	2	-3/+5
	Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20. This is an implementation of "secret" mappings backed by a file descriptor. The file descriptor backing secret memory mappings is created using a dedicated memfd_secret system call The desired protection mode for the memory is configured using flags parameter of the system call. The mmap() of the file descriptor created with memfd_secret() will create a "secret" memory mapping. The pages in that mapping will be marked as not present in the direct map and will be present only in the page table of the owning mm. Although normally Linux userspace mappings are protected from other users, such secret mappings are useful for environments where a hostile tenant is trying to trick the kernel into giving them access to other tenants mappings. It's designed to provide the following protections: * Enhanced protection (in conjunction with all the other in-kernel attack prevention systems) against ROP attacks. Seceretmem makes "simple" ROP insufficient to perform exfiltration, which increases the required complexity of the attack. Along with other protections like the kernel stack size limit and address space layout randomization which make finding gadgets is really hard, absence of any in-kernel primitive for accessing secret memory means the one gadget ROP attack can't work. Since the only way to access secret memory is to reconstruct the missing mapping entry, the attacker has to recover the physical page and insert a PTE pointing to it in the kernel and then retrieve the contents. That takes at least three gadgets which is a level of difficulty beyond most standard attacks. * Prevent cross-process secret userspace memory exposures. Once the secret memory is allocated, the user can't accidentally pass it into the kernel to be transmitted somewhere. The secreremem pages cannot be accessed via the direct map and they are disallowed in GUP. * Harden against exploited kernel flaws. In order to access secretmem, a kernel-side attack would need to either walk the page tables and create new ones, or spawn a new privileged uiserspace process to perform secrets exfiltration using ptrace. In the future the secret mappings may be used as a mean to protect guest memory in a virtual machine host. For demonstration of secret memory usage we've created a userspace library https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git that does two things: the first is act as a preloader for openssl to redirect all the OPENSSL_malloc calls to secret memory meaning any secret keys get automatically protected this way and the other thing it does is expose the API to the user who needs it. We anticipate that a lot of the use cases would be like the openssl one: many toolkits that deal with secret keys already have special handling for the memory to try to give them greater protection, so this would simply be pluggable into the toolkits without any need for user application modification. Hiding secret memory mappings behind an anonymous file allows usage of the page cache for tracking pages allocated for the "secret" mappings as well as using address_space_operations for e.g. page migration callbacks. The anonymous file may be also used implicitly, like hugetlb files, to implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm ABIs in the future. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. In addition, there is also a long term goal to improve management of the direct map. [1] https://lore.kernel.org/linux-mm/[email protected]/ This patch (of 7): It will be used by the upcoming secret memory implementation. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: James Bottomley <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Hagen Paul Pfeifer <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Bottomley <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: Will Deacon <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08	mm/slub: use stackdepot to save stack trace in objects	Oliver Glitta	1	-30/+49
	Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the future using the stackdepot handle instead of matching stacks manually. [[email protected]: rename save_stack_trace()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix lockdep splat] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oliver Glitta <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-04	Merge branch 'core-rcu-2021.07.04' of ↵	Linus Torvalds	5	-3/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ...
2021-07-04	Merge tag 'memblock-v5.14-rc1' of ↵	Linus Torvalds	1	-12/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: "Fix arm crashes caused by holes in the memory map. The coordination between freeing of unused memory map, pfn_valid() and core mm assumptions about validity of the memory map in various ranges was not designed for complex layouts of the physical memory with a lot of holes all over the place. Kefen Wang reported crashes in move_freepages() on a system with the following memory layout [1]: node 0: [mem 0x0000000080a00000-0x00000000855fffff] node 0: [mem 0x0000000086a00000-0x0000000087dfffff] node 0: [mem 0x000000008bd00000-0x000000008c4fffff] node 0: [mem 0x000000008e300000-0x000000008ecfffff] node 0: [mem 0x0000000090d00000-0x00000000bfffffff] node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] node 0: [mem 0x00000000de700000-0x00000000de9fffff] node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] node 0: [mem 0x00000000fda00000-0x00000000ffffefff] These crashes can be mitigated by enabling CONFIG_HOLES_IN_ZONE on ARM and essentially turning pfn_valid_within() to pfn_valid() instead of having it hardwired to 1 on that architecture, but this would require to keep CONFIG_HOLES_IN_ZONE solely for this purpose. A cleaner approach is to update ARM's implementation of pfn_valid() to take into accounting rounding of the freed memory map to pageblock boundaries and make sure it returns true for PFNs that have memory map entries even if there is no physical memory backing those PFNs" Link: https://lore.kernel.org/lkml/[email protected] [1] * tag 'memblock-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: arm: extend pfn_valid to take into account freed memory map alignment memblock: ensure there is no overflow in memblock_overlaps_region() memblock: align freed memory map on pageblock boundaries with SPARSEMEM memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER
2021-07-04	percpu: flush tlb in pcpu_reclaim_populated()	Dennis Zhou	3	-8/+35
	Prior to "percpu: implement partial chunk depopulation", pcpu_depopulate_chunk() was called only on the destruction path. This meant the virtual address range was on its way back to vmalloc which will handle flushing the tlbs for us. However, with pcpu_reclaim_populated(), we are now calling pcpu_depopulate_chunk() during the active lifecycle of a chunk. Therefore, we need to flush the tlb as well otherwise we can end up accessing the wrong page through an invalid tlb mapping as reported in [1]. [1] https://lore.kernel.org/lkml/[email protected]/ Fixes: f183324133ea ("percpu: implement partial chunk depopulation") Reported-and-tested-by: Guenter Roeck <[email protected]> Signed-off-by: Dennis Zhou <[email protected]>
2021-07-03	Merge branch 'work.iov_iter' of ↵	Linus Torvalds	1	-21/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter cleanups and fixes. There are followups, but this is what had sat in -next this cycle. IMO the macro forest in there became much thinner and easier to follow..." * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller clean up copy_mc_pipe_to_iter() pipe_zero(): we don't need no stinkin' kmap_atomic()... iov_iter: clean csum_and_copy_...() primitives up a bit copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases iterate_xarray(): only of the first iteration we might get offset != 0 pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} iov_iter: make iterator callbacks use base and len instead of iovec iov_iter: make the amount already copied available to iterator callbacks iov_iter: get rid of separate bvec and xarray callbacks iov_iter: teach iterate_{bvec,xarray}() about possible short copies iterate_bvec(): expand bvec.h macro forest, massage a bit iov_iter: unify iterate_iovec and iterate_kvec iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec iterate_and_advance(): get rid of magic in case when n is 0 csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant [xarray] iov_iter_npages(): just use DIV_ROUND_UP() iov_iter_npages(): don't bother with iterate_all_kinds() ...
2021-07-02	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	45	-1411/+2919
	Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <[email protected]>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01	Merge branch 'for-5.14' of ↵	Linus Torvalds	6	-186/+338
	git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: - percpu chunk depopulation - depopulate backing pages for chunks with empty pages when we exceed a global threshold without those pages. This lets us reclaim a portion of memory that would previously be lost until the full chunk would be freed (possibly never). - memcg accounting cleanup - previously separate chunks were managed for normal allocations and __GFP_ACCOUNT allocations. These are now consolidated which cleans up the code quite a bit. - a few misc clean ups for clang warnings * 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: optimize locking in pcpu_balance_workfn() percpu: initialize best_upa variable percpu: rework memcg accounting mm, memcg: introduce mem_cgroup_kmem_disabled() mm, memcg: mark cgroup_memory_nosocket, nokmem and noswap as __ro_after_init percpu: make symbol 'pcpu_free_slot' static percpu: implement partial chunk depopulation percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1 percpu: factor out pcpu_check_block_hint() percpu: split __pcpu_balance_workfn() percpu: fix a comment about the chunks ordering
2021-07-01	mm: device exclusive memory access	Alistair Popple	5	-7/+328
	Some devices require exclusive write access to shared virtual memory (SVM) ranges to perform atomic operations on that memory. This requires CPU page tables to be updated to deny access whilst atomic operations are occurring. In order to do this introduce a new swap entry type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for exclusive access by a device all page table mappings for the particular range are replaced with device exclusive swap entries. This causes any CPU access to the page to result in a fault. Faults are resovled by replacing the faulting entry with the original mapping. This results in MMU notifiers being called which a driver uses to update access permissions such as revoking atomic access. After notifiers have been called the device will no longer have exclusive access to the region. Walking of the page tables to find the target pages is handled by get_user_pages() rather than a direct page table walk. A direct page table walk similar to what migrate_vma_collect()/unmap() does could also have been utilised. However this resulted in more code similar in functionality to what get_user_pages() provides as page faulting is required to make the PTEs present and to break COW. [[email protected]: fix signedness bug in make_device_exclusive_range()] Link: https://lkml.kernel.org/r/YNIz5NVnZ5GiZ3u1@mwanda Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/memory.c: allow different return codes for copy_nonpresent_pte()	Alistair Popple	1	-11/+17
	Currently if copy_nonpresent_pte() returns a non-zero value it is assumed to be a swap entry which requires further processing outside the loop in copy_pte_range() after dropping locks. This prevents other values being returned to signal conditions such as failure which a subsequent change requires. Instead make copy_nonpresent_pte() return an error code if further processing is required and read the value for the swap entry in the main loop under the ptl. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm: rename migrate_pgmap_owner	Alistair Popple	1	-5/+5
	MMU notifier ranges have a migrate_pgmap_owner field which is used by drivers to store a pointer. This is subsequently used by the driver callback to filter MMU_NOTIFY_MIGRATE events. Other notifier event types can also benefit from this filtering, so rename the 'migrate_pgmap_owner' field to 'owner' and create a new notifier initialisation function to initialise this field. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Suggested-by: Peter Xu <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/rmap: split migration into its own function	Alistair Popple	3	-104/+288
	Migration is currently implemented as a mode of operation for try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE. However it does not have much in common with the rest of the unmap functionality of try_to_unmap_one() and thus splitting it into a separate function reduces the complexity of try_to_unmap_one() making it more readable. Several simplifications can also be made in try_to_migrate_one() based on the following observations: - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK. - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON. - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH. TTU_SPLIT_FREEZE is a special case of migration used when splitting an anonymous page. This is most easily dealt with by calling the correct function from unmap_page() in mm/huge_memory.c - either try_to_migrate() for PageAnon or try_to_unmap(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/rmap: split try_to_munlock from try_to_unmap	Alistair Popple	2	-23/+55
	The behaviour of try_to_unmap_one() is difficult to follow because it performs different operations based on a fairly large set of flags used in different combinations. TTU_MUNLOCK is one such flag. However it is exclusively used by try_to_munlock() which specifies no other flags. Therefore rather than overload try_to_unmap_one() with unrelated behaviour split this out into it's own function and remove the flag. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/swapops: rework swap entry manipulation code	Alistair Popple	8	-36/+70
	Both migration and device private pages use special swap entries that are manipluated by a range of inline functions. The arguments to these are somewhat inconsistent so rework them to remove flag type arguments and to make the arguments similar for both read and write entry creation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm: remove special swap entry functions	Alistair Popple	6	-18/+17
	Patch series "Add support for SVM atomics in Nouveau", v11. Introduction ============ Some devices have features such as atomic PTE bits that can be used to implement atomic access to system memory. To support atomic operations to a shared virtual memory page such a device needs access to that page which is exclusive of the CPU. This series introduces a mechanism to temporarily unmap pages granting exclusive access to a device. These changes are required to support OpenCL atomic operations in Nouveau to shared virtual memory (SVM) regions allocated with the CL_MEM_SVM_ATOMICS clSVMAlloc flag. A more complete description of the OpenCL SVM feature is available at https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/ OpenCL_API.html#_shared_virtual_memory . Implementation ============== Exclusive device access is implemented by adding a new swap entry type (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main difference is that on fault the original entry is immediately restored by the fault handler instead of waiting. Restoring the entry triggers calls to MMU notifers which allows a device driver to revoke the atomic access permission from the GPU prior to the CPU finalising the entry. Patches ======= Patches 1 & 2 refactor existing migration and device private entry functions. Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated functionality into separate functions - try_to_migrate_one() and try_to_munlock_one(). Patch 5 renames some existing code but does not introduce functionality. Patch 6 is a small clean-up to swap entry handling in copy_pte_range(). Patch 7 contains the bulk of the implementation for device exclusive memory. Patch 8 contains some additions to the HMM selftests to ensure everything works as expected. Patch 9 is a cleanup for the Nouveau SVM implementation. Patch 10 contains the implementation of atomic access for the Nouveau driver. Testing ======= This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program which checks that GPU atomic accesses to system memory are atomic. Without this series the test fails as there is no way of write-protecting the page mapping which results in the device clobbering CPU writes. For reference the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/ Further testing has been performed by adding support for testing exclusive access to the hmm-tests kselftests. This patch (of 10): Remove multiple similar inline functions for dealing with different types of special swap entries. Both migration and device private swap entries use the swap offset to store a pfn. Instead of multiple inline functions to obtain a struct page for each swap entry type use a common function pfn_swap_entry_to_page(). Also open-code the various entry_to_pfn() functions as this results is shorter code that is easier to understand. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	kfence: unconditionally use unbound work queue	Marco Elver	1	-2/+2
	Unconditionally use unbound work queue, and not just if wq_power_efficient is true. Because if the system is idle, KFENCE may wait, and by being run on the unbound work queue, we permit the scheduler to make better scheduling decisions and not require pinning KFENCE to the same CPU upon waking up. Link: https://lkml.kernel.org/r/[email protected] Fixes: 36f0b35d0894 ("kfence: use power-efficient work queue to run delayed work") Signed-off-by: Marco Elver <[email protected]> Reported-by: Hillf Danton <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/page_alloc: move prototype for find_suitable_fallback	Mel Gorman	1	-2/+1
	make W=1 generates the following warning in mmap_lock.c for allnoconfig mm/page_alloc.c:2670:5: warning: no previous prototype for `find_suitable_fallback' [-Wmissing-prototypes] int find_suitable_fallback(struct free_area *area, unsigned int order, ^~~~~~~~~~~~~~~~~~~~~~ find_suitable_fallback is only shared outside of page_alloc.c for CONFIG_COMPACTION but to suppress the warning, move the protype outside of CONFIG_COMPACTION. It is not worth the effort at this time to find a clever way of allowing compaction.c to share the code or avoid the use entirely as the function is called on relatively slow paths. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations	Mel Gorman	1	-27/+32
	make W=1 generates the following warning in mmap_lock.c for allnoconfig mm/mmap_lock.c:213:6: warning: no previous prototype for `__mmap_lock_do_trace_start_locking' [-Wmissing-prototypes] void __mmap_lock_do_trace_start_locking(struct mm_struct mm, bool write) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/mmap_lock.c:219:6: warning: no previous prototype for `__mmap_lock_do_trace_acquire_returned' [-Wmissing-prototypes] void __mmap_lock_do_trace_acquire_returned(struct mm_struct mm, bool write, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/mmap_lock.c:226:6: warning: no previous prototype for `__mmap_lock_do_trace_released' [-Wmissing-prototypes] void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) On !CONFIG_TRACING configurations, the code is dead so put it behind an #ifdef. [[email protected]: fix warning when CONFIG_TRACING is not defined] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Bixuan Cui <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/z3fold: add kerneldoc fields for z3fold_pool	Mel Gorman	1	-0/+2
	make W=1 generates the following warning for z3fold_pool mm/z3fold.c:171: warning: Function parameter or member 'zpool' not described in 'z3fold_pool' mm/z3fold.c:171: warning: Function parameter or member 'zpool_ops' not described in 'z3fold_pool' Commit 9a001fc19ccc ("z3fold: the 3-fold allocator for compressed pages") simply did not document the fields at the time. Add rudimentary documentation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/zbud: add kerneldoc fields for zbud_pool	Mel Gorman	1	-0/+2
	make W=1 generates the following warning for zbud_pool mm/zbud.c:105: warning: Function parameter or member 'zpool' not described in 'zbud_pool' mm/zbud.c:105: warning: Function parameter or member 'zpool_ops' not described in 'zbud_pool' Commit 479305fd7172 ("zpool: remove zpool_evict()") removed the zpool_evict helper and added the associated zpool and operations structure in struct zbud_pool but did not add documentation for the fields. Add rudimentary documentation. Link: https://lkml.kernel.org/r/[email protected] Fixes: 479305fd7172 ("zpool: remove zpool_evict()") Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/memory_hotplug: fix kerneldoc comment for __remove_memory	Mel Gorman	1	-1/+1
	make W=1 generates the following warning for __remove_memory mm/memory_hotplug.c:2044: warning: expecting prototype for remove_memory(). Prototype was for __remove_memory() instead Commit eca499ab3749 ("mm/hotplug: make remove_memory() interface usable") introduced the kerneldoc comment and function but the kerneldoc name and function name did not match. Link: https://lkml.kernel.org/r/[email protected] Fixes: eca499ab3749 ("mm/hotplug: make remove_memory() interface usable") Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/memory_hotplug: fix kerneldoc comment for __try_online_node	Mel Gorman	1	-2/+2
	make W=1 generates the following warning for try_online_node mm/memory_hotplug.c:1087: warning: expecting prototype for try_online_node(). Prototype was for __try_online_node() instead Commit b9ff036082cd ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node") renamed the function but did not update the associated kerneldoc. The function is static and somewhat specialised in nature so it's not clear it warrants being a kerneldoc by moving the comment to try_online_node. Hence, leave the comment of the internal helper in place but leave it out of kerneldoc and correct the function name in the comment. Link: https://lkml.kernel.org/r/[email protected] Fixes: Commit b9ff036082cd ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node") Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection	Mel Gorman	1	-1/+1
	make W=1 generates the following warning for mem_cgroup_calculate_protection mm/memcontrol.c:6468: warning: expecting prototype for mem_cgroup_protected(). Prototype was for mem_cgroup_calculate_protection() instead Commit 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") changed the function definition but not the associated kerneldoc comment. Link: https://lkml.kernel.org/r/[email protected] Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks") Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Chris Down <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/mapping_dirty_helpers: remove double Note in kerneldoc	Mel Gorman	1	-1/+1
	make W=1 generates the following warning for mm/mapping_dirty_helpers.c mm/mapping_dirty_helpers.c:325: warning: duplicate section name 'Note' The helper function is very specific to one driver -- vmwgfx. While the two notes are separate, all of it needs to be taken into account when using the helper so make it one note. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/page_alloc: make should_fail_alloc_page() static	Mel Gorman	1	-1/+1
	make W=1 generates the following warning for mm/page_alloc.c mm/page_alloc.c:3651:15: warning: no previous prototype for `should_fail_alloc_page' [-Wmissing-prototypes] noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) ^~~~~~~~~~~~~~~~~~~~~~ This function is deliberately split out for BPF to allow errors to be injected. The function is not used anywhere else so it is local to the file. Make it static which should still allow error injection to be used similar to how block/blk-core.c:should_fail_bio() works. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/vmalloc: include header for prototype of set_iounmap_nonlazy	Mel Gorman	1	-0/+3
	make W=1 generates the following warning for mm/vmalloc.c mm/vmalloc.c:1599:6: warning: no previous prototype for `set_iounmap_nonlazy' [-Wmissing-prototypes] void set_iounmap_nonlazy(void) ^~~~~~~~~~~~~~~~~~~ This is an arch-generic function only used by x86. On other arches, it's dead code. Include the header with the definition and make it x86-64 specific. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages	Mel Gorman	1	-1/+1
	Patch series "Clean W=1 build warnings for mm/". This is a janitorial only. During development of a tool to catch build warnings early to avoid tripping the Intel lkp-robot, I noticed that mm/ is not clean for W=1. This is generally harmless but there is no harm in cleaning it up. It disrupts git blame a little but on relatively obvious lines that are unlikely to be git blame targets. This patch (of 13): make W=1 generates the following warning for vmscan.c mm/vmscan.c:1814: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst It is not a kerneldoc comment and isolate_lru_pages() is a static function. While the detailed comment is nice, it does not need to be exposed via kernel-doc. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm: fix spelling mistakes	Zhen Lei	4	-5/+5
	Fix some spelling mistakes in comments: each having differents usage ==> each has a different usage statments ==> statements adresses ==> addresses aggresive ==> aggressive datas ==> data posion ==> poison higer ==> higher precisly ==> precisely wont ==> won't We moves tha ==> We move the endianess ==> endianness Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Reviewed-by: Souptick Joarder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm: fix typos and grammar error in comments	Hyeonggon Yoo	1	-1/+1
	We moves tha -> We move that in mm/swap.c statments -> statements in include/linux/mm.h Link: https://lkml.kernel.org/r/20210509063444.GA24745@hyeyoo Signed-off-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/zsmalloc.c: improve readability for async_free_zspage()	Miaohe Lin	1	-1/+1
	The class is extracted from pool->size_class[class_idx] again before calling __free_zspage(). It looks like class will change after we fetch the class lock. But this is misleading as class will stay unchanged. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01	mm/zsmalloc.c: remove confusing code in obj_free()	Miaohe Lin	1	-1/+0
	Patch series "Cleanup for zsmalloc". This series contains cleanups to remove confusing code in obj_free(), combine two atomic ops and improve readability for async_free_zspage(). More details can be found in the respective changelogs. This patch (of 2): OBJ_ALLOCATED_TAG is only set for handle to indicate allocated object. It's irrelevant with obj. So remove this misleading code to improve readability. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/zswap.c: fix two bugs in zswap_writeback_entry()	Miaohe Lin	1	-10/+7
	In the ZSWAP_SWAPCACHE_FAIL and ZSWAP_SWAPCACHE_EXIST case, we forgot to call zpool_unmap_handle() when zpool can't sleep. And we might sleep in zswap_get_swap_cache_page() while zpool can't sleep. To fix all of these, zpool_unmap_handle() should be done before zswap_get_swap_cache_page() when zpool can't sleep. Link: https://lkml.kernel.org/r/[email protected] Fixes: fc6697a89f56 ("mm/zswap: add the flag can_sleep_mapped") Signed-off-by: Miaohe Lin <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tian Tao <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/zswap.c: avoid unnecessary copy-in at map time	Miaohe Lin	1	-1/+1
	The buf mapped via zpool_map_handle() is only used to store compressed page buffer and there is no information to extract from it. So we could use ZPOOL_MM_WO instead to avoid unnecessary copy-in at map time. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tian Tao <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/zswap.c: remove unused function zswap_debugfs_exit()	Miaohe Lin	1	-7/+0
	Patch series "Cleanup and fixup for zswap". This series contains cleanups to remove unused function and avoid unnecessary copy-in at map time. Also this fixes two bugs in the function zswap_writeback_entry(). More details can be found in the respective changelogs. This patch (of 3): zswap_debugfs_exit() is unused, remove it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Tian Tao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm,memory_hotplug: drop unneeded locking	Oscar Salvador	1	-15/+1
	Currently, memory-hotplug code takes zone's span_writelock and pgdat's resize_lock when resizing the node/zone's spanned pages via {move_pfn_range_to_zone(),remove_pfn_range_from_zone()} and when resizing node and zone's present pages via adjust_present_page_count(). These locks are also taken during the initialization of the system at boot time, where it protects parallel struct page initialization, but they should not really be needed in memory-hotplug where all operations are a) synchronized on device level and b) serialized by the mem_hotplug_lock lock. [[email protected]: remove now-unused locals] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30	mm/memory_hotplug: rate limit page migration warnings	Liam Mark	1	-5/+11
	When offlining memory the system can attempt to migrate a lot of pages, if there are problems with migration this can flood the logs. Printing all the data hogs the CPU and cause some RT threads to run for a long time, which may have some bad consequences. Rate limit the page migration warnings in order to avoid this. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam Mark <[email protected]> Signed-off-by: Georgi Djakov <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>