aboutsummaryrefslogtreecommitdiff
path: root/include/linux/pgtable.h
AgeCommit message (Collapse)AuthorFilesLines
2022-07-17mm/mmap: define DECLARE_VM_GET_PAGE_PROTAnshuman Khandual1-0/+28
This just converts the generic vm_get_page_prot() implementation into a new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does not create any functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13mm: functions may simplify the use of return valuesLi kunyu1-9/+3
p4d_clear_huge may be optimized for void return type and function usage. vunmap_p4d_range function saves a few steps here. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li kunyu <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13mm: remove __HAVE_ARCH_PTEP_CLEAR in pgtable.hTong Tiangen1-2/+0
Currently, there is no architecture definition __HAVE_ARCH_PTEP_CLEAR, Generic ptep_clear() is the only definition for all architecture, So drop the "#ifndef __HAVE_ARCH_PTEP_CLEAR". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tong Tiangen <[email protected]> Suggested-by: Anshuman Khandual <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13mm: page_table_check: add hooks to public helpersTong Tiangen1-8/+15
Move ptep_clear() to the include/linux/pgtable.h and add page table check relate hooks to some helpers, it's prepare for support page table check feature on new architecture. Optimize the implementation of ptep_clear(), page table hooks added page table check stubs, the interface control should be at stubs, there is no rationale for doing a IS_ENABLED() check here. For architectures that do not enable CONFIG_PAGE_TABLE_CHECK, they will call a fallback page table check stubs[1] when getting their page table helpers[2] in include/linux/pgtable.h. [1] page table check stubs defined in include/linux/page_table_check.h [2] ptep_clear() ptep_get_and_clear() pmdp_huge_get_and_clear() pudp_huge_get_and_clear() Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tong Tiangen <[email protected]> Acked-by: Pasha Tatashin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()Matthew Wilcox (Oracle)1-1/+1
shmem_swapin_page() only brings in order-0 pages, which are folios by definition. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13mm: avoid unnecessary flush on change_huge_pmd()Nadav Amit1-0/+20
Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nadav Amit <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-09mm/swap: remember PG_anon_exclusive via a swp pte bitDavid Hildenbrand1-0/+29
Patch series "mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages", v2. This series fixes memory corruptions when a GUP R/W reference (FOLL_WRITE | FOLL_GET) was taken on an anonymous page and COW logic fails to detect exclusivity of the page to then replacing the anonymous page by a copy in the page table: The GUP reference lost synchronicity with the pages mapped into the page tables. This series focuses on x86, arm64, s390x and ppc64/book3s -- other architectures are fairly easy to support by implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE. This primarily fixes the O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we lose DMA reads to a page (modifying the user page by writing to it). O_DIRECT currently uses FOLL_GET for short-term (!FOLL_LONGTERM) DMA from/to a user page. In the long run, we want to convert it to properly use FOLL_PIN, and John is working on it, but that might take a while and might not be easy to backport. In the meantime, let's restore what used to work before we started modifying our COW logic: make R/W FOLL_GET references reliable as long as there is no fork() after GUP involved. This is just the natural follow-up of part 2, that will also further reduce "wrong COW" on the swapin path, for example, when we cannot remove a page from the swapcache due to concurrent writeback, or if we have two threads faulting on the same swapped-out page. Fixing O_DIRECT is just a nice side-product This issue, including other related COW issues, has been summarized in [3] under 2): " 2. Intra Process Memory Corruptions due to Wrong COW (FOLL_GET) It was discovered that we can create a memory corruption by reading a file via O_DIRECT to a part (e.g., first 512 bytes) of a page, concurrently writing to an unrelated part (e.g., last byte) of the same page, and concurrently write-protecting the page via clear_refs SOFTDIRTY tracking [6]. For the reproducer, the issue is that O_DIRECT grabs a reference of the target page (via FOLL_GET) and clear_refs write-protects the relevant page table entry. On successive write access to the page from the process itself, we wrongly COW the page when resolving the write fault, resulting in a loss of synchronicity and consequently a memory corruption. While some people might think that using clear_refs in this combination is a corner cases, it turns out to be a more generic problem unfortunately. For example, it was just recently discovered that we can similarly create a memory corruption without clear_refs, simply by concurrently swapping out the buffer pages [7]. Note that we nowadays even use the swap infrastructure in Linux without an actual swap disk/partition: the prime example is zram which is enabled as default under Fedora [10]. The root issue is that a write-fault on a page that has additional references results in a COW and thereby a loss of synchronicity and consequently a memory corruption if two parties believe they are referencing the same page. " We don't particularly care about R/O FOLL_GET references: they were never reliable and O_DIRECT doesn't expect to observe modifications from a page after DMA was started. Note that: * this only fixes the issue on x86, arm64, s390x and ppc64/book3s ("enterprise architectures"). Other architectures have to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE to achieve the same. * this does *not * consider any kind of fork() after taking the reference: fork() after GUP never worked reliably with FOLL_GET. * Not losing PG_anon_exclusive during swapout was the last remaining piece. KSM already makes sure that there are no other references on a page before considering it for sharing. Page migration maintains PG_anon_exclusive and simply fails when there are additional references (freezing the refcount fails). Only swapout code dropped the PG_anon_exclusive flag because it requires more work to remember + restore it. With this series in place, most COW issues of [3] are fixed on said architectures. Other architectures can implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE fairly easily. [1] https://lkml.kernel.org/r/[email protected] [2] https://lkml.kernel.org/r/[email protected] [3] https://lore.kernel.org/r/[email protected] This patch (of 8): Currently, we clear PG_anon_exclusive in try_to_unmap() and forget about it. We do this, to keep fork() logic on swap entries easy and efficient: for example, if we wouldn't clear it when unmapping, we'd have to lookup the page in the swapcache for each and every swap entry during fork() and clear PG_anon_exclusive if set. Instead, we want to store that information directly in the swap pte, protected by the page table lock, similarly to how we handle SWP_MIGRATION_READ_EXCLUSIVE for migration entries. However, for actual swap entries, we don't want to mess with the swap type (e.g., still one bit) because it overcomplicates swap code. In try_to_unmap(), we already reject to unmap in case the page might be pinned, because we must not lose PG_anon_exclusive on pinned pages ever. Checking if there are other unexpected references reliably *before* completely unmapping a page is unfortunately not really possible: THP heavily overcomplicate the situation. Once fully unmapped it's easier -- we, for example, make sure that there are no unexpected references *after* unmapping a page before starting writeback on that page. So, we currently might end up unmapping a page and clearing PG_anon_exclusive if that page has additional references, for example, due to a FOLL_GET. do_swap_page() has to re-determine if a page is exclusive, which will easily fail if there are other references on a page, most prominently GUP references via FOLL_GET. This can currently result in memory corruptions when taking a FOLL_GET | FOLL_WRITE reference on a page even when fork() is never involved: try_to_unmap() will succeed, and when refaulting the page, it cannot be marked exclusive and will get replaced by a copy in the page tables on the next write access, resulting in writes via the GUP reference to the page being lost. In an ideal world, everybody that uses GUP and wants to modify page content, such as O_DIRECT, would properly use FOLL_PIN. However, that conversion will take a while. It's easier to fix what used to work in the past (FOLL_GET | FOLL_WRITE) remembering PG_anon_exclusive. In addition, by remembering PG_anon_exclusive we can further reduce unnecessary COW in some cases, so it's the natural thing to do. So let's transfer the PG_anon_exclusive information to the swap pte and store it via an architecture-dependant pte bit; use that information when restoring the swap pte in do_swap_page() and unuse_pte(). During fork(), we simply have to clear the pte bit and are done. Of course, there is one corner case to handle: swap backends that don't support concurrent page modifications while the page is under writeback. Special case these, and drop the exclusive marker. Add a comment why that is just fine (also, reuse_swap_page() would have done the same in the past). In the future, we'll hopefully have all architectures support __HAVE_ARCH_PTE_SWP_EXCLUSIVE, such that we can get rid of the empty stubs and the define completely. Then, we can also convert SWP_MIGRATION_READ_EXCLUSIVE. For architectures it's fairly easy to support: either simply use a yet unused pte bit that can be used for swap entries, steal one from the arch type bits if they exceed 5, or steal one from the offset bits. Note: R/O FOLL_GET references were never really reliable, especially when taking one on a shared page and then writing to the page (e.g., GUP after fork()). FOLL_GET, including R/W references, were never really reliable once fork was involved (e.g., GUP before fork(), GUP during fork()). KSM steps back in case it stumbles over unexpected references and is, therefore, fine. [[email protected]: fix SWP_STABLE_WRITES test] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Jann Horn <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Peter Xu <[email protected]> Cc: Don Dutile <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liang Zhang <[email protected]> Cc: Pedro Demarchi Gomes <[email protected]> Cc: Oded Gabbay <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Gerald Schaefer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-02-04mm/pgtable: define pte_index so that preprocessor could recognize itMike Rapoport1-0/+1
Since commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") pte_index is a static inline and there is no define for it that can be recognized by the preprocessor. As a result, vm_insert_pages() uses slower loop over vm_insert_page() instead of insert_pages() that amortizes the cost of spinlock operations when inserting multiple pages. Link: https://lkml.kernel.org/r/[email protected] Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Signed-off-by: Mike Rapoport <[email protected]> Reported-by: Christian Dietrich <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm: ptep_clear() page table helperPasha Tatashin1-0/+8
We have ptep_get_and_clear() and ptep_get_and_clear_full() helpers to clear PTE from user page tables, but there is no variant for simple clear of a present PTE from user page tables without using a low level pte_clear() which can be either native or para-virtualised. Add a new ptep_clear() that can be used in common code to clear PTEs from page table. We will need this call later in order to add a hook for page table check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muchun Song <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wei Xu <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-21Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge"Jonathan Marek1-25/+1
This reverts commit c742199a014de23ee92055c2473d91fe5561ffdf. c742199a014d ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge") breaks arm64 in at least two ways for configurations where PUD or PMD folding occur: 1. We no longer install huge-vmap mappings and silently fall back to page-granular entries, despite being able to install block entries at what is effectively the PGD level. 2. If the linear map is backed with block mappings, these will now silently fail to be created in alloc_init_pud(), causing a panic early during boot. The pgtable selftests caught this, although a fix has not been forthcoming and Christophe is AWOL at the moment, so just revert the change for now to get a working -rc3 on which we can queue patches for 5.15. A simple revert breaks the build for 32-bit PowerPC 8xx machines, which rely on the default function definitions when the corresponding page-table levels are folded, since commit a6a8f7c4aa7e ("powerpc/8xx: add support for huge pages on VMAP and VMALLOC"), eg: powerpc64-linux-ld: mm/vmalloc.o: in function `vunmap_pud_range': linux/mm/vmalloc.c:362: undefined reference to `pud_clear_huge' To avoid that, add stubs for pud_clear_huge() and pmd_clear_huge() in arch/powerpc/mm/nohash/8xx.c as suggested by Christophe. Cc: Christophe Leroy <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Fixes: c742199a014d ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge") Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Acked-by: Marc Zyngier <[email protected]> [mpe: Fold in 8xx.c changes from Christophe and mention in change log] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/linux-arm-kernel/CAMuHMdXShORDox-xxaeUfDW3wx2PeggFSqhVSHVZNKCGK-y_vQ@mail.gmail.com/ Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-07-08mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *Aneesh Kumar K.V1-1/+1
No functional change in this patch. [[email protected]: m68k build error reported by kernel robot] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *Aneesh Kumar K.V1-1/+1
No functional change in this patch. [[email protected]: fix] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: another fix] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01mm/thp: define default pmd_pgtable()Anshuman Khandual1-0/+9
Currently most platforms define pmd_pgtable() as pmd_page() duplicating the same code all over. Instead just define a default value i.e pmd_page() for pmd_pgtable() and let platforms override when required via <asm/pgtable.h>. All the existing platform that override pmd_pgtable() have been moved into their respective <asm/pgtable.h> header in order to precede before the new generic definition. This makes it much cleaner with reduced code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Nick Hu <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Guo Ren <[email protected]> Cc: Brian Cain <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Michal Simek <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Stafford Horne <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Chris Zankel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01mm: define default value for FIRST_USER_ADDRESSAnshuman Khandual1-0/+9
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the same code all over. Instead just define a generic default value (i.e 0UL) for FIRST_USER_ADDRESS and let the platforms override when required. This makes it much cleaner with reduced code. The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h> when the given platform overrides its value via <asm/pgtable.h>. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Acked-by: Guo Ren <[email protected]> [csky] Acked-by: Stafford Horne <[email protected]> [openrisc] Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Mike Rapoport <[email protected]> Acked-by: Palmer Dabbelt <[email protected]> [RISC-V] Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Guo Ren <[email protected]> Cc: Brian Cain <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Michal Simek <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Stafford Horne <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Chris Zankel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm/pgtable: add stubs for {pmd/pub}_{set/clear}_hugeChristophe Leroy1-1/+25
For architectures with no PMD and/or no PUD, add stubs similar to what we have for architectures without P4D. [[email protected]: arm64: define only {pud/pmd}_{set/clear}_huge when useful] Link: https://lkml.kernel.org/r/73ec95f40cafbbb69bdfb43a7f53876fd845b0ce.1620990479.git.christophe.leroy@csgroup.eu [[email protected]: x86: define only {pud/pmd}_{set/clear}_huge when useful] Link: https://lkml.kernel.org/r/7fbf1b6bc3e15c07c24fa45278d57064f14c896b.1620930415.git.christophe.leroy@csgroup.eu Link: https://lkml.kernel.org/r/5ac5976419350e8e048d463a64cae449eb3ba4b0.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Naresh Kamboju <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29mm: define default MAX_PTRS_PER_* in include/pgtable.hDaniel Axtens1-0/+22
Commit c65e774fb3f6 ("x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable") made PTRS_PER_P4D variable on x86 and introduced MAX_PTRS_PER_P4D as a constant for cases which need a compile-time constant (e.g. fixed-size arrays). powerpc likewise has boot-time selectable MMU features which can cause other mm "constants" to vary. For KASAN, we have some static PTE/PMD/PUD/P4D arrays so we need compile-time maximums for all these constants. Extend the MAX_PTRS_PER_ idiom, and place default definitions in include/pgtable.h. These define MAX_PTRS_PER_x to be PTRS_PER_x unless an architecture has defined MAX_PTRS_PER_x in its arch headers. Clean up pgtable-nop4d.h and s390's MAX_PTRS_PER_P4D definitions while we're at it: both can just pick up the default now. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Axtens <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-05Revert "MIPS: make userspace mapping young by default"Thomas Bogendoerfer1-0/+8
This reverts commit f685a533a7fab35c5d069dcd663f59c8e4171a75. The MIPS cache flush logic needs to know whether the mapping was already established to decide how to flush caches. This is done by checking the valid bit in the PTE. The commit above breaks this logic by setting the valid in the PTE in new mappings, which causes kernel crashes. Link: https://lkml.kernel.org/r/[email protected] Fixes: f685a533a7f ("MIPS: make userspace mapping young by default") Reported-by: Zhou Yanjie <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Cc: Huang Pei <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07include/linux/pgtable.h: few spelling fixesBhaskar Chowdhury1-5/+5
Few spelling fixes throughout the file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: do not migrate zero pagePavel Tatashin1-0/+12
On some platforms ZERO_PAGE(0) might end-up in a movable zone. Do not migrate zero page in gup during longterm pinning as migration of zero page is not allowed. For example, in x86 QEMU with 16G of memory and kernelcore=5G parameter, I see the following: Boot#1: zero_pfn 0x48a8d zero_pfn zone: ZONE_DMA32 Boot#2: zero_pfn 0x20168d zero_pfn zone: ZONE_MOVABLE On x86, empty_zero_page is declared in .bss and depending on the loader may end up in different physical locations during boots. Also, move is_zero_pfn() my_zero_pfn() functions under CONFIG_MMU, because zero_pfn that they are using is declared in memory.c which is compiled with CONFIG_MMU. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-03-10arm64: mte: Map hotplugged memory as Normal TaggedCatalin Marinas1-0/+4
In a system supporting MTE, the linear map must allow reading/writing allocation tags by setting the memory type as Normal Tagged. Currently, this is only handled for memory present at boot. Hotplugged memory uses Normal non-Tagged memory. Introduce pgprot_mhp() for hotplugged memory and use it in add_memory_resource(). The arm64 code maps pgprot_mhp() to pgprot_tagged(). Note that ZONE_DEVICE memory should not be mapped as Tagged and therefore setting the memory type in arch_add_memory() is not feasible. Signed-off-by: Catalin Marinas <[email protected]> Fixes: 0178dc761368 ("arm64: mte: Use Normal Tagged attributes for the linear map") Reported-by: Patrick Daly <[email protected]> Tested-by: Patrick Daly <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> # 5.10.x Cc: Will Deacon <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-02-26MIPS: make userspace mapping young by defaultHuang Pei1-8/+0
MIPS page fault path(except huge page) takes 3 exceptions (1 TLB Miss + 2 TLB Invalid), butthe second TLB Invalid exception is just triggered by __update_tlb from do_page_fault writing tlb without _PAGE_VALID set. With this patch, user space mapping prot is made young by default (with both _PAGE_VALID and _PAGE_YOUNG set), and it only take 1 TLB Miss + 1 TLB Invalid exception Remove pte_sw_mkyoung without polluting MM code and make page fault delay of MIPS on par with other architecture Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Huang Pei <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Acked-by: <[email protected]> Acked-by: Thomas Bogendoerfer <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: <[email protected]> Cc: Bibo Mao <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Paul Burton <[email protected]> Cc: Li Xuefeng <[email protected]> Cc: Yang Tiezhu <[email protected]> Cc: Gao Juxin <[email protected]> Cc: Fuxin Zhang <[email protected]> Cc: Huacai Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-01-20mm: Cleanup faultaround and finish_fault() codepathsKirill A. Shutemov1-0/+11
alloc_set_pte() has two users with different requirements: in the faultaround code, it called from an atomic context and PTE page table has to be preallocated. finish_fault() can sleep and allocate page table as needed. PTL locking rules are also strange, hard to follow and overkill for finish_fault(). Let's untangle the mess. alloc_set_pte() has gone now. All locking is explicit. The price is some code duplication to handle huge pages in faultaround path, but it should be fine, having overall improvement in readability. Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box Signed-off-by: Kirill A. Shutemov <[email protected]> [will: s/from from/from/ in comment; spotted by willy] Signed-off-by: Will Deacon <[email protected]>
2020-12-14Merge tag 'perf-core-2020-12-14' of ↵Linus Torvalds1-0/+71
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Better handling of page table leaves on archictectures which have architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. - Prevent a deadlock vs exec_update_mutex Architectures: - The related updates for page size calculation of leaf entries - The usual churn to support new CPUs - Small fixes and improvements all over the place" * tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/intel: Add Tremont Topdown support uprobes/x86: Fix fall-through warnings for Clang perf/x86: Fix fall-through warnings for Clang kprobes/x86: Fix fall-through warnings for Clang perf/x86/intel/lbr: Fix the return type of get_lbr_cycles() perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake x86/kprobes: Restore BTF if the single-stepping is cancelled perf: Break deadlock involving exec_update_mutex sparc64/mm: Implement pXX_leaf_size() support powerpc/8xx: Implement pXX_leaf_size() support arm64/mm: Implement pXX_leaf_size() support perf/core: Fix arch_perf_get_page_size() mm: Introduce pXX_leaf_size() mm/gup: Provide gup_get_pte() more generic perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY perf/x86/intel/uncore: Add Rocket Lake support perf/x86/msr: Add Rocket Lake CPU support perf/x86/cstate: Add Rocket Lake CPU support perf/x86/intel: Add Rocket Lake CPU support perf,mm: Handle non-page-table-aligned hugetlbfs ...
2020-12-03mm: Introduce pXX_leaf_size()Peter Zijlstra1-0/+16
A number of architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. Provide generic helpers to determine the size of a page-table leaf. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03mm/gup: Provide gup_get_pte() more genericPeter Zijlstra1-0/+55
In order to write another lockless page-table walker, we need gup_get_pte() exposed. While doing that, rename it to ptep_get_lockless() to match the existing ptep_get() naming. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-16arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where neededArnd Bergmann1-0/+13
Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer <[email protected]> Reviewed-by: Stefan Agner <[email protected]> Tested-by: Stefan Agner <[email protected]> Acked-by: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann <[email protected]>
2020-11-02mm: always have io_remap_pfn_range() set pgprot_decrypted()Jason Gunthorpe1-4/+0
The purpose of io_remap_pfn_range() is to map IO memory, such as a memory mapped IO exposed through a PCI BAR. IO devices do not understand encryption, so this memory must always be decrypted. Automatically call pgprot_decrypted() as part of the generic implementation. This fixes a bug where enabling AMD SME causes subsystems, such as RDMA, using io_remap_pfn_range() to expose BAR pages to user space to fail. The CPU will encrypt access to those BAR pages instead of passing unencrypted IO directly to the device. Places not mapping IO should use remap_pfn_range(). Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Dave Young" <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Toshimitsu Kani <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-12Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's quite a lot of code here, but much of it is due to the addition of a new PMU driver as well as some arm64-specific selftests which is an area where we've traditionally been lagging a bit. In terms of exciting features, this includes support for the Memory Tagging Extension which narrowly missed 5.9, hopefully allowing userspace to run with use-after-free detection in production on CPUs that support it. Work is ongoing to integrate the feature with KASAN for 5.11. Another change that I'm excited about (assuming they get the hardware right) is preparing the ASID allocator for sharing the CPU page-table with the SMMU. Those changes will also come in via Joerg with the IOMMU pull. We do stray outside of our usual directories in a few places, mostly due to core changes required by MTE. Although much of this has been Acked, there were a couple of places where we unfortunately didn't get any review feedback. Other than that, we ran into a handful of minor conflicts in -next, but nothing that should post any issues. Summary: - Userspace support for the Memory Tagging Extension introduced by Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11. - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context switching. - Fix and subsequent rewrite of our Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. - Support for the Armv8.3 Pointer Authentication enhancements. - Support for ASID pinning, which is required when sharing page-tables with the SMMU. - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op. - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and also support to handle CPU PMU IRQs as NMIs. - Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. - Implementation of ARCH_STACKWALK for unwinding. - Improve reporting of unexpected kernel traps due to BPF JIT failure. - Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. - Removal of TEXT_OFFSET. - Removal of some unused functions, parameters and prototypes. - Removal of MPIDR-based topology detection in favour of firmware description. - Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. - Cleanups to the SDEI driver in preparation for support in KVM. - Miscellaneous cleanups and refactoring work" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) Revert "arm64: initialize per-cpu offsets earlier" arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 ...
2020-09-26mm/gup: fix gup_fast with dynamic page table foldingVasily Gorbik1-0/+10
Currently to make sure that every page table entry is read just once gup_fast walks perform READ_ONCE and pass pXd value down to the next gup_pXd_range function by value e.g.: static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) ... pudp = pud_offset(&p4d, addr); This function passes a reference on that local value copy to pXd_offset, and might get the very same pointer in return. This happens when the level is folded (on most arches), and that pointer should not be iterated. On s390 due to the fact that each task might have different 5,4 or 3-level address translation and hence different levels folded the logic is more complex and non-iteratable pointer to a local copy leads to severe problems. Here is an example of what happens with gup_fast on s390, for a task with 3-level paging, crossing a 2 GB pud boundary: // addr = 0x1007ffff000, end = 0x10080001000 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; // pud_offset returns &p4d itself (a pointer to a value on stack) pudp = pud_offset(&p4d, addr); do { // on second iteratation reading "random" stack value pud_t pud = READ_ONCE(*pudp); // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 next = pud_addr_end(addr, end); ... } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack return 1; } This happens since s390 moved to common gup code with commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") and commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code"). s390 tried to mimic static level folding by changing pXd_offset primitives to always calculate top level page table offset in pgd_offset and just return the value passed when pXd_offset has to act as folded. What is crucial for gup_fast and what has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly. And the latter is not possible with dynamic folding. To fix the issue in addition to pXd values pass original pXdp pointers down to gup_pXd_range functions. And introduce pXd_offset_lockless helpers, which take an additional pXd entry value parameter. This has already been discussed in https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Gerald Schaefer <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Claudio Imbrenda <[email protected]> Cc: <[email protected]> [5.2+] Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours Signed-off-by: Linus Torvalds <[email protected]>
2020-09-04mm: Add arch hooks for saving/restoring tagsSteven Price1-0/+28
Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to every physical page, when swapping pages out to disk it is necessary to save these tags, and later restore them when reading the pages back. Add some hooks along with dummy implementations to enable the arch code to handle this. Three new hooks are added to the swap code: * arch_prepare_to_swap() and * arch_swap_invalidate_page() / arch_swap_invalidate_area(). One new hook is added to shmem: * arch_swap_restore() Signed-off-by: Steven Price <[email protected]> [[email protected]: add unlock_page() on the error path] [[email protected]: dropped the _tags suffix] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Andrew Morton <[email protected]>
2020-08-17arch/ia64: Restore arch-specific pgd_offset_k implementationJessica Clarke1-0/+2
IA-64 is special and treats pgd_offset_k() differently to pgd_offset(), using different formulae to calculate the indices into the kernel and user PGDs. The index into the user PGDs takes into account the region number, but the index into the kernel (init_mm) PGD always assumes a predefined kernel region number. Commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which incorrectly used pgd_index() for kernel page tables. As a result, the index into the kernel PGD was going out of bounds and the kernel hung during early boot. Allow overrides of pgd_offset_k() and override it on IA-64 with the old implementation that will correctly index the kernel PGD. Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Reported-by: John Paul Adrian Glaubitz <[email protected]> Signed-off-by: Jessica Clarke <[email protected]> Tested-by: John Paul Adrian Glaubitz <[email protected]> Acked-by: Tony Luck <[email protected]> Signed-off-by: Mike Rapoport <[email protected]>
2020-08-12mm: drop duplicated words in <linux/pgtable.h>Randy Dunlap1-6/+6
Drop the doubled words "used" and "by". Drop the repeated acronym "TLB" and make several other fixes around it. (capital letters, spellos) Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-07-30mm: pgtable: Make generic pgprot_* macros available for no-MMUPekka Enberg1-34/+37
The <linux/pgtable.h> header defines some generic pgprot_* implementations, but they are only available when CONFIG_MMU is enabled. The RISC-V architecture, for example, therefore defines some of these pgprot_* macros for !NOMMU. Let's make the pgprot_* generic available even for !NOMMU so we can remove the RISC-V specific definitions. Compile-tested with x86 defconfig, and riscv defconfig and !MMU defconfig. Suggested-by: Palmer Dabbelt <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2020-06-20mm: Allow arches to provide ptep_get()Christophe Leroy1-0/+7
Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") it is not possible anymore to use READ_ONCE() to access complex page table entries like the one defined for powerpc 8xx with 16k size pages. Define a ptep_get() helper that architectures can override instead of performing a READ_ONCE() on the page table entry pointer. Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse1-3/+3
Convert comments that reference mmap_sem to reference mmap_lock instead. [[email protected]: fix up linux-next leftovers] [[email protected]: s/lockaphore/lock/, per Vlastimil] [[email protected]: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: Daniel Jordan <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ying Han <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: consolidate pte_index() and pte_offset_*() definitionsMike Rapoport1-0/+91
All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix x86 warning] [[email protected]: fix powerpc build] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: pgtable: add shortcuts for accessing kernel PMD and PTEMike Rapoport1-0/+24
The powerpc 32-bit implementation of pgtable has nice shortcuts for accessing kernel PMD and PTE for a given virtual address. Make these helpers available for all architectures. [[email protected]: microblaze: fix page table traversal in setup_rt_frame()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: s/pmd_ptr_k/pmd_off_k/ in various powerpc places] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport1-0/+1323
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>