aboutsummaryrefslogtreecommitdiff
path: root/arch/arm/include/asm/pgtable-3level.h
AgeCommit message (Collapse)AuthorFilesLines
2020-11-16arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where neededArnd Bergmann1-0/+2
Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer <[email protected]> Reviewed-by: Stefan Agner <[email protected]> Tested-by: Stefan Agner <[email protected]> Acked-by: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann <[email protected]>
2020-06-09mm: consolidate pte_index() and pte_offset_*() definitionsMike Rapoport1-7/+0
All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix x86 warning] [[email protected]: fix powerpc build] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-03mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()Anshuman Khandual1-1/+1
pmd_present() is expected to test positive after pmdp_mknotpresent() as the PMD entry still points to a valid huge page in memory. pmdp_mknotpresent() implies that given PMD entry is just invalidated from MMU perspective while still holding on to pmd_page() referred valid huge page thus also clearing pmd_present() test. This creates the following situation which is counter intuitive. [pmd_present(pmd_mknotpresent(pmd)) = true] This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's functionality more accurately while changing the above mentioned situation as follows. This does not create any functional change. [pmd_present(pmd_mkinvalid(pmd)) = true] This is not applicable for platforms that define own pmdp_invalidate() via __HAVE_ARCH_PMDP_INVALIDATE. Suggestion for renaming came during a previous discussion here. https://patchwork.kernel.org/patch/11019637/ [[email protected]: change pmd_mknotvalid() to pmd_mkinvalid() per Will] Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Catalin Marinas <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-24arm: Remove HYP/Stage-2 page-table supportMarc Zyngier1-20/+0
Remove all traces of Stage-2 and HYP page table support. Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Olof Johansson <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Vladimir Murzin <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Linus Walleij <[email protected]> Acked-by: Christoffer Dall <[email protected]>
2020-02-04arm: mm: add p?d_leaf() definitionsSteven Price1-0/+1
walk_page_range() is going to be allowed to walk page tables other than those of user space. For this it needs to know when it has reached a 'leaf' entry in the page tables. This information is provided by the p?d_leaf() functions/macros. For arm pmd_large() already exists and does what we want. So simply provide the generic pmd_leaf() name. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Price <[email protected]> Cc: Russell King <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Hogan <[email protected]> Cc: James Morse <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: "Liang, Kan" <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zong Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333Thomas Gleixner1-13/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 136 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-06-07mm: introduce ARCH_HAS_PTE_SPECIALLaurent Dufour1-1/+0
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Laurent Dufour <[email protected]> Suggested-by: Jerome Glisse <[email protected]> Reviewed-by: Jerome Glisse <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: David S. Miller <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: David Rientjes <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Christophe LEROY <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-31arm/mm: provide pmdp_establish() helperKirill A. Shutemov1-0/+3
ARM LPAE doesn't have hardware dirty/accessed bits. generic_pmdp_establish() is the right implementation of pmdp_establish for this case. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-29mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITEDan Williams1-1/+0
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Suggested-by: Stephen Rothwell <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Oliver OHalloran <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-09ARM: 8579/1: mm: Fix definition of pmd_mknotpresentSteve Capper1-2/+2
Currently pmd_mknotpresent will use a zero entry to respresent an invalidated pmd. Unfortunately this definition clashes with pmd_none, thus it is possible for a race condition to occur if zap_pmd_range sees pmd_none whilst __split_huge_pmd_locked is running too with pmdp_invalidate just called. This patch fixes the race condition by modifying pmd_mknotpresent to create non-zero faulting entries (as is done in other architectures), removing the ambiguity with pmd_none. [[email protected]: using L_PMD_SECT_VALID instead of PMD_TYPE_SECT] Fixes: 8d9625070073 ("ARM: mm: Transparent huge page support for LPAE systems.") Cc: <[email protected]> # 3.11+ Reported-by: Kirill A. Shutemov <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Steve Capper <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King <[email protected]>
2016-06-09ARM: 8578/1: mm: ensure pmd_present only checks the valid bitWill Deacon1-0/+1
In a subsequent patch, pmd_mknotpresent will clear the valid bit of the pmd entry, resulting in a not-present entry from the hardware's perspective. Unfortunately, pmd_present simply checks for a non-zero pmd value and will therefore continue to return true even after a pmd_mknotpresent operation. Since pmd_mknotpresent is only used for managing huge entries, this is only an issue for the 3-level case. This patch fixes the 3-level pmd_present implementation to take into account the valid bit. For bisectability, the change is made before the fix to pmd_mknotpresent. [[email protected]: comment update regarding pmd_mknotpresent patch] Fixes: 8d9625070073 ("ARM: mm: Transparent huge page support for LPAE systems.") Cc: <[email protected]> # 3.11+ Cc: Russell King <[email protected]> Cc: Steve Capper <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King <[email protected]>
2016-05-19arch: fix has_transparent_hugepage()Hugh Dickins1-5/+0
I've just discovered that the useful-sounding has_transparent_hugepage() is actually an architecture-dependent minefield: on some arches it only builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when not, but on some of those (arm and arm64) it then gives the wrong answer; and on mips alone it's marked __init, which would crash if called later (but so far it has not been called later). Straighten this out: make it available to all configs, with a sensible default in asm-generic/pgtable.h, removing its definitions from those arches (arc, arm, arm64, sparc, tile) which are served by the default, adding #define has_transparent_hugepage has_transparent_hugepage to those (mips, powerpc, s390, x86) which need to override the default at runtime, and removing the __init from mips (but maybe that kind of code should be avoided after init: set a static variable the first time it's called). Signed-off-by: Hugh Dickins <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andres Lagar-Cavilla <[email protected]> Cc: Yang Shi <[email protected]> Cc: Ning Qu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Vineet Gupta <[email protected]> [arch/arc] Acked-by: Gerald Schaefer <[email protected]> [arch/s390] Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THPMinchan Kim1-0/+1
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Shaohua Li <[email protected]> Cc: <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Chen Gang <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Daniel Micay <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: David S. Miller <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Jason Evans <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mika Penttil <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15arm, thp: remove infrastructure for handling splitting PMDsKirill A. Shutemov1-9/+0
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. [[email protected]: fix unterminated ifdef in header file] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Steve Capper <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-12mm: convert p[te|md]_mknonnuma and remaining page table manipulationsMel Gorman1-1/+4
With PROT_NONE, the traditional page table manipulation functions are sufficient. [[email protected]: fix compiler warning in pmdp_invalidate()] [[email protected]: fix build with STRICT_MM_TYPECHECKS] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Aneesh Kumar <[email protected]> Tested-by: Sasha Levin <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Dave Jones <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kirill Shutemov <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-10arm: drop L_PTE_FILE and pte_file()-related helpersKirill A. Shutemov1-1/+0
We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also adjust __SWP_TYPE_SHIFT, effectively increase size of possible swap file to 128G. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-01-16KVM: arm: Add initial dirty page locking supportMario Smarduch1-0/+1
Add support for initial write protection of VM memslots. This patch series assumes that huge PUDs will not be used in 2nd stage tables, which is always valid on ARMv7 Acked-by: Christoffer Dall <[email protected]> Signed-off-by: Mario Smarduch <[email protected]>
2014-10-09arm: mm: enable RCU fast_gupSteve Capper1-0/+8
Activate the RCU fast_gup for ARM. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Some pre-requisite functions pud_write and pud_page are also added. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Dann Frazier <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Russell King <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Will Deacon <[email protected]> Cc: Christoffer Dall <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09arm: mm: introduce special ptes for LPAESteve Capper1-0/+7
We need a mechanism to tag ptes as being special, this indicates that no attempt should be made to access the underlying struct page * associated with the pte. This is used by the fast_gup when operating on ptes as it has no means to access VMAs (that also contain this information) locklessly. The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies pte_special and pte_mkspecial to make use of it, and defines __HAVE_ARCH_PTE_SPECIAL. This patch also excludes special ptes from the icache/dcache sync logic. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Dann Frazier <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Russell King <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Will Deacon <[email protected]> Cc: Christoffer Dall <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-07-24ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAESteven Capper1-17/+24
For LPAE, we have the following means for encoding writable or dirty ptes: L_PTE_DIRTY L_PTE_RDONLY !pte_dirty && !pte_write 0 1 !pte_dirty && pte_write 0 1 pte_dirty && !pte_write 1 1 pte_dirty && pte_write 1 0 So we can't distinguish between writeable clean ptes and read only ptes. This can cause problems with ptes being incorrectly flagged as read only when they are writeable but not dirty. This patch renumbers L_PTE_RDONLY from AP[2] to a software bit #58, and adds additional logic to set AP[2] whenever the pte is read only or not dirty. That way we can distinguish between clean writeable ptes and read only ptes. HugeTLB pages will use this new logic automatically. We need to add some logic to Transparent HugePages to ensure that they correctly interpret the revised pgprot permissions (L_PTE_RDONLY has moved and no longer matches PMD_SECT_AP2). In the process of revising THP, the names of the PMD software bits have been prefixed with L_ to make them easier to distinguish from their hardware bit counterparts. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Will Deacon <[email protected]> Signed-off-by: Russell King <[email protected]>
2014-07-24ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclearSteven Capper1-4/+8
Long descriptors on ARM are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(*pte), 1); where pte_dirty(*pte) is downcast to an int. This patch introduces a new macro pte_isset which performs the bitwise and, then performs a double logical invert (where needed) to ensure predictable downcasting. The logical inverse pte_isclear is also introduced. Equivalent pmd functions for Transparent HugePages have also been added. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Will Deacon <[email protected]> Signed-off-by: Russell King <[email protected]>
2014-02-10ARM: 7950/1: mm: Fix stage-2 device memory attributesChristoffer Dall1-6/+9
The stage-2 memory attributes are distinct from the Hyp memory attributes and the Stage-1 memory attributes. We were using the stage-1 memory attributes for stage-2 mappings causing device mappings to be mapped as normal memory. Add the S2 equivalent defines for memory attributes and fix the comments explaining the defines while at it. Add a prot_pte_s2 field to the mem_type struct and fill out the field for device mappings accordingly. Cc: <[email protected]> [3.9+] Acked-by: Marc Zyngier <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Russell King <[email protected]>
2013-12-11ARM: add support to dump the kernel page tablesRussell King1-0/+1
This patch allows the kernel page tables to be dumped via a debugfs file, allowing kernel developers to check the layout of the kernel page tables and the verify the various permissions and type settings. Signed-off-by: Russell King <[email protected]>
2013-11-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...
2013-10-29ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page awareSteven Capper1-0/+3
The memory pinning code in uaccess_with_memcpy.c does not check for HugeTLB or THP pmds, and will enter an infinite loop should a __copy_to_user or __clear_user occur against a huge page. This patch adds detection code for huge pages to pin_page_for_write. As this code can be executed in a fast path it refers to the actual pmds rather than the vma. If a HugeTLB or THP is found (they have the same pmd representation on ARM), the page table spinlock is taken to prevent modification whilst the page is pinned. On ARM, huge pages are only represented as pmds, thus no huge pud checks are performed. (For huge puds one would lock the page table in a similar manner as in the pmd case). Two helper functions are introduced; pmd_thp_or_huge will check whether or not a page is huge or transparent huge (which have the same pmd layout on ARM), and pmd_hugewillfault will detect whether or not a page fault will occur on write to the page. Running the following test (with the chunking from read_zero removed): $ dd if=/dev/zero of=/dev/null bs=10M count=1024 Gave: 2.3 GB/s backed by normal pages, 2.9 GB/s backed by huge pages, 5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK. After some discussion, it was decided not to adopt the HPAGE_MASK, as this would have a significant detrimental effect on the overall system latency due to page_table_lock being held for too long. This could be revisited if split huge page locks are adopted. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Nicolas Pitre <[email protected]> Signed-off-by: Russell King <[email protected]>
2013-10-17KVM: ARM: Support hugetlbfs backed huge pagesChristoffer Dall1-0/+2
Support huge pages in KVM/ARM and KVM/ARM64. The pud_huge checking on the unmap path may feel a bit silly as the pud_huge check is always defined to false, but the compiler should be smart about this. Note: This deals only with VMAs marked as huge which are allocated by users through hugetlbfs only. Transparent huge pages can only be detected by looking at the underlying pages (or the page tables themselves) and this patch so far simply maps these on a page-by-page level in the Stage-2 page tables. Cc: Catalin Marinas <[email protected]> Cc: Russell King <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2013-06-18Merge branch 'for-rmk/lpae' of ↵Russell King1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into devel-stable Conflicts: arch/arm/kernel/smp.c Please pull these miscellaneous LPAE fixes I've been collecting for a while now for 3.11. They've been tested and reviewed by quite a few people, and most of the patches are pretty trivial. -- Will Deacon.
2013-06-04ARM: mm: Transparent huge page support for LPAE systems.Catalin Marinas1-0/+60
The patch adds support for THP (transparent huge pages) to LPAE systems. When this feature is enabled, the kernel tries to map anonymous pages as 2MB sections where possible. Signed-off-by: Catalin Marinas <[email protected]> [[email protected]: symbolic constants used, value of PMD_SECT_SPLITTING adjusted, tlbflush.h included in pgtable.h, added PROT_NONE support.] Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Will Deacon <[email protected]>
2013-06-04ARM: mm: HugeTLB support for LPAE systems.Catalin Marinas1-0/+11
This patch adds support for hugetlbfs based on the x86 implementation. It allows mapping of 2MB sections (see Documentation/vm/hugetlbpage.txt for usage). The 64K pages configuration is not supported (section size is 512MB in this case). Signed-off-by: Catalin Marinas <[email protected]> [[email protected]: symbolic constants replace numbers in places. Split up into multiple files, to simplify future non-LPAE support, removed huge_pmd_share code, as this is very rarely executed, Added PROT_NONE support]. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Will Deacon <[email protected]>
2013-06-04ARM: mm: correct pte_same behaviour for LPAE.Steve Capper1-0/+17
For 3 levels of paging the PTE_EXT_NG bit will be set for user address ptes that are written to a page table but not for ptes created with mk_pte. This can cause some comparison tests made by pte_same to fail spuriously and lead to other problems. To correct this behaviour, we mask off PTE_EXT_NG for any pte that is present before running the comparison. Signed-off-by: Steve Capper <[email protected]> Reviewed-by: Will Deacon <[email protected]>
2013-05-30ARM: lpae: fix definition of PTE_HWTABLE_PTRSWill Deacon1-1/+1
For 2-level page tables, PTE_HWTABLE_PTRS describes the offset between Linux PTEs and hardware PTEs. On LPAE, there is no distinction (since we have 64-bit descriptors with plenty of space) so PTE_HWTABLE_PTRS should be 0. Unfortunately, it is wrongly defined as PTRS_PER_PTE, meaning that current pte table flushing is off by a page. Luckily, all current LPAE implementations are SMP, so the hardware walker can snoop L1. This patch fixes the broken definition. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2013-05-30ARM: LPAE: use signed arithmetic for mask definitionsCyril Chemparathy1-3/+3
This patch applies to PAGE_MASK, PMD_MASK, and PGDIR_MASK, where forcing unsigned long math truncates the mask at the 32-bits. This clearly does bad things on PAE systems. This patch fixes this problem by defining these masks as signed quantities. We then rely on sign extension to do the right thing. Signed-off-by: Cyril Chemparathy <[email protected]> Signed-off-by: Vitaly Andrianov <[email protected]> Reviewed-by: Nicolas Pitre <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Tested-by: Santosh Shilimkar <[email protected]> Tested-by: Subash Patel <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2013-04-16ARM: KVM: fix L_PTE_S2_RDWR to actually be Read/WriteMarc Zyngier1-1/+1
Looks like our L_PTE_S2_RDWR definition is slightly wrong, and is actually write only (see ARM ARM Table B3-9, Stage 2 control of access permissions). Didn't make a difference for normal pages, as we OR the flags together, but I'm still wondering how it worked for Stage-2 mapped devices, such as the GIC. Brown paper bag time, again. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2013-01-23ARM: Add page table and page defines needed by KVMChristoffer Dall1-0/+18
KVM uses the stage-2 page tables and the Hyp page table format, so we define the fields and page protection flags needed by KVM. The nomenclature is this: - page_hyp: PL2 code/data mappings - page_hyp_device: PL2 device mappings (vgic access) - page_s2: Stage-2 code/data page mappings - page_s2_device: Stage-2 device mappings (vgic access) Reviewed-by: Will Deacon <[email protected]> Reviewed-by: Marcelo Tosatti <[email protected]> Christoffer Dall <[email protected]>
2012-11-09ARM: mm: introduce present, faulting entries for PAGE_NONEWill Deacon1-0/+1
PROT_NONE mappings apply the page protection attributes defined by _P000 which translate to PAGE_NONE for ARM. These attributes specify an XN, RDONLY pte that is inaccessible to userspace. However, on kernels configured without support for domains, such a pte *is* accessible to the kernel and can be read via get_user, allowing tasks to read PROT_NONE pages via syscalls such as read/write over a pipe. This patch introduces a new software pte flag, L_PTE_NONE, that is set to identify faulting, present entries. Signed-off-by: Will Deacon <[email protected]>
2012-11-09ARM: mm: introduce L_PTE_VALID for page table entriesWill Deacon1-1/+2
For long-descriptor translation table formats, the ARMv7 architecture defines the last two bits of the second- and third-level descriptors to be: x0b - Invalid 01b - Block (second-level), Reserved (third-level) 11b - Table (second-level), Page (third-level) This allows us to define L_PTE_PRESENT as (3 << 0) and use this value to create ptes directly. However, when determining whether a given pte value is present in the low-level page table accessors, we only need to check the least significant bit of the descriptor, allowing us to write faulting, present entries which are required for PROT_NONE mappings. This patch introduces L_PTE_VALID, which can be used to test whether a pte should fault, and updates the low-level page table accessors accordingly. Signed-off-by: Will Deacon <[email protected]>
2012-05-12ARM: 7416/1: LPAE: Remove unused L_PTE_(BUFFERABLE|CACHEABLE) macrosCatalin Marinas1-2/+0
These have already been removed from the classic MMU in favour of L_PTE_MT_* macros. Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: Russell King <[email protected]>
2011-12-08ARM: LPAE: Page table maintenance for the 3-level formatCatalin Marinas1-0/+53
This patch modifies the pgd/pmd/pte manipulation functions to support the 3-level page table format. Since there is no need for an 'ext' argument to cpu_set_pte_ext(), this patch conditionally defines a different prototype for this function when CONFIG_ARM_LPAE. The patch also introduces the L_PGD_SWAPPER flag to mark pgd entries pointing to pmd tables pre-allocated in the swapper_pg_dir and avoid trying to free them at run-time. This flag is 0 with the classic page table format. Signed-off-by: Catalin Marinas <[email protected]>
2011-12-08ARM: LPAE: Introduce the 3-level page table format definitionsCatalin Marinas1-0/+102
This patch introduces the pgtable-3level*.h files with definitions specific to the LPAE page table format (3 levels of page tables). Each table is 4KB and has 512 64-bit entries. An entry can point to a 40-bit physical address. The young, write and exec software bits share the corresponding hardware bits (negated). Other software bits use spare bits in the PTE. The patch also changes some variable types from unsigned long or int to pteval_t or pgprot_t. Signed-off-by: Catalin Marinas <[email protected]>