Age | Commit message (Collapse) | Author | Files | Lines |
|
This is required to make numa emulation code architecture independent so
that it can be moved to generic code in following commits.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This is required to make numa emulation code architecture independent so
that it can be moved to generic code in following commits.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
By the time numa_emulation() is called, all physical memory is already
mapped in the direct map and there is no need to define limits for
memblock allocation.
Replace memblock_phys_alloc_range() with memblock_alloc().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The definitions of FAKE_NODE_MIN_SIZE and FAKE_NODE_MIN_HASH_MASK are only
used by numa emulation code, make them local to
arch/x86/mm/numa_emulation.c
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Instead of looping over numa_meminfo array to detect node's start and
end addresses use get_pfn_range_for_init().
This is shorter and make it easier to lift numa_memblks to generic code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Allocation of numa_distance uses memblock_phys_alloc_range() to limit
allocation to be below the last mapped page.
But NUMA initializaition runs after the direct map is populated and there
is also code in setup_arch() that adjusts memblock limit to reflect how
much memory is already mapped in the direct map.
Simplify the allocation of numa_distance and use plain memblock_alloc().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Architectures that support NUMA duplicate the code that allocates
NODE_DATA on the node-local memory with slight variations in reporting of
the addresses where the memory was allocated.
Use x86 version as the basis for the generic alloc_node_data() function
and call this function in architecture specific numa initialization.
Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like
x86 used to do since the bootmem era when allocation granularity was
PAGE_SIZE anyway.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Every architecture that supports NUMA defines node_data in the same way:
struct pglist_data *node_data[MAX_NUMNODES];
No reason to keep multiple copies of this definition and its forward
declarations, especially when such forward declaration is the only thing
in include/asm/mmzone.h for many architectures.
Add definition and declaration of node_data to generic code and drop
architecture-specific versions.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Tested-by: Zi Yan <[email protected]> # for x86_64 and arm64
Tested-by: Jonathan Cameron <[email protected]> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rob Herring (Arm) <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
DMA ops are a helper for architectures and not for drivers to override
the DMA implementation.
Unfortunately driver authors keep ignoring this. Make the fact more
clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers
overriding their dma_ops depend on that. These drivers should probably be
marked broken, but we can give them a bit of a grace period for that.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Sakari Ailus <[email protected]> # for IPU6
Acked-by: Robin Murphy <[email protected]>
|
|
IFM references
There's an erratum that prevents the PAT from working correctly:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf
# Document 316515 Version 010
The kernel currently disables PAT support on those CPUs, but it
does it with some magic numbers.
Replace the magic numbers with the new "IFM" macros.
Make the check refer to the last affected CPU (INTEL_CORE_YONAH)
rather than the first fixed one. This makes it easier to find the
documentation of the erratum since Intel documents where it is
broken and not where it is fixed.
I don't think the Pentium Pro (or Pentium II) is actually affected.
But the old check included them, so it can't hurt to keep doing the
same. I'm also not completely sure about the "Pentium M" CPUs
(models 0x9 and 0xd). But, again, they were included in in the
old checks and were close Pentium III derivatives, so are likely
affected.
While we're at it, revise the comment referring to the erratum name
and making sure it is a quote of the language from the actual errata
doc. That should make it easier to find in the future when the URL
inevitably changes.
Why bother with this in the first place? It actually gets rid of one
of the very few remaining direct references to c->x86{,_model}.
No change in functionality intended.
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Len Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Until recently, KVM_CAP_READONLY_MEM was unconditionally supported on
x86, but this is no longer the case for SEV-ES and SEV-SNP VMs.
When KVM_CHECK_EXTENSION is invoked on a VM, only advertise
KVM_CAP_READONLY_MEM when it's actually supported.
Fixes: 66155de93bcf ("KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)")
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Michael Roth <[email protected]>
Signed-off-by: Tom Dohrmann <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Fix the following 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/mm/testmmiotrace.o
Link: https://lkml.kernel.org/r/20240730-module_description_orphans-v1-2-7094088076c8@quicinc.com
Signed-off-by: Jeff Johnson <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Alistar Popple <[email protected]>
Cc: Andrew Jeffery <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eddie James <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Joel Stanley <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Naveen N Rao <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Nouveau <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Russell King <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
All relevant architectures had already been converted to the new interface
(which just has an underscore in front of the name - not very imaginative
naming), this just force-converts the stragglers.
The modern interface is almost identical to the old one, except instead of
the page pointer it takes a "struct vm_special_mapping" that describes the
mapping (and contains the page pointer as one member), and it returns the
resulting 'vma' instead of just the error code.
Getting rid of the old interface also gets rid of some special casing,
which had caused problems with the mremap extensions to "struct
vm_special_mapping".
[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/CAHk-=whvR+z=0=0gzgdfUiK70JTa-=+9vxD-4T=3BagXR6dciA@mail.gmail.comTested-by: Rob Landley <[email protected]> # arch/sh/
Link: https://lore.kernel.org/all/20240819195120.GA1113263@thelio-3990X/
Signed-off-by: Linus Torvalds <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Landley <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now that powerpc no longer uses arch_unmap() to handle VDSO unmapping,
there are no meaningful implementions left. Drop support for it entirely,
and update comments which refer to it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michael Ellerman <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pedro Falcato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Some new helpers will be needed for pud entry updates soon. Introduce
these helpers by referencing the pmd ones. Namely:
- pudp_invalidate(): this helper invalidates a huge pud before a
split happens, so that the invalidated pud entry will make sure no
race will happen (either with software, like a concurrent zap, or
hardware, like a/d bit lost).
- pud_modify(): this helper applies a new pgprot to an existing huge
pud mapping.
For more information on why we need these two helpers, please refer to the
corresponding pmd helpers in the mprotect() code path.
When at it, simplify the pud_modify()/pmd_modify() comments on shadow
stack pgtable entries to reference pte_modify() to avoid duplicating the
whole paragraph three times.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Introduce arch_check_zapped_pud() to sanity check shadow stack on PUD
zaps. It has the same logic as the PMD helper.
One thing to mention is, it might be a good idea to use page_table_check
in the future for trapping wrong setups of shadow stack pgtable entries
[1]. That is left for the future as a separate effort.
[1] https://lore.kernel.org/all/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When working on mprotect() on 1G dax entries, I hit an zap bad pud error
when zapping a huge pud that is with PROT_NONE permission.
Here the problem is x86's pud_leaf() requires both PRESENT and PSE bits
set to report a pud entry as a leaf, but that doesn't look right, as it's
not following the pXd_leaf() definition that we stick with so far, where
PROT_NONE entries should be reported as leaves.
To fix it, change x86's pud_leaf() implementation to only check against
PSE bit to report a leaf, irrelevant of whether PRESENT bit is set.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make accept_memory() and range_contains_unaccepted_memory() take 'start'
and 'size' arguments instead of 'start' and 'end'.
Remove accept_page(), replacing it with direct calls to accept_memory().
The accept_page() name is going to be used for a different function.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport (Microsoft) <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: split PTE/PMD PT table Kconfig cleanups+clarifications".
This series is a follow up to the fixes:
"[PATCH v1 0/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
When working on the fixes, I wondered why 8xx is fine (-> never uses split
PT locks) and how PT locking even works properly with PMD page table
sharing (-> always requires split PMD PT locks).
Let's improve the split PT lock detection, make hugetlb properly depend on
it and make 8xx bail out if it would ever get enabled by accident.
As an alternative to patch #3 we could extend the Kconfig
SPLIT_PTE_PTLOCKS option from patch #2 -- but enforcing it closer to the
code that actually implements it feels a bit nicer for documentation
purposes, and there is no need to actually disable it because it should
always be disabled (!SMP).
Did a bunch of cross-compilations to make sure that split PTE/PMD PT locks
are still getting used where we would expect them.
[1] https://lkml.kernel.org/r/[email protected]
This patch (of 3):
Let's clean that up a bit and prepare for depending on
CONFIG_SPLIT_PMD_PTLOCKS in other Kconfig options.
More cleanups would be reasonable (like the arch-specific "depends on" for
CONFIG_SPLIT_PTE_PTLOCKS), but we'll leave that for another day.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- x2apic_disable() clears x2apic_state and x2apic_mode unconditionally,
even when the state is X2APIC_ON_LOCKED, which prevents the kernel to
disable it thereby creating inconsistent state.
Reorder the logic so it actually works correctly
- The XSTATE logic for handling LBR is incorrect as it assumes that
XSAVES supports LBR when the CPU supports LBR. In fact both
conditions need to be true. Otherwise the enablement of LBR in the
IA32_XSS MSR fails and subsequently the machine crashes on the next
XRSTORS operation because IA32_XSS is not initialized.
Cache the XSTATE support bit during init and make the related
functions use this cached information and the LBR CPU feature bit to
cure this.
- Cure a long standing bug in KASLR
KASLR uses the full address space between PAGE_OFFSET and vaddr_end
to randomize the starting points of the direct map, vmalloc and
vmemmap regions. It thereby limits the size of the direct map by
using the installed memory size plus an extra configurable margin for
hot-plug memory. This limitation is done to gain more randomization
space because otherwise only the holes between the direct map,
vmalloc, vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel,
so the memory hot-plug and resource management related code paths
still operate under the assumption that the available address space
can be determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of
the direct map and if unlucky this address is in the vmalloc space,
which causes high_memory to become greater than VMALLOC_START and
consequently causes iounmap() to fail for valid ioremap addresses.
Cure this by exposing the end of the direct map via PHYSMEM_END and
use that for the memory hot-plug and resource management related
places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case
PHYSMEM_END maps to a variable which is initialized by the KASLR
initialization and otherwise it is based on MAX_PHYSMEM_BITS as
before.
- Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of
an initialized variabled on the stack to the VMM. The variable is
only required as output value, so it does not have to exposed to the
VMM in the first place.
- Prevent an array overrun in the resource control code on systems with
Sub-NUMA Clustering enabled because the code failed to adjust the
index by the number of SNC nodes per L3 cache.
* tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix arch_mbm_* array overrun on SNC
x86/tdx: Fix data leak in mmio_read()
x86/kaslr: Expose and use the end of the physical memory address space
x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
x86/apic: Make x2apic_disable() work correctly
|
|
This reverts the following commits:
- 236dec051078 ("kconfig: tinyconfig: provide whole choice blocks to
avoid warnings")
- b0f269728ccd ("x86/config: Fix warning for 'make ARCH=x86_64
tinyconfig'")
Since commit f79dc03fe68c ("kconfig: refactor choice value calculation"),
it is no longer necessary to disable the remaining options in choice
blocks.
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
|
|
Sparse expect an __iomem pointer, but after converting the EISA probe to
memremap() the pointer is a regular memory pointer. Access it directly
instead.
[ tglx: Converted it to fix the already applied version ]
Fixes: 80a4da05642c ("x86/EISA: Use memremap() to probe for the EISA BIOS signature")
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
SEV-SNP support is present since commit 1dfe571c12cf ("KVM: SEV: Add
initial SEV-SNP support") but Kconfig entry wasn't updated and still
mentions SEV and SEV-ES only. Add SEV-SNP there and, while on it, expand
'SEV' in the description as 'Encrypted VMs' is not what 'SEV' stands for.
No functional change.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
When using resctrl on systems with Sub-NUMA Clustering enabled, monitoring
groups may be allocated RMID values which would overrun the
arch_mbm_{local,total} arrays.
This is due to inconsistencies in whether the SNC-adjusted num_rmid value or
the unadjusted value in resctrl_arch_system_num_rmid_idx() is used. The
num_rmid value for the L3 resource is currently:
resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache
As a simple fix, make resctrl_arch_system_num_rmid_idx() return the
SNC-adjusted, L3 num_rmid value on x86.
Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Signed-off-by: Peter Newman <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently, struct snp_guest_msg includes a message header (96 bytes) and
a payload (4000 bytes). There is an implicit assumption here that the
SNP message header will always be 96 bytes, and with that assumption the
payload array size has been set to 4000 bytes - a magic number. If any
new member is added to the SNP message header, the SNP guest message
will span more than a page.
Instead of using a magic number for the payload, declare struct
snp_guest_msg in a way that payload plus the message header do not
exceed a page.
[ bp: Massage. ]
Suggested-by: Tom Lendacky <[email protected]>
Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The mmio_read() function makes a TDVMCALL to retrieve MMIO data for an
address from the VMM.
Sean noticed that mmio_read() unintentionally exposes the value of an
initialized variable (val) on the stack to the VMM.
This variable is only needed as an output value. It did not need to be
passed to the VMM in the first place.
Do not send the original value of *val to the VMM.
[ dhansen: clarify what 'val' is used for. ]
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Reported-by: Sean Christopherson <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Cc:[email protected]
Link: https://lore.kernel.org/all/20240826125304.1566719-1-kirill.shutemov%40linux.intel.com
|
|
Allowing iounmap() on memory that was not ioremap()'d in the first
place is obviously a bad idea. There is currently a feeble attempt to
avoid errant iounmap()s by checking to see if the address is below
"high_memory". But that's imprecise at best because there are plenty
of high addresses that are also invalid to call iounmap() on.
Thankfully, there is a more precise helper: is_ioremap_addr(). x86
just does not use it in iounmap().
Restrict iounmap() to addresses in the ioremap region, by using
is_ioremap_addr(). This aligns x86 closer to the generic iounmap()
implementation.
Additionally, add a warning in case there is an attempt to iounmap()
invalid memory. This replaces an existing silent return and will
help alert folks to any incorrect usage of iounmap().
Due to VMALLOC_START on i386 not being present in asm/pgtable.h,
include for asm/vmalloc.h had to be added to include/linux/ioremap.h.
[ dhansen: tweak subject and changelog ]
Signed-off-by: Max Ramanouski <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Link: https://lore.kernel.org/all/20240824220111.84441-1-max8rr8%40gmail.com
|
|
The FRED RSP0 MSR points to the top of the kernel stack for user level
event delivery. As this is the task stack it needs to be updated when a
task is scheduled in.
The update is done at context switch. That means it's also done when
switching to kernel threads, which is pointless as those never go out to
user space. For KVM threads this means there are two writes to FRED_RSP0 as
KVM has to switch to the guest value before VMENTER.
Defer the update to the exit to user space path and cache the per CPU
FRED_RSP0 value, so redundant writes can be avoided.
Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual
MSR value after returning from guest to host mode.
[ tglx: Massage change log ]
Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
use the alternatives mechanism to choose WRMSRNS when it's available,
otherwise fallback to WRMSR.
Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer
dependent on FRED.
[1] https://lore.kernel.org/lkml/[email protected]/
Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At
least that's the current information from the hardware folks.
Signed-off-by: Andrew Cooper <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
In most cases, ti_work values passed to arch_exit_to_user_mode_prepare()
are zeros, e.g., 99% in kernel build tests. So an obvious optimization is
to test ti_work for zero before processing individual bits in it.
Omit the optimization when FPU debugging is enabled, otherwise the
FPU consistency check is never executed.
Intel 0day tests did not find a perfermance regression with this change.
Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Running the ltp test cve-2015-3290 concurrently reports the following
warnings.
perfevents: irq loop stuck!
WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174
intel_pmu_handle_irq+0x285/0x370
Call Trace:
<NMI>
? __warn+0xa4/0x220
? intel_pmu_handle_irq+0x285/0x370
? __report_bug+0x123/0x130
? intel_pmu_handle_irq+0x285/0x370
? __report_bug+0x123/0x130
? intel_pmu_handle_irq+0x285/0x370
? report_bug+0x3e/0xa0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x50
? asm_exc_invalid_op+0x1a/0x20
? irq_work_claim+0x1e/0x40
? intel_pmu_handle_irq+0x285/0x370
perf_event_nmi_handler+0x3d/0x60
nmi_handle+0x104/0x330
Thanks to Thomas Gleixner's analysis, the issue is caused by the low
initial period (1) of the frequency estimation algorithm, which triggers
the defects of the HW, specifically erratum HSW11 and HSW143. (For the
details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)
The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL
event, but the initial period in the freq mode is 1. The erratum is the
same as the BDM11, which has been supported in the kernel. A minimum
period of 128 is enforced as well on HSW.
HSW143 is regarding that the fixed counter 1 may overcount 32 with the
Hyper-Threading is enabled. However, based on the test, the hardware
has more issues than it tells. Besides the fixed counter 1, the message
'interrupt took too long' can be observed on any counter which was armed
with a period < 32 and two events expired in the same NMI. A minimum
period of 32 is enforced for the rest of the events.
The recommended workaround code of the HSW143 is not implemented.
Because it only addresses the issue for the fixed counter. It brings
extra overhead through extra MSR writing. No related overcounting issue
has been reported so far.
Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support")
Reported-by: Li Huafei <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Closes: https://lore.kernel.org/lkml/[email protected]/
|
|
SS is initialized to NULL during boot time and not explicitly set to
__KERNEL_DS.
With FRED enabled, if a kernel event is delivered before a CPU goes to
user level for the first time, its SS is NULL thus NULL is pushed into
the SS field of the FRED stack frame. But before ERETS is executed,
the CPU may context switch to another task and go to user level. Then
when the CPU comes back to kernel mode, SS is changed to __KERNEL_DS.
Later when ERETS is executed to return from the kernel event handler,
a #GP fault is generated because SS doesn't match the SS saved in the
FRED stack frame.
Initialize SS to __KERNEL_DS when enabling FRED to prevent that.
Note, IRET doesn't check if SS matches the SS saved in its stack frame,
thus IDT doesn't have this problem. For IDT it doesn't matter whether
SS is set to __KERNEL_DS or not, because it's set to NULL upon interrupt
or exception delivery and __KERNEL_DS upon SYSCALL. Thus it's pointless
to initialize SS for IDT.
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Add new PCI IDs for Device 18h and Function 4 to enable the amd_atl driver
on those systems.
Signed-off-by: Richard Gong <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
Commit 15a416e8aaa7 ("x86/entry: Treat BUG/WARN as NMI-like entries")
removed the implementation but left the declaration.
Signed-off-by: Yue Haibing <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
init_per_cpu_var() returns a pointer in the percpu address space while
rip_rel_ptr() expects a pointer in the generic address space.
When strict address space checks are enabled, GCC's named address space
checks fail:
asm.h:124:63: error: passing argument 1 of 'rip_rel_ptr' from
pointer to non-enclosed address space
Add a explicit cast to remove address space of the returned pointer.
Fixes: 11e36b0f7c21 ("x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[]")
Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
When SGX is not supported by the BIOS, the kernel log contains the error
'SGX disabled by BIOS', which can be confusing since there might not be an
SGX-related option in the BIOS settings.
For the kernel it's difficult to distinguish between the BIOS not
supporting SGX and the BIOS supporting SGX but having it disabled.
Therefore, update the error message to 'SGX disabled or unsupported by
BIOS' to make it easier for those reading kernel logs to understand what's
happening.
Reported-by: Bo Wu <[email protected]>
Co-developed-by: Zelong Xiang <[email protected]>
Signed-off-by: Zelong Xiang <[email protected]>
Signed-off-by: WangYuli <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Closes: https://github.com/linuxdeepin/developer-center/issues/10032
|
|
The current assembly around swap_pages() in the relocate_kernel() takes
some time to follow because the use of registers can be easily lost when
the line of assembly goes long. Add a couple of comments to clarify the
code around swap_pages() to improve readability.
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/all/8b52b0b8513a34b2a02fb4abb05c6700c2821475.1724573384.git.kai.huang@intel.com
|
|
When relocate_kernel() gets called, %rdi holds 'indirection_page' and
%rsi holds 'page_list'. And %rdi always holds 'indirection_page' when
swap_pages() is called.
Therefore the comment of the first line code of swap_pages()
movq %rdi, %rcx /* Put the page_list in %rcx */
.. isn't correct because it actually moves the 'indirection_page' to
the %rcx. Fix it.
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/all/adafdfb1421c88efce04420fc9a996c0e2ca1b34.1724573384.git.kai.huang@intel.com
|
|
Building the SGX code with W=1 generates below warning:
arch/x86/kernel/cpu/sgx/main.c:741: warning: Function parameter or
struct member 'low' not described in 'sgx_calc_section_metric'
arch/x86/kernel/cpu/sgx/main.c:741: warning: Function parameter or
struct member 'high' not described in 'sgx_calc_section_metric'
...
The function sgx_calc_section_metric() is a simple helper which is only
used in sgx/main.c. There's no need to use kernel-doc style comment for
it.
Downgrade to a normal comment.
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
The area at the 0x0FFFD9 physical location in the PC memory space is
regular memory, traditionally ROM BIOS and more recently a copy of BIOS
code and data in RAM, write-protected.
Therefore use memremap() to get access to it rather than ioremap(),
avoiding issues in virtualization scenarios and complementing changes such
as commit f7750a795687 ("x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use
memremap() for RAM mappings") or commit 5997efb96756 ("x86/boot: Use
memremap() to map the MPF and MPC data").
Reported-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Maciej W. Rozycki <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Closes: https://lore.kernel.org/r/[email protected]
|
|
mtrr_bp_restore() has been removed in commit 0b9a6a8bedbf ("x86/mtrr: Add a
stop_machine() handler calling only cache_cpu_init()"), but the declaration
was left behind. Remove it.
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
|
|
After a recent LLVM change that deduces __cold on functions that only call
cold code (such as __init functions), there is a section mismatch warning
from percpu_setup_exception_stacks(), which got moved to .text.unlikely. as
a result of that optimization:
WARNING: modpost: vmlinux: section mismatch in reference:
percpu_setup_exception_stacks+0x3a (section: .text.unlikely.) ->
cea_map_percpu_pages (section: .init.text)
Drop the inline keyword, which does not guarantee inlining, and replace it
with __init, as percpu_setup_exception_stacks() is only called from __init
code, which clears up the warning.
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/20240822-x86-percpu_setup_exception_stacks-init-v1-1-57c5921b8209@kernel.org
|
|
The macros FOUR_ROUNDS_AND_SCHED and DO_4ROUNDS rely on an
unexpected/undocumented behavior of the GNU assembler, which might
change in the future
(https://sourceware.org/bugzilla/show_bug.cgi?id=32073).
M (1) (2) // 1 arg !? Future: 2 args
M 1 + 2 // 1 arg !? Future: 3 args
M 1 2 // 2 args
Add parentheses around the single arguments to support future GNU
assembler and LLVM integrated assembler (when the IsOperator hack from
the following link is dropped).
Link: https://github.com/llvm/llvm-project/commit/055006475e22014b28a070db1bff41ca15f322f0
Signed-off-by: Fangrui Song <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Use native_read_cr*() helpers to read control registers into vmsa->cr*
instead of open-coded assembly.
No functional change intended, unless there was a purpose to specifying
rax.
Signed-off-by: Yosry Ahmed <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
Message-ID: <[email protected]>
|
|
Modern (fortified) memcpy() prefers to avoid writing (or reading) beyond
the end of the addressed destination (or source) struct member:
In function ‘fortify_memcpy_chk’,
inlined from ‘syscall_get_arguments’ at ./arch/x86/include/asm/syscall.h:85:2,
inlined from ‘populate_seccomp_data’ at kernel/seccomp.c:258:2,
inlined from ‘__seccomp_filter’ at kernel/seccomp.c:1231:3:
./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
580 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As already done for x86_64 and compat mode, do not use memcpy() to
extract syscall arguments from struct pt_regs but rather just perform
direct assignments. Binary output differences are negligible, and actually
ends up using less stack space:
- sub $0x84,%esp
+ sub $0x6c,%esp
and less text size:
text data bss dec hex filename
10794 252 0 11046 2b26 gcc-32b/kernel/seccomp.o.stock
10714 252 0 10966 2ad6 gcc-32b/kernel/seccomp.o.after
Closes: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Tested-by: Mirsad Todorovac <[email protected]>
Link: https://lore.kernel.org/all/20240708202202.work.477-kees%40kernel.org
|
|
If host supports Bus Lock Detect, KVM advertises it to guests even if
SVM support is absent. Additionally, guest wouldn't be able to use it
despite guest CPUID bit being set. Fix it by unconditionally clearing
the feature bit in KVM cpu capability.
Reported-by: Jim Mattson <[email protected]>
Closes: https://lore.kernel.org/r/CALMp9eRet6+v8Y1Q-i6mqPm4hUow_kJNhmVHfOV8tMfuSS=tVg@mail.gmail.com
Fixes: 76ea438b4afc ("KVM: X86: Expose bus lock debug exception to guest")
Cc: [email protected]
Signed-off-by: Ravi Bangoria <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
If these msrs are read by the emulator (e.g due to 'force emulation' prefix),
SVM code currently fails to extract the corresponding segment bases,
and return them to the emulator.
Fix that.
Cc: [email protected]
Signed-off-by: Maxim Levitsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Grab kvm->srcu when processing KVM_SET_VCPU_EVENTS, as KVM will forcibly
leave nested VMX/SVM if SMM mode is being toggled, and leaving nested VMX
reads guest memory.
Note, kvm_vcpu_ioctl_x86_set_vcpu_events() can also be called from KVM_RUN
via sync_regs(), which already holds SRCU. I.e. trying to precisely use
kvm_vcpu_srcu_read_lock() around the problematic SMM code would cause
problems. Acquiring SRCU isn't all that expensive, so for simplicity,
grab it unconditionally for KVM_SET_VCPU_EVENTS.
=============================
WARNING: suspicious RCU usage
6.10.0-rc7-332d2c1d713e-next-vm #552 Not tainted
-----------------------------
include/linux/kvm_host.h:1027 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by repro/1071:
#0: ffff88811e424430 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x7d/0x970 [kvm]
stack backtrace:
CPU: 15 PID: 1071 Comm: repro Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #552
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0x90
lockdep_rcu_suspicious+0x13f/0x1a0
kvm_vcpu_gfn_to_memslot+0x168/0x190 [kvm]
kvm_vcpu_read_guest+0x3e/0x90 [kvm]
nested_vmx_load_msr+0x6b/0x1d0 [kvm_intel]
load_vmcs12_host_state+0x432/0xb40 [kvm_intel]
vmx_leave_nested+0x30/0x40 [kvm_intel]
kvm_vcpu_ioctl_x86_set_vcpu_events+0x15d/0x2b0 [kvm]
kvm_arch_vcpu_ioctl+0x1107/0x1750 [kvm]
? mark_held_locks+0x49/0x70
? kvm_vcpu_ioctl+0x7d/0x970 [kvm]
? kvm_vcpu_ioctl+0x497/0x970 [kvm]
kvm_vcpu_ioctl+0x497/0x970 [kvm]
? lock_acquire+0xba/0x2d0
? find_held_lock+0x2b/0x80
? do_user_addr_fault+0x40c/0x6f0
? lock_release+0xb7/0x270
__x64_sys_ioctl+0x82/0xb0
do_syscall_64+0x6c/0x170
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7ff11eb1b539
</TASK>
Fixes: f7e570780efc ("KVM: x86: Forcibly leave nested virt when SMM state is toggled")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Error out if kvm_mmu_reload() fails when pre-faulting memory, as trying to
fault-in SPTEs will fail miserably due to root.hpa pointing at garbage.
Note, kvm_mmu_reload() can return -EIO and thus trigger the WARN on -EIO
in kvm_vcpu_pre_fault_memory(), but all such paths also WARN, i.e. the
WARN isn't user-triggerable and won't run afoul of warn-on-panic because
the kernel would already be panicking.
BUG: unable to handle page fault for address: 000029ffffffffe8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP
CPU: 22 PID: 1069 Comm: pre_fault_memor Not tainted 6.10.0-rc7-332d2c1d713e-next-vm #548
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:is_page_fault_stale+0x3e/0xe0 [kvm]
RSP: 0018:ffffc9000114bd48 EFLAGS: 00010206
RAX: 00003fffffffffc0 RBX: ffff88810a07c080 RCX: ffffc9000114bd78
RDX: ffff88810a07c080 RSI: ffffea0000000000 RDI: ffff88810a07c080
RBP: ffffc9000114bd78 R08: 00007fa3c8c00000 R09: 8000000000000225
R10: ffffea00043d7d80 R11: 0000000000000000 R12: ffff88810a07c080
R13: 0000000100000000 R14: ffffc9000114be58 R15: 0000000000000000
FS: 00007fa3c9da0740(0000) GS:ffff888277d80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000029ffffffffe8 CR3: 000000011d698000 CR4: 0000000000352eb0
Call Trace:
<TASK>
kvm_tdp_page_fault+0xcc/0x160 [kvm]
kvm_mmu_do_page_fault+0xfb/0x1f0 [kvm]
kvm_arch_vcpu_pre_fault_memory+0xd0/0x1a0 [kvm]
kvm_vcpu_ioctl+0x761/0x8c0 [kvm]
__x64_sys_ioctl+0x82/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Modules linked in: kvm_intel kvm
CR2: 000029ffffffffe8
---[ end trace 0000000000000000 ]---
Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]
Reviewed-by: Kai Huang <[email protected]>
Tested-by: xingwei lee <[email protected]>
Tested-by: yuxin wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Replace "removed" with "frozen" in comments as appropriate to complete the
rename of REMOVED_SPTE to FROZEN_SPTE.
Fixes: 964cea817196 ("KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE")
Signed-off-by: Yan Zhao <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[sean: write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
|