Age | Commit message (Collapse) | Author | Files | Lines |
|
Because FRED always restores the full value of %rsp, ESPFIX is
no longer needed when it's enabled.
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A call is made to arch_get_random_longs() and rdtsc(), rather than just
using get_random_long(), because this was written during a time when
very early boot would give abysmal entropy. These days, a call to
get_random_long() at early boot will incorporate RDRAND, RDTSC, and
more, without having to do anything bespoke.
In fact, the situation is now such that on the majority of x86 systems,
the pool actually is initialized at this point, even though it doesn't
need to be for get_random_long() to still return something better than
what this function currently does.
So simplify this to just call get_random_long() instead.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The archrandom interface was originally designed for x86, which supplies
RDRAND/RDSEED for receiving random words into registers, resulting in
one function to generate an int and another to generate a long. However,
other architectures don't follow this.
On arm64, the SMCCC TRNG interface can return between one and three
longs. On s390, the CPACF TRNG interface can return arbitrary amounts,
with four longs having the same cost as one. On UML, the os_getrandom()
interface can return arbitrary amounts.
So change the api signature to take a "max_longs" parameter designating
the maximum number of longs requested, and then return the number of
longs generated.
Since callers need to check this return value and loop anyway, each arch
implementation does not bother implementing its own loop to try again to
fill the maximum number of longs. Additionally, all existing callers
pass in a constant max_longs parameter. Taken together, these two things
mean that the codegen doesn't really change much for one-word-at-a-time
platforms, while performance is greatly improved on platforms such as
s390.
Acked-by: Heiko Carstens <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Reviewed-by: Alexios Zavras <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The "normal" kernel page table creation mechanisms using
PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
The few places in the kernel that always want _PAGE_GLOBAL must
avoid using PAGE_KERNEL_*.
Document that we want it here and its use is not accidental.
Signed-off-by: Dave Hansen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Convert all allocations that used a NOTRACK flag to stop using it.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim Hansen <[email protected]>
Cc: Vegard Nossum <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros. Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active. Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.
The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.
Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask. These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.
Also, an early initialization function is added for SME. If SME is active,
this function:
- Updates the early_pmd_flags so that early page faults create mappings
with the encryption mask.
- Updates the __supported_pte_mask to include the encryption mask.
- Updates the protection_map entries to include the encryption mask so
that user-space allocations will automatically have the encryption mask
applied.
Signed-off-by: Tom Lendacky <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Toshimitsu Kani <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.
Let's give these variable more generic names: init_top_pgt and
early_top_pgt.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We don't need extra virtual address space for ESPFIX, so it stays within
one PUD page table for both 4- and 5-level paging.
Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would
make it stay in the same place regarding of paging mode.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.
PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-0. This means that this flag has never been
actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: kvm ML <[email protected]>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: kvm ML <[email protected]>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
|
|
As we alloc pages with GFP_KERNEL in init_espfix_ap() which is
called before we enable local irqs, so the lockdep sub-system
would (correctly) trigger a warning about the potentially
blocking API.
So we allocate them on the boot CPU side when the secondary CPU is
brought up by the boot CPU, and hand them over to the secondary
CPU.
And we use alloc_pages_node() with the secondary CPU's node, to
make sure the espfix stack is NUMA-local to the CPU that is
going to use it.
Signed-off-by: Zhu Guihua <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add a CPU index parameter to init_espfix_ap(), so that the
parameter could be propagated to the function for espfix
page allocation.
Signed-off-by: Zhu Guihua <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Previous versions of the espfix had a single function which did setup
the pagetables. It was later split into BSP and AP version. Drop unused
leftovers after that split.
Signed-off-by: Borislav Petkov <[email protected]>
|
|
init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.
The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.
More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.
Signed-off-by: Boris Ostrovsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
|
|
Sparse warns that the percpu variables aren't declared before they are
defined. Rather than hacking around it, move espfix definitions into
a proper header file.
Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. This
causes some 16-bit software to break, but it also leaks kernel state
to user space. We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.
In checkin:
b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.
This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart. When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace. The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.
(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)
Special thanks to:
- Andy Lutomirski, for the suggestion of using very small stack slots
and copy (as opposed to map) the IRET frame there, and for the
suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.
Reported-by: Brian Gerst <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andrew Lutomriski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Dirk Hohndel <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: comex <[email protected]>
Cc: Alexander van Heukelum <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: <[email protected]> # consider after upstream merge
|