aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-01-19kprobes/x86: Blacklist indirect thunk functions for kprobesMasami Hiramatsu1-1/+2
Mark __x86_indirect_thunk_* functions as blacklist for kprobes because those functions can be called from anywhere in the kernel including blacklist functions of kprobes. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151629209111.10241.5444852823378068683.stgit@devbox
2018-01-19retpoline: Introduce start/end markers of indirect thunkMasami Hiramatsu3-1/+10
Introduce start/end markers of __x86_indirect_thunk_* functions. To make it easy, consolidate .text.__x86.indirect_thunk.* sections to one .text.__x86.indirect_thunk section and put it in the end of kernel text section and adds __indirect_thunk_start/end so that other subsystem (e.g. kprobes) can identify it. Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151629206178.10241.6828804696410044771.stgit@devbox
2018-01-19x86/mce: Make machine check speculation protectedThomas Gleixner3-1/+7
The machine check idtentry uses an indirect branch directly from the low level code. This evades the speculation protection. Replace it by a direct call into C code and issue the indirect call there so the compiler can apply the proper speculation protection. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by:Borislav Petkov <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Niced-by: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
2018-01-17module: Add retpoline tag to VERMAGICAndi Kleen1-1/+7
Add a marker for retpoline to the module VERMAGIC. This catches the case when a non RETPOLINE compiled module gets loaded into a retpoline kernel, making it insecure. It doesn't handle the case when retpoline has been runtime disabled. Even in this case the match of the retcompile status will be enforced. This implies that even with retpoline run time disabled all modules loaded need to be recompiled. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: David Woodhouse <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-17x86/cpufeature: Move processor tracing out of scattered featuresPaolo Bonzini2-2/+1
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX), so do not duplicate it in the scattered features word. Besides being more tidy, this will be useful for KVM when it presents processor tracing to the guests. KVM selects host features that are supported by both the host kernel (depending on command line options, CPU errata, or whatever) and KVM. Whenever a full feature word exists, KVM's code is written in the expectation that the CPUID bit number matches the X86_FEATURE_* bit number, but this is not the case for X86_FEATURE_INTEL_PT. Signed-off-by: Paolo Bonzini <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luwei Kang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16objtool: Improve error message for bad file argumentJosh Poimboeuf1-1/+3
If a nonexistent file is supplied to objtool, it complains with a non-helpful error: open: No such file or directory Improve it to: objtool: Can't open 'foo': No such file or directory Reported-by: Markus <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16objtool: Fix seg fault with gold linkerJosh Poimboeuf1-4/+10
Objtool segfaults when the gold linker is used with CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y. With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before being passed to objtool. The gold linker seems to strip unused ELF symbols by default, which confuses objtool and causes the seg fault when it's trying to generate ORC metadata. Objtool should really be running immediately after GCC anyway, without a linker call in between. Change the makefile ordering so that objtool is called before the linker. Reported-and-tested-by: Markus <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-15x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macrosTom Lendacky1-1/+5
The PAUSE instruction is currently used in the retpoline and RSB filling macros as a speculation trap. The use of PAUSE was originally suggested because it showed a very, very small difference in the amount of cycles/time used to execute the retpoline as compared to LFENCE. On AMD, the PAUSE instruction is not a serializing instruction, so the pause/jmp loop will use excess power as it is speculated over waiting for return to mispredict to the correct target. The RSB filling macro is applicable to AMD, and, if software is unable to verify that LFENCE is serializing on AMD (possible when running under a hypervisor), the generic retpoline support will be used and, so, is also applicable to AMD. Keep the current usage of PAUSE for Intel, but add an LFENCE instruction to the speculation trap for AMD. The same sequence has been adopted by GCC for the GCC generated retpolines. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: David Woodhouse <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Dan Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-15x86/retpoline: Fill RSB on context switch for affected CPUsDavid Woodhouse4-0/+59
On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an *underflow* of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as *filling* the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-15x86/kasan: Panic if there is not enough memory to bootAndrey Ryabinin1-10/+14
Currently KASAN doesn't panic in case it don't have enough memory to boot. Instead, it crashes in some random place: kernel BUG at arch/x86/mm/physaddr.c:27! RIP: 0010:__phys_addr+0x268/0x276 Call Trace: kasan_populate_shadow+0x3f2/0x497 kasan_init+0x12e/0x2b2 setup_arch+0x2825/0x2a2c start_kernel+0xc8/0x15f4 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x72/0x75 secondary_startup_64+0xa5/0xb0 Use memblock_virt_alloc_try_nid() for allocations without failure fallback. It will panic with an out of memory message. Reported-by: kernel test robot <[email protected]> Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Cc: [email protected] Cc: Alexander Potapenko <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14x86/retpoline: Remove compile time warningThomas Gleixner1-2/+0
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler does not have retpoline support. Linus rationale for this is: It's wrong because it will just make people turn off RETPOLINE, and the asm updates - and return stack clearing - that are independent of the compiler are likely the most important parts because they are likely the ones easiest to target. And it's annoying because most people won't be able to do anything about it. The number of people building their own compiler? Very small. So if their distro hasn't got a compiler yet (and pretty much nobody does), the warning is just annoying crap. It is already properly reported as part of the sysfs interface. The compile-time warning only encourages bad things. Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Requested-by: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
2018-01-14x86,perf: Disable intel_bts when PTIPeter Zijlstra1-0/+18
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through the cpu_entry_area but instead uses the memory allocated for the perf AUX buffer. This obviously comes apart when using PTI because then the kernel mapping; which includes that AUX buffer memory; disappears. Fixing this requires to expose a mapping which is visible in all context and that's not trivial. As a quick fix disable this driver when PTI is enabled to prevent malfunction. Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Reported-by: Vince Weaver <[email protected]> Reported-by: Robert Święcki <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14security/Kconfig: Correct the Documentation reference for PTIW. Trevor King1-1/+1
When the config option for PTI was added a reference to documentation was added as well. But the documentation did not exist at that point. The final documentation has a different file name. Fix it up to point to the proper file. Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig") Signed-off-by: W. Trevor King <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: [email protected] Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
2018-01-14x86/pti: Fix !PCID and sanitize definesThomas Gleixner3-21/+23
The switch to the user space page tables in the low level ASM code sets unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base address of the page directory to the user part, bit 11 is switching the PCID to the PCID associated with the user page tables. This fails on a machine which lacks PCID support because bit 11 is set in CR3. Bit 11 is reserved when PCID is inactive. While the Intel SDM claims that the reserved bits are ignored when PCID is disabled, the AMD APM states that they should be cleared. This went unnoticed as the AMD APM was not checked when the code was developed and reviewed and test systems with Intel CPUs never failed to boot. The report is against a Centos 6 host where the guest fails to boot, so it's not yet clear whether this is a virt issue or can happen on real hardware too, but thats irrelevant as the AMD APM clearly ask for clearing the reserved bits. Make sure that on non PCID machines bit 11 is not set by the page table switching code. Andy suggested to rename the related bits and masks so they are clearly describing what they should be used for, which is done as well for clarity. That split could have been done with alternatives but the macro hell is horrible and ugly. This can be done on top if someone cares to remove the extra orq. For now it's a straight forward fix. Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Laura Abbott <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: stable <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
2018-01-13selftests/x86: Add test_vsyscallAndy Lutomirski2-1/+501
This tests that the vsyscall entries do what they're expected to do. It also confirms that attempts to read the vsyscall page behave as expected. If changes are made to the vsyscall code or its memory map handling, running this test in all three of vsyscall=none, vsyscall=emulate, and vsyscall=native are helpful. (Because it's easy, this also compares the vsyscall results to their vDSO equivalents.) Note to KAISER backporters: please test this under all three vsyscall modes. Also, in the emulate and native modes, make sure that test_vsyscall_64 agrees with the command line or config option as to which mode you're in. It's quite easy to mess up the kernel such that native mode accidentally emulates or vice versa. Greg, etc: please backport this to all your Meltdown-patched kernels. It'll help make sure the patches didn't regress vsyscalls. CSigned-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-01-12x86/retpoline: Fill return stack buffer on vmexitDavid Woodhouse3-1/+85
In accordance with the Intel and AMD documentation, we need to overwrite all entries in the RSB on exiting a guest, to prevent malicious branch target predictions from affecting the host kernel. This is needed both for retpoline and for IBRS. [ak: numbers again for the RSB stuffing labels] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/irq32: Convert assembler indirect jumpsAndi Kleen1-4/+5
Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/checksum32: Convert assembler indirect jumpsDavid Woodhouse1-3/+4
Convert all indirect jumps in 32bit checksum assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/xen: Convert Xen hypercall indirect jumpsDavid Woodhouse1-2/+3
Convert indirect call in Xen hypercall to use non-speculative sequence, when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/hyperv: Convert assembler indirect jumpsDavid Woodhouse1-8/+10
Convert all indirect jumps in hyperv inline asm code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/ftrace: Convert ftrace assembler indirect jumpsDavid Woodhouse2-6/+8
Convert all indirect jumps in ftrace assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/entry: Convert entry assembler indirect jumpsDavid Woodhouse2-5/+12
Convert indirect jumps in core 32/64bit entry assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return address after the 'call' instruction must be *precisely* at the .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work, and the use of alternatives will mess that up unless we play horrid games to prepend with NOPs and make the variants the same length. It's not worth it; in the case where we ALTERNATIVE out the retpoline, the first instruction at __x86.indirect_thunk.rax is going to be a bare jmp *%rax anyway. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline/crypto: Convert crypto assembler indirect jumpsDavid Woodhouse4-5/+9
Convert all indirect jumps in crypto assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/spectre: Add boot time option to select Spectre v2 mitigationDavid Woodhouse4-5/+195
Add a spectre_v2= option to select the mitigation used for the indirect branch speculation vulnerability. Currently, the only option available is retpoline, in its various forms. This will be expanded to cover the new IBRS/IBPB microcode features. The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a serializing instruction, which is indicated by the LFENCE_RDTSC feature. [ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS integration becomes simple ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12x86/retpoline: Add initial retpoline supportDavid Woodhouse8-0/+231
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide the corresponding thunks. Provide assembler macros for invoking the thunks in the same way that GCC does, from native and inline assembler. This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In some circumstances, IBRS microcode features may be used instead, and the retpoline can be disabled. On AMD CPUs if lfence is serialising, the retpoline can be dramatically simplified to a simple "lfence; jmp *\reg". A future patch, after it has been verified that lfence really is serialising in all circumstances, can enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition to X86_FEATURE_RETPOLINE. Do not align the retpoline in the altinstr section, because there is no guarantee that it stays aligned when it's copied over the oldinstr during alternative patching. [ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks] [ tglx: Put actual function CALL/JMP in front of the macros, convert to symbolic labels ] [ dwmw2: Convert back to numeric labels, merge objtool fixes ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12objtool: Allow alternatives to be ignoredJosh Poimboeuf2-7/+57
Getting objtool to understand retpolines is going to be a bit of a challenge. For now, take advantage of the fact that retpolines are patched in with alternatives. Just read the original (sane) non-alternative instruction, and ignore the patched-in retpoline. This allows objtool to understand the control flow *around* the retpoline, even if it can't yet follow what's inside. This means the ORC unwinder will fail to unwind from inside a retpoline, but will work fine otherwise. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-12objtool: Detect jumps to retpoline thunksJosh Poimboeuf1-0/+7
A direct jump to a retpoline thunk is really an indirect jump in disguise. Change the objtool instruction type accordingly. Objtool needs to know where indirect branches are so it can detect switch statement jump tables. This fixes a bunch of warnings with CONFIG_RETPOLINE like: arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame ... Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-11x86/pti: Make unpoison of pgd for trusted boot work for realDave Hansen1-1/+11
The inital fix for trusted boot and PTI potentially misses the pgd clearing if pud_alloc() sets a PGD. It probably works in *practice* because for two adjacent calls to map_tboot_page() that share a PGD entry, the first will clear NX, *then* allocate and set the PGD (without NX clear). The second call will *not* allocate but will clear the NX bit. Defer the NX clearing to a point after it is known that all top-level allocations have occurred. Add a comment to clarify why. [ tglx: Massaged changelog ] Fixes: 262b6b30087 ("x86/tboot: Unbreak tboot with PTI enabled") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: Jon Masters <[email protected]> Cc: "Tim Chen" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-10x86/alternatives: Fix optimize_nops() checkingBorislav Petkov1-2/+5
The alternatives code checks only the first byte whether it is a NOP, but with NOPs in front of the payload and having actual instructions after it breaks the "optimized' test. Make sure to scan all bytes before deciding to optimize the NOPs in there. Reported-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Tim Chen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-09sysfs/cpu: Fix typos in vulnerability documentationDavid Woodhouse1-2/+2
Fixes: 87590ce6e ("sysfs/cpu: Add vulnerability folder") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2018-01-09x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSCTom Lendacky2-2/+17
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference to MFENCE_RDTSC. However, since the kernel could be running under a hypervisor that does not support writing that MSR, read the MSR back and verify that the bit has been set successfully. If the MSR can be read and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the MFENCE_RDTSC feature. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Reviewed-by: Borislav Petkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-09x86/cpu/AMD: Make LFENCE a serializing instructionTom Lendacky2-0/+12
To aid in speculation control, make LFENCE a serializing instruction since it has less overhead than MFENCE. This is done by setting bit 1 of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not have this MSR. For these families, the LFENCE instruction is already serializing. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Reviewed-by: Borislav Petkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-08x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()Jike Song1-26/+6
The following code contains dead logic: 162 if (pgd_none(*pgd)) { 163 unsigned long new_p4d_page = __get_free_page(gfp); 164 if (!new_p4d_page) 165 return NULL; 166 167 if (pgd_none(*pgd)) { 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page))); 169 new_p4d_page = 0; 170 } 171 if (new_p4d_page) 172 free_page(new_p4d_page); 173 } There can't be any difference between two pgd_none(*pgd) at L162 and L167, so it's always false at L171. Dave Hansen explained: Yes, the double-test was part of an optimization where we attempted to avoid using a global spinlock in the fork() path. We would check for unallocated mid-level page tables without the lock. The lock was only taken when we needed to *make* an entry to avoid collisions. Now that it is all single-threaded, there is no chance of a collision, no need for a lock, and no need for the re-check. As all these functions are only called during init, mark them __init as well. Fixes: 03f4424f348e ("x86/mm/pti: Add functions to clone kernel PMDs") Signed-off-by: Jike Song <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alan Cox <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Jiri Koshina <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andi Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg KH <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Paul Turner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-08x86/tboot: Unbreak tboot with PTI enabledDave Hansen1-0/+1
This is another case similar to what EFI does: create a new set of page tables, map some code at a low address, and jump to it. PTI mistakes this low address for userspace and mistakenly marks it non-executable in an effort to make it unusable for userspace. Undo the poison to allow execution. Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alan Cox <[email protected]> Cc: Tim Chen <[email protected]> Cc: Jon Masters <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jeff Law <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: David" <[email protected]> Cc: Nick Clifton <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-08x86/cpu: Implement CPU vulnerabilites sysfs functionsThomas Gleixner2-0/+30
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and spectre_v2. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-08sysfs/cpu: Add vulnerability folderThomas Gleixner4-0/+74
As the meltdown/spectre problem affects several CPU architectures, it makes sense to have common way to express whether a system is affected by a particular vulnerability or not. If affected the way to express the mitigation should be common as well. Create /sys/devices/system/cpu/vulnerabilities folder and files for meltdown, spectre_v1 and spectre_v2. Allow architectures to override the show function. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-06x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]David Woodhouse2-0/+5
Add the bug bits for spectre v1/2 and force them unconditionally for all cpus. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-06x86/Documentation: Add PTI descriptionDave Hansen2-7/+200
Add some details about how PTI works, what some of the downsides are, and how to debug it when things go wrong. Also document the kernel parameter: 'pti/nopti'. Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Moritz Lipp <[email protected]> Cc: Daniel Gruss <[email protected]> Cc: Michael Schwarz <[email protected]> Cc: Richard Fellner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andi Lutomirsky <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-06x86/pti: Unbreak EFI old_memmapJiri Kosina1-0/+2
EFI_OLD_MEMMAP's efi_call_phys_prolog() calls set_pgd() with swapper PGD that has PAGE_USER set, which makes PTI set NX on it, and therefore EFI can't execute it's code. Fix that by forcefully clearing _PAGE_NX from the PGD (this can't be done by the pgprot API). _PAGE_NX will be automatically reintroduced in efi_call_phys_epilog(), as _set_pgd() will again notice that this is _PAGE_USER, and set _PAGE_NX on it. Tested-by: Dimitri Sivanich <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2018-01-05x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWNThomas Gleixner3-5/+5
Use the name associated with the particular attack which needs page table isolation for mitigation. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Cc: Alan Cox <[email protected]> Cc: Jiri Koshina <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Tim Chen <[email protected]> Cc: Andi Lutomirski <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Greg KH <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
2018-01-05x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asmDavid Woodhouse1-2/+2
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: [email protected] Cc: Tim Chen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-05x86/tlb: Drop the _GPL from the cpu_tlbstate exportThomas Gleixner1-1/+1
The recent changes for PTI touch cpu_tlbstate from various tlb_flush inlines. cpu_tlbstate is exported as GPL symbol, so this causes a regression when building out of tree drivers for certain graphics cards. Aside of that the export was wrong since it was introduced as it should have been EXPORT_PER_CPU_SYMBOL_GPL(). Use the correct PER_CPU export and drop the _GPL to restore the previous state which allows users to utilize the cards they payed for. As always I'm really thrilled to make this kind of change to support the #friends (or however the hot hashtag of today is spelled) from that closet sauce graphics corp. Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected]
2018-01-05x86/events/intel/ds: Use the proper cache flush method for mapping ds buffersPeter Zijlstra1-0/+16
Thomas reported the following warning: BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498 caller is native_flush_tlb_single+0x57/0xc0 native_flush_tlb_single+0x57/0xc0 __set_pte_vaddr+0x2d/0x40 set_pte_vaddr+0x2f/0x40 cea_set_pte+0x30/0x40 ds_update_cea.constprop.4+0x4d/0x70 reserve_ds_buffers+0x159/0x410 x86_reserve_hardware+0x150/0x160 x86_pmu_event_init+0x3e/0x1f0 perf_try_init_event+0x69/0x80 perf_event_alloc+0x652/0x740 SyS_perf_event_open+0x3f6/0xd60 do_syscall_64+0x5c/0x190 set_pte_vaddr is used to map the ds buffers into the cpu entry area, but there are two problems with that: 1) The resulting flush is not supposed to be called in preemptible context 2) The cpu entry area is supposed to be per CPU, but the debug store buffers are mapped for all CPUs so these mappings need to be flushed globally. Add the necessary preemption protection across the mapping code and flush TLBs globally. Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area") Reported-by: Thomas Zeitlhofer <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Thomas Zeitlhofer <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-05x86/kaslr: Fix the vaddr_end messThomas Gleixner3-24/+22
vaddr_end for KASLR is only documented in the KASLR code itself and is adjusted depending on config options. So it's not surprising that a change of the memory layout causes KASLR to have the wrong vaddr_end. This can map arbitrary stuff into other areas causing hard to understand problems. Remove the whole ifdef magic and define the start of the cpu_entry_area to be the end of the KASLR vaddr range. Add documentation to that effect. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Benjamin Gilbert <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Benjamin Gilbert <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: stable <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Garnier <[email protected]>, Cc: Alexander Kuleshov <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Map cpu_entry_area at the same place on 4/5 levelThomas Gleixner3-6/+7
There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Gilbert <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: stable <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Garnier <[email protected]>, Cc: Alexander Kuleshov <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04x86/mm: Set MODULES_END to 0xffffffffff000000Andrey Ryabinin2-5/+2
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary. So passing page unaligned address to kasan_populate_zero_shadow() have two possible effects: 1) It may leave one page hole in supposed to be populated area. After commit 21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that hole happens to be in the shadow covering fixmap area and leads to crash: BUG: unable to handle kernel paging request at fffffbffffe8ee04 RIP: 0010:check_memory_region+0x5c/0x190 Call Trace: <NMI> memcpy+0x1f/0x50 ghes_copy_tofrom_phys+0xab/0x180 ghes_read_estatus+0xfb/0x280 ghes_notify_nmi+0x2b2/0x410 nmi_handle+0x115/0x2c0 default_do_nmi+0x57/0x110 do_nmi+0xf8/0x150 end_repeat_nmi+0x1a/0x1e Note, the crash likely disappeared after commit 92a0f81d8957, which changed kasan_populate_zero_shadow() call the way it was before commit 21506525fb8d. 2) Attempt to load module near MODULES_END will fail, because __vmalloc_node_range() called from kasan_module_alloc() will hit the WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error. To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned which means that MODULES_END should be 8*PAGE_SIZE aligned. The whole point of commit f06bdd4001c2 was to move MODULES_END down if NR_CPUS is big, so the cpu_entry_area takes a lot of space. But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") the cpu_entry_area is no longer in fixmap, so we could just set MODULES_END to a fixed 8*PAGE_SIZE aligned address. Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") Reported-by: Jakub Kicinski <[email protected]> Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Andy Lutomirski <[email protected]> Cc: Thomas Garnier <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-03x86/process: Define cpu_tss_rw in same section as declarationNick Desaulniers1-1/+1
cpu_tss_rw is declared with DECLARE_PER_CPU_PAGE_ALIGNED but then defined with DEFINE_PER_CPU_SHARED_ALIGNED leading to section mismatch warnings. Use DEFINE_PER_CPU_PAGE_ALIGNED consistently. This is necessary because it's mapped to the cpu entry area and must be page aligned. [ tglx: Massaged changelog a bit ] Fixes: 1a935bc3d4ea ("x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-03x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()Thomas Gleixner1-7/+6
The preparation for PTI which added CR3 switching to the entry code misplaced the CR3 switch in entry_SYSCALL_compat(). With PTI enabled the entry code tries to access a per cpu variable after switching to kernel GS. This fails because that variable is not mapped to user space. This results in a double fault and in the worst case a kernel crash. Move the switch ahead of the access and clobber RSP which has been saved already. Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching") Reported-by: Lars Wendler <[email protected]> Reported-by: Laura Abbott <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Andy Lutomirski <[email protected]>, Cc: Dave Hansen <[email protected]>, Cc: Peter Zijlstra <[email protected]>, Cc: Greg KH <[email protected]>, , Cc: Boris Ostrovsky <[email protected]>, Cc: Juergen Gross <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
2018-01-03x86/dumpstack: Print registers for first stack frameJosh Poimboeuf1-1/+2
In the stack dump code, if the frame after the starting pt_regs is also a regs frame, the registers don't get printed. Fix that. Reported-by: Andy Lutomirski <[email protected]> Tested-by: Alexander Tsoy <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toralf Förster <[email protected]> Cc: [email protected] Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/396f84491d2f0ef64eda4217a2165f5712f6a115.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-03x86/dumpstack: Fix partial register dumpsJosh Poimboeuf3-13/+34
The show_regs_safe() logic is wrong. When there's an iret stack frame, it prints the entire pt_regs -- most of which is random stack data -- instead of just the five registers at the end. show_regs_safe() is also poorly named: the on_stack() checks aren't for safety. Rename the function to show_regs_if_on_stack() and add a comment to explain why the checks are needed. These issues were introduced with the "partial register dump" feature of the following commit: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") That patch had gone through a few iterations of development, and the above issues were artifacts from a previous iteration of the patch where 'regs' pointed directly to the iret frame rather than to the (partially empty) pt_regs. Tested-by: Alexander Tsoy <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toralf Förster <[email protected]> Cc: [email protected] Fixes: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") Link: http://lkml.kernel.org/r/5b05b8b344f59db2d3d50dbdeba92d60f2304c54.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>