aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-01-31x86/paravirt: Remove 'noreplace-paravirt' cmdline optionJosh Poimboeuf2-16/+0
The 'noreplace-paravirt' option disables paravirt patching, leaving the original pv indirect calls in place. That's highly incompatible with retpolines, unless we want to uglify paravirt even further and convert the paravirt calls to retpolines. As far as I can tell, the option doesn't seem to be useful for much other than introducing surprising corner cases and making the kernel vulnerable to Spectre v2. It was probably a debug option from the early paravirt days. So just remove it. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ashok Raj <[email protected]> Cc: Greg KH <[email protected]> Cc: Jun Nakajima <[email protected]> Cc: Tim Chen <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Asit Mallick <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jason Baron <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Arjan Van De Ven <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Dan Williams <[email protected]> Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
2018-01-30x86/speculation: Use Indirect Branch Prediction Barrier in context switchTim Chen2-1/+34
Flush indirect branches when switching into a process that marked itself non dumpable. This protects high value processes like gpg better, without having too high performance overhead. If done naïvely, we could switch to a kernel idle thread and then back to the original process, such as: process A -> idle -> process A In such scenario, we do not have to do IBPB here even though the process is non-dumpable, as we are switching back to the same process after a hiatus. To avoid the redundant IBPB, which is expensive, we track the last mm user context ID. The cost is to have an extra u64 mm context id to track the last mm we were using before switching to the init_mm used by idle. Avoiding the extra IBPB is probably worth the extra memory for this common scenario. For those cases where tlb_defer_switch_to_init_mm() returns true (non PCID), lazy tlb will defer switch to init_mm, so we will not be changing the mm for the process A -> idle -> process A switch. So IBPB will be skipped for this case. Thanks to the reviewers and Andy Lutomirski for the suggestion of using ctx_id which got rid of the problem of mm pointer recycling. Signed-off-by: Tim Chen <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-30x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on IntelDavid Woodhouse2-19/+29
Despite the fact that all the other code there seems to be doing it, just using set_cpu_cap() in early_intel_init() doesn't actually work. For CPUs with PKU support, setup_pku() calls get_cpu_cap() after c->c_init() has set those feature bits. That resets those bits back to what was queried from the hardware. Turning the bits off for bad microcode is easy to fix. That can just use setup_clear_cpu_cap() to force them off for all CPUs. I was less keen on forcing the feature bits *on* that way, just in case of inconsistencies. I appreciate that the kernel is going to get this utterly wrong if CPU features are not consistent, because it has already applied alternatives by the time secondary CPUs are brought up. But at least if setup_force_cpu_cap() isn't being used, we might have a chance of *detecting* the lack of the corresponding bit and either panicking or refusing to bring the offending CPU online. So ensure that the appropriate feature bits are set within get_cpu_cap() regardless of how many extra times it's called. Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-30x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"Colin Ian King1-1/+1
Trivial fix to spelling mistake in pr_err error message text. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-30x86/spectre: Report get_user mitigation for spectre_v1Dan Williams1-1/+1
Reflect the presence of get_user(), __get_user(), and 'syscall' protections in sysfs. The expectation is that new and better tooling will allow the kernel to grow more usages of array_index_nospec(), for now, only claim mitigation for __user pointer de-references. Reported-by: Jiri Slaby <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727420158.33451.11658324346540434635.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30nl80211: Sanitize array index in parse_txq_paramsDan Williams1-3/+6
Wireless drivers rely on parse_txq_params to validate that txq_params->ac is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx() handler is called. Use a new helper, array_index_nospec(), to sanitize txq_params->ac with respect to speculation. I.e. ensure that any speculation into ->conf_tx() handlers is done with a value of txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS). Reported-by: Christian Lamparter <[email protected]> Reported-by: Elena Reshetova <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Johannes Berg <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30vfs, fdtable: Prevent bounds-check bypass via speculative executionDan Williams1-1/+4
'fd' is a user controlled value that is used as a data dependency to read from the 'fdt->fd' array. In order to avoid potential leaks of kernel memory values, block speculative execution of the instruction stream that could issue reads based on an invalid 'file *' returned from __fcheck_files. Co-developed-by: Elena Reshetova <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/syscall: Sanitize syscall table de-references under speculationDan Williams1-1/+4
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/get_user: Use pointer masking to limit speculationDan Williams1-0/+10
Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. Unlike the __get_user() case get_user() includes the address limit check near the pointer de-reference. With that locality the speculation can be mitigated with pointer narrowing rather than a barrier, i.e. array_index_nospec(). Where the narrowing is performed by: cmp %limit, %ptr sbb %mask, %mask and %mask, %ptr With respect to speculation the value of %ptr is either less than %limit or NULL. Co-developed-by: Linus Torvalds <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospecDan Williams4-14/+14
Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the limit check is far away from the user pointer de-reference. In those cases a barrier_nospec() prevents speculation with a potential pointer to privileged memory. uaccess_try_nospec covers get_user_try. Suggested-by: Linus Torvalds <[email protected]> Suggested-by: Andi Kleen <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}Dan Williams1-4/+4
In preparation for converting some __uaccess_begin() instances to __uacess_begin_nospec(), make sure all 'from user' uaccess paths are using the _begin(), _end() helpers rather than open-coded stac() and clac(). No functional changes. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Tom Lendacky <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospecDan Williams1-0/+9
For __get_user() paths, do not allow the kernel to speculate on the value of a user controlled pointer. In addition to the 'stac' instruction for Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the access_ok() result to resolve in the pipeline before the CPU might take any speculative action on the pointer value. Given the cost of 'stac' the speculation barrier is placed after 'stac' to hopefully overlap the cost of disabling SMAP with the cost of flushing the instruction pipeline. Since __get_user is a major kernel interface that deals with user controlled pointers, the __uaccess_begin_nospec() mechanism will prevent speculative execution past an access_ok() permission check. While speculative execution past access_ok() is not enough to lead to a kernel memory leak, it is a necessary precondition. To be clear, __uaccess_begin_nospec() is addressing a class of potential problems near __get_user() usages. Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used to protect __get_user(), pointer masking similar to array_index_nospec() will be used for get_user() since it incorporates a bounds check near the usage. uaccess_try_nospec provides the same mechanism for get_user_try. No functional changes. Suggested-by: Linus Torvalds <[email protected]> Suggested-by: Andi Kleen <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Tom Lendacky <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86: Introduce barrier_nospecDan Williams2-2/+5
Rename the open coded form of this instruction sequence from rdtsc_ordered() into a generic barrier primitive, barrier_nospec(). One of the mitigations for Spectre variant1 vulnerabilities is to fence speculative execution after successfully validating a bounds check. I.e. force the result of a bounds check to resolve in the instruction pipeline to ensure speculative execution honors that result before potentially operating on out-of-bounds data. No functional changes. Suggested-by: Linus Torvalds <[email protected]> Suggested-by: Andi Kleen <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Tom Lendacky <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86: Implement array_index_mask_nospecDan Williams1-0/+24
array_index_nospec() uses a mask to sanitize user controllable array indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask otherwise. While the default array_index_mask_nospec() handles the carry-bit from the (index - size) result in software. The x86 array_index_mask_nospec() does the same, but the carry-bit is handled in the processor CF flag without conditional instructions in the control flow. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30array_index_nospec: Sanitize speculative array de-referencesDan Williams1-0/+72
array_index_nospec() is proposed as a generic mechanism to mitigate against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks via speculative execution. The array_index_nospec() implementation is expected to be safe for current generation CPUs across multiple architectures (ARM, x86). Based on an original implementation by Linus Torvalds, tweaked to remove speculative flows by Alexei Starovoitov, and tweaked again by Linus to introduce an x86 assembly implementation for the mask generation. Co-developed-by: Linus Torvalds <[email protected]> Co-developed-by: Alexei Starovoitov <[email protected]> Suggested-by: Cyril Novikov <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30Documentation: Document array_index_nospecMark Rutland1-0/+90
Document the rationale and usage of the new array_index_nospec() helper. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/151727413645.33451.15878817161436755393.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/asm: Move 'status' from thread_struct to thread_infoAndy Lutomirski7-12/+11
The TS_COMPAT bit is very hot and is accessed from code paths that mostly also touch thread_info::flags. Move it into struct thread_info to improve cache locality. The only reason it was in thread_struct is that there was a brief period during which arch-specific fields were not allowed in struct thread_info. Linus suggested further changing: ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); to: if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); on the theory that frequently dirtying the cacheline even in pure 64-bit code that never needs to modify status hurts performance. That could be a reasonable followup patch, but I suspect it matters less on top of this patch. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Kernel Hardening <[email protected]> Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
2018-01-30x86/entry/64: Push extra regs right awayAndy Lutomirski1-3/+7
With the fast path removed there is no point in splitting the push of the normal and the extra register set. Just push the extra regs right away. [ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Kernel Hardening <[email protected]> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
2018-01-30x86/entry/64: Remove the SYSCALL64 fast pathAndy Lutomirski2-122/+2
The SYCALLL64 fast path was a nice, if small, optimization back in the good old days when syscalls were actually reasonably fast. Now there is PTI to slow everything down, and indirect branches are verboten, making everything messier. The retpoline code in the fast path is particularly nasty. Just get rid of the fast path. The slow path is barely slower. [ tglx: Split out the 'push all extra regs' part ] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Kernel Hardening <[email protected]> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
2018-01-30x86/spectre: Check CONFIG_RETPOLINE in command line parserDou Liyang1-3/+3
The spectre_v2 option 'auto' does not check whether CONFIG_RETPOLINE is enabled. As a consequence it fails to emit the appropriate warning and sets feature flags which have no effect at all. Add the missing IS_ENABLED() check. Fixes: da285121560e ("x86/spectre: Add boot time option to select Spectre v2 mitigation") Signed-off-by: Dou Liyang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Tomohiro" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-30x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAPWilliam Grant2-4/+7
Since commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap"), i386's CPU_ENTRY_AREA has been mapped to the memory area just below FIXADDR_START. But already immediately before FIXADDR_START is the FIX_BTMAP area, which means that early_ioremap can collide with the entry area. It's especially bad on PAE where FIX_BTMAP_BEGIN gets aligned to exactly match CPU_ENTRY_AREA_BASE, so the first early_ioremap slot clobbers the IDT and causes interrupts during early boot to reset the system. The overlap wasn't a problem before the CPU entry area was introduced, as the fixmap has classically been preceded by the pkmap or vmalloc areas, neither of which is used until early_ioremap is out of the picture. Relocate CPU_ENTRY_AREA to below FIX_BTMAP, not just below the permanent fixmap area. Fixes: commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: William Grant <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-30objtool: Warn on stripped section symbolJosh Poimboeuf1-0/+5
With the following fix: 2a0098d70640 ("objtool: Fix seg fault with gold linker") ... a seg fault was avoided, but the original seg fault condition in objtool wasn't fixed. Replace the seg fault with an error message. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/dc4585a70d6b975c99fc51d1957ccdde7bd52f3a.1517284349.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-30objtool: Add support for alternatives at the end of a sectionJosh Poimboeuf1-22/+31
Now that the previous patch gave objtool the ability to read retpoline alternatives, it shows a new warning: arch/x86/entry/entry_64.o: warning: objtool: .entry_trampoline: don't know how to handle alternatives at end of section This is due to the JMP_NOSPEC in entry_SYSCALL_64_trampoline(). Previously, objtool ignored this situation because it wasn't needed, and it would have required a bit of extra code. Now that this case exists, add proper support for it. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/2a30a3c2158af47d891a76e69bb1ef347e0443fd.1517284349.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-30objtool: Improve retpoline alternative handlingJosh Poimboeuf1-20/+16
Currently objtool requires all retpolines to be: a) patched in with alternatives; and b) annotated with ANNOTATE_NOSPEC_ALTERNATIVE. If you forget to do both of the above, objtool segfaults trying to dereference a NULL 'insn->call_dest' pointer. Avoid that situation and print a more helpful error message: quirks.o: warning: objtool: efi_delete_dummy_variable()+0x99: unsupported intra-function call quirks.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE. Future improvements can be made to make objtool smarter with respect to retpolines, but this is a good incremental improvement for now. Reported-and-tested-by: Guenter Roeck <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/819e50b6d9c2e1a22e34c1a636c0b2057cc8c6e5.1517284349.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2018-01-30Merge tag 'v4.15' into x86/pti, to be able to merge dependent changesIngo Molnar13079-284848/+604316
Time has come to switch PTI development over to a v4.15 base - we'll still try to make sure that all PTI fixes backport cleanly to v4.14 and earlier. Signed-off-by: Ingo Molnar <[email protected]>
2018-01-28Linux 4.15Linus Torvalds1-1/+1
2018-01-28Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 retpoline fixlet from Thomas Gleixner: "Remove the ESP/RSP thunks for retpoline as they cannot ever work. Get rid of them before they show up in a release" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Remove the esp/rsp thunk
2018-01-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds7-34/+60
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of small fixes for 4.15: - Fix vmapped stack synchronization on systems with 4-level paging and a large amount of memory caused by a missing 5-level folding which made the pgd synchronization logic to fail and causing double faults. - Add a missing sanity check in the vmalloc_fault() logic on 5-level paging systems. - Bring back protection against accessing a freed initrd in the microcode loader which was lost by a wrong merge conflict resolution. - Extend the Broadwell micro code loading sanity check. - Add a missing ENDPROC annotation in ftrace assembly code which makes ORC unhappy. - Prevent loading the AMD power module on !AMD platforms. The load itself is uncritical, but an unload attempt results in a kernel crash. - Update Peter Anvins role in the MAINTAINERS file" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ftrace: Add one more ENDPROC annotation x86: Mark hpa as a "Designated Reviewer" for the time being x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems x86/microcode: Fix again accessing initrd after having been freed x86/microcode/intel: Extend BDW late-loading further with LLC size check perf/x86/amd/power: Do not load AMD power module on !AMD platforms
2018-01-28Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for a ~10 years old problem which causes high resolution timers to stop after a CPU unplug/plug cycle due to a stale flag in the per CPU hrtimer base struct. Paul McKenney was hunting this for about a year, but the heisenbug nature made it resistant against debug attempts for quite some time" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Reset hrtimer cpu base proper on CPU hotplug
2018-01-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single bug fix to prevent a subtle deadlock in the scheduler core code vs cpu hotplug" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix cpu.max vs. cpuhotplug deadlock
2018-01-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-20/+60
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Four patches which all address lock inversions and deadlocks in the perf core code and the Intel debug store" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix perf,x86,cpuhp deadlock perf/core: Fix ctx::mutex deadlock perf/core: Fix another perf,trace,cpuhp lock inversion perf/core: Fix lock inversion between perf,trace,cpuhp
2018-01-28Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two final locking fixes for 4.15: - Repair the OWNER_DIED logic in the futex code which got wreckaged with the recent fix for a subtle race condition. - Prevent the hard lockup detector from triggering when dumping all held locks in the system" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks() futex: Fix OWNER_DEAD fixup
2018-01-28x86/ftrace: Add one more ENDPROC annotationJosh Poimboeuf1-1/+1
When ORC support was added for the ftrace_64.S code, an ENDPROC for function_hook() was missed. This results in the following warning: arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction Fixes: e2ac83d74a4d ("x86/ftrace: Fix ORC unwinding from ftrace handlers") Reported-by: Steven Rostedt <[email protected]> Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
2018-01-27x86/speculation: Simplify indirect_branch_prediction_barrier()Borislav Petkov3-9/+13
Make it all a function which does the WRMSR instead of having a hairy inline asm. [dwmw2: export it, fix CONFIG_RETPOLINE issues] Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-27x86/retpoline: Simplify vmexit_fill_RSB()Borislav Petkov6-65/+71
Simplify it to call an asm-function instead of pasting 41 insn bytes at every call site. Also, add alignment to the macro as suggested here: https://support.google.com/faqs/answer/7625886 [dwmw2: Clean up comments, let it clobber %ebx and just tell the compiler] Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-27x86/cpufeatures: Clean up Spectre v2 related CPUID flagsDavid Woodhouse4-24/+34
We want to expose the hardware features simply in /proc/cpuinfo as "ibrs", "ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them as the user-visible bits. When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP bit is set, set the AMD STIBP that's used for the generic hardware capability. Hide the rest from /proc/cpuinfo by putting "" in the comments. Including RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are patches to make the sysfs vulnerabilities information non-readable by non-root, and the same should apply to all information about which mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo. The feature bit for whether IBPB is actually used, which is needed for ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB. Originally-by: Borislav Petkov <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-27x86/cpu/bugs: Make retpoline module warning conditionalThomas Gleixner1-3/+11
If sysfs is disabled and RETPOLINE not defined: arch/x86/kernel/cpu/bugs.c:97:13: warning: ‘spectre_v2_bad_module’ defined but not used [-Wunused-variable] static bool spectre_v2_bad_module; Hide it. Fixes: caf7501a1b4e ("module/retpoline: Warn about missing retpoline in module") Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Woodhouse <[email protected]>
2018-01-27hrtimer: Reset hrtimer cpu base proper on CPU hotplugThomas Gleixner1-0/+3
The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation. If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions. Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in. Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag. Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Sewior <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
2018-01-27x86: Mark hpa as a "Designated Reviewer" for the time beingH. Peter Anvin1-11/+1
Due to some unfortunate events, I have not been directly involved in the x86 kernel patch flow for a while now. I have also not been able to ramp back up by now like I had hoped to, and after reviewing what I will need to work on both internally at Intel and elsewhere in the near term, it is clear that I am not going to be able to ramp back up until late 2018 at the very earliest. It is not acceptable to not recognize that this load is currently taken by Ingo and Thomas without my direct participation, so I mark myself as R: (designated reviewer) rather than M: (maintainer) until further notice. This is in fact recognizing the de facto situation for the past few years. I have obviously no intention of going away, and I will do everything within my power to improve Linux on x86 and x86 for Linux. This, however, puts credit where it is due and reflects a change of focus. This patch also removes stale entries for portions of the x86 architecture which have not been maintained separately from arch/x86 for a long time. If there is a reason to re-introduce them then that can happen later. Signed-off-by: H. Peter Anvin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Bruce Schlobohm <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-27KVM: VMX: introduce alloc_loaded_vmcsPaolo Bonzini1-14/+22
Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Cc: [email protected] # prereq for Spectre mitigation Signed-off-by: Paolo Bonzini <[email protected]>
2018-01-27KVM: nVMX: Eliminate vmcs02 poolJim Mattson1-123/+23
The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Cc: [email protected] # prereq for Spectre mitigation Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Mark Kanda <[email protected]> Reviewed-by: Ameya More <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2018-01-26Merge tag 'riscv-for-linus-4.15-maintainers' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V update from Palmer Dabbelt: "RISC-V: We have a new mailing list and git repo! Sorry to send something essentially as late as possible (Friday after an rc9), but we managed to get a mailing list for the RISC-V Linux port. We've been using [email protected] for a while, but that list has some problems (it's Google Groups and it's shared over all RISC-V software projects). The new infaread.org list is much better. We just got it on Wednesday but I used it a bit on Thursday to shake out all the configuration problems and it appears to be in working order. When I updated the mailing list I noticed that the MAINTAINERS file was pointing to our github repo, but now that we have a kernel.org repo I'd like to point to that instead so I changed that as well. We'll be centralizing all RISC-V Linux related development here as that seems to be the saner way to go about it. I can understand if it's too late to get this into 4.15, but given that it's not a code change I was hoping it'd still be OK. It would be nice to have the new mailing list and git repo in the release tarballs so when people start to find bugs they'll get to the right place" * tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: Update the RISC-V MAINTAINERS file
2018-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds16-28/+57
Pull networking fixes from David Miller: 1) The per-network-namespace loopback device, and thus its namespace, can have its teardown deferred for a long time if a kernel created TCP socket closes and the namespace is exiting meanwhile. The kernel keeps trying to finish the close sequence until it times out (which takes quite some time). Fix this by forcing the socket closed in this situation, from Dan Streetman. 2) Fix regression where we're trying to invoke the update_pmtu method on route types (in this case metadata tunnel routes) that don't implement the dst_ops method. Fix from Nicolas Dichtel. 3) Fix long standing memory corruption issues in r8169 driver by performing the chip statistics DMA programming more correctly. From Francois Romieu. 4) Handle local broadcast sends over VRF routes properly, from David Ahern. 5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket can never be released. From Alexey Kodanev. 6) Set poll flags properly in VSOCK protocol layer, from Stefan Hajnoczi. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state net: vrf: Add support for sends to local broadcast address r8169: fix memory corruption on retrieval of hardware statistics. net: don't call update_pmtu unconditionally net: tcp: close sock if net namespace is exiting
2018-01-26Merge tag 'drm-fixes-for-v4.15-rc10-2' of ↵Linus Torvalds2-21/+58
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A fairly urgent nouveau regression fix for broken irqs across suspend/resume came in. This was broken before but a patch in 4.15 has made it much more obviously broken and now s/r fails a lot more often. The fix removes freeing the irq across s/r which never should have been done anyways. Also two vc4 fixes for a NULL deference and some misrendering / flickering on screen" * tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux: drm/nouveau: Move irq setup/teardown to pci ctor/dtor drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state() drm/vc4: Flush the caches before the bin jobs, as well.
2018-01-26VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSINGStefan Hajnoczi1-1/+1
select(2) with wfds but no rfds must return when the socket is shut down by the peer. This way userspace notices socket activity and gets -EPIPE from the next write(2). Currently select(2) does not return for virtio-vsock when a SEND+RCV shutdown packet is received. This is because vsock_poll() only sets POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the socket is in when the shutdown is received. Signed-off-by: Stefan Hajnoczi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-26dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed stateAlexey Kodanev1-0/+3
ccid2_hc_tx_rto_expire() timer callback always restarts the timer again and can run indefinitely (unless it is stopped outside), and after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"), which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from dccp_destroy_sock() to sk_destruct(), this started to happen quite often. The timer prevents releasing the socket, as a result, sk_destruct() won't be called. Found with LTP/dccp_ipsec tests running on the bonding device, which later couldn't be unloaded after the tests were completed: unregister_netdevice: waiting for bond0 to become free. Usage count = 148 Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation") Signed-off-by: Alexey Kodanev <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-26Update the RISC-V MAINTAINERS filePalmer Dabbelt1-2/+2
Now that we're upstream in Linux we've been able to make some infrastructure changes so our port works a bit more like other ports. Specifically: * We now have a mailing list specific to the RISC-V Linux port, hosted at lists.infreadead.org. * We now have a kernel.org git tree where work on our port is coordinated. This patch changes the RISC-V maintainers entry to reflect these new bits of infrastructure. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2018-01-26x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernelsAndy Lutomirski1-13/+9
On a 5-level kernel, if a non-init mm has a top-level entry, it needs to match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON() that would have checked it. While we're at it, get rid of the rather confusing 4-level folded "pgd" logic. Cleans-up: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Neil Berrington <[email protected]> Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
2018-01-26x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systemsAndy Lutomirski1-5/+29
Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses large amounts of vmalloc space with PTI enabled. The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd folding code into account, so, on a 4-level kernel, the pgd synchronization logic compiles away to exactly nothing. Interestingly, the problem doesn't trigger with nopti. I assume this is because the kernel is mapped with global pages if we boot with nopti. The sequence of operations when we create a new task is that we first load its mm while still running on the old stack (which crashes if the old stack is unmapped in the new mm unless the TLB saves us), then we call prepare_switch_to(), and then we switch to the new stack. prepare_switch_to() pokes the new stack directly, which will populate the mapping through vmalloc_fault(). I assume that we're getting lucky on non-PTI systems -- the old stack's TLB entry stays alive long enough to make it all the way through prepare_switch_to() and switch_to() so that we make it to a valid stack. Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Reported-and-tested-by: Neil Berrington <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: [email protected] Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
2018-01-26x86/bugs: Drop one "mitigation" from dmesgBorislav Petkov1-1/+1
Make [ 0.031118] Spectre V2 mitigation: Mitigation: Full generic retpoline into [ 0.031118] Spectre V2: Mitigation: Full generic retpoline to reduce the mitigation mitigations strings. Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: David Woodhouse <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]