blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-23	Merge branch 'x86-paravirt-for-linus' of ↵	Linus Torvalds	2	-6/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Ingo Molnar: "Two main changes: - Remove no longer used parts of the paravirt infrastructure and put large quantities of paravirt ops under a new config option PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross) - Enable PV spinlocks on Hyperv (Yi Sun)" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Enable PV qspinlock for Hyper-V x86/hyperv: Add GUEST_IDLE_MSR support x86/paravirt: Clean up native_patch() x86/paravirt: Prevent redefinition of SAVE_FLAGS macro x86/xen: Make xen_reservation_lock static x86/paravirt: Remove unneeded mmu related paravirt ops bits x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella x86/paravirt: Introduce new config option PARAVIRT_XXL x86/paravirt: Remove unused paravirt bits x86/paravirt: Use a single ops structure x86/paravirt: Remove clobbers from struct paravirt_patch_site x86/paravirt: Remove clobbers parameter from paravirt patch functions x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static x86/xen: Add SPDX identifier in arch/x86/xen files x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
2018-10-23	Merge branch 'x86-asm-for-linus' of ↵	Linus Torvalds	2	-44/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were the fsgsbase related preparatory patches from Chang S. Bae - but there's also an optimized memcpy_flushcache() and a cleanup for the __cmpxchg_double() assembly glue" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fsgsbase/64: Clean up various details x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick x86/vdso: Initialize the CPU/node NR segment descriptor earlier x86/vdso: Introduce helper functions for CPU and node number x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER x86/fsgsbase/64: Factor out FS/GS segment loading from __switch_to() x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers x86/fsgsbase/64: Make ptrace use the new FS/GS base helpers x86/fsgsbase/64: Introduce FS/GS base helper functions x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double() x86/asm: Optimize memcpy_flushcache()
2018-10-23	Merge branch 'locking-core-for-linus' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and misc x86 updates from Ingo Molnar: "Lots of changes in this cycle - in part because locking/core attracted a number of related x86 low level work which was easier to handle in a single tree: - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E. McKenney, Andrea Parri) - lockdep scalability improvements and micro-optimizations (Waiman Long) - rwsem improvements (Waiman Long) - spinlock micro-optimization (Matthew Wilcox) - qspinlocks: Provide a liveness guarantee (more fairness) on x86. (Peter Zijlstra) - Add support for relative references in jump tables on arm64, x86 and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens) - Be a lot less permissive on weird (kernel address) uaccess faults on x86: BUG() when uaccess helpers fault on kernel addresses (Jann Horn) - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav Amit) - ... and a handful of other smaller changes as well" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) locking/lockdep: Make global debug_locks* variables read-mostly locking/lockdep: Fix debug_locks off performance problem locking/pvqspinlock: Extend node size when pvqspinlock is configured locking/qspinlock_stat: Count instances of nested lock slowpaths locking/qspinlock, x86: Provide liveness guarantee x86/asm: 'Simplify' GEN_*_RMWcc() macros locking/qspinlock: Rework some comments locking/qspinlock: Re-order code locking/lockdep: Remove duplicated 'lock_class_ops' percpu array x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y futex: Replace spin_is_locked() with lockdep locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs x86/extable: Macrofy inline assembly code to work around GCC inlining bugs x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs x86/refcount: Work around GCC inlining bug x86/objtool: Use asm macros to work around GCC inlining bugs ...
2018-10-17	x86/entry/64: Further improve paranoid_entry comments	Andy Lutomirski	1	-6/+4
	Commit: 16561f27f94e ("x86/entry: Add some paranoid entry/exit CR3 handling comments") ... added some comments. This improves them a bit: - When I first read the new comments, it was unclear to me whether they were referring to the case where paranoid_entry interrupted other entry code or where paranoid_entry was itself interrupted. Clarify it. - Remove the EBX comment. We no longer use EBX as a SWAPGS indicator. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/c47daa1888dc2298e7e1d3f82bd76b776ea33393.1539542111.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17	x86/entry/32: Clear the CS high bits	Jan Kiszka	1	-6/+7
	Even if not on an entry stack, the CS's high bits must be initialized because they are unconditionally evaluated in PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards. [ bp: Make the commit message tone passive and impartial. ] Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack") Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Acked-by: Joerg Roedel <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Andrea Arcangeli <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Boris Ostrovsky <[email protected]> CC: Brian Gerst <[email protected]> CC: Dave Hansen <[email protected]> CC: David Laight <[email protected]> CC: Denys Vlasenko <[email protected]> CC: Eduardo Valentin <[email protected]> CC: Greg KH <[email protected]> CC: Ingo Molnar <[email protected]> CC: Jiri Kosina <[email protected]> CC: Josh Poimboeuf <[email protected]> CC: Juergen Gross <[email protected]> CC: Linus Torvalds <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Thomas Gleixner <[email protected]> CC: Will Deacon <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] CC: linux-mm <[email protected]> CC: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-14	x86/entry: Add some paranoid entry/exit CR3 handling comments	Dave Hansen	1	-0/+15
	Andi Kleen was just asking me about the NMI CR3 handling and why we restore it unconditionally. I was sure we had documented it well. We did not. Add some documentation. We have common entry code where the CR3 value is stashed, but three places in two big code paths where we restore it. I put bulk of the comments in this common path and then refer to it from the other spots. Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: "H. Peter Anvin" <[email protected] Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-08	x86/fsgsbase/64: Clean up various details	Ingo Molnar	1	-0/+1
	So: - use 'extern' consistently for APIs - fix weird header guard - clarify code comments - reorder APIs by type Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chang S. Bae <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Markus T Metzger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Shankar <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-08	x86/segments: Introduce the 'CPUNODE' naming to better document the segment ↵	Ingo Molnar	1	-1/+1
	limit CPU/node NR trick We have a special segment descriptor entry in the GDT, whose sole purpose is to encode the CPU and node numbers in its limit (size) field. There are user-space instructions that allow the reading of the limit field, which gives us a really fast way to read the CPU and node IDs from the vDSO for example. But the naming of related functionality does not make this clear, at all: VDSO_CPU_SIZE VDSO_CPU_MASK __CPU_NUMBER_SEG GDT_ENTRY_CPU_NUMBER vdso_encode_cpu_node vdso_read_cpu_node There's a number of problems: - The 'VDSO_CPU_SIZE' doesn't really make it clear that these are number of bits, nor does it make it clear which 'CPU' this refers to, i.e. that this is about a GDT entry whose limit encodes the CPU and node number. - Furthermore, the 'CPU_NUMBER' naming is actively misleading as well, because the segment limit encodes not just the CPU number but the node ID as well ... So use a better nomenclature all around: name everything related to this trick as 'CPUNODE', to make it clear that this is something special, and add _BITS to make it clear that these are number of bits, and propagate this to every affected name: VDSO_CPU_SIZE => VDSO_CPUNODE_BITS VDSO_CPU_MASK => VDSO_CPUNODE_MASK __CPU_NUMBER_SEG => __CPUNODE_SEG GDT_ENTRY_CPU_NUMBER => GDT_ENTRY_CPUNODE vdso_encode_cpu_node => vdso_encode_cpunode vdso_read_cpu_node => vdso_read_cpunode This, beyond being less confusing, also makes it easier to grep for all related functionality: $ git grep -i cpunode arch/x86 Also, while at it, fix "return is not a function" style sloppiness in vdso_encode_cpunode(). Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chang S. Bae <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Markus T Metzger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Shankar <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-08	x86/vdso: Initialize the CPU/node NR segment descriptor earlier	Chang S. Bae	1	-32/+1
	Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is initialized relatively late during CPU init, from the vCPU code, which has a number of disadvantages, such as hotplug CPU notifiers and SMP cross-calls. Instead just initialize it much earlier, directly in cpu_init(). This reduces complexity and increases robustness. [ mingo: Wrote new changelog. ] Suggested-by: H. Peter Anvin <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Chang S. Bae <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Markus T Metzger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Shankar <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-08	x86/vdso: Introduce helper functions for CPU and node number	Chang S. Bae	2	-20/+8
	Clean up the CPU/node number related code a bit, to make it more apparent how we are encoding/extracting the CPU and node fields from the segment limit. No change in functionality intended. [ mingo: Wrote new changelog. ] Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Chang S. Bae <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Markus T Metzger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Shankar <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-08	x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER	Chang S. Bae	1	-1/+1
	The old 'per CPU' naming was misleading: 64-bit kernels don't use this GDT entry for per CPU data, but to store the CPU (and node) ID. [ mingo: Wrote new changelog. ] Suggested-by: H. Peter Anvin <[email protected]> Signed-off-by: Chang S. Bae <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Markus T Metzger <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Shankar <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-06	x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs	Nadav Amit	1	-1/+1
	As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - which is also a minor cleanup for the jump-label code. As a result the code size is slightly increased, but inlining decisions are better: text data bss dec hex filename 18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128) And functions such as intel_pstate_adjust_policy_max(), kvm_cpu_accept_dm_intr(), kvm_register_readl() are inlined. Tested-by: Kees Cook <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/T/#u Signed-off-by: Ingo Molnar <[email protected]>
2018-10-05	x86/vdso: Rearrange do_hres() to improve code generation	Andy Lutomirski	1	-4/+8
	vgetcyc() is full of barriers, so fetching values out of the vvar page before vgetcyc() for use after vgetcyc() results in poor code generation. Put vgetcyc() first to avoid this problem. Also, pull the tv_sec division into the loop and put all the ts writes together. The old code wrote ts->tv_sec on each iteration before the syscall fallback check and then added in the offset afterwards, which forced the compiler to pointlessly copy base->sec to ts->tv_sec on each iteration. The new version seems to generate sensible code. Saves several cycles. With this patch applied, the result is faster than before the clock_gettime() rewrite. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/3c05644d010b72216aa286a6d20b5078d5fae5cd.1538762487.git.luto@kernel.org
2018-10-05	x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks	Andy Lutomirski	1	-2/+2
	When a vDSO clock function falls back to the syscall, no special barriers or ordering is needed, and the syscall fallbacks don't clobber any memory that is not explicitly listed in the asm constraints. Remove the "memory" clobber. This causes minor changes to the generated code, but otherwise has no obvious performance impact. I think it's nice to have, though, since it may help the optimizer in the future. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/3a7438f5fb2422ed881683d2ccffd7f987b2dc44.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-04	x66/vdso: Add CLOCK_TAI support	Thomas Gleixner	1	-0/+4
	With the storage array in place it's now trivial to support CLOCK_TAI in the vdso. Extend the base time storage array and add the update code. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Matt Rickard <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Move cycle_last handling into the caller	Thomas Gleixner	1	-32/+7
	Dereferencing gtod->cycle_last all over the place and foing the cycles < last comparison in the vclock read functions generates horrible code. Doing it at the call site is much better and gains a few cycles both for TSC and pvclock. Caveat: This adds the comparison to the hyperv vclock as well, but I have no way to test that. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Simplify the invalid vclock case	Thomas Gleixner	1	-61/+21
	The code flow for the vclocks is convoluted as it requires the vclocks which can be invalidated separately from the vsyscall_gtod_data sequence to store the fact in a separate variable. That's inefficient. Restructure the code so the vclock readout returns cycles and the conversion to nanoseconds is handled at the call site. If the clock gets invalidated or vclock is already VCLOCK_NONE, return U64_MAX as the cycle value, which is invalid for all clocks and leave the sequence loop immediately in that case by calling the fallback function directly. This allows to remove the gettimeofday fallback as it now uses the clock_gettime() fallback and does the nanoseconds to microseconds conversion in the same way as it does when the vclock is functional. It does not make a difference whether the division by 1000 happens in the kernel fallback or in userspace. Generates way better code and gains a few cycles back. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Replace the clockid switch case	Thomas Gleixner	1	-20/+18
	Now that the time getter functions use the clockid as index into the storage array for the base time access, the switch case can be replaced. - Check for clockid >= MAX_CLOCKS and for negative clockid (CPU/FD) first and call the fallback function right away. - After establishing that clockid is < MAX_CLOCKS, convert the clockid to a bitmask - Check for the supported high resolution and coarse functions by anding the bitmask of supported clocks and check whether a bit is set. This completely avoids jump tables, reduces the number of conditionals and makes the VDSO extensible for other clock ids. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Collapse coarse functions	Thomas Gleixner	1	-16/+4
	do_realtime_coarse() and do_monotonic_coarse() are now the same except for the storage array index. Hand the index in as an argument and collapse the functions. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Collapse high resolution functions	Thomas Gleixner	1	-28/+7
	do_realtime() and do_monotonic() are now the same except for the storage array index. Hand the index in as an argument and collapse the functions. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Introduce and use vgtod_ts	Thomas Gleixner	2	-32/+39
	It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This results either in indirect calls due to the larger switch case, which then requires retpolines or when the compiler is forced to avoid jump tables it results in even more conditionals. To avoid both variants which are bad for performance the high resolution functions and the coarse grained functions will be collapsed into one for each. That requires to store the clock specific base time in an array. Introcude struct vgtod_ts for storage and convert the data store, the update function and the individual clock functions over to use it. The new storage does not longer use gtod_long_t for seconds depending on 32 or 64 bit compile because this needs to be the full 64bit value even for 32bit when a Y2038 function is added. No point in keeping the distinction alive in the internal representation. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq	Thomas Gleixner	1	-4/+4
	The sequence count in vgtod_data is unsigned int, but the call sites use unsigned long, which is a pointless exercise. Fix the call sites and replace 'unsigned' with unsinged 'int' while at it. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Enforce 64bit clocksource	Thomas Gleixner	1	-1/+1
	All VDSO clock sources are TSC based and use CLOCKSOURCE_MASK(64). There is no point in masking with all FF. Get rid of it and enforce the mask in the sanity checker. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: John Stultz <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Juergen Gross <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-04	x86/vdso: Fix vDSO syscall fallback asm constraint regression	Andy Lutomirski	1	-4/+4
	When I added the missing memory outputs, I failed to update the index of the first argument (ebx) on 32-bit builds, which broke the fallbacks. Somehow I must have screwed up my testing or gotten lucky. Add another test to cover gettimeofday() as well. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 715bd9d12f84 ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks") Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-03	x86/vdso: Only enable vDSO retpolines when enabled and supported	Andy Lutomirski	1	-2/+14
	When I fixed the vDSO build to use inline retpolines, I messed up the Makefile logic and made it unconditional. It should have depended on CONFIG_RETPOLINE and on the availability of compiler support. This broke the build on some older compilers. Reported-by: [email protected] Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Rickard <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 2e549b2ee0e3 ("x86/vdso: Fix vDSO build if a retpoline is emitted") Link: http://lkml.kernel.org/r/08a1f29f2c238dd1f493945e702a521f8a5aa3ae.1538540801.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-02	x86/vdso: Fix asm constraints on vDSO syscall fallbacks	Andy Lutomirski	1	-8/+10
	The syscall fallbacks in the vDSO have incorrect asm constraints. They are not marked as writing to their outputs -- instead, they are marked as clobbering "memory", which is useless. In particular, gcc is smart enough to know that the timespec parameter hasn't escaped, so a memory clobber doesn't clobber it. And passing a pointer as an asm input does not tell gcc that the pointed-to value is changed. Add in the fact that the asm instructions weren't volatile, and gcc was free to omit them entirely unless their sole output (the return value) is used. Which it is (phew!), but that stops happening with some upcoming patches. As a trivial example, the following code: void test_fallback(struct timespec *ts) { vdso_fallback_gettime(CLOCK_MONOTONIC, ts); } compiles to: 00000000000000c0 <test_fallback>: c0: c3 retq To add insult to injury, the RCX and R11 clobbers on 64-bit builds were missing. The "memory" clobber is also unnecessary -- no ordering with respect to other memory operations is needed, but that's going to be fixed in a separate not-for-stable patch. Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
2018-09-21	signal/x86: Use force_sig_fault where appropriate	Eric W. Biederman	1	-8/+1
	Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-09-12	x86/pti/64: Remove the SYSCALL64 entry trampoline	Andy Lutomirski	1	-67/+2
	The SYSCALL64 trampoline has a couple of nice properties: - The usual sequence of SWAPGS followed by two GS-relative accesses to set up RSP is somewhat slow because the GS-relative accesses need to wait for SWAPGS to finish. The trampoline approach allows RIP-relative accesses to set up RSP, which avoids the stall. - The trampoline avoids any percpu access before CR3 is set up, which means that no percpu memory needs to be mapped in the user page tables. This prevents using Meltdown to read any percpu memory outside the cpu_entry_area and prevents using timing leaks to directly locate the percpu areas. The downsides of using a trampoline may outweigh the upsides, however. It adds an extra non-contiguous I$ cache line to system calls, and it forces an indirect jump to transfer control back to the normal kernel text after CR3 is set up. The latter is because x86 lacks a 64-bit direct jump instruction that could jump from the trampoline to the entry text. With retpolines enabled, the indirect jump is extremely slow. Change the code to map the percpu TSS into the user page tables to allow the non-trampoline SYSCALL64 path to work under PTI. This does not add a new direct information leak, since the TSS is readable by Meltdown from the cpu_entry_area alias regardless. It does allow a timing attack to locate the percpu area, but KASLR is more or less a lost cause against local attack on CPUs vulnerable to Meltdown regardless. As far as I'm concerned, on current hardware, KASLR is only useful to mitigate remote attacks that try to attack the kernel without first gaining RCE against a vulnerable user process. On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall overhead from ~237ns to ~228ns. There is a possible alternative approach: Move the trampoline within 2G of the entry text and make a separate copy for each CPU. This would allow a direct jump to rejoin the normal entry path. There are pro's and con's for this approach: + It avoids a pipeline stall - It executes from an extra page and read from another extra page during the syscall. The latter is because it needs to use a relative addressing mode to find sp1 -- it's the same cacheline, but accessed using an alias, so it's an extra TLB entry. - Slightly more memory. This would be one page per CPU for a simple implementation and 64-ish bytes per CPU or one page per node for a more complex implementation. - More code complexity. The current approach is chosen for simplicity and because the alternative does not provide a significant benefit, which makes it worth. [ tglx: Added the alternative discussion to the changelog ] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
2018-09-08	x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space	Andy Lutomirski	1	-7/+9
	In the non-trampoline SYSCALL64 path, a percpu variable is used to temporarily store the user RSP value. Instead of a separate variable, use the otherwise unused sp2 slot in the TSS. This will improve cache locality, as the sp1 slot is already used in the same code to find the kernel stack. It will also simplify a future change to make the non-trampoline path work in PTI mode. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lkml.kernel.org/r/08e769a0023dbad4bac6f34f3631dbaf8ad59f4f.1536015544.git.luto@kernel.org
2018-09-08	x86/entry/64: Document idtentry	Andy Lutomirski	1	-0/+36
	The idtentry macro is complicated and magical. Document what it does to help future readers and to allow future patches to adjust the code and docs at the same time. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lkml.kernel.org/r/6e56c3ad94879e41afe345750bc28ccc0e820ea8.1536015544.git.luto@kernel.org
2018-09-04	x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls	Alexander Popov	4	-0/+29
	The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2018-09-03	x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella	Juergen Gross	2	-6/+10
	All functions in arch/x86/xen/irq.c and arch/x86/xen/xen-asm.S are specific to PV guests. Include them in the kernel with CONFIG_XEN_PV only. Make the PV specific code in arch/x86/entry/entry_.S dependent on CONFIG_XEN_PV instead of CONFIG_XEN. The HVM specific code should depend on CONFIG_XEN_PVHVM. While at it reformat the Makefile to make it more readable. Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-08-20	x86/vdso: Fix vDSO build if a retpoline is emitted	Andy Lutomirski	1	-2/+4
	Currently, if the vDSO ends up containing an indirect branch or call, GCC will emit the "external thunk" style of retpoline, and it will fail to link. Fix it by building the vDSO with inline retpoline thunks. I haven't seen any reports of this triggering on an unpatched kernel. Fixes: commit 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Matt Rickard <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jason Vas Dias <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andi Kleen <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/c76538cd3afbe19c6246c2d1715bc6a60bd63985.1534448381.git.luto@kernel.org
2018-08-15	Merge tag 'kbuild-v4.19' of ↵	Linus Torvalds	2	-0/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - verify depmod is installed before modules_install - support build salt in case build ids must be unique between builds - allow users to specify additional host compiler flags via HOSTFLAGS, and rename internal variables to KBUILD_HOSTFLAGS - update buildtar script to drop vax support, add arm64 support - update builddeb script for better debarch support - document the pit-fall of if_changed usage - fix parallel build of UML with O= option - make 'samples' target depend on headers_install to fix build errors - remove deprecated host-progs variable - add a new coccinelle script for refcount_t vs atomic_t check - improve double-test coccinelle script - misc cleanups and fixes * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits) coccicheck: return proper error code on fail Coccinelle: doubletest: reduce side effect false positives kbuild: remove deprecated host-progs variable kbuild: make samples really depend on headers_install um: clean up archheaders recipe kbuild: add %asm-generic to no-dot-config-targets um: fix parallel building with O= option scripts: Add Python 3 support to tracing/draw_functrace.py builddeb: Add automatic support for sh{3,4}{,eb} architectures builddeb: Add automatic support for riscv* architectures builddeb: Add automatic support for m68k architecture builddeb: Add automatic support for or1k architecture builddeb: Add automatic support for sparc64 architecture builddeb: Add automatic support for mips{,64}r6{,el} architectures builddeb: Add automatic support for mips64el architecture builddeb: Add automatic support for ppc64 and powerpcspe architectures builddeb: Introduce functions to simplify kconfig tests in set_debarch builddeb: Drop check for 32-bit s390 builddeb: Change architecture detection fallback to use dpkg-architecture builddeb: Skip architecture detection when KBUILD_DEBARCH is set ...
2018-08-13	Merge branch 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	1	-110/+522
	Pull x86 PTI updates from Thomas Gleixner: "The Speck brigade sadly provides yet another large set of patches destroying the perfomance which we carefully built and preserved - PTI support for 32bit PAE. The missing counter part to the 64bit PTI code implemented by Joerg. - A set of fixes for the Global Bit mechanics for non PCID CPUs which were setting the Global Bit too widely and therefore possibly exposing interesting memory needlessly. - Protection against userspace-userspace SpectreRSB - Support for the upcoming Enhanced IBRS mode, which is preferred over IBRS. Unfortunately we dont know the performance impact of this, but it's expected to be less horrible than the IBRS hammering. - Cleanups and simplifications" * 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) x86/mm/pti: Move user W+X check into pti_finalize() x86/relocs: Add __end_rodata_aligned to S_REL x86/mm/pti: Clone kernel-image on PTE level for 32 bit x86/mm/pti: Don't clear permissions in pti_clone_pmd() x86/mm/pti: Fix 32 bit PCID check x86/mm/init: Remove freed kernel image areas from alias mapping x86/mm/init: Add helper for freeing kernel image pages x86/mm/init: Pass unconverted symbol addresses to free_init_pages() mm: Allow non-direct-map arguments to free_reserved_area() x86/mm/pti: Clear Global bit more aggressively x86/speculation: Support Enhanced IBRS on future CPUs x86/speculation: Protect against userspace-userspace spectreRSB x86/kexec: Allocate 8k PGDs for PTI Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables" x86/mm: Remove in_nmi() warning from vmalloc_fault() x86/entry/32: Check for VM86 mode in slow-path check perf/core: Make sure the ring-buffer is mapped in all page-tables x86/pti: Check the return value of pti_user_pagetable_walk_pmd() x86/pti: Check the return value of pti_user_pagetable_walk_p4d() x86/entry/32: Add debug code to check entry/exit CR3 ...
2018-08-13	Merge branch 'x86-vdso-for-linus' of ↵	Linus Torvalds	1	-13/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso update from Thomas Gleixner: "Use LD to link the VDSO libs instead of indirecting trough CC which causes build failures with Clang" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: vdso: Use $LD instead of $CC to link
2018-08-13	Merge branch 'x86-asm-for-linus' of ↵	Linus Torvalds	2	-5/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Thomas Gleixner: "The lowlevel and ASM code updates for x86: - Make stack trace unwinding more reliable - ASM instruction updates for better code generation - Various cleanups" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Add two more instruction suffixes x86/asm/64: Use 32-bit XOR to zero registers x86/build/vdso: Simplify 'cmd_vdso2c' x86/build/vdso: Remove unused vdso-syms.lds x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder x86/unwind/orc: Detect the end of the stack x86/stacktrace: Do not fail for ORC with regs on stack x86/stacktrace: Clarify the reliable success paths x86/stacktrace: Remove STACKTRACE_DUMP_ONCE x86/stacktrace: Do not unwind after user regs x86/asm: Use CC_SET/CC_OUT in percpu_cmpxchg8b_double() to micro-optimize code generation
2018-08-06	Merge branch 'x86/pti-urgent' into x86/pti	Thomas Gleixner	1	-14/+4
	Integrate the PTI Global bit fixes which conflict with the 32bit PTI support. Signed-off-by: Thomas Gleixner <[email protected]>
2018-08-05	x86: vdso: Use $LD instead of $CC to link	Alistair Strachan	1	-13/+9
	The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with. /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782) This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains. Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling. Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed. This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux. This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag. Signed-off-by: Alistair Strachan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Andi Kleen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-07-24	x86/entry/64: Remove %ebx handling from error_entry/exit	Andy Lutomirski	1	-14/+4
	error_entry and error_exit communicate the user vs. kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, the xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained nonzero garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [ Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed. ] [ Note to stable maintainers: this should probably get applied to all kernels. If you're nervous about that, a more conservative fix to add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should also fix the problem. ] Reported-and-tested-by: M. Vefa Bicakci <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Dominik Brodowski <[email protected]> Cc: Greg KH <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-07-20	x86/entry/32: Check for VM86 mode in slow-path check	Joerg Roedel	1	-2/+10
	The SWITCH_TO_KERNEL_STACK macro only checks for CPL == 0 to go down the slow and paranoid entry path. The problem is that this check also returns true when coming from VM86 mode. This is not a problem by itself, as the paranoid path handles VM86 stack-frames just fine, but it is not necessary as the normal code path handles VM86 mode as well (and faster). Extend the check to include VM86 mode. This also makes an optimization of the paranoid path possible. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Add debug code to check entry/exit CR3	Joerg Roedel	1	-0/+43
	Add code to check whether the kernel is entered and left with the correct CR3 and make it depend on CONFIG_DEBUG_ENTRY. This is needed because there is no NX protection of user-addresses in the kernel-CR3 on x86-32 and that type of bug would not be detected otherwise. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Add PTI CR3 switches to NMI handler code	Joerg Roedel	1	-6/+33
	The NMI handler is special, as it needs to leave with the same CR3 as it was entered with. This is required because the NMI can happen within kernel context but with user CR3 already loaded, i.e. after switching to user CR3 but before returning to user space. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points	Joerg Roedel	1	-4/+82
	Add unconditional cr3 switches between user and kernel cr3 to all non-NMI entry and exit points. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Simplify debug entry point	Joerg Roedel	1	-32/+3
	The common exception entry code now handles the entry-from-sysenter stack situation and makes sure to leave with the same stack as it entered the kernel. So there is no need anymore for the special handling in the debug entry code. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack	Joerg Roedel	1	-1/+115
	It is possible that the kernel is entered from kernel-mode and on the entry-stack. The most common way this happens is when an exception is triggered while loading the user-space segment registers on the kernel-to-userspace exit path. The segment loading needs to be done after the entry-stack switch, because the stack-switch needs kernel %fs for per_cpu access. When this happens, make sure to leave the kernel with the entry-stack again, so that the interrupted code-path runs on the right stack when switching to the user-cr3. Detect this condition on kernel-entry by checking CS.RPL and %esp, and if it happens, copy over the complete content of the entry stack to the task-stack. This needs to be done because once the exception handler is entereed, the task might be scheduled out or even migrated to a different CPU, so this cannot rely on the entry-stack contents. Leave a marker in the stack-frame to detect this condition on the exit path. On the exit path the copy is reversed, copy all of the remaining task-stack back to the entry-stack and switch to it. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI	Joerg Roedel	1	-4/+11
	These macros will be used in the NMI handler code and replace plain SAVE_ALL and RESTORE_REGS there. The NMI-specific CR3-switch will be added to these macros later. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Leave the kernel via trampoline stack	Joerg Roedel	1	-2/+77
	Switch back to the trampoline stack before returning to userspace. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Enter the kernel via trampoline stack	Joerg Roedel	1	-20/+99
	Use the entry-stack as a trampoline to enter the kernel. The entry-stack is already in the cpu_entry_area and will be mapped to userspace when PTI is enabled. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-07-20	x86/entry/32: Split off return-to-kernel path	Joerg Roedel	1	-3/+8
	Use a separate return path when returning to the kernel. This allows to put the PTI cr3-switch and the switch to the entry-stack into the return-to-user path without further checking. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavel Machek <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]