aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-03-10x86/entry/32: Change INT80 to be an interrupt gateAndy Lutomirski1-1/+1
We want all of the syscall entries to run with interrupts off so that we can efficiently run context tracking before enabling interrupts. This will regress int $0x80 performance on 32-bit kernels by a couple of cycles. This shouldn't matter much -- int $0x80 is not a fast path. This effectively reverts: 657c1eea0019 ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on") ... and fixes the same issue differently. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar1-0/+7
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")Yu-cheng Yu1-6/+0
Leonid Shatz noticed that the SDM interpretation of the following recent commit: 394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off") ... is incorrect and that the original behavior of the FPU code was correct. Because AVX is not stated in CR0 TS bit description, it was mistakenly believed to be not supported for lazy context switch. This turns out to be false: Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers: 'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task.' Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception Specification: 'AVX instructions refer to exceptions by classes that include #NM "Device Not Available" exception for lazy context switch.' So revert the commit. Reported-by: Leonid Shatz <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi V. Shankar <[email protected]> Cc: Sai Praneeth Prakhya <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry/32: Add and check a stack canary for the SYSENTER stackAndy Lutomirski2-0/+11
The first instruction of the SYSENTER entry runs on its own tiny stack. That stack can be used if a #DB or NMI is delivered before the SYSENTER prologue switches to a real stack. We have code in place to prevent us from overflowing the tiny stack. For added paranoia, add a canary to the stack and check it in do_debug() -- that way, if something goes wrong with the #DB logic, we'll eventually notice. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixupAndy Lutomirski1-0/+5
Right after SYSENTER, we can get a #DB or NMI. On x86_32, there's no IST, so the exception handler is invoked on the temporary SYSENTER stack. Because the SYSENTER stack is very small, we have a fixup to switch off the stack quickly when this happens. The old fixup had several issues: 1. It checked the interrupt frame's CS and EIP. This wasn't obviously correct on Xen or if vm86 mode was in use [1]. 2. In the NMI handler, it did some frightening digging into the stack frame. I'm not convinced this digging was correct. 3. The fixup didn't switch stacks and then switch back. Instead, it synthesized a brand new stack frame that would redirect the IRET back to the SYSENTER code. That frame was highly questionable. For one thing, if NMI nested inside #DB, we would effectively abort the #DB prologue, which was probably safe but was frightening. For another, the code used PUSHFL to write the FLAGS portion of the frame, which was simply bogus -- by the time PUSHFL was called, at least TF, NT, VM, and all of the arithmetic flags were clobbered. Simplify this considerably. Instead of looking at the saved frame to see where we came from, check the hardware ESP register against the SYSENTER stack directly. Malicious user code cannot spoof the kernel ESP register, and by moving the check after SAVE_ALL, we can use normal PER_CPU accesses to find all the relevant addresses. With this patch applied, the improved syscall_nt_32 test finally passes on 32-bit kernels. [1] It isn't obviously correct, but it is nonetheless safe from vm86 shenanigans as far as I can tell. A user can't point EIP at entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32, like all kernel addresses, is greater than 0xffff and would thus violate the CS segment limit. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry: Vastly simplify SYSENTER TF (single-step) handlingAndy Lutomirski1-9/+43
Due to a blatant design error, SYSENTER doesn't clear TF (single-step). As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry/traps: Clear DR6 early in do_debug() and improve the commentAndy Lutomirski1-3/+12
Leaving any bits set in DR6 on return from a debug exception is asking for trouble. Prevent it by writing zero right away and clarify the comment. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/3857676e1be8fb27db4b89bbb1e2052b7f435ff4.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptionsAndy Lutomirski1-5/+7
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-09x86/fpu: Fix 'no387' regressionAndy Lutomirski1-6/+8
After fixing FPU option parsing, we now parse the 'no387' boot option too early: no387 clears X86_FEATURE_FPU before it's even probed, so the boot CPU promptly re-enables it. I suspect it gets even more confused on SMP. Fix the probing code to leave X86_FEATURE_FPU off if it's been disabled by setup_clear_cpu_cap(). Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Casasnovas <[email protected]> Cc: Sai Praneeth Prakhya <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: yu-cheng yu <[email protected]> Fixes: 4f81cbafcce2 ("x86/fpu: Fix early FPU command-line parsing") Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+7
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <[email protected]>
2016-03-08x86/mm, x86/mce: Add memcpy_mcsafe()Tony Luck1-0/+2
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08Merge branch 'linus' into x86/fpu, to pick up fixesIngo Molnar2-0/+9
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/asm-offsets: Remove PARAVIRT_enabledAndy Lutomirski1-1/+0
It no longer has any users. Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabledAndy Lutomirski1-0/+25
x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/nmi: Mark 'ignore_nmis' as __read_mostlyKostenzer Felix1-1/+2
ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: Kostenzer Felix <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytesDenys Vlasenko1-1/+1
_flat_send_IPI_mask: 157 bytes, 3 callsites text data bss dec hex filename 96183823 20860520 36122624 153166967 9212477 vmlinux1_before 96183699 20860520 36122624 153166843 92123fb vmlinux Signed-off-by: Denys Vlasenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel J Blueman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Travis <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/apic: Deinline __default_send_IPI_*, save ~200 bytesDenys Vlasenko1-0/+60
__default_send_IPI_shortcut: 49 bytes, 2 callsites __default_send_IPI_dest_field: 108 bytes, 7 callsites text data bss dec hex filename 96184086 20860488 36122624 153167198 921255e vmlinux_before 96183823 20860520 36122624 153166967 9212477 vmlinux Signed-off-by: Denys Vlasenko <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel J Blueman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Travis <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/mce/AMD: Document some functionalityAravind Gopalakrishnan1-5/+2
In an attempt to aid in understanding of what the threshold_block structure holds, provide comments to describe the members here. Also, trim comments around threshold_restart_bank() and update copyright info. No functional change is introduced. Signed-off-by: Aravind Gopalakrishnan <[email protected]> [ Shorten comments. ] Signed-off-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/mce/AMD: Fix logic to obtain block addressAravind Gopalakrishnan1-29/+55
In upcoming processors, the BLKPTR field is no longer used to indicate the MSR number of the additional register. Insted, it simply indicates the prescence of additional MSRs. Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC() for newer processors and fall back to existing logic for older processors. [ Drop nextaddr_out label; style cleanups. ] Signed-off-by: Aravind Gopalakrishnan <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errorsAravind Gopalakrishnan1-0/+29
For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08Merge branch 'linus' into ras/core, to pick up fixesIngo Molnar2-0/+9
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/microcode/intel: Drop orig_sum from ext signature checksumBorislav Petkov1-3/+3
It is 0 because for !0 values we would have exited already. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-03-08x86/microcode/intel: Improve microcode sanity-checking error messagesBorislav Petkov1-10/+26
Turn them into proper sentences. Add comments to microcode_sanity_check() to explain what it does. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-03-08x86/microcode/intel: Merge two consecutive if-statementsBorislav Petkov1-5/+5
Merge the two consecutive "if (ext_table_size)". No functional change. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-03-08x86/microcode/intel: Get rid of DWSIZEBorislav Petkov1-2/+2
sizeof(u32) is perfectly clear as it is. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-03-08x86/microcode/intel: Change checksum variables to u32Chris Bainbridge1-4/+4
Microcode checksum verification should be done using unsigned 32-bit values otherwise the calculation overflow results in undefined behaviour. This is also nicely documented in the SDM, section "Microcode Update Checksum": "To check for a corrupt microcode update, software must perform a unsigned DWORD (32-bit) checksum of the microcode update. Even though some fields are signed, the checksum procedure treats all DWORDs as unsigned. Microcode updates with a header version equal to 00000001H must sum all DWORDs that comprise the microcode update. A valid checksum check will yield a value of 00000000H." but for some reason the code has been using ints from the very beginning. In practice, this bug possibly manifested itself only when doing the microcode data checksum - apparently, currently shipped Intel microcode doesn't have an extended signature table for which we do checksum verification too. UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 signed integer overflow: -1500151068 + -2125470173 cannot be represented in type 'int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 ... Call Trace: dump_stack ? inotify_ioctl ubsan_epilogue handle_overflow __ubsan_handle_add_overflow microcode_sanity_check get_matching_model_microcode.isra.2.constprop.8 ? early_idt_handler_common ? strlcpy ? find_cpio_data load_ucode_intel_bsp load_ucode_bsp ? load_ucode_bsp x86_64_start_kernel [ Expand and massage commit message. ] Signed-off-by: Chris Bainbridge <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-03-07Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fixIngo Molnar2-0/+9
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-04Merge tag 'v4.5-rc6' into core/resources, to resolve conflictIngo Molnar10-83/+201
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03x86/tsc: Always Running Timer (ART) correlated clocksourceChristopher S. Hall1-0/+59
On modern Intel systems TSC is derived from the new Always Running Timer (ART). ART can be captured simultaneous to the capture of audio and network device clocks, allowing a correlation between timebases to be constructed. Upon capture, the driver converts the captured ART value to the appropriate system clock using the correlated clocksource mechanism. On systems that support ART a new CPUID leaf (0x15) returns parameters “m” and “n” such that: TSC_value = (ART_value * m) / n + k [n >= 1] [k is an offset that can adjusted by a privileged agent. The IA32_TSC_ADJUST MSR is an example of an interface to adjust k. See 17.14.4 of the Intel SDM for more details] Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Christopher S. Hall <[email protected]> [jstultz: Tweaked to fix build issue, also reworked math for 64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build fixes] Signed-off-by: John Stultz <[email protected]>
2016-03-03PM / sleep / x86: Fix crash on graph trace through x86 suspendTodd E Brandt1-0/+7
Pause/unpause graph tracing around do_suspend_lowlevel as it has inconsistent call/return info after it jumps to the wakeup vector. The graph trace buffer will otherwise become misaligned and may eventually crash and hang on suspend. To reproduce the issue and test the fix: Run a function_graph trace over suspend/resume and set the graph function to suspend_devices_and_enter. This consistently hangs the system without this fix. Signed-off-by: Todd Brandt <[email protected]> Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2016-03-01arch/hotplug: Call into idle with a proper stateThomas Gleixner1-1/+1
Let the non boot cpus call into idle with the corresponding hotplug state, so the hotplug core can handle the further bringup. That's a first step to convert the boot side of the hotplugged cpus to do all the synchronization with the other side through the state machine. For now it'll only start the hotplug thread and kick the full bringup of the cpu. Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Rafael Wysocki <[email protected]> Cc: "Srivatsa S. Bhat" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Sebastian Siewior <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul Turner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar1-0/+2
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29x86/topology: Create logical package idThomas Gleixner4-0/+129
For per package oriented services we must be able to rely on the number of CPU packages to be within bounds. Create a tracking facility, which - calculates the number of possible packages depending on nr_cpu_ids after boot - makes sure that the package id is within the number of possible packages. If the apic id is outside we map it to a logical package id if there is enough space available. Provide interfaces for drivers to query the mapping and do translations from physcial to logical ids. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standardJosh Poimboeuf1-0/+2
objtool reports the following warning for kretprobe_trampoline(): arch/x86/kernel/kprobes/core.o: warning: objtool: kretprobe_trampoline()+0x20: call without frame pointer save/setup kretprobes are a special case where the stack is intentionally wrong. The return address isn't known at the beginning of the trampoline, so the stack frame can't be set up properly before it calls trampoline_handler(). Because kretprobe handlers don't sleep, the frame pointer doesn't *have* to be accurate in the trampoline. So it's ok to tell objtool to ignore it. This results in no actual changes to the generated code. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Chris J Arges <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7eaf37de52456ff822ffc86b928edb5d48a40ef1.1456719558.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29objtool: Add STACK_FRAME_NON_STANDARD() macroJosh Poimboeuf1-1/+4
Add a new macro, STACK_FRAME_NON_STANDARD(), which is used to denote a function which does something unusual related to its stack frame. Use of the macro prevents objtool from emitting a false positive warning. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Chris J Arges <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/34487a17b23dba43c50941599d47054a9584b219.1456719558.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29objtool: Mark non-standard object files and directoriesJosh Poimboeuf1-3/+8
Code which runs outside the kernel's normal mode of operation often does unusual things which can cause a static analysis tool like objtool to emit false positive warnings: - boot image - vdso image - relocation - realmode - efi - head - purgatory - modpost Set OBJECT_FILES_NON_STANDARD for their related files and directories, which will tell objtool to skip checking them. It's ok to skip them because they don't affect runtime stack traces. Also skip the following code which does the right thing with respect to frame pointers, but is too "special" to be validated by a tool: - entry - mcount Also skip the test_nx module because it modifies its exception handling table at runtime, which objtool can't understand. Fortunately it's just a test module so it doesn't matter much. Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it might eventually be useful for other tools. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Chris J Arges <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25Merge branch 'ras/core' into core/objtool, to pick up the new exception ↵Ingo Molnar5-68/+145
table format Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25Merge branch 'x86/debug' into core/objtool, to pick up frame pointer fixesIngo Molnar29-136/+214
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-24x86: Fix misspellings in commentsAdam Buchbinder15-18/+18
Signed-off-by: Adam Buchbinder <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-24x86/kprobes: Get rid of kretprobe_trampoline_holder()Josh Poimboeuf1-29/+28
The kretprobe_trampoline_holder() wrapper around kretprobe_trampoline() isn't used anywhere and adds some unnecessary frame pointer instructions which never execute. Instead, just make kretprobe_trampoline() a proper ELF function. Signed-off-by: Josh Poimboeuf <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chris J Arges <[email protected]> Cc: David S. Miller <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/92d921b102fb865a7c254cfde9e4a0a72b9a781e.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-24x86/asm/acpi: Create a stack frame in do_suspend_lowlevel()Josh Poimboeuf1-0/+3
do_suspend_lowlevel() is a callable non-leaf function which doesn't honor CONFIG_FRAME_POINTER, which can result in bad stack traces. Create a stack frame for it when CONFIG_FRAME_POINTER is enabled. Signed-off-by: Josh Poimboeuf <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: Pavel Machek <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chris J Arges <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7383d87dd40a460e0d757a0793498b9d06a7ee0d.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-24x86/amd: Set ELF function type for vide()Josh Poimboeuf1-1/+4
vide() is a callable function, but is missing the ELF function type, which confuses tools like stacktool. Properly annotate it to be a callable function. The generated code is unchanged. Signed-off-by: Josh Poimboeuf <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chris J Arges <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/a324095f5c9390ff39b15b4562ea1bbeda1a8282.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2016-02-22x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig optionKees Cook5-26/+17
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled. This simplifies the code and also makes it clearer that read-only mapped memory is just as fundamental a security feature in kernel-space as it is in user-space. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Brown <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Emese Revfy <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathias Krause <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: PaX Team <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: linux-arch <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-22Merge tag 'v4.5-rc5' into efi/core, before queueing up new changesIngo Molnar1-0/+2
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-21x86/tsc: Use topology functionsThomas Gleixner1-4/+4
It's simpler to look at the topology mask than iterating over all online cpus to find a cpu on the same package. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]>
2016-02-20perf: generalize perf_callchainAlexei Starovoitov3-11/+17
. avoid walking the stack when there is no room left in the buffer . generalize get_perf_callchain() to be called from bpf helper Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18mm/core, x86/mm/pkeys: Add execute-only protection keys supportDave Hansen1-2/+0
Protection keys provide new page-based protection in hardware. But, they have an interesting attribute: they only affect data accesses and never affect instruction fetches. That means that if we set up some memory which is set as "access-disabled" via protection keys, we can still execute from it. This patch uses protection keys to set up mappings to do just that. If a user calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only without PROT_READ/WRITE), the kernel will notice this, and set a special protection key on the memory. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. I haven't found any userspace that does this today. With this facility in place, we expect userspace to move to use it eventually. Userspace _could_ start doing this today. Any PROT_EXEC calls get converted to PROT_READ inside the kernel, and would transparently be upgraded to "true" PROT_EXEC with this code. IOW, userspace never has to do any PROT_EXEC runtime detection. This feature provides enhanced protection against leaking executable memory contents. This helps thwart attacks which are attempting to find ROP gadgets on the fly. But, the security provided by this approach is not comprehensive. The PKRU register which controls access permissions is a normal user register writable from unprivileged userspace. An attacker who can execute the 'wrpkru' instruction can easily disable the protection provided by this feature. The protection key that is used for execute-only support is permanently dedicated at compile time. This is fine for now because there is currently no API to set a protection key other than this one. Despite there being a constant PKRU value across the entire system, we do not set it unless this feature is in use in a process. That is to preserve the PKRU XSAVE 'init state', which can lead to faster context switches. PKRU *is* a user register and the kernel is modifying it. That means that code doing: pkru = rdpkru() pkru |= 0x100; mmap(..., PROT_EXEC); wrpkru(pkru); could lose the bits in PKRU that enforce execute-only permissions. To avoid this, we suggest avoiding ever calling mmap() or mprotect() when the PKRU value is expected to be unstable. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chen Gang <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Piotr Kwapulinski <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: Vladimir Murzin <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18x86/mm/pkeys: Allow kernel to modify user pkey rights registerDave Hansen1-0/+74
The Protection Key Rights for User memory (PKRU) is a 32-bit user-accessible register. It contains two bits for each protection key: one to write-disable (WD) access to memory covered by the key and another to access-disable (AD). Userspace can read/write the register with the RDPKRU and WRPKRU instructions. But, the register is saved and restored with the XSAVE family of instructions, which means we have to treat it like a floating point register. The kernel needs to write to the register if it wants to implement execute-only memory or if it implements a system call to change PKRU. To do this, we need to create a 'pkru_state' buffer, read the old contents in to it, modify it, and then tell the FPU code that there is modified data in there so it can (possibly) move the buffer back in to the registers. This uses the fpu__xfeature_set_state() function that we defined in the previous patch. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-18x86/fpu: Allow setting of XSAVE stateDave Hansen2-2/+159
We want to modify the Protection Key rights inside the kernel, so we need to change PKRU's contents. But, if we do a plain 'wrpkru', when we return to userspace we might do an XRSTOR and wipe out the kernel's 'wrpkru'. So, we need to go after PKRU in the xsave buffer. We do this by: 1. Ensuring that we have the XSAVE registers (fpregs) in the kernel FPU buffer (fpstate) 2. Looking up the location of a given state in the buffer 3. Filling in the stat 4. Ensuring that the hardware knows that state is present there (basically that the 'init optimization' is not in place). 5. Copying the newly-modified state back to the registers if necessary. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Casasnovas <[email protected]> Cc: Rik van Riel <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>