Age | Commit message (Collapse) | Author | Files | Lines |
|
Make the doublefault exception handler unconditional on 32-bit. Yes,
it is important to be able to catch #DF exceptions instead of silent
reboots. Yes, the code size increase is worth every byte. And one less
CONFIG symbol is just the cherry on top.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
User space cannot disable interrupts any longer so trace return to user space
unconditionally as IRQS_ON.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Nothing cares about the -1 "mark as interrupt" in the errorcode of
exception entries. It's only used to fill the error code when a signal is
delivered, but this is already inconsistent vs. 64 bit as there all
exceptions which do not have an error code set it to 0. So if 32 bit
applications would care about this, then they would have noticed more than
a decade ago.
Just use 0 for all excpetions which do not have an errorcode consistently.
This does neither break /proc/$PID/syscall because this interface examines
the error code / syscall number which is on the stack and that is set to -1
(no syscall) in common_exception unconditionally for all exceptions. The
push in the entry stub is just there to fill the hardware error code slot
on the stack for consistency of the stack layout.
A transient observation of 0 is possible, but that's true for the other
exceptions which use 0 already as well and that interface is an unreliable
snapshot of dubious correctness anyway.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
int3 is not using the common_exception path for purely historical reasons,
but there is no reason to keep it the only exception which is different.
Make it use common_exception so the upcoming changes to autogenerate the
entry stubs do not have to special case int3.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Remove the pointless difference between 32 and 64 bit to make further
unifications simpler.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
All exception entry points must have ASM_CLAC right at the
beginning. The general_protection entry is missing one.
Fixes: e59d1b0a2419 ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry")
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The old x86_32 doublefault_fn() was old and crufty, and it did not
even try to recover. do_double_fault() is much nicer. Rewrite the
32-bit double fault code to sanitize CPU state and call
do_double_fault(). This is mostly an exercise i386 archaeology.
With this patch applied, 32-bit double faults get a real stack trace,
just like 64-bit double faults.
[ mingo: merged the patch to a later kernel base. ]
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 iopl updates from Ingo Molnar:
"This implements a nice simplification of the iopl and ioperm code that
Thomas Gleixner discovered: we can implement the IO privilege features
of the iopl system call by using the IO permission bitmap in
permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
if they change the interrupt flag.
This implements that feature, with testing facilities and related
cleanups"
[ "Simplification" may be an over-statement. The main goal is to avoid
the cli/sti of iopl by effectively implementing the IO port access
parts of iopl in terms of ioperm.
This may end up not workign well in case people actually depend on
cli/sti being available, or if there are mixed uses of iopl and
ioperm. We will see.. - Linus ]
* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/ioperm: Fix use of deprecated config option
x86/entry/32: Clarify register saving in __switch_to_asm()
selftests/x86/iopl: Extend test to cover IOPL emulation
x86/ioperm: Extend IOPL config to control ioperm() as well
x86/iopl: Remove legacy IOPL option
x86/iopl: Restrict iopl() permission scope
x86/iopl: Fixup misleading comment
selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
x86/ioperm: Share I/O bitmap if identical
x86/ioperm: Remove bitmap if all permissions dropped
x86/ioperm: Move TSS bitmap update to exit to user work
x86/ioperm: Add bitmap sequence number
x86/ioperm: Move iobitmap data into a struct
x86/tss: Move I/O bitmap data into a seperate struct
x86/io: Speedup schedule out of I/O bitmap user
x86/ioperm: Avoid bitmap allocation if no permissions are set
x86/ioperm: Simplify first ioperm() invocation logic
x86/iopl: Cleanup include maze
x86/tss: Fix and move VMX BUILD_BUG_ON()
x86/cpu: Unify cpu_init()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
|
|
UNWIND_ESPFIX_STACK needs to read the GDT, and the GDT mapping that
can be accessed via %fs is not mapped in the user pagetables. Use
SGDT to find the cpu_entry_area mapping and read the espfix offset
from that instead.
Reported-and-tested-by: Borislav Petkov <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When the NMI lands on an ESPFIX_SS, we are on the entry stack and must
swizzle, otherwise we'll run do_nmi() on the entry stack, which is
BAD.
Also, similar to the normal exception path, we need to correct the
ESPFIX magic before leaving the entry stack, otherwise pt_regs will
present a non-flat stack pointer.
Tested by running sigreturn_32 concurrent with perf-record.
Fixes: e5862d0515ad ("x86/entry/32: Leave the kernel via trampoline stack")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
|
|
Right now, we do some fancy parts of the exception entry path while SS
might have a nonzero base: we fill in regs->ss and regs->sp, and we
consider switching to the kernel stack. This results in regs->ss and
regs->sp referring to a non-flat stack and it may result in
overflowing the entry stack. The former issue means that we can try to
call iret_exc on a non-flat stack, which doesn't work.
Tested with selftests/x86/sigreturn_32.
Fixes: 45d7b255747c ("x86/entry/32: Enter the kernel via trampoline stack")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
This will allow us to get percpu access working before FIXUP_FRAME,
which will allow us to unwind ESPFIX earlier.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
When re-building the IRET frame we use %eax as an destination %esp,
make sure to then also match the segment for when there is a nonzero
SS base (ESPFIX).
[peterz: Changelog and minor edits]
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
|
|
As reported by Lai, the commit 3c88c692c287 ("x86/stackframe/32:
Provide consistent pt_regs") wrecked the IRET EXTABLE entry by making
.Lirq_return not point at IRET.
Fix this by placing IRET_FRAME in RESTORE_REGS, to mirror how
FIXUP_FRAME is part of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Reported-by: Lai Jiangshan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
|
|
Now that SS:ESP always get saved by SAVE_ALL, this also needs to be
accounted for in xen_iret_crit_fixup(). Otherwise the old_ax value gets
interpreted as EFLAGS, and hence VM86 mode appears to be active all the
time, leading to random "vm86_32: no user_vm86: BAD" log messages alongside
processes randomly crashing.
Since following the previous model (sitting after SAVE_ALL) would further
complicate the code _and_ retain the dependency of xen_iret_crit_fixup() on
frame manipulations done by entry_32.S, switch things around and do the
adjustment ahead of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Stable Team <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Once again RPL checks have been introduced which don't account for a 32-bit
kernel living in ring 1 when running in a PV Xen domain. The case in
FIXUP_FRAME has been preventing boot.
Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro
on a code path reachable when running in PV mode under Xen; I have to admit
that I stopped at a certain point trying to figure out whether there are
present ones.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Stable Team <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The C reimplementation of SYSENTER left that unused ENTRY() label
around. Remove it.
Fixes: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path")
Originally-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
commit 6690e86be83a ("sched/x86: Save [ER]FLAGS on context switch")
re-introduced the flags saving on context switch to prevent AC leakage.
The pushf/popf instructions are right among the callee saved register
section, so the comment explaining the save/restore is not entirely
correct.
Add a seperate comment to pushf/popf explaining the reason.
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Now, ENTRY/ENDPROC can be forced to be undefined on X86, so do so.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Allison Randal <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Bill Metzenthen <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: linux-efi <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: [email protected]
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Change all assembly code which is marked using END (and not ENDPROC) to
appropriate new markings SYM_CODE_START and SYM_CODE_END.
And since the last user of END on X86 is gone now, make sure that END is
not defined there.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
All these are functions which are invoked from elsewhere but they are
not typical C functions. So annotate them using the new SYM_CODE_START.
All these were not balanced with any END, so mark their ends by
SYM_CODE_END, appropriately.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]> [xen bits]
Reviewed-by: Rafael J. Wysocki <[email protected]> [hibernate]
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Len Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: "Steven Rostedt (VMware)" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Convert the remaining 32bit users and remove the GLOBAL macro finally.
In particular, this means to use SYM_ENTRY for the singlestepping hack
region.
Exclude the global definition of GLOBAL from x86 too.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
* annotate functions properly by SYM_CODE_START, SYM_CODE_START_LOCAL*
and SYM_CODE_END -- these are not C-like functions, so they have to
be annotated using CODE.
* use SYM_INNER_LABEL* for labels being in the middle of other functions
This prevents nested labels annotations.
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
functionality which today depends on CONFIG_PREEMPT.
Switch the entry code, preempt and kprobes conditionals over to
CONFIG_PREEMPTION.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") added the
address parameter to do_async_page_fault(), but does not pass it from the
32-bit entry point. To plumb it through, factor-out
common_exception_read_cr2 in the same fashion as common_exception, and uses
it from both page_fault and async_page_fault.
For a 32-bit KVM guest, this fixes:
Run /sbin/init as init process
Starting init: /sbin/init exists but couldn't execute it (error -14)
Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption")
Signed-off-by: Matt Mullins <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Despite the current efforts to read CR2 before tracing happens there still
exist a number of possible holes:
idtentry page_fault do_page_fault has_error_code=1
call error_entry
TRACE_IRQS_OFF
call trace_hardirqs_off*
#PF // modifies CR2
CALL_enter_from_user_mode
__context_tracking_exit()
trace_user_exit(0)
#PF // modifies CR2
call do_page_fault
address = read_cr2(); /* whoopsie */
And similar for i386.
Fix it by pulling the CR2 read into the entry code, before any of that
stuff gets a chance to run and ruin things.
Reported-by: He Zhe <[email protected]>
Reported-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Debugged-by: Steven Rostedt <[email protected]>
|
|
Adding one more option to SAVE_ALL can be used in common_exception to
simplify things. This also saves duplication later where page_fault will no
longer use common_exception.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
common_spurious is currently ENDed erroneously. common_interrupt is used
in its ENDPROC. So fix this mistake.
Found by my asm macros rewrite patchset.
Fixes: f8a8fe61fec8 ("x86/irq: Seperate unused system vectors from spurious entry again")
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"Most of the changes relate to Peter Zijlstra's cleanup of ptregs
handling, in particular the i386 part is now much simplified and
standardized - no more partial ptregs stack frames via the esp/ss
oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump
and kgdb.
There's also a CR4 hardening enhancements by Kees Cook, to make the
generic platform functions such as native_write_cr4() less useful as
ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0
against similar attacks.
The rest is smaller cleanups/fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Add int3_emulate_call() selftest
x86/stackframe/32: Allow int3_emulate_push()
x86/stackframe/32: Provide consistent pt_regs
x86/stackframe, x86/ftrace: Add pt_regs frame annotations
x86/stackframe, x86/kprobes: Fix frame pointer annotations
x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
x86/entry/32: Clean up return from interrupt preemption path
x86/asm: Pin sensitive CR0 bits
x86/asm: Pin sensitive CR4 bits
Documentation/x86: Fix path to entry_32.S
x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
|
|
Quite some time ago the interrupt entry stubs for unused vectors in the
system vector range got removed and directly mapped to the spurious
interrupt vector entry point.
Sounds reasonable, but it's subtly broken. The spurious interrupt vector
entry point pushes vector number 0xFF on the stack which makes the whole
logic in __smp_spurious_interrupt() pointless.
As a consequence any spurious interrupt which comes from a vector != 0xFF
is treated as a real spurious interrupt (vector 0xFF) and not
acknowledged. That subsequently stalls all interrupt vectors of equal and
lower priority, which brings the system to a grinding halt.
This can happen because even on 64-bit the system vector space is not
guaranteed to be fully populated. A full compile time handling of the
unused vectors is not possible because quite some of them are conditonally
populated at runtime.
Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
but gains the proper handling back. There is no point to selectively spare
some of the stubs which are known at compile time as the required code in
the IDT management would be way larger and convoluted.
Do not route the spurious entries through common_interrupt and do_IRQ() as
the original code did. Route it to smp_spurious_interrupt() which evaluates
the vector number and acts accordingly now that the real vector numbers are
handed in.
Fixup the pr_warn so the actual spurious vector (0xff) is clearly
distiguished from the other vectors and also note for the vectored case
whether it was pending in the ISR or not.
"Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
"Spurious interrupt vector 0xed on CPU#1. Acked."
"Spurious interrupt vector 0xee on CPU#1. Not pending!."
Fixes: 2414e021ac8d ("x86: Avoid building unused IRQ entry stubs")
Reported-by: Jan Kiszka <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Jan Beulich <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Currently pt_regs on x86_32 has an oddity in that kernel regs
(!user_mode(regs)) are short two entries (esp/ss). This means that any
code trying to use them (typically: regs->sp) needs to jump through
some unfortunate hoops.
Change the entry code to fix this up and create a full pt_regs frame.
This then simplifies various trampolines in ftrace and kprobes, the
stack unwinder, ptrace, kdump and kgdb.
Much thanks to Josh for help with the cleanups!
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In preparation for wider use, move the ENCODE_FRAME_POINTER macros to
a common header and provide inline asm versions.
These macros are used to encode a pt_regs frame for the unwinder; see
unwind_frame.c:decode_frame_pointer().
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The code flow around the return from interrupt preemption point seems
needlessly complicated.
There is only one site jumping to resume_kernel, and none (outside of
resume_kernel) jumping to restore_all_kernel. Inline resume_kernel
in restore_all_kernel and avoid the CONFIG_PREEMPT dependent label.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry cleanup from Ingo Molnar:
"A single commit that removes a redundant complication from
preempt-schedule handling in the x86 entry code"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Remove unneeded need_resched() loop
|
|
Since the enabling and disabling of IRQs within preempt_schedule_irq() is
contained in a need_resched() loop, there is no need for the outer
architecture specific loop.
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Effectively reverts commit:
2c7577a75837 ("sched/x86_64: Don't save flags on context switch")
Specifically because SMAP uses FLAGS.AC which invalidates the claim
that the kernel has clean flags.
In particular; while preemption from interrupt return is fine (the
IRET frame on the exception stack contains FLAGS) it breaks any code
that does synchonous scheduling, including preempt_enable().
This has become a significant issue ever since commit:
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
provided for means of having 'normal' C code between STAC / CLAC,
exposing the FLAGS.AC state. So far this hasn't led to trouble,
however fix it before it comes apart.
Reported-by: Julien Thierry <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull stackleak gcc plugin from Kees Cook:
"Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin
was ported from grsecurity by Alexander Popov. It provides efficient
stack content poisoning at syscall exit. This creates a defense
against at least two classes of flaws:
- Uninitialized stack usage. (We continue to work on improving the
compiler to do this in other ways: e.g. unconditional zero init was
proposed to GCC and Clang, and more plugin work has started too).
- Stack content exposure. By greatly reducing the lifetime of valid
stack contents, exposures via either direct read bugs or unknown
cache side-channels become much more difficult to exploit. This
complements the existing buddy and heap poisoning options, but
provides the coverage for stacks.
The x86 hooks are included in this series (which have been reviewed by
Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already
been merged through the arm64 tree (written by Laura Abbott and
reviewed by Mark Rutland and Will Deacon).
With VLAs having been removed this release, there is no need for
alloca() protection, so it has been removed from the plugin"
* tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: Drop unneeded stackleak_check_alloca()
stackleak: Allow runtime disabling of kernel stack erasing
doc: self-protection: Add information about STACKLEAK feature
fs/proc: Show STACKLEAK metrics in the /proc file system
lkdtm: Add a test for STACKLEAK
gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
|
|
Even if not on an entry stack, the CS's high bits must be
initialized because they are unconditionally evaluated in
PARANOID_EXIT_TO_KERNEL_MODE.
Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards.
[ bp: Make the commit message tone passive and impartial. ]
Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Joerg Roedel <[email protected]>
Acked-by: Joerg Roedel <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andrea Arcangeli <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Boris Ostrovsky <[email protected]>
CC: Brian Gerst <[email protected]>
CC: Dave Hansen <[email protected]>
CC: David Laight <[email protected]>
CC: Denys Vlasenko <[email protected]>
CC: Eduardo Valentin <[email protected]>
CC: Greg KH <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Jiri Kosina <[email protected]>
CC: Josh Poimboeuf <[email protected]>
CC: Juergen Gross <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Will Deacon <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: linux-mm <[email protected]>
CC: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The STACKLEAK feature (initially developed by PaX Team) has the following
benefits:
1. Reduces the information that can be revealed through kernel stack leak
bugs. The idea of erasing the thread stack at the end of syscalls is
similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
crypto, which all comply with FDP_RIP.2 (Full Residual Information
Protection) of the Common Criteria standard.
2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
CVE-2010-2963). That kind of bugs should be killed by improving C
compilers in future, which might take a long time.
This commit introduces the code filling the used part of the kernel
stack with a poison value before returning to userspace. Full
STACKLEAK feature also contains the gcc plugin which comes in a
separate commit.
The STACKLEAK feature is ported from grsecurity/PaX. More information at:
https://grsecurity.net/
https://pax.grsecurity.net/
This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.
Performance impact:
Hardware: Intel Core i7-4770, 16 GB RAM
Test #1: building the Linux kernel on a single core
0.91% slowdown
Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
4.2% slowdown
So the STACKLEAK description in Kconfig includes: "The tradeoff is the
performance impact: on a single CPU system kernel compilation sees a 1%
slowdown, other systems and workloads may vary and you are advised to
test this feature on your expected workload before deploying it".
Signed-off-by: Alexander Popov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
|
|
All functions in arch/x86/xen/irq.c and arch/x86/xen/xen-asm*.S are
specific to PV guests. Include them in the kernel with CONFIG_XEN_PV only.
Make the PV specific code in arch/x86/entry/entry_*.S dependent on
CONFIG_XEN_PV instead of CONFIG_XEN.
The HVM specific code should depend on CONFIG_XEN_PVHVM.
While at it reformat the Makefile to make it more readable.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The SWITCH_TO_KERNEL_STACK macro only checks for CPL == 0 to go down the
slow and paranoid entry path. The problem is that this check also returns
true when coming from VM86 mode. This is not a problem by itself, as the
paranoid path handles VM86 stack-frames just fine, but it is not necessary
as the normal code path handles VM86 mode as well (and faster).
Extend the check to include VM86 mode. This also makes an optimization of
the paranoid path possible.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add code to check whether the kernel is entered and left with the correct
CR3 and make it depend on CONFIG_DEBUG_ENTRY. This is needed because there
is no NX protection of user-addresses in the kernel-CR3 on x86-32 and that
type of bug would not be detected otherwise.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Pavel Machek <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The NMI handler is special, as it needs to leave with the same CR3 as it
was entered with. This is required because the NMI can happen within kernel
context but with user CR3 already loaded, i.e. after switching to user CR3
but before returning to user space.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Pavel Machek <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add unconditional cr3 switches between user and kernel cr3 to all non-NMI
entry and exit points.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Pavel Machek <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The common exception entry code now handles the entry-from-sysenter stack
situation and makes sure to leave with the same stack as it entered the
kernel.
So there is no need anymore for the special handling in the debug entry
code.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Pavel Machek <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
It is possible that the kernel is entered from kernel-mode and on the
entry-stack. The most common way this happens is when an exception is
triggered while loading the user-space segment registers on the
kernel-to-userspace exit path.
The segment loading needs to be done after the entry-stack switch, because
the stack-switch needs kernel %fs for per_cpu access.
When this happens, make sure to leave the kernel with the entry-stack
again, so that the interrupted code-path runs on the right stack when
switching to the user-cr3.
Detect this condition on kernel-entry by checking CS.RPL and %esp, and if
it happens, copy over the complete content of the entry stack to the
task-stack. This needs to be done because once the exception handler is
entereed, the task might be scheduled out or even migrated to a different
CPU, so this cannot rely on the entry-stack contents. Leave a marker in the
stack-frame to detect this condition on the exit path.
On the exit path the copy is reversed, copy all of the remaining task-stack
back to the entry-stack and switch to it.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Pavel Machek <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Laight <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrea Arcangeli <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: "David H . Gutteridge" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|