aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/entry/common.c
AgeCommit message (Collapse)AuthorFilesLines
2018-04-05x86/syscalls: Don't pointlessly reload the system call numberLinus Torvalds1-6/+6
We have it in a register in the low-level asm, just pass it in as an argument rather than have do_syscall_64() load it back in from the ptregs pointer. Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Dominik Brodowski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-02-04Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull spectre/meltdown updates from Thomas Gleixner: "The next round of updates related to melted spectrum: - The initial set of spectre V1 mitigations: - Array index speculation blocker and its usage for syscall, fdtable and the n180211 driver. - Speculation barrier and its usage in user access functions - Make indirect calls in KVM speculation safe - Blacklisting of known to be broken microcodes so IPBP/IBSR are not touched. - The initial IBPB support and its usage in context switch - The exposure of the new speculation MSRs to KVM guests. - A fix for a regression in x86/32 related to the cpu entry area - Proper whitelisting for known to be safe CPUs from the mitigations. - objtool fixes to deal proper with retpolines and alternatives - Exclude __init functions from retpolines which speeds up the boot process. - Removal of the syscall64 fast path and related cleanups and simplifications - Removal of the unpatched paravirt mode which is yet another source of indirect unproteced calls. - A new and undisputed version of the module mismatch warning - A couple of cleanup and correctness fixes all over the place Yet another step towards full mitigation. There are a few things still missing like the RBS underflow mitigation for Skylake and other small details, but that's being worked on. That said, I'm taking a belated christmas vacation for a week and hope that everything is magically solved when I'm back on Feb 12th" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/x86: Add IBPB support KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL x86/pti: Mark constant arrays as __initconst x86/spectre: Simplify spectre_v2 command line parsing x86/retpoline: Avoid retpolines for built-in __init functions x86/kvm: Update spectre-v1 mitigation KVM: VMX: make MSR bitmaps per-VCPU x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/spectre: Report get_user mitigation for spectre_v1 nl80211: Sanitize array index in parse_txq_params vfs, fdtable: Prevent bounds-check bypass via speculative execution x86/syscall: Sanitize syscall table de-references under speculation x86/get_user: Use pointer masking to limit speculation ...
2018-01-30x86/syscall: Sanitize syscall table de-references under speculationDan Williams1-1/+4
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
2018-01-30x86/asm: Move 'status' from thread_struct to thread_infoAndy Lutomirski1-2/+2
The TS_COMPAT bit is very hot and is accessed from code paths that mostly also touch thread_info::flags. Move it into struct thread_info to improve cache locality. The only reason it was in thread_struct is that there was a brief period during which arch-specific fields were not allowed in struct thread_info. Linus suggested further changing: ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); to: if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); on the theory that frequently dirtying the cacheline even in pure 64-bit code that never needs to modify status hurts performance. That could be a reasonable followup patch, but I suspect it matters less on top of this patch. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Kernel Hardening <[email protected]> Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
2017-12-04livepatch: send a fake signal to all blocking tasksMiroslav Benes1-3/+3
Live patching consistency model is of LEAVE_PATCHED_SET and SWITCH_THREAD. This means that all tasks in the system have to be marked one by one as safe to call a new patched function. Safe means when a task is not (sleeping) in a set of patched functions. That is, no patched function is on the task's stack. Another clearly safe place is the boundary between kernel and userspace. The patching waits for all tasks to get outside of the patched set or to cross the boundary. The transition is completed afterwards. The problem is that a task can block the transition for quite a long time, if not forever. It could sleep in a set of patched functions, for example. Luckily we can force the task to leave the set by sending it a fake signal, that is a signal with no data in signal pending structures (no handler, no sign of proper signal delivered). Suspend/freezer use this to freeze the tasks as well. The task gets TIF_SIGPENDING set and is woken up (if it has been sleeping in the kernel before) or kicked by rescheduling IPI (if it was running on other CPU). This causes the task to go to kernel/userspace boundary where the signal would be handled and the task would be marked as safe in terms of live patching. There are tasks which are not affected by this technique though. The fake signal is not sent to kthreads. They should be handled differently. They can be woken up so they leave the patched set and their TIF_PATCH_PENDING can be cleared thanks to stack checking. For the sake of completeness, if the task is in TASK_RUNNING state but not currently running on some CPU it doesn't get the IPI, but it would eventually handle the signal anyway. Second, if the task runs in the kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not handled on return from the interrupt. It would be handled on return to the userspace in the future when the fake signal is sent again. Stack checking deals with these cases in a better way. If the task was sleeping in a syscall it would be woken by our fake signal, it would check if TIF_SIGPENDING is set (by calling signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with ERESTART* return values are restarted in case of the fake signal (see do_signal()). EINTR is propagated back to the userspace program. This could disturb the program, but... * each process dealing with signals should react accordingly to EINTR return values. * syscalls returning EINTR happen to be quite common situation in the system even if no fake signal is sent. * freezer sends the fake signal and does not deal with EINTR anyhow. Thus EINTR values are returned when the system is resumed. The very safe marking is done in architectures' "entry" on syscall and interrupt/exception exit paths, and in a stack checking functions of livepatch. TIF_PATCH_PENDING is cleared and the next recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also call klp_update_patch_state() before do_signal(), so that recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING immediately and thus prevent a double call of do_signal(). Note that the fake signal is not sent to stopped/traced tasks. Such task prevents the patching to finish till it continues again (is not traced anymore). Last, sending the fake signal is not automatic. It is done only when admin requests it by writing 1 to signal sysfs attribute in livepatch sysfs directory. Signed-off-by: Miroslav Benes <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: [email protected] Acked-by: Michael Ellerman <[email protected]> (powerpc) Signed-off-by: Jiri Kosina <[email protected]>
2017-11-08x86: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker1-3/+1
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. It also makes no more sense to fix the IRQ flags when a bug is detected as the assertion is now pure config-dependent debugging. And to quote Peter Zijlstra: The whole if !disabled, disable logic is uber paranoid programming, but I don't think we've ever seen that WARN trigger, and if it does (and then burns the kernel) we at least know what happend. Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: David S . Miller <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland1-1/+1
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-08x86/syscalls: Check address limit on user-mode returnThomas Garnier1-0/+3
Ensure the address limit is a user-mode segment before returning to user-mode. Otherwise a process can corrupt kernel-mode memory and elevate privileges [1]. The set_fs function sets the TIF_SETFS flag to force a slow path on return. In the slow path, the address limit is checked to be USER_DS if needed. The addr_limit_user_check function is added as a cross-architecture function to check the address limit. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Signed-off-by: Thomas Garnier <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: David Howells <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Pratyush Anand <[email protected]> Cc: Russell King <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Kees Cook <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Will Drewry <[email protected]> Cc: [email protected] Cc: Oleg Nesterov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Paolo Bonzini <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-03-08livepatch/x86: add TIF_PATCH_PENDING thread flagJosh Poimboeuf1-3/+6
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch per-task consistency model for x86_64. The bit getting set indicates the thread has a pending patch which needs to be applied when the thread exits the kernel. The bit is placed in the _TIF_ALLWORK_MASK macro, which results in exit_to_usermode_loop() calling klp_update_patch_state() when it's set. Signed-off-by: Josh Poimboeuf <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Acked-by: Ingo Molnar <[email protected]> # for the x86 changes Signed-off-by: Jiri Kosina <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-09-15x86/entry: Get rid of pt_regs_to_thread_info()Linus Torvalds1-14/+6
It was a nice optimization while it lasted, but thread_info is moving and this optimization will no longer work. Quoting Linus: Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want to do unspeakable acts on a poor innocent wax figure that looked _exactly_ like you. [ Changelog written by Andy. ] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jann Horn <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-09-15x86/asm: Move the thread_info::status field to thread_structAndy Lutomirski1-2/+2
Because sched.h and thread_info.h are a tangled mess, I turned in_compat_syscall() into a macro. If we had current_thread_struct() or similar and we could use it from thread_info.h, then this would be a bit cleaner. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jann Horn <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/ccc8a1b2f41f9c264a41f771bb4a6539a642ad72.1473801993.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-08-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes and a cleanup-fix, to the syscall entry code and to ptrace" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace x86/ptrace: Stop setting TS_COMPAT in ptrace code x86/vdso: Error out if the vDSO isn't a valid DSO
2016-07-29Merge branch 'next' of ↵Linus Torvalds1-86/+20
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
2016-07-27x86/ptrace: Stop setting TS_COMPAT in ptrace codeAndy Lutomirski1-1/+5
Setting TS_COMPAT in ptrace is wrong: if we happen to do it during syscall entry, then we'll confuse seccomp and audit. (The former isn't a security problem: seccomp is currently entirely insecure if a malicious ptracer is attached.) As a minimal fix, this patch adds a new flag TS_I386_REGS_POKED that handles the ptrace special case. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/5383ebed38b39fa37462139e337aff7f2314d1ca.1469599803.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-07-10x86/entry: Inline enter_from_user_mode()Paolo Bonzini1-1/+1
This matches what is already done for prepare_exit_to_usermode(), and saves about 60 clock cycles (4% speedup) with the benchmark in the previous commit message. Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-07-10x86/entry: Avoid interrupt flag save and restorePaolo Bonzini1-2/+2
Thanks to all the work that was done by Andy Lutomirski and others, enter_from_user_mode() and prepare_exit_to_usermode() are now called only with interrupts disabled. Let's provide them a version of user_enter()/user_exit() that skips saving and restoring the interrupt flag. On an AMD-based machine I tested this patch on, with force-enabled context tracking, the speed-up in system calls was 90 clock cycles or 6%, measured with the following simple benchmark: #include <sys/signal.h> #include <time.h> #include <unistd.h> #include <stdio.h> unsigned long rdtsc() { unsigned long result; asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n" "or %%rdx, %%rax" : "=a" (result) : : "rdx"); return result; } int main() { unsigned long tsc1, tsc2; int pid = getpid(); int i; tsc1 = rdtsc(); for (i = 0; i < 100000000; i++) kill(pid, SIGWINCH); tsc2 = rdtsc(); printf("%ld\n", tsc2 - tsc1); } Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-14x86/ptrace: run seccomp after ptraceKees Cook1-10/+12
This moves seccomp after ptrace on x86 to that seccomp can catch changes made by ptrace. Emulation should skip the rest of processing too. We can get rid of test_thread_flag because there's no longer any opportunity for seccomp to mess with ptrace state before invoking ptrace. Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: [email protected] Cc: Andy Lutomirski <[email protected]>
2016-06-14x86/entry: Get rid of two-phase syscall entry workAndy Lutomirski1-76/+8
I added two-phase syscall entry work back when the entry slow path was very slow. Nowadays, the entry slow path is fast and two-phase entry work serves no purpose. Remove it. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2016-04-19x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()Dmitry Safonov1-1/+1
The is_ia32_task()/is_x32_task() function names are a big misnomer: they suggests that the compat-ness of a system call is a task property, which is not true, the compatness of a system call purely depends on how it was invoked through the system call layer. A task may call 32-bit and 64-bit and x32 system calls without changing any of its kernel visible state. This specific minomer is also actively dangerous, as it might cause kernel developers to use the wrong kind of security checks within system calls. So rename it to in_{ia32,x32}_syscall(). Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> [ Expanded the changelog. ] Acked-by: Andy Lutomirski <[email protected]> Cc: [email protected] Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry: Call enter_from_user_mode() with IRQs offAndy Lutomirski1-22/+11
Now that slow-path syscalls always enter C before enabling interrupts, it's straightforward to call enter_from_user_mode() before enabling interrupts rather than doing it as part of entry tracing. With this change, we should finally be able to retire exception_enter(). This will also enable optimizations based on knowing that we never change context tracking state with interrupts on. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry/32: Change INT80 to be an interrupt gateAndy Lutomirski1-12/+3
We want all of the syscall entries to run with interrupts off so that we can efficiently run context tracking before enabling interrupts. This will regress int $0x80 performance on 32-bit kernels by a couple of cycles. This shouldn't matter much -- int $0x80 is not a fast path. This effectively reverts: 657c1eea0019 ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on") ... and fixes the same issue differently. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-03-10x86/entry: Remove TIF_SINGLESTEP entry workAndy Lutomirski1-10/+0
Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall entry code. Remove it. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17x86/entry/compat: Keep TS_COMPAT set during signal deliveryAndy Lutomirski1-10/+13
Signal delivery needs to know the sign of an interrupted syscall's return value in order to detect -ERESTART variants. Normally this works independently of bitness because syscalls internally return long. Under ptrace, however, this can break, and syscall_get_error is supposed to sign-extend regs->ax if needed. We were clearing TS_COMPAT too early, though, and this prevented sign extension, which subtly broke syscall restart under ptrace. Reported-by: Robert O'Callahan <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] # 4.3.x- Fixes: c5c46f59e4e7 ("x86/entry: Add new, comprehensible entry and exit handlers written in C") Link: http://lkml.kernel.org/r/cbce3cf545522f64eb37f5478cb59746230db3b5.1455142412.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov1-0/+1
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-29x86/entry/64: Migrate the 64-bit syscall slow path to CAndy Lutomirski1-0/+26
This is more complicated than the 32-bit and compat cases because it preserves an asm fast path for the case where the callee-saved regs aren't needed in pt_regs and no entry or exit work needs to be done. This appears to slow down fastpath syscalls by no more than one cycle on my Skylake laptop. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/ce2335a4d42dc164b24132ee5e8c7716061f947b.1454022279.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-12-21x86/entry: Restore traditional SYSENTER calling conventionAndy Lutomirski1-3/+3
It turns out that some Android versions hardcode the SYSENTER calling convention. This is buggy and will cause problems no matter what the kernel does. Nonetheless, we should try to support it. Credit goes to Linus for pointing out a clean way to handle the SYSENTER/SYSCALL clobber differences while preserving straightforward DWARF annotations. I believe that the original offending Android commit was: https://android.googlesource.com/platform%2Fbionic/+/7dc3684d7a2587e43e6d2a8e0e3f39bf759bd535 Reported-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-and-tested-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Su Tao <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Mingwei Shi <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-18x86/entry/32: Fix entry_INT80_32() to expect interrupts to be onAndy Lutomirski1-3/+12
When I rewrote entry_INT80_32, I thought that int80 was an interrupt gate. It's a trap gate. *facepalm* Thanks to Brian Gerst for pointing out that it's better to change the entry code than to change the gate type. Suggested-by: Brian Gerst <[email protected]> Reported-and-tested-by: Borislav Petkov <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 150ac78d63af ("x86/entry/32: Switch INT80 to the new C syscall path") Link: http://lkml.kernel.org/r/dc09d9b574a5c1dcca996847875c73f8341ce0ad.1445035014.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Split and inline syscall_return_slowpath()Andy Lutomirski1-21/+29
GCC is unable to properly optimize functions that have a very short likely case and a longer and register-heavier cold part -- it fails to sink all of the register saving and stack frame setup code into the unlikely part. Help it out with syscall_return_slowpath() by splitting it into two parts and inline the hot part. Saves 6 cycles for compat syscalls. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/0f773a894ab15c589ac794c2d34ca6ba9b5335c9.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Split and inline prepare_exit_to_usermode()Andy Lutomirski1-15/+28
GCC is unable to properly optimize functions that have a very short likely case and a longer and register-heavier cold part -- it fails to sink all of the register saving and stack frame setup code into the unlikely part. Help it out with prepare_exit_to_usermode() by splitting it into two parts and inline the hot part. Saves 6-8 cycles for compat syscalls. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/9fc53eda4a5b924070952f12fa4ae3e477640a07.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Use pt_regs_to_thread_info() in syscall entry tracingAndy Lutomirski1-11/+11
It generates simpler and faster code than current_thread_info(). Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/a3b6633e7dcb9f673c1b619afae602d29d27d2cf.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRYAndy Lutomirski1-2/+4
This shaves a few cycles off the slow paths. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/ce383fa9e129286ce6da6e00b53acd4c9fb5d06a.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Micro-optimize compat fast syscall arg fetchAndy Lutomirski1-2/+14
We're following a 32-bit pointer, and the uaccess code isn't smart enough to figure out that the access_ok() check isn't needed. This saves about three cycles on a cache-hot fast syscall. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/bdff034e2f23c5eb974c760cf494cb5bddce8f29.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Force inlining of 32-bit syscall codeAndy Lutomirski1-3/+5
On systems that support fast syscalls, we only really care about the performance of the fast syscall path. Forcibly inline it and add a likely annotation. This saves 4-6 cycles. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/8472036ff1f4b426b4c4c3e3d0b3bf5264407c0c.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Make irqs_disabled checks in exit code depend on lockdepAndy Lutomirski1-3/+3
These checks are quite slow. Disable them in non-lockdep kernels to reduce the performance hit. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/eccff2a154ae6fb50f40228901003a6e9c24f3d0.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscallsAndy Lutomirski1-7/+11
This is slightly messy, but it eliminates an unnecessary cli;sti pair. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/22f34b1096694a37326f36c53407b8dd90f37948.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry/32: Re-implement SYSENTER using the new C pathAndy Lutomirski1-2/+15
Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/5b99659e8be70f3dd10cd8970a5c90293d9ad9a7.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry/compat: Implement opportunistic SYSRETL for compat syscallsAndy Lutomirski1-3/+20
If CS, SS and IP are as expected and FLAGS is compatible with SYSRETL, then return from fast compat syscalls (both SYSCALL and SYSENTER) using SYSRETL. Unlike native 64-bit opportunistic SYSRET, this is not invisible to user code: RCX and R8-R15 end up in a different state than shown saved in pt_regs. To compensate, we only do this when returning to the vDSO fast syscall return path. This won't interfere with syscall restart, as we won't use SYSRETL when returning to the INT80 restart instruction. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/aa15e49db33773eb10b73d73466b6d5466d7856a.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Add C code for fast system call entriesAndy Lutomirski1-0/+43
This handles both SYSENTER and SYSCALL. The asm glue will take care of the differences. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/6041a58a9b8ef6d2522ab4350deb1a1945eb563f.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-09x86/entry: Add do_syscall_32(), a C function to do 32-bit syscallsAndy Lutomirski1-0/+43
System calls are really quite simple. Add a helper to call a 32-bit system call. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/a77ed179834c27da436fb4a7fb23c8ee77abc11c.1444091585.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-10-07x86/entry, locking/lockdep: Move lockdep_sys_exit() to ↵Andy Lutomirski1-0/+2
prepare_exit_to_usermode() Rather than worrying about exactly where LOCKDEP_SYS_EXIT should go in the asm code, add it to prepare_exit_from_usermode() and remove all of the asm calls that are followed by prepare_exit_to_usermode(). LOCKDEP_SYS_EXIT now appears only in the syscall fast paths. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1736ebe948b845e68120b86b89091f3ec27f5e8e.1444091584.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-08-05x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masksAndy Lutomirski1-57/+0
They are no longer used. Good riddance! Deleting the TIF_ macros is really nice. It was never clear why there were so many variants. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eric Paris <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/22c61682f446628573dde0f1d573ab821677e06da.1438378274.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-07-17x86/entry: Fix _TIF_USER_RETURN_NOTIFY check in prepare_exit_to_usermodeAndy Lutomirski1-1/+2
Linus noticed that the early return check was missing _TIF_USER_RETURN_NOTIFY. If the only work flag was _TIF_USER_RETURN_NOTIFY, we'd skip user return notifiers. Fix it. (This is the only missing bit.) This fixes double faults on a KVM host. It's the same issue as last time, except that this time it's very easy to trigger. Apparently no one uses -next as a KVM host. ( I'm still not quite sure what it is that KVM does that blows up so badly if we miss a user return notifier. My best guess is that KVM lets KERNEL_GS_BASE (i.e. the user's gs base) be negative and fixes it up in a user return notifier. If we actually end up in user mode with a negative gs base, we blow up pretty badly. ) Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: c5c46f59e4e7 ("x86/entry: Add new, comprehensible entry and exit handlers written in C") Link: http://lkml.kernel.org/r/3f801104d24ee7a6bb1446408d9950777aa63277.1436995419.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-07-07x86/entry: Add new, comprehensible entry and exit handlers written in CAndy Lutomirski1-1/+111
The current x86 entry and exit code, written in a mixture of assembly and C code, is incomprehensible due to being open-coded in a lot of places without coherent documentation. It appears to work primary by luck and duct tape: i.e. obvious runtime failures were fixed on-demand, without re-thinking the design. Due to those reasons our confidence level in that code is low, and it is very difficult to incrementally improve. Add new code written in C, in preparation for simply deleting the old entry code. prepare_exit_to_usermode() is a new function that will handle all slow path exits to user mode. It is called with IRQs disabled and it leaves us in a state in which it is safe to immediately return to user mode. IRQs must not be re-enabled at any point after prepare_exit_to_usermode() returns and user mode is actually entered. (We can, of course, fail to enter user mode and treat that failure as a fresh entry to kernel mode.) All callers of do_notify_resume() will be migrated to call prepare_exit_to_usermode() instead; prepare_exit_to_usermode() needs to do everything that do_notify_resume() does today, but it also takes care of scheduling and context tracking. Unlike do_notify_resume(), it does not need to be called in a loop. syscall_return_slowpath() is exactly what it sounds like: it will be called on any syscall exit slow path. It will replace syscall_trace_leave() and it calls prepare_exit_to_usermode() on the way out. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/c57c8b87661a4152801d7d3786eac2d1a2f209dd.1435952415.git.luto@kernel.org [ Improved the changelog a bit. ] Signed-off-by: Ingo Molnar <[email protected]>
2015-07-07x86/entry: Add enter_from_user_mode() and use it in syscallsAndy Lutomirski1-1/+12
Changing the x86 context tracking hooks is dangerous because there are no good checks that we track our context correctly. Add a helper to check that we're actually in CONTEXT_USER when we enter from user mode and wire it up for syscall entries. Subsequent patches will wire this up for all non-NMI entries as well. NMIs are their own special beast and cannot currently switch overall context tracking state. Instead, they have their own special RCU hooks. This is a tiny speedup if !CONFIG_CONTEXT_TRACKING (removes a branch) and a tiny slowdown if CONFIG_CONTEXT_TRACING (adds a layer of indirection). Eventually, we should fix up the core context tracking code to supply a function that does what we want (and can be much simpler than user_exit), which will enable us to get rid of the extra call. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/853b42420066ec3fb856779cdc223a6dcb5d355b.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-07-07x86/entry: Move C entry and exit code to arch/x86/entry/common.cAndy Lutomirski1-0/+253
The entry and exit C helpers were confusingly scattered between ptrace.c and signal.c, even though they aren't specific to ptrace or signal handling. Move them together in a new file. This change just moves code around. It doesn't change anything. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/324d686821266544d8572423cc281f961da445f4.1435952415.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>