aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-09-07x86/boot/compressed/64: Add stage1 #VC handlerJoerg Roedel1-0/+66
Add the first handler for #VC exceptions. At stage 1 there is no GHCB yet because the kernel might still be running on the EFI page table. The stage 1 handler is limited to the MSR-based protocol to talk to the hypervisor and can only support CPUID exit-codes, but that is enough to get to stage 2. [ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-07x86/umip: Factor out instruction decodingJoerg Roedel1-22/+1
Factor out the code used to decode an instruction with the correct address and operand sizes to a helper function. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-07x86/umip: Factor out instruction fetchJoerg Roedel1-20/+6
Factor out the code to fetch the instruction from user-space to a helper function. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-07x86/cpufeatures: Add SEV-ES CPU featureTom Lendacky2-1/+3
Add CPU feature detection for Secure Encrypted Virtualization with Encrypted State. This feature enhances SEV by also encrypting the guest register state, making it in-accessible to the hypervisor. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-07Merge 'x86/cpu' to pick up dependent bitsBorislav Petkov1-13/+27
Pick up work happening in parallel to avoid nasty merge conflicts later. Signed-off-by: Borislav Petkov <[email protected]>
2020-09-05x86/resctrl: Fix spelling in user-visible warning messagesColin Ian King1-2/+2
Fix spelling mistake "Could't" -> "Couldn't" in user-visible warning messages. [ bp: Massage commit message; s/cpu/CPU/g ] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-04x86/debug: Change thread.debugreg6 to thread.virtual_dr6Peter Zijlstra4-23/+25
Current usage of thread.debugreg6 is convoluted at best. It starts life as a copy of the hardware DR6 value, but then various bits are cleared and set. Replace this with a new variable thread.virtual_dr6 that is initialized to 0 when DR6 is read and only gains bits, at the same time the actual (on stack) dr6 value which is read from the hardware only gets bits cleared. Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Support negative polarity DR6 bitsPeter Zijlstra3-6/+5
DR6 has a whole bunch of bits that have negative polarity; they were architecturally reserved and defined to be 1 and are now getting used. Since they're 1 by default, 0 becomes the signal value. Handle this by xor'ing the read DR6 value by the reserved mask, this will flip them around such that 1 is the signal value (positive polarity). Current Linux doesn't yet support any of these bits, but there's two defined: - DR6[11] Bus Lock Debug Exception (ISEr39) - DR6[16] Restricted Transactional Memory (SDM) Update ptrace_{set,get}_debugreg() to provide/consume the value in architectural polarity. Although afaict ptrace_set_debugreg(6) is pointless, the value is not consumed anywhere. Change hw_breakpoint_restore() to alway write the DR6_RESERVED value to DR6, again, no consumer for that write. Suggested-by: Andrew Cooper <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Simplify hw_breakpoint_handler()Peter Zijlstra1-6/+2
This is called with interrupts disabled, there's no point in using get_cpu() and per_cpu(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Remove aout_dump_debugregs()Peter Zijlstra1-36/+0
Unused remnants for the bit-bucket. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Remove the historical junkPeter Zijlstra1-11/+12
Remove the historical junk and replace it with a WARN and a comment. The problem is that even though the kernel only uses TF single-step in kprobes and KGDB, both of which consume the event before this, QEMU/KVM has bugs in this area that can trigger this state so it has to be dealt with. Suggested-by: Brian Gerst <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Move cond_local_irq_enable() block into exc_debug_user()Peter Zijlstra1-29/+29
The cond_local_irq_enable() block, dealing with vm86 and sending signals is only relevant for #DB-from-user, move it there. This then reduces handle_debug() to only the notifier call, so rename it to notify_debug(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Move historical SYSENTER junk into exc_debug_kernel()Peter Zijlstra1-24/+25
The historical SYSENTER junk is explicitly for from-kernel, so move it to the #DB-from-kernel handler. It is ordered after the notifier, which is important for KGDB which uses TF single-step and needs to consume the event before that point. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Simplify #DB signal codePeter Zijlstra1-6/+9
There's no point in calculating si_code if it's not going to be used. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Remove handle_debug(.user) argumentPeter Zijlstra1-11/+10
The handle_debug(.user) argument is used to terminate the #DB handler early for the INT1-from-kernel case, since the kernel doesn't use INT1. Remove the argument and handle this explicitly in #DB-from-kernel. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Move kprobe_debug_handler() into exc_debug_kernel()Peter Zijlstra1-6/+4
Kprobes are on kernel text, and thus only matter for #DB-from-kernel. Kprobes are ordered before the generic notifier, preserve that order. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Sync BTF earlierPeter Zijlstra1-7/+7
Move the BTF sync near the DR6 load, as this will be the only common code guaranteed to run on every #DB. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-04x86/debug: Allow a single level of #DB recursionAndy Lutomirski1-34/+31
Trying to clear DR7 around a #DB from usermode malfunctions if the tasks schedules when delivering SIGTRAP. Rather than trying to define a special no-recursion region, just allow a single level of recursion. The same mechanism is used for NMI, and it hasn't caused any problems yet. Fixes: 9f58fdde95c9 ("x86/db: Split out dr6/7 handling") Reported-by: Kyle Huey <[email protected]> Debugged-by: Josh Poimboeuf <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Daniel Thompson <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/8b9bd05f187231df008d48cf818a6a311cbd5c98.1597882384.git.luto@kernel.org Link: https://lore.kernel.org/r/[email protected]
2020-09-04arm64: mte: Add specific SIGSEGV codesVincenzo Frascino1-1/+1
Add MTE-specific SIGSEGV codes to siginfo.h and update the x86 BUILD_BUG_ON(NSIGSEGV != 7) compile check. Signed-off-by: Vincenzo Frascino <[email protected]> [[email protected]: renamed precise/imprecise to sync/async] [[email protected]: dropped #ifdef __aarch64__, renumbered] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Will Deacon <[email protected]>
2020-09-03dma-mapping: introduce dma_get_seg_boundary_nr_pages()Nicolin Chen1-2/+1
We found that callers of dma_get_seg_boundary mostly do an ALIGN with page mask and then do a page shift to get number of pages: ALIGN(boundary + 1, 1 << shift) >> shift However, the boundary might be as large as ULONG_MAX, which means that a device has no specific boundary limit. So either "+ 1" or passing it to ALIGN() would potentially overflow. According to kernel defines: #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) #define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1) We can simplify the logic here into a helper function doing: ALIGN(boundary + 1, 1 << shift) >> shift = ALIGN_MASK(b + 1, (1 << s) - 1) >> s = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s = [b + 1 + (1 << s) - 1] >> s = [b + (1 << s)] >> s = (b >> s) + 1 This patch introduces and applies dma_get_seg_boundary_nr_pages() as an overflow-free helper for the dma_get_seg_boundary() callers to get numbers of pages. It also takes care of the NULL dev case for non-DMA API callers. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Acked-by: Niklas Schnelle <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Signed-off-by: Christoph Hellwig <[email protected]>
2020-09-01x86/build: Add asserts for unwanted sectionsKees Cook1-0/+24
In preparation for warning on orphan sections, enforce other expected-to-be-zero-sized sections (since discarding them might hide problems with them suddenly gaining unexpected entries). Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/build: Enforce an empty .got.plt sectionKees Cook1-1/+13
The .got.plt section should always be zero (or filled only with the linker-generated lazy dispatch entry). Enforce this with an assert and mark the section as INFO. This is more sensitive than just blindly discarding the section. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Allow early initPeter Zijlstra2-1/+6
In order to use static_call() to wire up x86_pmu, we need to initialize earlier, specifically before memory allocation works; copy some of the tricks from jump_label to enable this. Primarily we overload key->next to store a sites pointer when there are no modules, this avoids having to use kmalloc() to initialize the sites and allows us to run much earlier. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Add some validationPeter Zijlstra1-2/+26
Verify the text we're about to change is as we expect it to be. Requested-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Handle tail-callsPeter Zijlstra1-3/+18
GCC can turn our static_call(name)(args...) into a tail call, in which case we get a JMP.d32 into the trampoline (which then does a further tail-call). Teach objtool to recognise and mark these in .static_call_sites and adjust the code patching to deal with this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Add static_call_cond()Peter Zijlstra1-10/+32
Extend the static_call infrastructure to optimize the following common pattern: if (func_ptr) func_ptr(args...) For the trampoline (which is in effect a tail-call), we patch the JMP.d32 into a RET, which then directly consumes the trampoline call. For the in-line sites we replace the CALL with a NOP5. NOTE: this is 'obviously' limited to functions with a 'void' return type. NOTE: DEFINE_STATIC_COND_CALL() only requires a typename, as opposed to a full function. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/alternatives: Teach text_poke_bp() to emulate RETPeter Zijlstra1-0/+5
Future patches will need to poke a RET instruction, provide the infrastructure required for this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Cc: Masami Hiramatsu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/static_call: Add inline static call implementation for x86-64Josh Poimboeuf2-0/+4
Add the inline static call implementation for x86-64. The generated code is identical to the out-of-line case, except we move the trampoline into it's own section. Objtool uses the trampoline naming convention to detect all the call sites. It then annotates those call sites in the .static_call_sites section. During boot (and module init), the call sites are patched to call directly into the destination function. The temporary trampoline is then no longer used. [peterz: merged trampolines, put trampoline in section] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01x86/static_call: Add out-of-line static call implementationJosh Poimboeuf2-0/+32
Add the x86 out-of-line static call implementation. For each key, a permanent trampoline is created which is the destination for all static calls for the given key. The trampoline has a direct jump which gets patched by static_call_update() when the destination function changes. [peterz: fixed trampoline, rewrote patching code] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01static_call: Avoid kprobes on inline static_call()sPeter Zijlstra1-1/+3
Similar to how we disallow kprobes on any other dynamic text (ftrace/jump_label) also disallow kprobes on inline static_call()s. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-01vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUGKees Cook1-0/+1
The .comment section doesn't belong in STABS_DEBUG. Split it out into a new macro named ELF_DETAILS. This will gain other non-debug sections that need to be accounted for when linking with --orphan-handling=warn. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2020-08-30Merge tag 'x86-urgent-2020-08-30' of ↵Linus Torvalds2-13/+29
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three interrupt related fixes for X86: - Move disabling of the local APIC after invoking fixup_irqs() to ensure that interrupts which are incoming are noted in the IRR and not ignored. - Unbreak affinity setting. The rework of the entry code reused the regular exception entry code for device interrupts. The vector number is pushed into the errorcode slot on the stack which is then lifted into an argument and set to -1 because that's regs->orig_ax which is used in quite some places to check whether the entry came from a syscall. But it was overlooked that orig_ax is used in the affinity cleanup code to validate whether the interrupt has arrived on the new target. It turned out that this vector check is pointless because interrupts are never moved from one vector to another on the same CPU. That check is a historical leftover from the time where x86 supported multi-CPU affinities, but not longer needed with the now strict single CPU affinity. Famous last words ... - Add a missing check for an empty cpumask into the matrix allocator. The affinity change added a warning to catch the case where an interrupt is moved on the same CPU to a different vector. This triggers because a condition with an empty cpumask returns an assignment from the allocator as the allocator uses for_each_cpu() without checking the cpumask for being empty. The historical inconsistent for_each_cpu() behaviour of ignoring the cpumask and unconditionally claiming that CPU0 is in the mask struck again. Sigh. plus a new entry into the MAINTAINER file for the HPE/UV platform" * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Deal with the sillyness of for_each_cpu() on UP x86/irq: Unbreak interrupt affinity setting x86/hotplug: Silence APIC only after all interrupts are migrated MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
2020-08-30Merge tag 'locking-urgent-2020-08-30' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of fixes for lockdep, tracing and RCU: - Prevent recursion by using raw_cpu_* operations - Fixup the interrupt state in the cpu idle code to be consistent - Push rcu_idle_enter/exit() invocations deeper into the idle path so that the lock operations are inside the RCU watching sections - Move trace_cpu_idle() into generic code so it's called before RCU goes idle. - Handle raw_local_irq* vs. local_irq* operations correctly - Move the tracepoints out from under the lockdep recursion handling which turned out to be fragile and inconsistent" * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep,trace: Expose tracepoints lockdep: Only trace IRQ edges mips: Implement arch_irqs_disabled() arm64: Implement arch_irqs_disabled() nds32: Implement arch_irqs_disabled() locking/lockdep: Cleanup x86/entry: Remove unused THUNKs cpuidle: Move trace_cpu_idle() into generic code cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic sched,idle,rcu: Push rcu_idle deeper into the idle path cpuidle: Fixup IRQ state lockdep: Use raw_cpu_*() for per-cpu variables
2020-08-27x86/mpparse: Remove duplicate io_apic.h includeWang Hai1-1/+0
Remove asm/io_apic.h which is included more than once. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wang Hai <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-27x86/irq: Unbreak interrupt affinity settingThomas Gleixner1-7/+9
Several people reported that 5.8 broke the interrupt affinity setting mechanism. The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument. The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1. The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry. But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made. Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1. That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens. There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required. Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU. Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away. The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed. For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds. Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: Alex bykov <[email protected]> Reported-by: Avi Kivity <[email protected]> Reported-by: Alexander Graf <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Alexander Graf <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2020-08-27x86/hotplug: Silence APIC only after all interrupts are migratedAshok Raj1-6/+20
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green <[email protected]> Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Mathias Nyman <[email protected]> Tested-by: Evan Green <[email protected]> Reviewed-by: Evan Green <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected]
2020-08-26x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()Tony Luck1-5/+4
A long time ago, Linux cleared IA32_MCG_STATUS at the very end of machine check processing. Then, some fancy recovery and IST manipulation was added in: d4812e169de4 ("x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks") and clearing IA32_MCG_STATUS was pulled earlier in the function. Next change moved the actual recovery out of do_machine_check() and just used task_work_add() to schedule it later (before returning to the user): 5567d11c21a1 ("x86/mce: Send #MC singal from task work") Most recently the fancy IST footwork was removed as no longer needed: b052df3da821 ("x86/entry: Get rid of ist_begin/end_non_atomic()") At this point there is no reason remaining to clear IA32_MCG_STATUS early. It can move back to the very end of the function. Also move sync_core(). The comments for this function say that it should only be called when instructions have been changed/re-mapped. Recovery for an instruction fetch may change the physical address. But that doesn't happen until the scheduled work runs (which could be on another CPU). [ bp: Massage commit message. ] Reported-by: Gabriele Paoloni <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-26x86/resctrl: Enable user to view thread or core throttling modeFenghua Yu3-7/+87
Early Intel hardware implementations of Memory Bandwidth Allocation (MBA) could only control bandwidth at the processor core level. This meant that when two processes with different bandwidth allocations ran simultaneously on the same core the hardware had to resolve this difference. It did so by applying the higher throttling value (lower bandwidth) to both processes. Newer implementations can apply different throttling values to each thread on a core. Introduce a new resctrl file, "thread_throttle_mode", on Intel systems that shows to the user how throttling values are allocated, per-core or per-thread. On systems that support per-core throttling, the file will display "max". On newer systems that support per-thread throttling, the file will display "per-thread". AMD confirmed in [1] that AMD bandwidth allocation is already at thread level but that the AMD implementation does not use a memory delay throttle mode. So to avoid confusion the thread throttling mode would be UNDEFINED on AMD systems and the "thread_throttle_mode" file will not be visible. Originally-by: Reinette Chatre <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: [1] https://lore.kernel.org/lkml/[email protected]
2020-08-26x86/resctrl: Enumerate per-thread MBA controlsFenghua Yu2-0/+2
Some systems support per-thread Memory Bandwidth Allocation (MBA) which applies a throttling delay value to each hardware thread instead of to a core. Per-thread MBA is enumerated by CPUID. No feature flag is shown in /proc/cpuinfo. User applications need to check a resctrl throttling mode info file to know if the feature is supported. Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-26cpuidle: Move trace_cpu_idle() into generic codePeter Zijlstra1-4/+0
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Tested-by: Marco Elver <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-24x86/fsgsbase: Replace static_cpu_has() with boot_cpu_has()Borislav Petkov1-4/+4
ptrace and prctl() are not really fast paths to warrant the use of static_cpu_has() and cause alternatives patching for no good reason. Replace with boot_cpu_has() which is simple and fast enough. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva14-18/+18
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <[email protected]>
2020-08-22x86/msr: Make source of unrecognised MSR writes unambiguousChris Down1-2/+2
In many cases, task_struct.comm isn't enough to distinguish the offender, since for interpreted languages it's likely just going to be "python3" or whatever. Add the pid to make it unambiguous. [ bp: Make the printk string a single line for easier grepping. ] Signed-off-by: Chris Down <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/6f6fbd0ee6c99bc5e47910db700a6642159db01b.1598011595.git.chris@chrisdown.name
2020-08-22x86/msr: Prevent userspace MSR access from dominating the consoleChris Down1-3/+15
Applications which manipulate MSRs from userspace often do so infrequently, and all at once. As such, the default printk ratelimit architecture supplied by pr_err_ratelimited() doesn't do enough to prevent kmsg becoming completely overwhelmed with their messages and pushing other salient information out of the circular buffer. In one case, I saw over 80% of kmsg being filled with these messages, and the default kmsg buffer being completely filled less than 5 minutes after boot(!). Make things much less aggressive, while still achieving the original goal of fiter_write(). Operators will still get warnings that MSRs are being manipulated from userspace, but they won't have other also potentially useful messages pushed out of the kmsg buffer. Of course, one can boot with `allow_writes=1` to avoid these messages at all, but that then has the downfall that one doesn't get _any_ notification at all about these problems in the first place, and so is much less likely to forget to fix it. One might rather it was less binary: it was still logged, just less often, so that application developers _do_ have the incentive to improve their current methods, without the kernel having to push other useful stuff out of the kmsg buffer. This one example isn't the point, of course: I'm sure there are plenty of other non-ideal-but-pragmatic cases where people are writing to MSRs from userspace right now, and it will take time for those people to find other solutions. Overall, keep the intent of the original patch, while mitigating its sometimes heavy effects on kmsg composition. [ bp: Massage a bit. ] Signed-off-by: Chris Down <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/563994ef132ce6cffd28fc659254ca37d032b5ef.1598011595.git.chris@chrisdown.name
2020-08-20x86/umip: Add emulation/spoofing for SLDT and STR instructionsBrendan Shanks1-13/+27
Add emulation/spoofing of SLDT and STR for both 32- and 64-bit processes. Wine users have found a small number of Windows apps using SLDT that were crashing when run on UMIP-enabled systems. Originally-by: Ricardo Neri <[email protected]> Reported-by: Andreas Rammhold <[email protected]> Signed-off-by: Brendan Shanks <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Reviewed-by: Ricardo Neri <[email protected]> Tested-by: Ricardo Neri <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-20x86: switch to kernel_clone()Christian Brauner1-1/+1
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers. Signed-off-by: Christian Brauner <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2020-08-20x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmapYazen Ghannam1-22/+22
The Extended Error Code Bitmap (xec_bitmap) for a Scalable MCA bank type was intended to be used by the kernel to filter out invalid error codes on a system. However, this is unnecessary after a few product releases because the hardware will only report valid error codes. Thus, there's no need for it with future systems. Remove the xec_bitmap field and all references to it. Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-19cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header fileJames Morse1-15/+2
resctrl/core.c defines get_cache_id() for use in its cpu-hotplug callbacks. This gets the id attribute of the cache at the corresponding level of a CPU. Later rework means this private function needs to be shared. Move it to the header file. The name conflicts with a different definition in intel_cacheinfo.c, name it get_cpu_cacheinfo_id() to show its relation with get_cpu_cacheinfo(). Now this is visible on other architectures, check the id attribute has actually been set. Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-19x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmapsJames Morse3-39/+22
Intel CPUs expect the cache bitmap provided by user-space to have on a single span of 1s, whereas AMD can support bitmaps like 0xf00f. Arm's MPAM support also allows sparse bitmaps. Similarly, Intel CPUs check at least one bit set, whereas AMD CPUs are quite happy with an empty bitmap. Arm's MPAM allows an empty bitmap. To move resctrl out to /fs/, platform differences like this need to be explained. Add two resource properties arch_has_{empty,sparse}_bitmaps. Test these around the relevant parts of cbm_validate(). Merging the validate calls causes AMD to gain the min_cbm_bits test needed for Haswell, but as it always sets this value to 1, it will never match. [ bp: Massage commit message. ] Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Babu Moger <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-08-19x86/resctrl: Merge AMD/Intel parse_bw() callsJames Morse3-61/+5
Now after arch_needs_linear has been added, the parse_bw() calls are almost the same between AMD and Intel. The difference is '!is_mba_sc()', which is not checked on AMD. This will always be true on AMD CPUs as mba_sc cannot be enabled as is_mba_linear() is false. Removing this duplication means user-space visible behaviour and error messages are not validated or generated in different places. Reviewed-by : Babu Moger <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]