aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-11-18x86/dumpstack: Do not try to access user space code of other tasksThomas Gleixner1-4/+19
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Tested-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/dumpstack: Make show_trace_log_lvl() staticHui Su1-1/+1
show_trace_log_lvl() is not used by other compilation units so make it static and remove the declaration from the header file. Signed-off-by: Hui Su <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/20201113133943.GA136221@rlk
2020-11-17x86/sgx: Add SGX page allocator functionsJarkko Sakkinen2-0/+68
Add functions for runtime allocation and free. This allocator and its algorithms are as simple as it gets. They do a linear search across all EPC sections and find the first free page. They are not NUMA-aware and only hand out individual pages. The SGX hardware does not support large pages, so something more complicated like a buddy allocator is unwarranted. The free function (sgx_free_epc_page()) implicitly calls ENCLS[EREMOVE], which returns the page to the uninitialized state. This ensures that the page is ready for use at the next allocation. Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Jethro Beekman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/cpu/intel: Add a nosgx kernel parameterJarkko Sakkinen1-0/+9
Add a kernel parameter to disable SGX kernel support and document it. [ bp: Massage. ] Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Acked-by: Jethro Beekman <[email protected]> Tested-by: Sean Christopherson <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/cpu/intel: Detect SGX supportSean Christopherson1-1/+28
Kernel support for SGX is ultimately decided by the state of the launch control bits in the feature control MSR (MSR_IA32_FEAT_CTL). If the hardware supports SGX, but neglects to support flexible launch control, the kernel will not enable SGX. Enable SGX at feature control MSR initialization and update the associated X86_FEATURE flags accordingly. Disable X86_FEATURE_SGX (and all derivatives) if the kernel is not able to establish itself as the authority over SGX Launch Control. All checks are performed for each logical CPU (not just boot CPU) in order to verify that MSR_IA32_FEATURE_CONTROL is correctly configured on all CPUs. All SGX code in this series expects the same configuration from all CPUs. This differs from VMX where X86_FEATURE_VMX is intentionally cleared only for the current CPU so that KVM can provide additional information if KVM fails to load like which CPU doesn't support VMX. There’s not much the kernel or an administrator can do to fix the situation, so SGX neglects to convey additional details about these kinds of failures if they occur. Signed-off-by: Sean Christopherson <[email protected]> Co-developed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Jethro Beekman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sectionsSean Christopherson4-0/+253
Although carved out of normal DRAM, enclave memory is marked in the system memory map as reserved and is not managed by the core mm. There may be several regions spread across the system. Each contiguous region is called an Enclave Page Cache (EPC) section. EPC sections are enumerated via CPUID Enclave pages can only be accessed when they are mapped as part of an enclave, by a hardware thread running inside the enclave. Parse CPUID data, create metadata for EPC pages and populate a simple EPC page allocator. Although much smaller, ‘struct sgx_epc_page’ metadata is the SGX analog of the core mm ‘struct page’. Similar to how the core mm’s page->flags encode zone and NUMA information, embed the EPC section index to the first eight bits of sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to EPC section. Existing client hardware supports only a single section, while upcoming server hardware will support at most eight sections. Thus, eight bits should be enough for long term needs. Signed-off-by: Sean Christopherson <[email protected]> Co-developed-by: Serge Ayoun <[email protected]> Signed-off-by: Serge Ayoun <[email protected]> Co-developed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Jethro Beekman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/sgx: Add wrappers for ENCLS functionsJarkko Sakkinen1-0/+231
ENCLS is the userspace instruction which wraps virtually all unprivileged SGX functionality for managing enclaves. It is essentially the ioctl() of instructions with each function implementing different SGX-related functionality. Add macros to wrap the ENCLS functionality. There are two main groups, one for functions which do not return error codes and a “ret_” set for those that do. ENCLS functions are documented in Intel SDM section 36.6. Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Jethro Beekman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/sgx: Add SGX architectural data structuresJarkko Sakkinen1-0/+338
Define the SGX architectural data structures used by various SGX functions. This is not an exhaustive representation of all SGX data structures but only those needed by the kernel. The goal is to sequester hardware structures in "sgx/arch.h" and keep them separate from kernel-internal or uapi structures. The data structures are described in Intel SDM section 37.6. Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Jethro Beekman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-17x86/microcode/intel: Check patch signature before saving microcode for early ↵Chen Yu1-53/+10
loading Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: Chen Yu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/[email protected]
2020-11-16Merge branch 'x86/entry' into core/entryThomas Gleixner2-16/+14
Prepare for the merging of the syscall_work series which conflicts with the TIF bits overhaul in X86.
2020-11-16x86/kvm: remove unused macro HV_CLOCK_SIZEAlex Shi1-1/+0
This macro is useless, and could cause gcc warning: arch/x86/kernel/kvmclock.c:47:0: warning: macro "HV_CLOCK_SIZE" is not used [-Wunused-macros] Let's remove it. Signed-off-by: Alex Shi <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: [email protected] Cc: [email protected] Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-11-16x86/msr: Do not allow writes to MSR_IA32_ENERGY_PERF_BIASBorislav Petkov1-3/+0
Now that all in-kernel-tree users are converted to using the sysfs file, remove the MSR from the "allowlist". Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-16x86/mce: Use "safe" MSR functions when enabling additional error loggingTony Luck1-2/+3
Booting as a guest under KVM results in error messages about unchecked MSR access: unchecked MSR access error: RDMSR from 0x17f at rIP: 0xffffffff84483f16 (mce_intel_feature_init+0x156/0x270) because KVM doesn't provide emulation for random model specific registers. Switch to using rdmsrl_safe()/wrmsrl_safe() to avoid the message. Fixes: 68299a42f842 ("x86/mce: Enable additional error logging on certain Intel CPUs") Reported-by: Qian Cai <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-15Merge tag 'x86-urgent-2020-11-15' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Cure the fallout from the MSI irqdomain overhaul which missed that the Intel IOMMU does not register virtual function devices and therefore never reaches the point where the MSI interrupt domain is assigned. This made the VF devices use the non-remapped MSI domain which is trapped by the IOMMU/remap unit - Remove an extra space in the SGI_UV architecture type procfs output for UV5 - Remove a unused function which was missed when removing the UV BAU TLB shootdown handler" * tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iommu/vt-d: Cure VF irqdomain hickup x86/platform/uv: Fix copied UV5 output archtype x86/platform/uv: Drop last traces of uv_flush_tlb_others
2020-11-15Merge tag 'perf-urgent-2020-11-15' of ↵Linus Torvalds1-4/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A set of fixes for perf: - A set of commits which reduce the stack usage of various perf event handling functions which allocated large data structs on stack causing stack overflows in the worst case - Use the proper mechanism for detecting soft interrupts in the recursion protection - Make the resursion protection simpler and more robust - Simplify the scheduling of event groups to make the code more robust and prepare for fixing the issues vs. scheduling of exclusive event groups - Prevent event multiplexing and rotation for exclusive event groups - Correct the perf event attribute exclusive semantics to take pinned events, e.g. the PMU watchdog, into account - Make the anythread filtering conditional for Intel's generic PMU counters as it is not longer guaranteed to be supported on newer CPUs. Check the corresponding CPUID leaf to make sure - Fixup a duplicate initialization in an array which was probably caused by the usual 'copy & paste - forgot to edit' mishap" * tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix Add BW copypasta perf/x86/intel: Make anythread filter support conditional perf: Tweak perf_event_attr::exclusive semantics perf: Fix event multiplexing for exclusive groups perf: Simplify group_sched_in() perf: Simplify group_sched_out() perf/x86: Make dummy_iregs static perf/arch: Remove perf_sample_data::regs_user_copy perf: Optimize get_recursion_context() perf: Fix get_recursion_context() perf/x86: Reduce stack usage for x86_pmu::drain_pebs() perf: Reduce stack usage of perf_output_begin()
2020-11-13livepatch: Use the default ftrace_ops instead of REGS when ARGS is availableSteven Rostedt (VMware)1-0/+4
When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, the ftrace call will be able to set the ip of the calling function. This will improve the performance of live kernel patching where it does not need all the regs to be stored just to change the instruction pointer. If all archs that support live kernel patching also support HAVE_DYNAMIC_FTRACE_WITH_ARGS, then the architecture specific function klp_arch_set_pc() could be made generic. It is possible that an arch can support HAVE_DYNAMIC_FTRACE_WITH_ARGS but not HAVE_DYNAMIC_FTRACE_WITH_REGS and then have access to live patching. Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: [email protected] Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Miroslav Benes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13ftrace/x86: Allow for arguments to be passed in to ftrace_regs by defaultSteven Rostedt (VMware)1-2/+9
Currently, the only way to get access to the registers of a function via a ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this saves all regs as if a breakpoint were to trigger (for use with kprobes), it is expensive. The regs are already saved on the stack for the default ftrace callbacks, as that is required otherwise a function being traced will get the wrong arguments and possibly crash. And on x86, the arguments are already stored where they would be on a pt_regs structure to use that code for both the regs version of a callback, it makes sense to pass that information always to all functions. If an architecture does this (as x86_64 now does), it is to set HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it could have access to arguments without having to set the flags. This also includes having the stack pointer being saved, which could be used for accessing arguments on the stack, as well as having the function graph tracer not require its own trampoline! Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regsSteven Rostedt (VMware)1-1/+2
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default callback prototype to receive a struct ftrace_regs as the forth parameter instead of a pt_regs. For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they will now need to get the pt_regs via a ftrace_get_regs() helper call. If this is called by a callback that their ftrace_ops did not have a FL_SAVE_REGS flag set, it that helper function will return NULL. This will allow the ftrace_regs to hold enough just to get the parameters and stack pointer, but without the worry that callbacks may have a pt_regs that is not completely filled. Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13x86/platform/uv: Fix copied UV5 output archtypeMike Travis1-3/+3
A test shows that the output contains a space: # cat /proc/sgi_uv/archtype NSGI4 U/UVX Remove that embedded space by copying the "trimmed" buffer instead of the untrimmed input character list. Use sizeof to remove size dependency on copy out length. Increase output buffer size by one character just in case BIOS sends an 8 character string for archtype. Fixes: 1e61f5a95f19 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steve Wahl <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-10x86/ioapic: Correct the PCI/ISA trigger type selectionThomas Gleixner1-2/+2
PCI's default trigger type is level and ISA's is edge. The recent refactoring made it the other way round, which went unnoticed as it seems only to cause havoc on some AMD systems. Make the comment and code do the right thing again. Fixes: a27dca645d2c ("x86/io_apic: Cleanup trigger/polarity helpers") Reported-by: Tom Lendacky <[email protected]> Reported-by: Borislav Petkov <[email protected]> Reported-by: Qian Cai <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Tom Lendacky <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-09perf/arch: Remove perf_sample_data::regs_user_copyPeter Zijlstra1-4/+11
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Steven Rostedt <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-08Merge tag 'x86-urgent-2020-11-08' of ↵Linus Torvalds2-23/+51
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs integrated assembler upset - Correct the mitigation selection logic which prevented the related prctl to work correctly - Make the UV5 hubless system work correctly by fixing up the malformed table entries and adding the missing ones" * tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Recognize UV5 hubless system identifier x86/platform/uv: Remove spaces from OEM IDs x86/platform/uv: Fix missing OEM_TABLE_ID x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
2020-11-07x86/platform/uv: Recognize UV5 hubless system identifierMike Travis1-3/+10
Testing shows a problem in that UV5 hubless systems were not being recognized. Add them to the list of OEM IDs checked. Fixes: 6c7794423a998 ("Add UV5 direct references") Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-07x86/platform/uv: Remove spaces from OEM IDsMike Travis1-0/+3
Testing shows that trailing spaces caused problems with the OEM_ID and the OEM_TABLE_ID. One being that the OEM_ID would not string compare correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated in the printout. Remove any trailing spaces. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-07x86/platform/uv: Fix missing OEM_TABLE_IDMike Travis1-2/+5
Testing shows a problem in that the OEM_TABLE_ID was missing for hubless systems. This is used to determine the APIC type (legacy or extended). Add the OEM_TABLE_ID to the early hubless processing. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-06x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUsPaul E. McKenney1-0/+6
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to learn their clock frequency. Which is a bit strange, given that waking them from idle likely significantly changes their clock frequency. This commit therefore avoids sending /proc/cpuinfo-induced IPIs to idle CPUs. [ paulmck: Also check for idle in arch_freq_prepare_all(). ] Signed-off-by: Paul E. McKenney <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]>
2020-11-06x86/cpu: Avoid cpuinfo-induced IPI pileupsPaul E. McKenney1-1/+9
The aperfmperf_snapshot_cpu() function is invoked upon access to /proc/cpuinfo, and it does do an early exit if the specified CPU has recently done a snapshot. Unfortunately, the indication that a snapshot has been completed is set in an IPI handler, and the execution of this handler can be delayed by any number of unfortunate events. This means that a system that starts a number of applications, each of which parses /proc/cpuinfo, can suffer from an smp_call_function_single() storm, especially given that each access to /proc/cpuinfo invokes smp_call_function_single() for all CPUs. Please note that this is not theoretical speculation. Note also that one CPU's pending IPI serves all requests, so there is no point in ever having more than one IPI pending to a given CPU. This commit therefore suppresses duplicate IPIs to a given CPU via a new ->scfpending field in the aperfmperf_sample structure. This field is set to the value one if an IPI is pending to the corresponding CPU and to zero otherwise. The aperfmperf_snapshot_cpu() function uses atomic_xchg() to set this field to the value one and sample the old value. If this function's "wait" parameter is zero, smp_call_function_single() is called only if the old value of the ->scfpending field was zero. The IPI handler uses atomic_set_release() to set this new field to zero just before returning, so that the prior stores into the aperfmperf_sample structure are seen by future requests that get to the atomic_xchg(). Future requests that pass the elapsed-time check are ordered by the fact that on x86 loads act as acquire loads, just as was the case prior to this change. The return value is based off of the age of the prior snapshot, just as before. Reported-by: Dave Jones <[email protected]> [ paulmck: Allow /proc/cpuinfo to take advantage of arch_freq_get_on_cpu(). ] [ paulmck: Add comment on memory barrier. ] Signed-off-by: Paul E. McKenney <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]>
2020-11-06x86/mce: Correct the detection of invalid notifier prioritiesZhen Lei1-1/+2
Commit c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") changed the enumeration of MCE notifier priorities. Correct the check for notifier priorities to cover the new range. [ bp: Rewrite commit message, remove superfluous brackets in conditional. ] Fixes: c9c6d216ed28 ("x86/mce: Rename "first" function as "early"") Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)1-1/+1
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Guo Ren <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Kees Cook <[email protected]> Cc: Anton Vorontsov <[email protected]> Cc: Colin Cross <[email protected]> Cc: Tony Luck <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Joe Lawrence <[email protected]> Cc: Kamalesh Babulal <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06kprobes/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)1-2/+10
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Guo Ren <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06x86/mce: Assign boolean values to a bool variableKaixu Xia1-2/+2
Fix the following coccinelle warnings: ./arch/x86/kernel/cpu/mce/core.c:1765:3-20: WARNING: Assignment of 0/1 to bool variable ./arch/x86/kernel/cpu/mce/core.c:1584:2-9: WARNING: Assignment of 0/1 to bool variable [ bp: Massage commit message. ] Reported-by: Tosk Robot <[email protected]> Signed-off-by: Kaixu Xia <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-11-06ima: generalize x86/EFI arch glue for other EFI architecturesChester Lin2-96/+0
Move the x86 IMA arch code into security/integrity/ima/ima_efi.c, so that we will be able to wire it up for arm64 in a future patch. Co-developed-by: Chester Lin <[email protected]> Signed-off-by: Chester Lin <[email protected]> Acked-by: Mimi Zohar <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2020-11-05x86/speculation: Allow IBPB to be conditionally enabled on CPUs with ↵Anand K Mistry1-18/+33
always-on STIBP On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
2020-11-04x86/entry: Move nmi entry/exit into common codeThomas Gleixner3-12/+13
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-04Merge branch 'core/urgent' into core/entryThomas Gleixner77-1408/+4007
Pick up the entry fix before further modifications.
2020-11-04x86/ioapic: Use I/O-APIC ID for finding irqdomain, not indexDavid Woodhouse1-2/+2
In commit b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") the I/O-APIC code was changed to find its parent irqdomain using irq_find_matching_fwspec(), but the key used for the lookup was wrong. It shouldn't use 'ioapic' which is the index into its own ioapics[] array. It should use the actual arbitration ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic). Fixes: b643128b917 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain") Reported-by: lkp <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-04x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports itDexuan Cui1-0/+29
When a Linux VM runs on Hyper-V, if the VM has CPUs with >255 APIC IDs, the CPUs can't be the destination of IOAPIC interrupts, because the IOAPIC RTE's Dest Field has only 8 bits. Currently the hackery driver drivers/iommu/hyperv-iommu.c is used to ensure IOAPIC interrupts are only routed to CPUs that don't have >255 APIC IDs. However, there is an issue with kdump, because the kdump kernel can run on any CPU, and hence IOAPIC interrupts can't work if the kdump kernel run on a CPU with a >255 APIC ID. The kdump issue can be fixed by the Extended Dest ID, which is introduced recently by David Woodhouse (for IOAPIC, see the field virt_destid_8_14 in struct IO_APIC_route_entry). Of course, the Extended Dest ID needs the support of the underlying hypervisor. The latest Hyper-V has added the support recently: with this commit, on such a Hyper-V host, Linux VM does not use hyperv-iommu.c because hyperv_prepare_irq_remapping() returns -ENODEV; instead, Linux kernel's generic support of Extended Dest ID from David is used, meaning that Linux VM is able to support up to 32K CPUs, and IOAPIC interrupts can be routed to all the CPUs. On an old Hyper-V host that doesn't support the Extended Dest ID, nothing changes with this commit: Linux VM is still able to bring up the CPUs with > 255 APIC IDs with the help of hyperv-iommu.c, but IOAPIC interrupts still can not go to such CPUs, and the kdump kernel still can not work properly on such CPUs. [ tglx: Updated comment as suggested by David ] Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-03Merge tag 'x86_seves_for_v5.10_rc3' of ↵Linus Torvalds4-7/+144
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES fixes from Borislav Petkov: "A couple of changes to the SEV-ES code to perform more stringent hypervisor checks before enabling encryption (Joerg Roedel)" * tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Do not support MMIO to/from encrypted memory x86/head/64: Check SEV encryption before switching to kernel page-table x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler x86/boot/compressed/64: Introduce sev_status
2020-11-02x86/mtrr: Fix a kernel-doc markupMauro Carvalho Chehab1-1/+2
Kernel-doc markup should use this format: identifier - description Fix it. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/2217cd4ae9e561da2825485eb97de77c65741489.1603469755.git.mchehab+huawei@kernel.org
2020-11-02x86/mce: Enable additional error logging on certain Intel CPUsTony Luck1-0/+20
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an optional additional error logging mode which is enabled by an MSR. Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu, but userspace should not be poking at MSRs. So move the enabling into the kernel. [ bp: Correct the explanation why this is done. ] Suggested-by: Boris Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/build: Fix vmlinux size check on 64-bitArvind Sankar2-20/+12
Commit b4e0409a36f4 ("x86: check vmlinux limits, 64-bit") added a check that the size of the 64-bit kernel is less than KERNEL_IMAGE_SIZE. The check uses (_end - _text), but this is not enough. The initial PMD used in startup_64() (level2_kernel_pgt) can only map upto KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE. The correct check is what is currently done for 32-bit, since LOAD_OFFSET is defined appropriately for the two architectures. Just check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally. Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not really used by the main kernel. The higher the kernel is located, the less the space available for the vmalloc area. However, it is used by KASLR in the compressed stub to limit the maximum address of the kernel to a safe value. Clean up various comments to clarify that despite the name, KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a limit on the maximum virtual address that the image can occupy. Signed-off-by: Arvind Sankar <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/sev-es: Do not support MMIO to/from encrypted memoryJoerg Roedel1-7/+13
MMIO memory is usually not mapped encrypted, so there is no reason to support emulated MMIO when it is mapped encrypted. Prevent a possible hypervisor attack where a RAM page is mapped as an MMIO page in the nested page-table, so that any guest access to it will trigger a #VC exception and leak the data on that page to the hypervisor via the GHCB (like with valid MMIO). On the read side this attack would allow the HV to inject data into the guest. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/head/64: Check SEV encryption before switching to kernel page-tableJoerg Roedel1-0/+16
When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Check SEV encryption in 64-bit boot-pathJoerg Roedel1-0/+89
Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. The check function is in a separate file: arch/x86/kernel/sev_verify_cbit.S so that it can be re-used in the running kernel image. [ bp: Massage. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handlerJoerg Roedel1-0/+26
The early #VC handler which doesn't have a GHCB can only handle CPUID exit codes. It is needed by the early boot code to handle #VC exceptions raised in verify_cpu() and to get the position of the C-bit. But the CPUID information comes from the hypervisor which is untrusted and might return results which trick the guest into the no-SEV boot path with no C-bit set in the page-tables. All data written to memory would then be unencrypted and could leak sensitive data to the hypervisor. Add sanity checks to the early #VC handler to make sure the hypervisor can not pretend that SEV is disabled. [ bp: Massage a bit. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-10-29entry: Add support for TIF_NOTIFY_SIGNALJens Axboe1-2/+2
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set, will return true if signal_pending() is used in a wait loop. That causes an exit of the loop so that notify_signal tracehooks can be run. If the wait loop is currently inside a system call, the system call is restarted once task_work has been processed. In preparation for only having arch_do_signal() handle syscall restarts if _TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart(). Pass in a boolean that tells the architecture specific signal handler if it should attempt to get a signal, or just process a potential syscall restart. For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to get_signal(). This is done to minimize the needed architecture changes to support this feature. Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detectedDavid Woodhouse1-0/+6
This allows the host to indicate that MSI emulation supports 15-bit destination IDs, allowing up to 32768 CPUs without interrupt remapping. cf. https://patchwork.kernel.org/patch/11816693/ for qemu Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/apic: Support 15 bits of APIC ID in MSI where availableDavid Woodhouse2-2/+25
Some hypervisors can allow the guest to use the Extended Destination ID field in the MSI address to address up to 32768 CPUs. This applies to all downstream devices which generate MSI cycles, including HPET, I/O-APIC and PCI MSI. HPET and PCI MSI use the same __irq_msi_compose_msg() function, while I/O-APIC generates its own and had support for the extended bits added in a previous commit. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Handle Extended Destination ID field in RTEDavid Woodhouse1-5/+15
Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4 of the address used in the resulting MSI cycle. Historically, the x86 MSI format only used the top 8 of those 16 bits as the destination APIC ID, and the "Extended Destination ID" in the lower 8 bits was unused. With interrupt remapping, the lowest bit of the Extended Destination ID (bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable format MSI. A hypervisor can use the other 7 bits of the Extended Destination ID to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR I/OAPIC domain. [ tglx: Converted it to the cleaned up entry/msi_msg format and added commentry ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-28x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomainDavid Woodhouse1-12/+13
All possible parent domains have a select method now. Make use of it. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]