aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2020-09-11x86/mce: Make mce_rdmsrl() panic on an inaccessible MSRBorislav Petkov2-12/+70
If an exception needs to be handled while reading an MSR - which is in most of the cases caused by a #GP on a non-existent MSR - then this is most likely the incarnation of a BIOS or a hardware bug. Such bug violates the architectural guarantee that MCA banks are present with all MSRs belonging to them. The proper fix belongs in the hardware/firmware - not in the kernel. Handling an #MC exception which is raised while an NMI is being handled would cause the nasty NMI nesting issue because of the shortcoming of IRET of reenabling NMIs when executed. And the machine is in an #MC context already so <Deity> be at its side. Tracing MSR accesses while in #MC is another no-no due to tracing being inherently a bad idea in atomic context: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section so remove all that "additional" functionality from mce_rdmsrl() and provide it with a special exception handler which panics the machine when that MSR is not accessible. The exception handler prints a human-readable message explaining what the panic reason is but, what is more, it panics while in the #GP handler and latter won't have executed an IRET, thus opening the NMI nesting issue in the case when the #MC has happened while handling an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been cleared). Suggested-by: Andy Lutomirski <[email protected]> Suggested-by: Peter Zijlstra <[email protected]> [ Add missing prototypes for ex_handler_* ] Reported-by: kernel test robot <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-11crypto: poly1305-x86_64 - Use XORL r32,32Uros Bizjak1-4/+4
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "David S. Miller" <[email protected]> Acked-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-09-11crypto: curve25519-x86_64 - Use XORL r32,32Uros Bizjak1-34/+34
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORL r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "David S. Miller" <[email protected]> Acked-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-09-10x86/sev-es: Check required CPU features for SEV-ESMartin Radev5-6/+24
Make sure the machine supports RDRAND, otherwise there is no trusted source of randomness in the system. To also check this in the pre-decompression stage, make has_cpuflag() not depend on CONFIG_RANDOMIZE_BASE anymore. Signed-off-by: Martin Radev <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10x86/efi: Add GHCB mappings when SEV-ES is activeTom Lendacky4-0/+43
Calling down to EFI runtime services can result in the firmware performing VMGEXIT calls. The firmware is likely to use the GHCB of the OS (e.g., for setting EFI variables), so each GHCB in the system needs to be identity-mapped in the EFI page tables, as unencrypted, to avoid page faults. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Moved GHCB mapping loop to sev-es.c ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10x86/fpu: Allow multiple bits in clearcpuid= parameterArvind Sankar1-8/+22
Commit 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") changed clearcpuid parsing from __setup() to cmdline_find_option(). While the __setup() function would have been called for each clearcpuid= parameter on the command line, cmdline_find_option() will only return the last one, so the change effectively made it impossible to disable more than one bit. Allow a comma-separated list of bit numbers as the argument for clearcpuid to allow multiple bits to be disabled again. Log the bits being disabled for informational purposes. Also fix the check on the return value of cmdline_find_option(). It returns -1 when the option is not found, so testing as a boolean is incorrect. Fixes: 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") Signed-off-by: Arvind Sankar <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10objtool: Make unwind hint definitions available to other architecturesJulien Thierry3-84/+17
Unwind hints are useful to provide objtool with information about stack states in non-standard functions/code. While the type of information being provided might be very arch specific, the mechanism to provide the information can be useful for other architectures. Move the relevant unwint hint definitions for all architectures to see. [ jpoimboe: REGS_IRET -> REGS_PARTIAL ] Signed-off-by: Julien Thierry <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]>
2020-09-10objtool: Rename frame.h -> objtool.hJulien Thierry8-8/+8
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]>
2020-09-10swiotlb: Declare swiotlb_late_init_with_default_size() in headerAndy Shevchenko1-1/+0
Compiler is not happy about one function prototype: CC kernel/dma/swiotlb.o kernel/dma/swiotlb.c:275:1: warning: no previous prototype for ‘swiotlb_late_init_with_default_size’ [-Wmissing-prototypes] 275 | swiotlb_late_init_with_default_size(size_t default_size) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since it's used outside of the module, move its declaration to the header from the user. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2020-09-10arch/x86/amd/ibs: Fix re-arming IBS FetchKim Phillips1-1/+14
Stephane Eranian found a bug in that IBS' current Fetch counter was not being reset when the driver would write the new value to clear it along with the enable bit set, and found that adding an MSR write that would first disable IBS Fetch would make IBS Fetch reset its current count. Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12, 2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when IbsFetchEn is changed from 0 to 1." Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch, so the driver properly resets the internal counter to 0 and IBS Fetch starts counting again. A family 15h machine tested does not have this problem, and the extra wrmsr is also not needed on Family 19h, so only do the extra wrmsr on families 16h through 18h. Reported-by: Stephane Eranian <[email protected]> Signed-off-by: Kim Phillips <[email protected]> [peterz: optimized] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/x86/rapl: Add AMD Fam19h RAPL supportKim Phillips1-0/+1
Family 19h RAPL support did not change from Family 17h; extend the existing Fam17h support to work on Family 19h too. Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10perf/x86/amd/ibs: Support 27-bit extended Op/cycle counterKim Phillips2-11/+32
IBS hardware with the OpCntExt feature gets a 7-bit wider internal counter. Both the maximum and current count bitfields in the IBS_OP_CTL register are extended to support reading and writing it. No changes are necessary to the driver for handling the extra contiguous current count bits (IbsOpCurCnt), as the driver already passes through 32 bits of that field. However, the driver has to do some extra bit manipulation when converting from a period to the non-contiguous (although conveniently aligned) extra bits in the IbsOpMaxCnt bitfield. This decreases IBS Op interrupt overhead when the period is over 1,048,560 (0xffff0), which would previously activate the driver's software counter. That threshold is now 134,217,712 (0x7fffff0). Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10perf/x86/amd/ibs: Fix raw sample data accumulationKim Phillips2-10/+17
Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode. Don't accumulate them into raw sample user data in that case. Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR. Technically, there is an ABI change here with respect to the IBS raw sample data format, but I don't see any perf driver version information being included in perf.data file headers, but, existing users can detect whether the size of the sample record has reduced by 8 bytes to determine whether the IBS driver has this fix. Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions") Reported-by: Stephane Eranian <[email protected]> Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-09-10perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()Kim Phillips1-4/+8
get_ibs_op_count() adds hardware's current count (IbsOpCurCnt) bits to its count regardless of hardware's valid status. According to the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54, if the counter rolls over, valid status is set, and the lower 7 bits of IbsOpCurCnt are randomized by hardware. Don't include those bits in the driver's event count. Fixes: 8b1e13638d46 ("perf/x86-ibs: Fix usage of IBS op current count") Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/x86/amd: Fix sampling Large Increment per Cycle eventsKim Phillips1-2/+2
Commit 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") mistakenly zeroes the upper 16 bits of the count in set_period(). That's fine for counting with perf stat, but not sampling with perf record when only Large Increment events are being sampled. To enable sampling, we sign extend the upper 16 bits of the merged counter pair as described in the Family 17h PPRs: "Software wanting to preload a value to a merged counter pair writes the high-order 16-bit value to the low-order 16 bits of the odd counter and then writes the low-order 48-bit value to the even counter. Reading the even counter of the merged counter pair returns the full 64-bit value." Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviourKim Phillips1-20/+8
Commit 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs") inadvertently changed the uncore driver's behaviour wrt perf tool invocations with or without a CPU list, specified with -C / --cpu=. Change the behaviour of the driver to assume the former all-cpu (-a) case, which is the more commonly desired default. This fixes '-a -A' invocations without explicit cpu lists (-C) to not count L3 events only on behalf of the first thread of the first core in the L3 domain. BEFORE: Activity performed by the first thread of the last core (CPU#43) in CPU#40's L3 domain is not reported by CPU#40: sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default ... CPU36 21,835 l3_request_g1.caching_l3_cache_accesses CPU40 87,066 l3_request_g1.caching_l3_cache_accesses CPU44 17,360 l3_request_g1.caching_l3_cache_accesses ... AFTER: The L3 domain activity is now reported by CPU#40: sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default ... CPU36 354,891 l3_request_g1.caching_l3_cache_accesses CPU40 1,780,870 l3_request_g1.caching_l3_cache_accesses CPU44 315,062 l3_request_g1.caching_l3_cache_accesses ... Fixes: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs") Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-09-10perf/x86/intel/ds: Fix x86_pmu_stop warning for large PEBSKan Liang1-12/+20
A warning as below may be triggered when sampling with large PEBS. [ 410.411250] perf: interrupt took too long (72145 > 71975), lowering kernel.perf_event_max_sample_rate to 2000 [ 410.724923] ------------[ cut here ]------------ [ 410.729822] WARNING: CPU: 0 PID: 16397 at arch/x86/events/core.c:1422 x86_pmu_stop+0x95/0xa0 [ 410.933811] x86_pmu_del+0x50/0x150 [ 410.937304] event_sched_out.isra.0+0xbc/0x210 [ 410.941751] group_sched_out.part.0+0x53/0xd0 [ 410.946111] ctx_sched_out+0x193/0x270 [ 410.949862] __perf_event_task_sched_out+0x32c/0x890 [ 410.954827] ? set_next_entity+0x98/0x2d0 [ 410.958841] __schedule+0x592/0x9c0 [ 410.962332] schedule+0x5f/0xd0 [ 410.965477] exit_to_usermode_loop+0x73/0x120 [ 410.969837] prepare_exit_to_usermode+0xcd/0xf0 [ 410.974369] ret_from_intr+0x2a/0x3a [ 410.977946] RIP: 0033:0x40123c [ 411.079661] ---[ end trace bc83adaea7bb664a ]--- In the non-overflow context, e.g., context switch, with large PEBS, perf may stop an event twice. An example is below. //max_samples_per_tick is adjusted to 2 //NMI is triggered intel_pmu_handle_irq() handle_pmi_common() drain_pebs() __intel_pmu_pebs_event() perf_event_overflow() __perf_event_account_interrupt() hwc->interrupts = 1 return 0 //A context switch happens right after the NMI. //In the same tick, the perf_throttled_seq is not changed. perf_event_task_sched_out() perf_pmu_sched_task() intel_pmu_drain_pebs_buffer() __intel_pmu_pebs_event() perf_event_overflow() __perf_event_account_interrupt() ++hwc->interrupts >= max_samples_per_tick return 1 x86_pmu_stop(); # First stop perf_event_context_sched_out() task_ctx_sched_out() ctx_sched_out() event_sched_out() x86_pmu_del() x86_pmu_stop(); # Second stop and trigger the warning Perf should only invoke the perf_event_overflow() in the overflow context. Current drain_pebs() is called from: - handle_pmi_common() -- overflow context - intel_pmu_pebs_sched_task() -- non-overflow context - intel_pmu_pebs_disable() -- non-overflow context - intel_pmu_auto_reload_read() -- possible overflow context With PERF_SAMPLE_READ + PERF_FORMAT_GROUP, the function may be invoked in the NMI handler. But, before calling the function, the PEBS buffer has already been drained. The __intel_pmu_pebs_event() will not be called in the possible overflow context. To fix the issue, an indicator is required to distinguish between the overflow context aka handle_pmi_common() and other cases. The dummy regs pointer can be used as the indicator. In the non-overflow context, perf should treat the last record the same as other PEBS records, and doesn't invoke the generic overflow handler. Fixes: 21509084f999 ("perf/x86/intel: Handle multiple records in the PEBS buffer") Reported-by: Like Xu <[email protected]> Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Like Xu <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-10x86/tsc: Use seqcount_latch_tAhmed S. Darwish1-5/+5
Latch sequence counters have unique read and write APIs, and thus seqcount_latch_t was recently introduced at seqlock.h. Use that new data type instead of plain seqcount_t. This adds the necessary type-safety and ensures that only latching-safe seqcount APIs are to be used. Signed-off-by: Ahmed S. Darwish <[email protected]> [peterz: unwreck cyc2ns_read_begin()] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle NMI StateJoerg Roedel4-0/+32
When running under SEV-ES, the kernel has to tell the hypervisor when to open the NMI window again after an NMI was injected. This is done with an NMI-complete message to the hypervisor. Add code to the kernel's NMI handler to send this message right at the beginning of do_nmi(). This always allows nesting NMIs. [ bp: Mark __sev_es_nmi_complete() noinstr: vmlinux.o: warning: objtool: exc_nmi()+0x17: call to __sev_es_nmi_complete() leaves .noinstr.text section While at it, use __pa_nodebug() for the same reason due to CONFIG_DEBUG_VIRTUAL=y: vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0xd9: call to __phys_addr() leaves .noinstr.text section ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Support CPU offline/onlineJoerg Roedel2-0/+64
Add a play_dead handler when running under SEV-ES. This is needed because the hypervisor can't deliver an SIPI request to restart the AP. Instead, the kernel has to issue a VMGEXIT to halt the VCPU until the hypervisor wakes it up again. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/head/64: Don't call verify_cpu() on starting APsJoerg Roedel3-0/+19
The APs are not ready to handle exceptions when verify_cpu() is called in secondary_startup_64(). Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/smpboot: Load TSS and getcpu GDT entry before loading IDTJoerg Roedel3-1/+25
The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST stacks. To make these vectors work, the TSS and the getcpu GDT entry need to be set up before the IDT is loaded. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/realmode: Setup AP jump tableTom Lendacky4-2/+93
As part of the GHCB specification, the booting of APs under SEV-ES requires an AP jump table when transitioning from one layer of code to another (e.g. when going from UEFI to the OS). As a result, each layer that parks an AP must provide the physical address of an AP jump table to the next layer via the hypervisor. Upon booting of the kernel, read the AP jump table address from the hypervisor. Under SEV-ES, APs are started using the INIT-SIPI-SIPI sequence. Before issuing the first SIPI request for an AP, the start CS and IP is programmed into the AP jump table. Upon issuing the SIPI request, the AP will awaken and jump to that start CS:IP address. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: - Adapted to different code base - Moved AP table setup from SIPI sending path to real-mode setup code - Fix sparse warnings ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/realmode: Add SEV-ES specific trampoline entry pointJoerg Roedel3-0/+26
The code at the trampoline entry point is executed in real-mode. In real-mode, #VC exceptions can't be handled so anything that might cause such an exception must be avoided. In the standard trampoline entry code this is the WBINVD instruction and the call to verify_cpu(), which are both not needed anyway when running as an SEV-ES guest. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ESDoug Covelli1-5/+45
Add VMware-specific handling for #VC faults caused by VMMCALL instructions. Signed-off-by: Doug Covelli <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: - Adapt to different paravirt interface ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/kvm: Add KVM-specific VMMCALL handling under SEV-ESTom Lendacky1-6/+29
Implement the callbacks to copy the processor state required by KVM to the GHCB. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: - Split out of a larger patch - Adapt to different callback functions ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ESJoerg Roedel2-1/+27
Add two new paravirt callbacks to provide hypervisor-specific processor state in the GHCB and to copy state from the hypervisor back to the processor. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle #DB EventsJoerg Roedel1-0/+17
Handle #VC exceptions caused by #DB exceptions in the guest. Those must be handled outside of instrumentation_begin()/end() so that the handler will not be raised recursively. Handle them by calling the kernel's debug exception handler. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle #AC EventsJoerg Roedel1-0/+19
Implement a handler for #VC exceptions caused by #AC exceptions. The #AC exception is just forwarded to do_alignment_check() and not pushed down to the hypervisor, as requested by the SEV-ES GHCB Standardization Specification. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle VMMCALL EventsTom Lendacky1-0/+23
Implement a handler for #VC exceptions caused by VMMCALL instructions. This is only a starting point, VMMCALL emulation under SEV-ES needs further hypervisor-specific changes to provide additional state. [ bp: Drop "this patch". ] Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle MWAIT/MWAITX EventsTom Lendacky1-0/+10
Implement a handler for #VC exceptions caused by MWAIT and MWAITX instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle MONITOR/MONITORX EventsTom Lendacky1-0/+13
Implement a handler for #VC exceptions caused by MONITOR and MONITORX instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle INVD EventsTom Lendacky1-0/+4
Implement a handler for #VC exceptions caused by INVD instructions. Since Linux should never use INVD, just mark it as unsupported. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle RDPMC EventsTom Lendacky1-0/+22
Implement a handler for #VC exceptions caused by RDPMC instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle RDTSC(P) EventsTom Lendacky3-0/+31
Implement a handler for #VC exceptions caused by RDTSC and RDTSCP instructions. Also make it available in the pre-decompression stage because the KASLR code uses RDTSC/RDTSCP to gather entropy and some hypervisors intercept these instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: - Adapt to #VC handling infrastructure - Make it available early ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle WBINVD EventsTom Lendacky1-0/+9
Implement a handler for #VC exceptions caused by WBINVD instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling framework ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle DR7 read/write eventsTom Lendacky1-0/+85
Add code to handle #VC exceptions on DR7 register reads and writes. This is needed early because show_regs() reads DR7 to print it out. Under SEV-ES, there is currently no support for saving/restoring the DRx registers but software expects to be able to write to the DR7 register. For now, cache the value written to DR7 and return it on read attempts, but do not touch the real hardware DR7. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: - Adapt to #VC handling framework - Support early usage ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle MSR eventsTom Lendacky1-0/+28
Implement a handler for #VC exceptions caused by RDMSR/WRMSR instructions. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to #VC handling infrastructure. ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle MMIO String InstructionsJoerg Roedel1-0/+77
Add handling for emulation of the MOVS instruction on MMIO regions, as done by the memcpy_toio() and memcpy_fromio() functions. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle MMIO eventsTom Lendacky2-0/+227
Add a handler for #VC exceptions caused by MMIO intercepts. These intercepts come along as nested page faults on pages with reserved bits set. Signed-off-by: Tom Lendacky <[email protected]> [ [email protected]: Adapt to VC handling framework ] Co-developed-by: Joerg Roedel <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Handle instruction fetches from user-spaceJoerg Roedel1-9/+22
When a #VC exception is triggered by user-space, the instruction decoder needs to read the instruction bytes from user addresses. Enhance vc_decode_insn() to safely fetch kernel and user instructions. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Wire up existing #VC exit-code handlersJoerg Roedel2-4/+9
Re-use the handlers for CPUID- and IOIO-caused #VC exceptions in the early boot handler. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Add a Runtime #VC Exception HandlerTom Lendacky3-8/+255
Add the handlers for #VC exceptions invoked at runtime. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/entry/64: Add entry code for #VC handlerJoerg Roedel5-0/+171
The #VC handler needs special entry code because: 1. It runs on an IST stack 2. It needs to be able to handle nested #VC exceptions To make this work, the entry code is implemented to pretend it doesn't use an IST stack. When entered from user-mode or early SYSCALL entry path it switches to the task stack. If entered from kernel-mode it tries to switch back to the previous stack in the IRET frame. The stack found in the IRET frame is validated first, and if it is not safe to use it for the #VC handler, the code will switch to a fall-back stack (the #VC2 IST stack). From there, it can cause nested exceptions again. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/dumpstack/64: Add noinstr version of get_stack_info()Joerg Roedel4-20/+30
The get_stack_info() functionality is needed in the entry code for the #VC exception handler. Provide a version of it in the .text.noinstr section which can be called safely from there. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Adjust #VC IST Stack on entering NMI handlerJoerg Roedel3-0/+81
When an NMI hits in the #VC handler entry code before it has switched to another stack, any subsequent #VC exception in the NMI code-path will overwrite the interrupted #VC handler's stack. Make sure this doesn't happen by explicitly adjusting the #VC IST entry in the NMI handler for the time it can cause #VC exceptions. [ bp: Touchups, spelling fixes. ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Allocate and map an IST stack for #VC handlerJoerg Roedel5-14/+63
Allocate and map an IST stack and an additional fall-back stack for the #VC handler. The memory for the stacks is allocated only when SEV-ES is active. The #VC handler needs to use an IST stack because a #VC exception can be raised from kernel space with unsafe stack, e.g. in the SYSCALL entry path. Since the #VC exception can be nested, the #VC handler switches back to the interrupted stack when entered from kernel space. If switching back is not possible, the fall-back stack is used. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Setup per-CPU GHCBs for the runtime handlerTom Lendacky3-1/+60
The runtime handler needs one GHCB per-CPU. Set them up and map them unencrypted. [ bp: Touchups and simplification. ] Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Setup GHCB-based boot #VC handlerJoerg Roedel9-8/+176
Add the infrastructure to handle #VC exceptions when the kernel runs on virtual addresses and has mapped a GHCB. This handler will be used until the runtime #VC handler takes over. Since the handler runs very early, disable instrumentation for sev-es.c. [ bp: Make vc_ghcb_invalidate() __always_inline so that it can be inlined in noinstr functions like __sev_es_nmi_complete(). ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-09x86/sev-es: Setup an early #VC handlerJoerg Roedel3-1/+57
Setup an early handler for #VC exceptions. There is no GHCB mapped yet, so just re-use the vc_no_ghcb_handler(). It can only handle CPUID exit-codes, but that should be enough to get the kernel through verify_cpu() and __startup_64() until it runs on virtual addresses. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> [ boot failure Error: kernel_ident_mapping_init() failed. ] Reported-by: kernel test robot <[email protected]> Link: https://lkml.kernel.org/r/[email protected]