aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm/svm.c
AgeCommit message (Collapse)AuthorFilesLines
2022-04-02KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is runningMaxim Levitsky1-19/+79
When L2 is running without LBR virtualization, we should ensure that L1's LBR msrs continue to update as usual. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: SVM: remove vgif_enabled()Maxim Levitsky1-6/+6
KVM always uses vgif when allowed, thus there is no need to query current vmcb for it Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: SVM: use vmcb01 in init_vmcbMaxim Levitsky1-5/+6
Clarify that this function is not used to initialize any part of the vmcb02. No functional change intended. Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SWMaxim Levitsky1-1/+1
Use a dummy unused vmexit reason to mark the 'VM exit' that is happening when kvm exits to handle SMM, which is not a real VM exit. This makes it a bit easier to read the KVM trace, and avoids other potential problems due to a stale vmexit reason in the vmcb. If SVM_EXIT_SW somehow reaches svm_invoke_exit_handler(), instead, svm_check_exit_valid() will return false and a WARN will be logged. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: SVM: allow to force AVIC to be enabledMaxim Levitsky1-2/+9
Apparently on some systems AVIC is disabled in CPUID but still usable. Allow the user to override the CPUID if the user is willing to take the risk. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: nSVM: implement nested VMLOAD/VMSAVEMaxim Levitsky1-0/+7
This was tested by booting L1,L2,L3 (all Linux) and checking that no VMLOAD/VMSAVE vmexits happened. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: SVM: fix tsc scaling when the host doesn't support itMaxim Levitsky1-2/+2
It was decided that when TSC scaling is not supported, the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0' value. However in this case kvm_max_tsc_scaling_ratio is not set, which breaks various assumptions. Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of host support. For consistency, do the same for VMX. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02kvm: x86: SVM: remove unused definesMaxim Levitsky1-8/+0
Remove some unused #defines from svm.c Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: SVM: move tsc ratio definitions to svm.hMaxim Levitsky1-10/+5
Another piece of SVM spec which should be in the header file Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-04-02KVM: x86: Add wrappers for setting/clearing APICv inhibitsSean Christopherson1-6/+5
Add set/clear wrappers for toggling APICv inhibits to make the call sites more readable, and opportunistically rename the inner helpers to align with the new wrappers and to make them more readable as well. Invert the flag from "activate" to "set"; activate is painfully ambiguous as it's not obvious if the inhibit is being activated, or if APICv is being activated, in which case the inhibit is being deactivated. For the functions that take @set, swap the order of the inhibit reason and @set so that the call sites are visually similar to those that bounce through the wrapper. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-03-21KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_maskMaxim Levitsky1-0/+6
KVM_X86_OP_OPTIONAL_RET0 can only be used with 32-bit return values on 32-bit systems, because unsigned long is only 32-bits wide there and 64-bit values are returned in edx:eax. Reported-by: Maxim Levitsky <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-03-04Merge branch 'kvm-bugfixes' into HEADPaolo Bonzini1-2/+17
Merge bugfixes from 5.17 before merging more tricky work.
2022-03-01KVM: SVM: Disable preemption across AVIC load/put during APICv refreshSean Christopherson1-2/+2
Disable preemption when loading/putting the AVIC during an APICv refresh. If the vCPU task is preempted and migrated ot a different pCPU, the unprotected avic_vcpu_load() could set the wrong pCPU in the physical ID cache/table. Pull the necessary code out of avic_vcpu_{,un}blocking() and into a new helper to reduce the probability of introducing this exact bug a third time. Fixes: df7e4827c549 ("KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC") Cc: [email protected] Reported-by: Maxim Levitsky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-24KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non ↵Maxim Levitsky1-2/+17
default value when tsc scaling disabled If nested tsc scaling is disabled, MSR_AMD64_TSC_RATIO should never have non default value. Due to way nested tsc scaling support was implmented in qemu, it would set this msr to 0 when nested tsc scaling was disabled. Ignore that value for now, as it causes no harm. Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: [email protected] Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-18KVM: x86: allow defining return-0 static callsPaolo Bonzini1-20/+0
A few vendor callbacks are only used by VMX, but they return an integer or bool value. Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is NULL in struct kvm_x86_ops, it will be changed to __static_call_return0 when updating static calls. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-18KVM: x86: make several APIC virtualization callbacks optionalPaolo Bonzini1-4/+0
All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-18KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPICPaolo Bonzini1-6/+0
The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-14KVM: SVM: Rename AVIC helpers to use "avic" prefix instead of "svm"Sean Christopherson1-9/+9
Use "avic" instead of "svm" for SVM's all of APICv hooks and make a few additional funciton name tweaks so that the AVIC functions conform to their associated kvm_x86_ops hooks. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-14Merge remote-tracking branch 'kvm/master' into HEADPaolo Bonzini1-7/+41
Merge bugfix patches from Linux 5.17-rc.
2022-02-11KVM: SVM: fix race between interrupt delivery and AVIC inhibitionMaxim Levitsky1-7/+41
If svm_deliver_avic_intr is called just after the target vcpu's AVIC got inhibited, it might read a stale value of vcpu->arch.apicv_active which can lead to the target vCPU not noticing the interrupt. To fix this use load-acquire/store-release so that, if the target vCPU is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the AVIC. If AVIC has been disabled in the meanwhile, proceed with the KVM_REQ_EVENT-based delivery. Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and in fact it can be handled in exactly the same way; the only difference lies in who has set IRR, whether svm_deliver_interrupt or the processor. Therefore, svm_complete_interrupt_delivery can be used to fix incomplete IPI vmexits as well. Co-developed-by: Paolo Bonzini <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-11KVM: SVM: set IRR in svm_deliver_interruptPaolo Bonzini1-1/+1
SVM has to set IRR for both the AVIC and the software-LAPIC case, so pull it up to the common function that handles both configurations. Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuiltVitaly Kuznetsov1-1/+2
Similar to nVMX commit ed2a4800ae9d ("KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' namesSean Christopherson1-20/+20
Massage SVM's implementation names that still diverge from kvm_x86_ops to allow for wiring up all SVM-defined functions via kvm-x86-ops.h. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: SVM: Rename SEV implemenations to conform to kvm_x86_ops hooksSean Christopherson1-7/+7
Rename svm_vm_copy_asid_from() and svm_vm_migrate_from() to conform to the names used by kvm_x86_ops, and opportunistically use "sev" instead of "svm" to more precisely identify the role of the hooks. svm_vm_copy_asid_from() in particular was poorly named as the function does much more than simply copy the ASID. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooksSean Christopherson1-3/+3
Use slightly more verbose names for the so called "memory encrypt", a.k.a. "mem enc", kvm_x86_ops hooks to bridge the gap between the current super short kvm_x86_ops names and SVM's more verbose, but non-conforming names. This is a step toward using kvm-x86-ops.h with KVM_X86_CVM_OP() to fill svm_x86_ops. Opportunistically rename mem_enc_op() to mem_enc_ioctl() to better reflect its true nature, as it really is a full fledged ioctl() of its own. Ideally, the hook would be named confidential_vm_ioctl() or so, as the ioctl() is a gateway to more than just memory encryption, and because its underlying purpose to support Confidential VMs, which can be provided without memory encryption, e.g. if the TCB of the guest includes the host kernel but not host userspace, or by isolation in hardware without encrypting memory. But, diverging from KVM_MEMORY_ENCRYPT_OP even further is undeseriable, and short of creating alises for all related ioctl()s, which introduces a different flavor of divergence, KVM is stuck with the nomenclature. Defer renaming SVM's functions to a future commit as there are additional changes needed to make SVM fully conforming and to match reality (looking at you, svm_vm_copy_asid_from()). No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: SVM: Remove unused MAX_INST_SIZE #defineSean Christopherson1-2/+0
Remove SVM's MAX_INST_SIZE, which has long since been obsoleted by the common MAX_INSN_SIZE. Note, the latter's "insn" is also the generally preferred abbreviation of instruction. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: SVM: Rename svm_flush_tlb() to svm_flush_tlb_current()Sean Christopherson1-5/+7
Rename svm_flush_tlb() to svm_flush_tlb_current() so that at least one of the flushing operations in svm_x86_ops can be filled via kvm-x86-ops.h, and to document the scope of the flush (specifically that it doesn't flush "all"). Opportunistically make svm_tlb_flush_current(), was svm_flush_tlb(), static. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: x86: Move get_cs_db_l_bits() helper to SVMSean Christopherson1-1/+10
Move kvm_get_cs_db_l_bits() to SVM and rename it appropriately so that its svm_x86_ops entry can be filled via kvm-x86-ops, and to eliminate a superfluous export from KVM x86. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor namesSean Christopherson1-9/+9
Rename a variety of kvm_x86_op function pointers so that preferred name for vendor implementations follows the pattern <vendor>_<function>, e.g. rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run(). This will allow vendor implementations to be wired up via the KVM_X86_OP macro. In many cases, VMX and SVM "disagree" on the preferred name, though in reality it's VMX and x86 that disagree as SVM blindly prepended _svm to the kvm_x86_ops name. Justification for using the VMX nomenclature: - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an event that has already been "set" in e.g. the vIRR. SVM's relevant VMCB field is even named event_inj, and KVM's stat is irq_injections. - prepare_guest_switch => prepare_switch_to_guest because the former is ambiguous, e.g. it could mean switching between multiple guests, switching from the guest to host, etc... - update_pi_irte => pi_update_irte to allow for matching match the rest of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>(). - start_assignment => pi_start_assignment to again follow VMX's posted interrupt naming scheme, and to provide context for what bit of code might care about an otherwise undescribed "assignment". The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's wrong. x86, VMX, and SVM all use flush_tlb, and even common KVM is on a variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more appropriate name for the Hyper-V hooks would be flush_remote_tlbs. Leave that change for another time as the Hyper-V hooks always start as NULL, i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all names requires an astounding amount of churn. VMX and SVM function names are intentionally left as is to minimize the diff. Both VMX and SVM will need to rename even more functions in order to fully utilize KVM_X86_OPS, i.e. an additional patch for each is inevitable. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: SVM: improve split between svm_prepare_guest_switch and ↵Paolo Bonzini1-3/+5
sev_es_prepare_guest_switch KVM performs the VMSAVE to the host save area for both regular and SEV-ES guests, so hoist it up to svm_prepare_guest_switch. And because sev_es_prepare_guest_switch does not really need to know the details of struct svm_cpu_data *, just pass it the pointer to the host save area inside the HSAVE page. Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-10KVM: x86/svm: Remove unused "vcpu" of svm_check_exit_valid()Jinrong Liang1-2/+2
The "struct kvm_vcpu *vcpu" parameter of svm_check_exit_valid() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-08KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets ↵Maxim Levitsky1-4/+13
L2 control them Fix a corner case in which the L1 hypervisor intercepts interrupts (INTERCEPT_INTR) and either doesn't set virtual interrupt masking (V_INTR_MASKING) or enters a nested guest with EFLAGS.IF disabled prior to the entry. In this case, despite the fact that L1 intercepts the interrupts, KVM still needs to set up an interrupt window to wait before injecting the INTR vmexit. Currently the KVM instead enters an endless loop of 'req_immediate_exit'. Exactly the same issue also happens for SMIs and NMI. Fix this as well. Note that on VMX, this case is impossible as there is only 'vmexit on external interrupts' execution control which either set, in which case both host and guest's EFLAGS.IF are ignored, or not set, in which case no VMexits are delivered. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-08KVM: x86: nSVM: expose clean bit support to the guestMaxim Levitsky1-0/+1
KVM already honours few clean bits thus it makes sense to let the nested guest know about it. Note that KVM also doesn't check if the hardware supports clean bits, and therefore nested KVM was already setting clean bits and L0 KVM was already honouring them. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-08KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSMMaxim Levitsky1-0/+5
While RSM induced VM entries are not full VM entries, they still need to be followed by actual VM entry to complete it, unlike setting the nested state. This patch fixes boot of hyperv and SMM enabled windows VM running nested on KVM, which fail due to this issue combined with lack of dirty bit setting. Signed-off-by: Maxim Levitsky <[email protected]> Cc: [email protected] Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-08KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved stateMaxim Levitsky1-0/+2
While usually, restoring the smm state makes the KVM enter the nested guest thus a different vmcb (vmcb02 vs vmcb01), KVM should still mark it as dirty, since hardware can in theory cache multiple vmcbs. Failure to do so, combined with lack of setting the nested_run_pending (which is fixed in the next patch), might make KVM re-enter vmcb01, which was just exited from, with completely different set of guest state registers (SMM vs non SMM) and without proper dirty bits set, which results in the CPU reusing stale IDTR pointer which leads to a guest shutdown on any interrupt. On the real hardware this usually doesn't happen, but when running nested, L0's KVM does check and honour few dirty bits, causing this issue to happen. This patch fixes boot of hyperv and SMM enabled windows VM running nested on KVM. Signed-off-by: Maxim Levitsky <[email protected]> Cc: [email protected] Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-08KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG caseMaxim Levitsky1-2/+10
When the guest doesn't enable paging, and NPT/EPT is disabled, we use guest't paging CR3's as KVM's shadow paging pointer and we are technically in direct mode as if we were to use NPT/EPT. In direct mode we create SPTEs with user mode permissions because usually in the direct mode the NPT/EPT doesn't need to restrict access based on guest CPL (there are MBE/GMET extenstions for that but KVM doesn't use them). In this special "use guest paging as direct" mode however, and if CR4.SMAP/CR4.SMEP are enabled, that will make the CPU fault on each access and KVM will enter endless loop of page faults. Since page protection doesn't have any meaning in !PG case, just don't passthrough these bits. The fix is the same as was done for VMX in commit: commit 656ec4a4928a ("KVM: VMX: fix SMEP and SMAP without EPT") This fixes the boot of windows 10 without NPT for good. (Without this patch, BSP boots, but APs were stuck in endless loop of page faults, causing the VM boot with 1 CPU) Signed-off-by: Maxim Levitsky <[email protected]> Cc: [email protected] Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-05Merge tag 'kvmarm-fixes-5.17-2' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
2022-02-01kvm/x86: rework guest entry logicMark Rutland1-2/+2
For consistency and clarity, migrate x86 over to the generic helpers for guest timing and lockdep/RCU/tracing management, and remove the x86-specific helpers. Prior to this patch, the guest timing was entered in kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and svm_vcpu_enter_exit()), and was exited by the call to vtime_account_guest_exit() within vcpu_enter_guest(). To minimize duplication and to more clearly balance entry and exit, both entry and exit of guest timing are placed in vcpu_enter_guest(), using the new guest_timing_{enter,exit}_irqoff() helpers. When context tracking is used a small amount of additional time will be accounted towards guests; tick-based accounting is unnaffected as IRQs are disabled at this point and not enabled until after the return from the guest. This also corrects (benign) mis-balanced context tracking accounting introduced in commits: ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath") 26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath") Where KVM can enter a guest multiple times, calling vtime_guest_enter() without a corresponding call to vtime_account_guest_exit(), and with vtime_account_system() called when vtime_account_guest() should be used. As account_system_time() checks PF_VCPU and calls account_guest_time(), this doesn't result in any functional problem, but is unnecessarily confusing. Signed-off-by: Mark Rutland <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-02-01KVM: x86: Move delivery of non-APICv interrupt into vendor codeSean Christopherson1-1/+16
Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[*]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [*] https://lore.kernel.org/all/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-55/+122
Pull kvm fixes from Paolo Bonzini: "Two larger x86 series: - Redo incorrect fix for SEV/SMAP erratum - Windows 11 Hyper-V workaround Other x86 changes: - Various x86 cleanups - Re-enable access_tracking_perf_test - Fix for #GP handling on SVM - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID - Fix for ICEBP in interrupt shadow - Avoid false-positive RCU splat - Enable Enlightened MSR-Bitmap support for real ARM: - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations Generic code changes: - Dead code cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: eventfd: Fix false positive RCU usage warning KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() KVM: nVMX: Rename vmcs_to_field_offset{,_table} KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP KVM: x86: add system attribute to retrieve full set of supported xsave states KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr selftests: kvm: move vm_xsave_req_perm call to amx_test KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS KVM: x86: Keep MSR_IA32_XSS unchanged for INIT KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest KVM: x86: Check .flags in kvm_cpuid_check_equal() too KVM: x86: Forcibly leave nested virt when SMM state is toggled KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real ...
2022-01-26KVM: x86: Forcibly leave nested virt when SMM state is toggledSean Christopherson1-1/+1
Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: [email protected] Reported-by: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermodeSean Christopherson1-1/+15
Inject a #GP instead of synthesizing triple fault to try to avoid killing the guest if emulation of an SEV guest fails due to encountering the SMAP erratum. The injected #GP may still be fatal to the guest, e.g. if the userspace process is providing critical functionality, but KVM should make every attempt to keep the guest alive. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT accessSean Christopherson1-9/+34
Resume the guest instead of synthesizing a triple fault shutdown if the instruction bytes buffer is empty due to the #NPF being on the code fetch itself or on a page table access. The SMAP errata applies if and only if the code fetch was successful and ucode's subsequent data read from the code page encountered a SMAP violation. In practice, the guest is likely hosed either way, but crashing the guest on a code fetch to emulated MMIO is technically wrong according to the behavior described in the APM. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn bufferSean Christopherson1-34/+55
Inject #UD if KVM attempts emulation for an SEV guests without an insn buffer and instruction decoding is required. The previous behavior of allowing emulation if there is no insn buffer is undesirable as doing so means KVM is reading guest private memory and thus decoding cyphertext, i.e. is emulating garbage. The check was previously necessary as the emulation type was not provided, i.e. SVM needed to allow emulation to handle completion of emulation after exiting to userspace to handle I/O. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: WARN if KVM attempts emulation on #UD or #GP for SEV guestsSean Christopherson1-0/+5
WARN if KVM attempts to emulate in response to #UD or #GP for SEV guests, i.e. if KVM intercepts #UD or #GP, as emulation on any fault except #NPF is impossible since KVM cannot read guest private memory to get the code stream, and the CPU's DecodeAssists feature only provides the instruction bytes on #NPF. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> [Warn on EMULTYPE_TRAP_UD_FORCED according to Liam Merwick's review. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: x86: Pass emulation type to can_emulate_instruction()Sean Christopherson1-1/+2
Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that a future commit can harden KVM's SEV support to WARN on emulation scenarios that should never happen. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: Don't intercept #GP for SEV guestsSean Christopherson1-3/+8
Never intercept #GP for SEV guests as reading SEV guest private memory will return cyphertext, i.e. emulating on #GP can't work as intended. Cc: [email protected] Cc: Tom Lendacky <[email protected]> Cc: Brijesh Singh <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26Revert "KVM: SVM: avoid infinite loop on NPF from bad address"Sean Christopherson1-7/+0
Revert a completely broken check on an "invalid" RIP in SVM's workaround for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously expects a gfn, i.e. operates in the guest physical address space, whereas RIP is a virtual (not even linear) address. The "fix" worked for the problematic KVM selftest because the test identity mapped RIP. Fully revert the hack instead of trying to translate RIP to a GPA, as the non-SEV case is now handled earlier, and KVM cannot access guest page tables to translate RIP. This reverts commit e72436bc3a5206f95bb384e741154166ddb3202e. Fixes: e72436bc3a52 ("KVM: SVM: avoid infinite loop on NPF from bad address") Reported-by: Liam Merwick <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: SVM: Never reject emulation due to SMAP errata for !SEV guestsSean Christopherson1-4/+6
Always signal that emulation is possible for !SEV guests regardless of whether or not the CPU provided a valid instruction byte stream. KVM can read all guest state (memory and registers) for !SEV guests, i.e. can fetch the code stream from memory even if the CPU failed to do so because of the SMAP errata. Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: [email protected] Cc: Tom Lendacky <[email protected]> Cc: Brijesh Singh <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-01-26KVM: x86: nSVM: skip eax alignment check for non-SVM instructionsDenis Valeev1-5/+6
The bug occurs on #GP triggered by VMware backdoor when eax value is unaligned. eax alignment check should not be applied to non-SVM instructions because it leads to incorrect omission of the instructions emulation. Apply the alignment check only to SVM instructions to fix. Fixes: d1cba6c92237 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround") Signed-off-by: Denis Valeev <[email protected]> Message-Id: <Yexlhaoe1Fscm59u@q> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>