aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-17KVM: x86: Disallow guest CPUID lookups when IRQs are disabledSean Christopherson1-0/+13
Now that KVM has a framework for caching guest CPUID feature flags, add a "rule" that IRQs must be enabled when doing guest CPUID lookups, and enforce the rule via a lockdep assertion. CPUID lookups are slow, and within KVM, IRQs are only ever disabled in hot paths, e.g. the core run loop, fast page fault handling, etc. I.e. querying guest CPUID with IRQs disabled, especially in the run loop, should be avoided. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled"Sean Christopherson3-6/+3
Track "virtual NMI exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "vnmi" param means that the code isn't strictly equivalent, as vnmi_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are gated by nSVM being enabled. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled"Sean Christopherson4-5/+7
Track "virtual GIF exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "vgif" param means that the code isn't strictly equivalent, as vgif_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled"Sean Christopherson4-9/+12
Track "Pause Filtering is exposed to L1" via governed feature flags instead of using dedicated bits/flags in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled"Sean Christopherson4-14/+16
Track "LBR virtualization exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "lbrv" param means that the code isn't strictly equivalent, as lbrv_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are gated by nSVM being enabled. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled"Sean Christopherson4-5/+9
Track "virtual VMSAVE/VMLOAD exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Opportunistically add a comment explaining why KVM disallows virtual VMLOAD/VMSAVE when the vCPU model is Intel. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled"Sean Christopherson4-6/+8
Track "TSC scaling exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, this fixes a benign bug where KVM would mark TSC scaling as exposed to L1 even if overall nested SVM supported is disabled, i.e. KVM would let L1 write MSR_AMD64_TSC_RATIO even when KVM didn't advertise TSCRATEMSR support to userspace. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled"Sean Christopherson4-7/+5
Track "NRIPS exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: nVMX: Use KVM-governed feature framework to track "nested VMX enabled"Sean Christopherson4-19/+11
Track "VMX exposed to L1" via a governed feature flag instead of using a dedicated helper to provide the same functionality. The main goal is to drive convergence between VMX and SVM with respect to querying features that are controllable via module param (SVM likes to cache nested features), avoiding the guest CPUID lookups at runtime is just a bonus and unlikely to provide any meaningful performance benefits. Note, X86_FEATURE_VMX is set in kvm_cpu_caps if and only if "nested" is true, and the CPU obviously supports VMX if KVM+VMX is running. I.e. the check on "nested" is now implicitly down by the kvm_cpu_cap_has() check in kvm_governed_feature_check_and_set(). No functional change intended. Reviewed-by: Yuan Yao <[email protected]> Reviwed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"Sean Christopherson5-24/+35
Use the governed feature framework to track if XSAVES is "enabled", i.e. if XSAVES can be used by the guest. Add a comment in the SVM code to explain the very unintuitive logic of deliberately NOT checking if XSAVES is enumerated in the guest CPUID model. No functional change intended. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: VMX: Rename XSAVES control to follow KVM's preferred "ENABLE_XYZ"Sean Christopherson7-9/+9
Rename the XSAVES secondary execution control to follow KVM's preferred style so that XSAVES related logic can use common macros that depend on KVM's preferred style. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: VMX: Check KVM CPU caps, not just VMX MSR support, for XSAVE enablingSean Christopherson1-1/+1
Check KVM CPU capabilities instead of raw VMX support for XSAVES when determining whether or not XSAVER can/should be exposed to the guest. Practically speaking, it's nonsensical/impossible for a CPU to support "enable XSAVES" without XSAVES being supported natively. The real motivation for checking kvm_cpu_cap_has() is to allow using the governed feature's standard check-and-set logic. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: VMX: Recompute "XSAVES enabled" only after CPUID updateSean Christopherson1-13/+11
Recompute whether or not XSAVES is enabled for the guest only if the guest's CPUID model changes instead of redoing the computation every time KVM generates vmcs01's secondary execution controls. The boot_cpu_has() and cpu_has_vmx_xsaves() checks should never change after KVM is loaded, and if they do the kernel/KVM is hosed. Opportunistically add a comment explaining _why_ XSAVES is effectively exposed to the guest if and only if XSAVE is also exposed to the guest. Practically speaking, no functional change intended (KVM will do fewer computations, but should still see the same xsaves_enabled value whenever KVM looks at it). Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: x86/mmu: Use KVM-governed feature framework to track "GBPAGES enabled"Sean Christopherson3-17/+22
Use the governed feature framework to track whether or not the guest can use 1GiB pages, and drop the one-off helper that wraps the surprisingly non-trivial logic surrounding 1GiB page usage in the guest. No functional change intended. Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: x86: Add a framework for enabling KVM-governed x86 featuresSean Christopherson4-0/+78
Introduce yet another X86_FEATURE flag framework to manage and cache KVM governed features (for lack of a better name). "Governed" in this case means that KVM has some level of involvement and/or vested interest in whether or not an X86_FEATURE can be used by the guest. The intent of the framework is twofold: to simplify caching of guest CPUID flags that KVM needs to frequently query, and to add clarity to such caching, e.g. it isn't immediately obvious that SVM's bundle of flags for "optional nested SVM features" track whether or not a flag is exposed to L1. Begrudgingly define KVM_MAX_NR_GOVERNED_FEATURES for the size of the bitmap to avoid exposing governed_features.h in arch/x86/include/asm/, but add a FIXME to call out that it can and should be cleaned up once "struct kvm_vcpu_arch" is no longer expose to the kernel at large. Cc: Zeng Guang <[email protected]> Reviewed-by: Binbin Wu <[email protected]> Reviewed-by: Kai Huang <[email protected]> Reviewed-by: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17x86: kvm: x86: Remove unnecessary initial values of variablesLi zeming1-2/+2
bitmap and khz is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: x86: Remove WARN sanity check on hypervisor timer vs. UNINITIALIZED vCPUSean Christopherson1-4/+9
Drop the WARN in KVM_RUN that asserts that KVM isn't using the hypervisor timer, a.k.a. the VMX preemption timer, for a vCPU that is in the UNINITIALIZIED activity state. The intent of the WARN is to sanity check that KVM won't drop a timer interrupt due to an unexpected transition to UNINITIALIZED, but unfortunately userspace can use various ioctl()s to force the unexpected state. Drop the sanity check instead of switching from the hypervisor timer to a software based timer, as the only reason to switch to a software timer when a vCPU is blocking is to ensure the timer interrupt wakes the vCPU, but said interrupt isn't a valid wake event for vCPUs in UNINITIALIZED state *and* the interrupt will be dropped in the end. Reported-by: Yikebaer Aizezi <[email protected]> Closes: https://lore.kernel.org/all/CALcu4rbFrU4go8sBHk3FreP+qjgtZCGcYNpSiEXOLm==qFv7iQ@mail.gmail.com Reviewed-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-17KVM: x86: Remove break statements that will never be executedLike Xu3-4/+0
Fix compiler warnings when compiling KVM with [-Wunreachable-code-break]. No functional change intended. Cc: Vitaly Kuznetsov <[email protected]> Signed-off-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: nSVM: Skip writes to MSR_AMD64_TSC_RATIO if guest state isn't loadedSean Christopherson1-1/+2
Skip writes to MSR_AMD64_TSC_RATIO that are done in the context of a vCPU if guest state isn't loaded, i.e. if KVM will update MSR_AMD64_TSC_RATIO during svm_prepare_switch_to_guest() before entering the guest. Checking guest_state_loaded may or may not be a net positive for performance as the current_tsc_ratio cache will optimize away duplicate WRMSRs in the vast majority of scenarios. However, the cost of the check is negligible, and the real motivation is to document that KVM needs to load the vCPU's value only when running the vCPU. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooksSean Christopherson6-16/+15
Drop the @offset and @multiplier params from the kvm_x86_ops hooks for propagating TSC offsets/multipliers into hardware, and instead have the vendor implementations pull the information directly from the vCPU structure. The respective vCPU fields _must_ be written at the same time in order to maintain consistent state, i.e. it's not random luck that the value passed in by all callers is grabbed from the vCPU. Explicitly grabbing the value from the vCPU field in SVM's implementation in particular will allow for additional cleanup without introducing even more subtle dependencies. Specifically, SVM can skip the WRMSR if guest state isn't loaded, i.e. svm_prepare_switch_to_guest() will load the correct value for the vCPU prior to entering the guest. This also reconciles KVM's handling of related values that are stored in the vCPU, as svm_write_tsc_offset() already assumes/requires the caller to have updated l1_tsc_offset. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: SVM: Clean up preemption toggling related to MSR_AMD64_TSC_RATIOSean Christopherson1-5/+3
Explicitly disable preemption when writing MSR_AMD64_TSC_RATIO only in the "outer" helper, as all direct callers of the "inner" helper now run with preemption already disabled. And that isn't a coincidence, as the outer helper requires a vCPU and is intended to be used when modifying guest state and/or emulating guest instructions, which are typically done with preemption enabled. Direct use of the inner helper should be extremely limited, as the only time KVM should modify MSR_AMD64_TSC_RATIO without a vCPU is when sanitizing the MSR for a specific pCPU (currently done when {en,dis}abling disabling SVM). The other direct caller is svm_prepare_switch_to_guest(), which does have a vCPU, but is a one-off special case: KVM is about to enter the guest on a specific pCPU and thus must have preemption disabled. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: nSVM: Use the "outer" helper for writing multiplier to MSR_AMD64_TSC_RATIOSean Christopherson3-6/+5
When emulating nested SVM transitions, use the outer helper for writing the TSC multiplier for L2. Using the inner helper only for one-off cases, i.e. for paths where KVM is NOT emulating or modifying vCPU state, will allow for multiple cleanups: - Explicitly disabling preemption only in the outer helper - Getting the multiplier from the vCPU field in the outer helper - Skipping the WRMSR in the outer helper if guest state isn't loaded Opportunistically delete an extra newline. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 stateSean Christopherson1-2/+2
When emulating nested VM-Exit, load L1's TSC multiplier if L1's desired ratio doesn't match the current ratio, not if the ratio L1 is using for L2 diverges from the default. Functionally, the end result is the same as KVM will run L2 with L1's multiplier if L2's multiplier is the default, i.e. checking that L1's multiplier is loaded is equivalent to checking if L2 has a non-default multiplier. However, the assertion that TSC scaling is exposed to L1 is flawed, as userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR); after restoring state in a new VM+vCPU yields an endless supply of: ------------[ cut here ]------------ WARNING: CPU: 10 PID: 206939 at arch/x86/kvm/svm/nested.c:1105 nested_svm_vmexit+0x6af/0x720 [kvm_amd] Call Trace: nested_svm_exit_handled+0x102/0x1f0 [kvm_amd] svm_handle_exit+0xb9/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] ? trace_hardirqs_off+0x4d/0xa0 __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Unlike the nested VMRUN path, hoisting the svm->tsc_scaling_enabled check into the if-statement is wrong as KVM needs to ensure L1's multiplier is loaded in the above scenario. Alternatively, the WARN_ON() could simply be deleted, but that would make KVM's behavior even more subtle, e.g. it's not immediately obvious why it's safe to write MSR_AMD64_TSC_RATIO when checking only tsc_ratio_msr. Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: nSVM: Check instead of asserting on nested TSC scaling supportSean Christopherson1-3/+2
Check for nested TSC scaling support on nested SVM VMRUN instead of asserting that TSC scaling is exposed to L1 if L1's MSR_AMD64_TSC_RATIO has diverged from KVM's default. Userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR); after restoring state in a new VM+vCPU yields an endless supply of: ------------[ cut here ]------------ WARNING: CPU: 164 PID: 62565 at arch/x86/kvm/svm/nested.c:699 nested_vmcb02_prepare_control+0x3d6/0x3f0 [kvm_amd] Call Trace: <TASK> enter_svm_guest_mode+0x114/0x560 [kvm_amd] nested_svm_vmrun+0x260/0x330 [kvm_amd] vmrun_interception+0x29/0x30 [kvm_amd] svm_invoke_exit_handler+0x35/0x100 [kvm_amd] svm_handle_exit+0xe7/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x45ca1b Note, the nested #VMEXIT path has the same flaw, but needs a different fix and will be handled separately. Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: x86: Advertise AMX-COMPLEX CPUID to userspaceTao Su2-1/+3
Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds two instructions to perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] Advertise AMX_COMPLEX if it's supported in hardware. There are no VMX controls for the feature, i.e. the instructions can't be interecepted, and KVM advertises base AMX in CPUID if AMX is supported in hardware, even if KVM doesn't advertise AMX as being supported in XCR0, e.g. because the process didn't opt-in to allocating tile data. Signed-off-by: Tao Su <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: tweak last paragraph of changelog] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0Sean Christopherson1-2/+10
Bail from vmx_emergency_disable() without processing the list of loaded VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be impossible for the list to have entries if VMX is already disabled, and even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. processing the list is pointless even if it somehow isn't empty. Assuming no existing KVM bugs, this should be a glorified nop. The primary motivation for the change is to avoid having code that looks like it does VMCLEAR, but then skips VMXON, which is nonsensical. Suggested-by: Kai Huang <[email protected]> Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: SVM: Use "standard" stgi() helper when disabling SVMSean Christopherson1-10/+3
Now that kvm_rebooting is guaranteed to be true prior to disabling SVM in an emergency, use the existing stgi() helper instead of open coding STGI. In effect, eat faults on STGI if and only if kvm_rebooting==true. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: x86: Force kvm_rebooting=true during emergency reboot/crashSean Christopherson2-0/+4
Set kvm_rebooting when virtualization is disabled in an emergency so that KVM eats faults on virtualization instructions even if kvm_reboot() isn't reached. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/virt: KVM: Move "disable SVM" helper into KVM SVMSean Christopherson2-54/+25
Move cpu_svm_disable() into KVM proper now that all hardware virtualization management is routed through KVM. Remove the now-empty virtext.h. No functional change intended. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: VMX: Ensure CPU is stable when probing basic VMX supportSean Christopherson1-3/+14
Disable migration when probing VMX support during module load to ensure the CPU is stable, mostly to match similar SVM logic, where allowing migration effective requires deliberately writing buggy code. As a bonus, KVM won't report the wrong CPU to userspace if VMX is unsupported, but in practice that is a very, very minor bonus as the only way that reporting the wrong CPU would actually matter is if hardware is broken or if the system is misconfigured, i.e. if KVM gets migrated from a CPU that _does_ support VMX to a CPU that does _not_ support VMX. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()Sean Christopherson1-6/+19
Check "this" CPU instead of the boot CPU when querying SVM support so that the per-CPU checks done during hardware enabling actually function as intended, i.e. will detect issues where SVM isn't support on all CPUs. Disable migration for the use from svm_init() mostly so that the standard accessors for the per-CPU data can be used without getting yelled at by CONFIG_DEBUG_PREEMPT=y sanity checks. Preventing the "disabled by BIOS" error message from reporting the wrong CPU is largely a bonus, as ensuring a stable CPU during module load is a non-goal for KVM. Link: https://lore.kernel.org/all/[email protected] Cc: Kai Huang <[email protected]> Cc: Chao Gao <[email protected]> Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()Sean Christopherson2-31/+8
Fold the guts of cpu_has_svm() into kvm_is_svm_supported(), its sole remaining user. No functional change intended. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()Sean Christopherson1-6/+0
Drop the explicit check on the extended CPUID level in cpu_has_svm(), the kernel's cached CPUID info will leave the entire SVM leaf unset if said leaf is not supported by hardware. Prior to using cached information, the check was needed to avoid false positives due to Intel's rather crazy CPUID behavior of returning the values of the maximum supported leaf if the specified leaf is unsupported. Fixes: 682a8108872f ("x86/kvm/svm: Simplify cpu_has_svm()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03KVM: SVM: Make KVM_AMD depend on CPU_SUP_AMD or CPU_SUP_HYGONSean Christopherson1-1/+1
Make building KVM SVM support depend on support for AMD or Hygon. KVM already effectively restricts SVM support to AMD and Hygon by virtue of the vendor string checks in cpu_has_svm(), and KVM VMX supports depends on one of its three known vendors (Intel, Centaur, or Zhaoxin). Add the CPU_SUP_HYGON clause even though CPU_SUP_HYGON selects CPU_SUP_AMD to document that KVM SVM support isn't just for AMD CPUs, and to prevent breakage should Hygon support ever become a standalone thing. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/virt: KVM: Move VMXOFF helpers into KVM VMXSean Christopherson2-45/+26
Now that VMX is disabled in emergencies via the virt callbacks, move the VMXOFF helpers into KVM, the only remaining user. No functional change intended. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/virt: KVM: Open code cpu_has_vmx() in KVM VMXSean Christopherson2-11/+1
Fold the raw CPUID check for VMX into kvm_is_vmx_supported(), its sole user. Keep the check even though KVM also checks X86_FEATURE_VMX, as the intent is to provide a unique error message if VMX is unsupported by hardware, whereas X86_FEATURE_VMX may be clear due to firmware and/or kernel actions. No functional change intended. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: Expose VMCS crash hooks if and only if KVM_{INTEL,AMD} is enabledSean Christopherson2-1/+8
Expose the crash/reboot hooks used by KVM to disable virtualization in hardware and unblock INIT only if there's a potential in-tree user, i.e. either KVM_INTEL or KVM_AMD is enabled. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: Disable virtualization during reboot iff callback is registeredSean Christopherson1-2/+1
Attempt to disable virtualization during an emergency reboot if and only if there is a registered virt callback, i.e. iff a hypervisor (KVM) is active. If there's no active hypervisor, then the CPU can't be operating with VMX or SVM enabled (barring an egregious bug). Checking for a valid callback instead of simply for SVM or VMX support can also eliminates spurious NMIs by avoiding the unecessary call to nmi_shootdown_cpus_on_restart(). Note, IRQs are disabled, which prevents KVM from coming along and enabling virtualization after the fact. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: Hoist "disable virt" helpers above "emergency reboot" pathSean Christopherson1-45/+45
Move the various "disable virtualization" helpers above the emergency reboot path so that emergency_reboot_disable_virtualization() can be stubbed out in a future patch if neither KVM_INTEL nor KVM_AMD is enabled, i.e. if there is no in-tree user of CPU virtualization. No functional change intended. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: Assert that IRQs are disabled when turning off virtualizationSean Christopherson1-1/+7
Assert that IRQs are disabled when turning off virtualization in an emergency. KVM enables hardware via on_each_cpu(), i.e. could re-enable hardware if a pending IPI were delivered after disabling virtualization. Remove a misleading comment from emergency_reboot_disable_virtualization() about "just" needing to guarantee the CPU is stable (see above). Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callbackSean Christopherson3-13/+17
Use the virt callback to disable SVM (and set GIF=1) during an emergency instead of blindly attempting to disable SVM. Like the VMX case, if a hypervisor, i.e. KVM, isn't loaded/active, SVM can't be in use. Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: KVM: Handle VMXOFF in KVM's reboot callbackSean Christopherson3-33/+14
Use KVM VMX's reboot/crash callback to do VMXOFF in an emergency instead of manually and blindly doing VMXOFF. There's no need to attempt VMXOFF if a hypervisor, i.e. KVM, isn't loaded/active, i.e. if the CPU can't possibly be post-VMXON. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: Harden virtualization hooks for emergency rebootSean Christopherson3-12/+29
Provide dedicated helpers to (un)register virt hooks used during an emergency crash/reboot, and WARN if there is an attempt to overwrite the registered callback, or an attempt to do an unpaired unregister. Opportunsitically use rcu_assign_pointer() instead of RCU_INIT_POINTER(), mainly so that the set/unset paths are more symmetrical, but also because any performance gains from using RCU_INIT_POINTER() are meaningless for this code. Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-03x86/reboot: VMCLEAR active VMCSes before emergency rebootSean Christopherson5-40/+27
VMCLEAR active VMCSes before any emergency reboot, not just if the kernel may kexec into a new kernel after a crash. Per Intel's SDM, the VMX architecture doesn't require the CPU to flush the VMCS cache on INIT. If an emergency reboot doesn't RESET CPUs, cached VMCSes could theoretically be kept and only be written back to memory after the new kernel is booted, i.e. could effectively corrupt memory after reboot. Opportunistically remove the setting of the global pointer to NULL to make checkpatch happy. Cc: Andrew Cooper <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabledSean Christopherson1-4/+25
Retry the optimized APIC map recalculation if an APIC-enabled vCPU shows up between allocating the map and filling in the map data. Conditionally reschedule before retrying even though the number of vCPUs that can be created is bounded by KVM. Retrying a few thousand times isn't so slow as to be hugely problematic, but it's not blazing fast either. Reset xapic_id_mistach on each retry as a vCPU could change its xAPIC ID between loops, but do NOT reset max_id. The map size also factors in whether or not a vCPU's local APIC is hardware-enabled, i.e. userspace and/or the guest can theoretically keep KVM retrying indefinitely. The only downside is that KVM will allocate more memory than is strictly necessary if the vCPU with the highest x2APIC ID disabled its APIC while the recalculation was in-progress. Refresh kvm->arch.apic_map_dirty to opportunistically change it from DIRTY => UPDATE_IN_PROGRESS to avoid an unnecessary recalc from a different task, i.e. if another task is waiting to attempt an update (which is likely since a retry happens if and only if an update is required). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: VMX: Drop unnecessary vmx_fb_clear_ctrl_available "cache"Sean Christopherson1-14/+3
Now that KVM snapshots the host's MSR_IA32_ARCH_CAPABILITIES, drop the similar snapshot/cache of whether or not KVM is allowed to manipulate MSR_IA32_MCU_OPT_CTRL.FB_CLEAR_DIS. The motivation for the cache was presumably to avoid the RDMSR, e.g. boot_cpu_has_bug() is quite cheap, and modifying the vCPU's MSR_IA32_ARCH_CAPABILITIES is an infrequent option and a relatively slow path. Cc: Pawan Gupta <[email protected]> Reviewed-by: Pawan Gupta <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: x86: Snapshot host's MSR_IA32_ARCH_CAPABILITIESSean Christopherson3-22/+14
Snapshot the host's MSR_IA32_ARCH_CAPABILITIES, if it's supported, instead of reading the MSR every time KVM wants to query the host state, e.g. when initializing the default value during vCPU creation. The paths that query ARCH_CAPABILITIES aren't particularly performance sensitive, but creating vCPUs is a frequent enough operation that burning 8 bytes is a good trade-off. Alternatively, KVM could add a field in kvm_caps and thus skip the on-demand calculations entirely, but a pure snapshot isn't possible due to the way KVM handles the l1tf_vmx_mitigation module param. And unlike the other "supported" fields in kvm_caps, KVM doesn't enforce the "supported" value, i.e. KVM treats ARCH_CAPABILITIES like a CPUID leaf and lets userspace advertise whatever it wants. Those problems are solvable, but it's not clear there is real benefit versus snapshotting the host value, and grabbing the host value will allow additional cleanup of KVM's FB_CLEAR_CTRL code. Link: https://lore.kernel.org/all/[email protected] Cc: Chao Gao <[email protected]> Cc: Xiaoyao Li <[email protected]> Reviewed-by: Chao Gao <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: x86: Advertise host CPUID 0x80000005 in KVM_GET_SUPPORTED_CPUIDTakahiro Itazuri1-0/+3
Advertise CPUID 0x80000005 (L1 cache and TLB info) to userspace so that VMMs that reflect KVM_GET_SUPPORTED_CPUID into KVM_SET_CPUID2 will enumerate sane cache/TLB information to the guest. CPUID 0x80000006 (L2 cache and TLB and L3 cache info) has been returned since commit 43d05de2bee7 ("KVM: pass through CPUID(0x80000006)"). Enumerating both 0x80000005 and 0x80000006 with KVM_GET_SUPPORTED_CPUID is better than reporting one or the other, and 0x80000005 could be helpful for VMM to pass it to KVM_SET_CPUID{,2} for the same reason with 0x80000006. Signed-off-by: Takahiro Itazuri <[email protected]> Link: https://lore.kernel.org/all/ZK7NmfKI9xur%[email protected] Link: https://lore.kernel.org/r/[email protected] [sean: add link, massage changelog] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: x86: Remove x86_emulate_ops::guest_has_long_modeMichal Luczaj2-7/+0
Remove x86_emulate_ops::guest_has_long_mode along with its implementation, emulator_guest_has_long_mode(). It has been unused since commit 1d0da94cdafe ("KVM: x86: do not go through ctxt->ops when emulating rsm"). No functional change intended. Signed-off-by: Michal Luczaj <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-31KVM: x86: Use sysfs_emit() instead of sprintf()Like Xu2-3/+3
Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit() knows the maximum of the temporary buffer used for outputting sysfs content and avoids overrunning the buffer length. Signed-off-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>