aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/x86.c
AgeCommit message (Collapse)AuthorFilesLines
2020-05-07KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu1-0/+1
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-06kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini1-15/+5
Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <[email protected]> Tested-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-23KVM: x86: move nested-related kvm_x86_ops to a separate structPaolo Bonzini1-14/+14
Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-23KVM: x86: check_nested_events is never NULLPaolo Bonzini1-3/+3
Both Intel and AMD now implement it, so there is no need to check if the callback is implemented. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21Merge tag 'kvm-s390-master-5.7-2' of ↵Paolo Bonzini1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fix for 5.7 and maintainer update - Silence false positive lockdep warning - add Claudio as reviewer
2020-04-21KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_runTianjia Zhang1-5/+6
In earlier versions of kvm, 'kvm_run' was an independent structure and was not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplifies the function definition, removes the extra 'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: X86: Improve latency for single target IPI fastpathWanpeng Li1-3/+3
IPI and Timer cause the main MSRs write vmexits in cloud environment observation, let's optimize virtual IPI latency more aggressively to inject target IPI as soon as possible. Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable adaptive advance lapic timer and adaptive halt-polling to avoid the interference, this patch can give another 7% improvement. w/o fastpath -> x86.c fastpath 4238 -> 3543 16.4% x86.c fastpath -> vmx.c fastpath 3543 -> 3293 7% w/o fastpath -> vmx.c fastpath 4238 -> 3293 22.3% Cc: Haiwei Li <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21kvm_host: unify VM_STAT and VCPU_STAT definitions in a single placeEmanuele Giuseppe Esposito1-42/+38
The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple files, each used by a different architecure to initialize the debugfs entries for statistics. Since they all have the same purpose, they can be unified in a single common definition in include/linux/kvm_host.h Signed-off-by: Emanuele Giuseppe Esposito <[email protected]> Message-Id: <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related codeSean Christopherson1-1/+1
Rename functions and variables in kvm_mmu_new_cr3() and related code to replace "cr3" with "pgd", i.e. continue the work started by commit 727a7e27cf88a ("KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd"). kvm_mmu_new_cr3() and company are not always loading a new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switchSean Christopherson1-1/+1
Add a separate "skip" override for MMU sync, a future change to avoid TLB flushes on nested VMX transitions may need to sync the MMU even if the TLB flush is unnecessary. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: VMX: Retrieve APIC access page HPA only when necessarySean Christopherson1-12/+1
Move the retrieval of the HPA associated with L1's APIC access page into VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the vCPU is in guest mode (L2). Alternatively, the optimization logic in VMX could be mirrored into the common x86 code, but that will get ugly fast when further optimizations are introduced. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASIDSean Christopherson1-2/+9
Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g. to flush L2's context during nested VM-Enter. Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where the flush is directly associated with vCPU-scoped instruction emulation, i.e. MOV CR3 and INVPCID. Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to make it clear that it deliberately requests a flush of all contexts. Service any pending flush request on nested VM-Exit as it's possible a nested VM-Exit could occur after requesting a flush for L2. Add the same logic for nested VM-Enter even though it's _extremely_ unlikely for flush to be pending on nested VM-Enter, but theoretically possible (in the future) due to RSM (SMM) emulation. [*] Intel also has an Address Space Identifier (ASID) concept, e.g. EPTP+VPID+PCID == ASID, it's just not documented in the SDM because the rules of invalidation are different based on which piece of the ASID is being changed, i.e. whether the EPTP, VPID, or PCID context must be invalidated. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()Sean Christopherson1-3/+3
Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a new hook to flush only the current ASID/context. Opportunstically replace the comment in vmx_flush_tlb() that explains why it flushes all EPTP/VPID contexts with a comment explaining why it unconditionally uses INVEPT when EPT is enabled. I.e. rely on the "all" part of the name to clarify why it does global INVEPT/INVVPID. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()Sean Christopherson1-3/+3
Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now that all callers pass %true for said param, or ignore the param (SVM has an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat arbitrarily passes %false). Remove __vmx_flush_tlb() as it is no longer used. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-21KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()Vitaly Kuznetsov1-1/+9
Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest() (just like KVM PV TLB flush mechanism) instead. Introduce KVM_REQ_HV_TLB_FLUSH to support the change. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-20KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hookSean Christopherson1-1/+1
Add a dedicated hook to handle flushing TLB entries on behalf of the guest, i.e. for a paravirtualized TLB flush, and use it directly instead of bouncing through kvm_vcpu_flush_tlb(). For VMX, change the effective implementation implementation to never do INVEPT and flush only the current context, i.e. to always flush via INVVPID(SINGLE_CONTEXT). The INVEPT performed by __vmx_flush_tlb() when @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only flush guest-physical mappings; linear and combined mappings are flushed by VM-Enter when VPID is disabled, and changes in the guest pages tables do not affect guest-physical mappings. When EPT and VPID are enabled, doing INVVPID is not required (by Intel's architecture) to invalidate guest-physical mappings, i.e. TLB entries that cache guest-physical mappings can live across INVVPID as the mappings are associated with an EPTP, not a VPID. The intent of @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate gpa mappings", i.e. do INVEPT and not simply INVVPID. Other than nested VPID handling, which now calls vpid_sync_context() directly, the only scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is enabled) is if KVM is flushing TLB entries from the guest's perspective, i.e. is only required to invalidate linear mappings. For SVM, flushing TLB entries from the guest's perspective can be done by flushing the current ASID, as changes to the guest's page tables are associated only with the current ASID. Adding a dedicated ->tlb_flush_guest() paves the way toward removing @invalidate_gpa, which is a potentially dangerous control flag as its meaning is not exactly crystal clear, even for those who are familiar with the subtleties of what mappings Intel CPUs are/aren't allowed to keep across various invalidation scenarios. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-20KVM: x86: Sync SPTEs when injecting page/EPT fault into L1Junaid Shahid1-1/+10
When injecting a page fault or EPT violation/misconfiguration, KVM is not syncing any shadow PTEs associated with the faulting address, including those in previous MMUs that are associated with L1's current EPTP (in a nested EPT scenario), nor is it flushing any hardware TLB entries. All this is done by kvm_mmu_invalidate_gva. Page faults that are either !PRESENT or RSVD are exempt from the flushing, as the CPU is not allowed to cache such translations. Signed-off-by: Junaid Shahid <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-20KVM: x86: cleanup kvm_inject_emulated_page_faultPaolo Bonzini1-4/+4
To reconstruct the kvm_mmu to be used for page fault injection, we can simply use fault->nested_page_fault. This matches how fault->nested_page_fault is assigned in the first place by FNAME(walk_addr_generic). Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-17kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GPVenkatesh Srinivas1-0/+11
Linux 3.14 unconditionally reads the RAPL PMU MSRs on boot, without handling General Protection Faults on reading those MSRs. Rather than injecting a #GP, which prevents boot, handle the MSRs by returning 0 for their data. Zero was checked to be safe by code review of the RAPL PMU driver and in discussion with the original driver author ([email protected]). Signed-off-by: Venkatesh Srinivas <[email protected]> Signed-off-by: Jon Cargille <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-17KVM: Remove CREATE_IRQCHIP/SET_PIT2 raceSteve Rutherford1-2/+8
Fixes a NULL pointer dereference, caused by the PIT firing an interrupt before the interrupt table has been initialized. SET_PIT2 can race with the creation of the IRQchip. In particular, if SET_PIT2 is called with a low PIT timer period (after the creation of the IOAPIC, but before the instantiation of the irq routes), the PIT can fire an interrupt at an uninitialized table. Signed-off-by: Steve Rutherford <[email protected]> Signed-off-by: Jon Cargille <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-15KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault)Sean Christopherson1-2/+6
Export the page fault propagation helper so that VMX can use it to correctly emulate TLB invalidation on page faults in an upcoming patch. In the (hopefully) not-too-distant future, SGX virtualization will also want access to the helper for injecting page faults to the correct level (L1 vs. L2) when emulating ENCLS instructions. Rename the function to kvm_inject_emulated_page_fault() to clarify that it is (a) injecting a fault and (b) only for page faults. WARN if it's invoked with an exception other than PF_VECTOR. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-11KVM: x86: Emulate split-lock access as a write in emulatorXiaoyao Li1-1/+11
Emulate split-lock accesses as writes if split lock detection is on to avoid #AC during emulation, which will result in a panic(). This should never occur for a well-behaved guest, but a malicious guest can manipulate the TLB to trigger emulation of a locked instruction[1]. More discussion can be found at [2][3]. [1] https://lkml.kernel.org/r/[email protected] [2] https://lkml.kernel.org/r/[email protected] [3] https://lkml.kernel.org/r/[email protected] Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-04-07KVM: X86: Filter out the broadcast dest for IPI fastpathWanpeng Li1-1/+2
Except destination shorthand, a destination value 0xffffffff is used to broadcast interrupts, let's also filter out this for single target IPI fastpath. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-362/+425
Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
2020-03-31KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirectionSean Christopherson1-178/+178
Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-31KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completesSean Christopherson1-2/+2
Set kvm_x86_ops with the vendor's ops only after ->hardware_setup() completes to "prevent" using kvm_x86_ops before they are ready, i.e. to generate a null pointer fault instead of silently consuming unconfigured state. An alternative implementation would be to have ->hardware_setup() return the vendor's ops, but that would require non-trivial refactoring, and would arguably result in less readable code, e.g. ->hardware_setup() would need to use ERR_PTR() in multiple locations, and each vendor's declaration of the runtime ops would be less obvious. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-31KVM: x86: Move init-only kvm_x86_ops to separate structSean Christopherson1-4/+6
Move the kvm_x86_ops functions that are used only within the scope of kvm_init() into a separate struct, kvm_x86_init_ops. In addition to identifying the init-only functions without restorting to code comments, this also sets the stage for waiting until after ->hardware_setup() to set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is desirable as many of the hooks are not usable until ->hardware_setup() completes. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-31KVM: Pass kvm_init()'s opaque param to additional arch funcsSean Christopherson1-2/+2
Pass @opaque to kvm_arch_hardware_setup() and kvm_arch_check_processor_compat() to allow architecture specific code to reference @opaque without having to stash it away in a temporary global variable. This will enable x86 to separate its vendor specific callback ops, which are passed via @opaque, into "init" and "runtime" ops without having to stash away the "init" ops. No functional change intended. Reviewed-by: Cornelia Huck <[email protected]> Tested-by: Cornelia Huck <[email protected]> #s390 Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-31Merge tag 'kvmarm-5.7' of ↵Paolo Bonzini1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for Linux 5.7 - GICv4.1 support - 32bit host removal
2020-03-30Merge tag 'timers-core-2020-03-30' of ↵Linus Torvalds1-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and timer updates from Thomas Gleixner: "Core: - Consolidation of the vDSO build infrastructure to address the difficulties of cross-builds for ARM64 compat vDSO libraries by restricting the exposure of header content to the vDSO build. This is achieved by splitting out header content into separate headers. which contain only the minimaly required information which is necessary to build the vDSO. These new headers are included from the kernel headers and the vDSO specific files. - Enhancements to the generic vDSO library allowing more fine grained control over the compiled in code, further reducing architecture specific storage and preparing for adopting the generic library by PPC. - Cleanup and consolidation of the exit related code in posix CPU timers. - Small cleanups and enhancements here and there Drivers: - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support - Correct the clock rate of PIT64b global clock - setup_irq() cleanup - Preparation for PWM and suspend support for the TI DM timer - Expand the fttmr010 driver to support ast2600 systems - The usual small fixes, enhancements and cleanups all over the place" * tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) Revert "clocksource/drivers/timer-probe: Avoid creating dead devices" vdso: Fix clocksource.h macro detection um: Fix header inclusion arm64: vdso32: Enable Clang Compilation lib/vdso: Enable common headers arm: vdso: Enable arm to use common headers x86/vdso: Enable x86 to use common headers mips: vdso: Enable mips to use common headers arm64: vdso32: Include common headers in the vdso library arm64: vdso: Include common headers in the vdso library arm64: Introduce asm/vdso/processor.h arm64: vdso32: Code clean up linux/elfnote.h: Replace elf.h with UAPI equivalent scripts: Fix the inclusion order in modpost common: Introduce processor.h linux/ktime.h: Extract common header for vDSO linux/jiffies.h: Extract common header for vDSO linux/time64.h: Extract common header for vDSO linux/time32.h: Extract common header for vDSO linux/time.h: Extract common header for vDSO ...
2020-03-26KVM: X86: Micro-optimize IPI fastpath delayWanpeng Li1-1/+5
This patch optimizes the virtual IPI fastpath emulation sequence: write ICR2 send virtual IPI read ICR2 write ICR2 send virtual IPI ==> write ICR write ICR We can observe ~0.67% performance improvement for IPI microbenchmark (https://lore.kernel.org/kvm/[email protected]/) on Skylake server. Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-26KVM: X86: Delay read msr data iff writes ICR MSRWanpeng Li1-1/+2
Delay read msr data until we identify guest accesses ICR MSR to avoid to penalize all other MSR writes. Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-26KVM: X86: Narrow down the IPI fastpath to single target IPIWanpeng Li1-1/+4
The original single target IPI fastpath patch forgot to filter the ICR destination shorthand field. Multicast IPI is not suitable for this feature since wakeup the multiple sleeping vCPUs will extend the interrupt disabled time, it especially worse in the over-subscribe and VM has a little bit more vCPUs scenario. Let's narrow it down to single target IPI. Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other running cyclictest on all vCPUs, w/ this patch, the avg score of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv sched yield are disabled during testing to avoid the disturb). Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-20KVM: x86: remove bogus user-triggerable WARN_ONPaolo Bonzini1-1/+0
The WARN_ON is essentially comparing a user-provided value with 0. It is trivial to trigger it just by passing garbage to KVM_SET_CLOCK. Guests can break if you do so, but the same applies to every KVM_SET_* ioctl. So, if it hurts when you do like this, just do not do it. Reported-by: [email protected] Fixes: 9446e6fce0ab ("KVM: x86: fix WARN_ON check of an unsigned less than zero") Cc: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-18KVM: x86: Code style cleanup in kvm_arch_dev_ioctl()Xiaoyao Li1-2/+2
In kvm_arch_dev_ioctl(), the brackets of case KVM_X86_GET_MCE_CAP_SUPPORTED accidently encapsulates case KVM_GET_MSR_FEATURE_INDEX_LIST and case KVM_GET_MSRS. It doesn't affect functionality but it's misleading. Remove unnecessary brackets and opportunistically add a "break" in the default path. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: X86: correct meaningless kvm_apicv_activated() checkPaolo Bonzini1-9/+16
After test_and_set_bit() for kvm->arch.apicv_inhibit_reasons, we will always get false when calling kvm_apicv_activated() because it's sure apicv_inhibit_reasons do not equal to 0. What the code wants to do, is check whether APICv was *already* active and if so skip the costly request; we can do this using cmpxchg. Reported-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16kvm: svm: Introduce GA Log tracepoint for AVICSuravee Suthikulpanit1-0/+1
GA Log tracepoint is useful when debugging AVIC performance issue as it can be used with perf to count the number of times IOMMU AVIC injects interrupts through the slow-path instead of directly inject interrupts to the target vcpu. Signed-off-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgdPaolo Bonzini1-2/+2
The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logicSean Christopherson1-2/+3
Invert and rename the kvm_cpuid() param that controls out-of-range logic to better reflect the semantics of the affected callers, i.e. callers that bypass the out-of-range logic do so because they are looking up an exact guest CPUID entry, e.g. to query the maxphyaddr. Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify that it tracks whether or not the exact requested leaf was found, as opposed to any usable leaf being found. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM x86: Extend AMD specific guest behavior to Hygon virtual CPUsSean Christopherson1-1/+1
Extend guest_cpuid_is_amd() to cover Hygon virtual CPUs and rename it accordingly. Hygon CPUs use an AMD-based core and so have the same basic behavior as AMD CPUs. Fixes: b8f4abb652146 ("x86/kvm: Add Hygon Dhyana support to KVM") Cc: Pu Wen <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: CPUID: add support for supervisor statesPaolo Bonzini1-4/+9
Current CPUID 0xd enumeration code does not support supervisor states, because KVM only supports setting IA32_XSS to zero. Change it instead to use a new variable supported_xss, to be set from the hardware_setup callback which is in charge of CPU capabilities. Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Move VMX's host_efer to common x86 codeSean Christopherson1-0/+5
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to avoid constantly re-reading the MSR. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: Drop largepages_enabled and its accessor/mutatorSean Christopherson1-4/+2
Drop largepages_enabled, kvm_largepages_enabled() and kvm_disable_largepages() now that all users are gone. Note, largepages_enabled was an x86-only flag that got left in common KVM code when KVM gained support for multiple architectures. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: VMX: Directly query Intel PT mode when refreshing PMUsSean Christopherson1-4/+3
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported() as the PMU code was the last remaining user. Opportunistically clean up the wording of a comment that referenced kvm_x86_ops->pt_supported(). No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Check for Intel PT MSR virtualization using KVM cpu capsSean Christopherson1-4/+4
Use kvm_cpu_cap_has() to check for Intel PT when processing the list of virtualized MSRs to pave the way toward removing ->pt_supported(). No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt supportSean Christopherson1-1/+1
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop ->rdtscp_supported(). Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due other usage in VMX code. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Use KVM cpu caps to track UMIP emulationSean Christopherson1-1/+1
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the bit will effectively be dropped by do_host_cpuid(). This allows checking for UMIP emulation via kvm_cpu_caps instead of a dedicated kvm_x86_ops callback. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Use KVM cpu caps to mark CR4.LA57 as not-reservedSean Christopherson1-1/+1
Add accessor(s) for KVM cpu caps and use said accessor to detect hardware support for LA57 instead of manually querying CPUID. Note, the explicit conversion to bool via '!!' in kvm_cpu_cap_has() is technically unnecessary, but it gives people a warm fuzzy feeling. No functional change intended. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Calculate the supported xcr0 mask at load timeSean Christopherson1-3/+11
Add a new global variable, supported_xcr0, to track which xcr0 bits can be exposed to the guest instead of calculating the mask on every call. The supported bits are constant for a given instance of KVM. This paves the way toward eliminating the ->mpx_supported() call in kvm_mpx_supported(), e.g. eliminates multiple retpolines in VMX's nested VM-Enter path, and eventually toward eliminating ->mpx_supported() altogether. No functional change intended. Reviewed-by: Xiaoyao Li <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-03-16KVM: x86: Shrink the usercopy region of the emulation contextSean Christopherson1-6/+6
Shuffle a few operand structs to the end of struct x86_emulate_ctxt and update the cache creation to whitelist only the region of the emulation context that is expected to be copied to/from user memory, e.g. the instruction operands, registers, and fetch/io/mem caches. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>