aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-07-09KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache localsSean Christopherson1-12/+12
Use "mc" for local variables to shorten line lengths and provide consistent names, which will be especially helpful when some of the helpers are moved to common KVM code in future patches. No functional change intended. Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86/mmu: Consolidate "page" variant of memory cache helpersSean Christopherson1-26/+11
Drop the "page" variants of the topup/free memory cache helpers, using the existence of an associated kmem_cache to select the correct alloc or free routine. No functional change intended. Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86/mmu: Track the associated kmem_cache in the MMU cachesSean Christopherson2-13/+12
Track the kmem_cache used for non-page KVM MMU memory caches instead of passing in the associated kmem_cache when filling the cache. This will allow consolidating code and other cleanups. No functional change intended. Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/vmx: Use native read/write_cr2()Thomas Gleixner1-3/+3
read/write_cr2() go throuh the paravirt XXL indirection, but nested VMX in a XEN_PV guest is not supported. Use the native variants. Signed-off-by: Thomas Gleixner <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/svm: Use uninstrumented wrmsrl() to restore GSThomas Gleixner1-1/+1
On guest exit MSR_GS_BASE contains whatever the guest wrote to it and the first action after returning from the ASM code is to set it to the host kernel value. This uses wrmsrl() which is interesting at least. wrmsrl() is either using native_write_msr() or the paravirt variant. The XEN_PV code is uninteresting as nested SVM in a XEN_PV guest does not work. But native_write_msr() can be placed out of line by the compiler especially when paravirtualization is enabled in the kernel configuration. The function is marked notrace, but still can be probed if CONFIG_KPROBE_EVENTS_ON_NOTRACE is enabled. That would be a fatal problem as kprobe events use per-CPU variables which are GS based and would be accessed with the guest GS. Depending on the GS value this would either explode in colorful ways or lead to completely undebugable data corruption. Aside of that native_write_msr() contains a tracepoint which objtool complains about as it is invoked from the noinstr section. As this cannot run inside a XEN_PV guest there is no point in using wrmsrl(). Use native_wrmsrl() instead which is just a plain native WRMSR without tracing or anything else attached. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Juergen Gross <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/svm: Move guest enter/exit into .noinstr.textThomas Gleixner2-44/+56
Move the functions which are inside the RCU off region into the non-instrumentable text section. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/vmx: Move guest enter/exit into .noinstr.textThomas Gleixner6-53/+81
Move the functions which are inside the RCU off region into the non-instrumentable text section. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/svm: Add hardirq tracing on guest enter/exitThomas Gleixner1-5/+22
Entering guest mode is more or less the same as returning to user space. From an instrumentation point of view both leave kernel mode and the transition to guest or user mode reenables interrupts on the host. In user mode an interrupt is served directly and in guest mode it causes a VM exit which then handles or reinjects the interrupt. The transition from guest mode or user mode to kernel mode disables interrupts, which needs to be recorded in instrumentation to set the correct state again. This is important for e.g. latency analysis because otherwise the execution time in guest or user mode would be wrongly accounted as interrupt disabled and could trigger false positives. Add hardirq tracing to guest enter/exit functions in the same way as it is done in the user mode enter/exit code, respecting the RCU requirements. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm/vmx: Add hardirq tracing to guest enter/exitThomas Gleixner1-2/+25
Entering guest mode is more or less the same as returning to user space. From an instrumentation point of view both leave kernel mode and the transition to guest or user mode reenables interrupts on the host. In user mode an interrupt is served directly and in guest mode it causes a VM exit which then handles or reinjects the interrupt. The transition from guest mode or user mode to kernel mode disables interrupts, which needs to be recorded in instrumentation to set the correct state again. This is important for e.g. latency analysis because otherwise the execution time in guest or user mode would be wrongly accounted as interrupt disabled and could trigger false positives. Add hardirq tracing to guest enter/exit functions in the same way as it is done in the user mode enter/exit code, respecting the RCU requirements. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09x86/kvm: Move context tracking where it belongsThomas Gleixner3-2/+26
Context tracking for KVM happens way too early in the vcpu_run() code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() cannot use RCU and should also be not instrumented. The current way of doing this covers way too much code. Move it closer to the actual vmenter/exit code. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexandre Chartre <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the hostMaxim Levitsky4-20/+24
To avoid complex and in some cases incorrect logic in kvm_spec_ctrl_test_value, just try the guest's given value on the host processor instead, and if it doesn't #GP, allow the guest to set it. One such case is when host CPU supports STIBP mitigation but doesn't support IBRS (as is the case with some Zen2 AMD cpus), and in this case we were giving guest #GP when it tried to use STIBP The reason why can can do the host test is that IA32_SPEC_CTRL msr is passed to the guest, after the guest sets it to a non zero value for the first time (due to performance reasons), and as as result of this, it is pointless to emulate #GP condition on this first access, in a different way than what the host CPU does. This is based on a patch from Sean Christopherson, who suggested this idea. Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL") Cc: [email protected] Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM/x86: pmu: Fix #GP condition check for RDPMC emulationLike Xu1-0/+5
In guest protected mode, if the current privilege level is not 0 and the PCE flag in the CR4 register is cleared, we will inject a #GP for RDPMC usage. Signed-off-by: Like Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: take as_id into account when checking PGDVitaly Kuznetsov3-1/+10
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after enabling paging from SMM. The crash is triggered from mmu_check_root() and is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0 while vCPU may be in a different context (address space). Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: Move kvm_x86_ops.vcpu_after_set_cpuid() into ↵Xiaoyao Li1-2/+2
kvm_vcpu_after_set_cpuid() kvm_x86_ops.vcpu_after_set_cpuid() is used to update vmx/svm specific vcpu settings based on updated CPUID settings. So it's supposed to be called after CPUIDs are updated, i.e., kvm_update_cpuid_runtime(). Currently, kvm_update_cpuid_runtime() only updates CPUID bits of OSXSAVE, APIC, OSPKE, MWAIT, KVM_FEATURE_PV_UNHALT and CPUID(0xD,0).ebx and CPUID(0xD, 1).ebx. None of them is consumed by vmx/svm's update_vcpu_after_set_cpuid(). So there is no dependency between them. Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid() is obviously more reasonable. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid()Xiaoyao Li5-8/+9
The name of callback cpuid_update() is misleading that it's not about updating CPUID settings of vcpu but updating the configurations of vcpu based on the CPUIDs. So rename it to vcpu_after_set_cpuid(). Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: Rename kvm_update_cpuid() to kvm_vcpu_after_set_cpuid()Xiaoyao Li1-3/+3
Now there is no updating CPUID bits behavior in kvm_update_cpuid(), rename it to kvm_vcpu_after_set_cpuid(). Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()Xiaoyao Li4-24/+34
Beside called in kvm_vcpu_ioctl_set_cpuid*(), kvm_update_cpuid() is also called 5 places else in x86.c and 1 place else in lapic.c. All those 6 places only need the part of updating guest CPUIDs (OSXSAVE, OSPKE, APIC, KVM_FEATURE_PV_UNHALT, ...) based on the runtime vcpu state, so extract them as a separate kvm_update_cpuid_runtime(). Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: x86: Introduce kvm_check_cpuid()Xiaoyao Li2-21/+36
Use kvm_check_cpuid() to validate if userspace provides legal cpuid settings and call it before KVM take any action to update CPUID or update vcpu states based on given CPUID settings. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-09KVM: X86: Move kvm_apic_set_version() to kvm_update_cpuid()Xiaoyao Li1-2/+2
There is no dependencies between kvm_apic_set_version() and kvm_update_cpuid() because kvm_apic_set_version() queries X2APIC CPUID bit, which is not touched/changed by kvm_update_cpuid(). Obviously, kvm_apic_set_version() belongs to the category of updating vcpu model. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version()Xiaoyao Li1-3/+1
Only code cleanup and no functional change. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: X86: Go on updating other CPUID leaves when leaf 1 is absentXiaoyao Li1-8/+7
As handling of bits out of leaf 1 added over time, kvm_update_cpuid() should not return directly if leaf 1 is absent, but should go on updateing other CPUID leaves. Keep the update of apic->lapic_timer.timer_mode_mask in a separate wrapper, to minimize churn for code since it will be moved out of this function in a future patch. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* failsXiaoyao Li2-0/+8
Current implementation keeps userspace input of CPUID configuration and cpuid->nent even if kvm_update_cpuid() fails. Reset vcpu->arch.cpuid_nent to 0 for the case of failure as a simple fix. Besides, update the doc to explicitly state that if IOCTL SET_CPUID* fail KVM gives no gurantee that previous valid CPUID configuration is kept. Signed-off-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08kvm: x86: limit the maximum number of vPMU fixed counters to 3Like Xu2-1/+3
Some new Intel platforms (such as TGL) already have the fourth fixed counter TOPDOWN.SLOTS, but it has not been fully enabled on KVM and the host. Therefore, we limit edx.split.num_counters_fixed to 3, so that it does not break the kvm-unit-tests PMU test case and bad-handled userspace. Signed-off-by: Like Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested ↵Krish Sadhukhan4-4/+31
guests According to section "Canonicalization and Consistency Checks" in APM vol. 2 the following guest state is illegal: "Any MBZ bit of CR3 is set." "Any MBZ bit of CR4 is set." Suggeted-by: Paolo Bonzini <[email protected]> Signed-off-by: Krish Sadhukhan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86: Make CR4.VMXE reserved for the guestPaolo Bonzini1-0/+2
CR4.VMXE is reserved unless the VMX CPUID bit is set. On Intel, it is also tested by vmx_set_cr4, but AMD relies on kvm_valid_cr4, so fix it. Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86: Create mask for guest CR4 reserved bits in kvm_update_cpuid()Krish Sadhukhan4-22/+25
Instead of creating the mask for guest CR4 reserved bits in kvm_valid_cr4(), do it in kvm_update_cpuid() so that it can be reused instead of creating it each time kvm_valid_cr4() is called. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Krish Sadhukhan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changesJim Mattson1-4/+5
According to the SDM, when PAE paging would be in use following a MOV-to-CR0 that modifies any of CR0.CD, CR0.NW, or CR0.PG, then the PDPTEs are loaded from the address in CR3. Previously, kvm only loaded the PDPTEs when PAE paging would be in use following a MOV-to-CR0 that modified CR0.PG. Signed-off-by: Jim Mattson <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Reviewed-by: Peter Shier <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08xen: Mark "xen_nopvspin" parameter obsoleteZhenzhong Duan2-5/+6
Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin" as we use qspinlock now. Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08x86/kvm: Add "nopvspin" parameter to disable PV spinlocksZhenzhong Duan4-7/+45
There are cases where a guest tries to switch spinlocks to bare metal behavior (e.g. by setting "xen_nopvspin" on XEN platform and "hv_nopvspin" on HYPER_V). That feature is missed on KVM, add a new parameter "nopvspin" to disable PV spinlocks for KVM guest. The new 'nopvspin' parameter will also replace Xen and Hyper-V specific parameters in future patches. Define variable nopvsin as global because it will be used in future patches as above. Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08x86/kvm: Change print code to use pr_*() formatZhenzhong Duan1-9/+13
pr_*() is preferred than printk(KERN_* ...), after change all the print in arch/x86/kernel/kvm.c will have "kvm-guest: xxx" style. No functional change. Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get ↵Zhenzhong Duan1-9/+3
initialized" This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff. Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching early") adds jump_label_init() call in setup_arch() to make static keys initialized early, so we could use the original simpler code again. The similar change for XEN is in commit 090d54bcbc54 ("Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"") Signed-off-by: Zhenzhong Duan <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Rename page_header() to to_shadow_page()Sean Christopherson3-15/+15
Rename KVM's accessor for retrieving a 'struct kvm_mmu_page' from the associated host physical address to better convey what the function is doing. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Add sptep_to_sp() helper to wrap shadow page lookupSean Christopherson4-20/+23
Introduce sptep_to_sp() to reduce the boilerplate code needed to get the shadow page associated with a spte pointer, and to improve readability as it's not immediately obvious that "page_header" is a KVM-specific accessor for retrieving a shadow page. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-onlySean Christopherson2-44/+50
Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should be poking into the gory details of shadow pages. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Add MMU-internal headerSean Christopherson4-5/+13
Add mmu/mmu_internal.h to hold declarations and definitions that need to be shared between various mmu/ files, but should not be used by anything outside of the MMU. Begin populating mmu_internal.h with declarations of the helpers used by page_track.c. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Move kvm_mmu_available_pages() into mmu.cSean Christopherson2-9/+9
Move kvm_mmu_available_pages() from mmu.h to mmu.c, it has a single caller and has no business being exposed via mmu.h. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Move mmu_audit.c and mmutrace.h into the mmu/ sub-directorySean Christopherson2-1/+1
Move mmu_audit.c and mmutrace.h under mmu/ where they belong. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() errorSean Christopherson2-2/+4
Propagate any error returned by make_mmu_pages_available() out to userspace instead of resuming the guest if the error occurs while handling a page fault. Now that zapping the oldest MMU pages skips active roots, i.e. fails if and only if there are no zappable pages, there is no chance for a false positive, i.e. no chance of returning a spurious error to userspace. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Batch zap MMU pages when shrinking the slabSean Christopherson1-16/+1
Use the recently introduced kvm_mmu_zap_oldest_mmu_pages() to batch zap MMU pages when shrinking a slab. This fixes a long standing issue where KVM's shrinker implementation is completely ineffective due to zapping only a single page. E.g. without batch zapping, forcing a scan via drop_caches basically has no impact on a VM with ~2k shadow pages. With batch zapping, the number of shadow pages can be reduced to a few hundred pages in one or two runs of drop_caches. Note, if the default batch size (currently 128) is problematic, e.g. zapping 128 pages holds mmu_lock for too long, KVM can bound the batch size by setting @batch in mmu_shrinker. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Batch zap MMU pages when recycling oldest pagesSean Christopherson1-13/+39
Collect MMU pages for zapping in a loop when making MMU pages available, and skip over active roots when doing so as zapping an active root can never immediately free up a page. Batching the zapping avoids multiple remote TLB flushes and remedies the issue where the loop would bail early if an active root was encountered. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Don't put invalid SPs back on the list of active pagesSean Christopherson1-8/+20
Delete a shadow page from the invalidation list instead of throwing it back on the list of active pages when it's a root shadow page with active users. Invalid active root pages will be explicitly freed by mmu_free_root_page() when the root_count hits zero, i.e. they don't need to be put on the active list to avoid leakage. Use sp->role.invalid to detect that a shadow page has already been zapped, i.e. is not on a list. WARN if an invalid page is encountered when zapping pages, as it should now be impossible. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Optimize MMU page cache lookup for fully direct MMUsSean Christopherson1-2/+7
Skip the unsync checks and the write flooding clearing for fully direct MMUs, which are guaranteed to not have unsync'd or indirect pages (write flooding detection only applies to indirect pages). For TDP, this avoids unnecessary memory reads and writes, and for the write flooding count will also avoid dirtying a cache line (unsync_child_bitmap itself consumes a cache line, i.e. write_flooding_count is guaranteed to be in a different cache line than parent_ptes). Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-By: Jon Cargille <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86/mmu: Avoid multiple hash lookups in kvm_get_mmu_page()Sean Christopherson1-8/+9
Refactor for_each_valid_sp() to take the list of shadow pages instead of retrieving it from a gfn to avoid doing the gfn->list hash and lookup multiple times during kvm_get_mmu_page(). Cc: Peter Feiner <[email protected]> Cc: Jon Cargille <[email protected]> Cc: Jim Mattson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-By: Jon Cargille <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: x86: Use VMCALL and VMMCALL mnemonics in kvm_para.hUros Bizjak1-1/+1
Current minimum required version of binutils is 2.23, which supports VMCALL and VMMCALL instruction mnemonics. Replace the byte-wise specification of VMCALL and VMMCALL with these proper mnemonics. Signed-off-by: Uros Bizjak <[email protected]> CC: Paolo Bonzini <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: SVM: Rename svm_nested_virtualize_tpr() to nested_svm_virtualize_tpr()Joerg Roedel2-4/+4
Match the naming with other nested svm functions. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: SVM: Add svm_ prefix to set/clr/is_intercept()Joerg Roedel3-48/+48
Make clear the symbols belong to the SVM code when they are built-in. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: SVM: Add vmcb_ prefix to mark_*() functionsJoerg Roedel5-31/+31
Make it more clear what data structure these functions operate on. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: SVM: Rename struct nested_state to svm_nested_stateJoerg Roedel1-2/+2
Renaming is only needed in the svm.h header file. No functional changes. Signed-off-by: Joerg Roedel <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08KVM: nVMX: Wrap VM-Fail valid path in generic VM-Fail helperSean Christopherson1-41/+36
Add nested_vmx_fail() to wrap VM-Fail paths that _may_ result in VM-Fail Valid to make it clear at the call sites that the Valid flavor isn't guaranteed. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-07-08kvm: x86: Set last_vmentry_cpu in vcpu_enter_guestJim Mattson3-2/+1
Since this field is now in kvm_vcpu_arch, clean things up a little by setting it in vendor-agnostic code: vcpu_enter_guest. Note that it must be set after the call to kvm_x86_ops.run(), since it can't be updated before pre_sev_run(). Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Reviewed-by: Peter Shier <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>