aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2020-09-28KVM: x86/mmu: Track write/user faults using boolsSean Christopherson2-7/+7
Use bools to track write and user faults throughout the page fault paths and down into mmu_set_spte(). The actual usage is purely boolean, but that's not obvious without digging into all paths as the current code uses a mix of bools (TDP and try_async_pf) and ints (shadow paging and mmu_set_spte()). No true functional change intended (although the pgprintk() will now print 0/1 instead of 0/PFERR_WRITE_MASK). Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Hoist ITLB multi-hit workaround check up a levelSean Christopherson2-3/+4
Move the "ITLB multi-hit workaround enabled" check into the callers of disallowed_hugepage_adjust() to make it more obvious that the helper is specific to the workaround, and to be consistent with the accounting, i.e. account_huge_nx_page() is called if and only if the workaround is enabled. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Rename 'hlevel' to 'level' in FNAME(fetch)Sean Christopherson1-5/+5
Rename 'hlevel', which presumably stands for 'host level', to simply 'level' in FNAME(fetch). The variable hasn't tracked the host level for quite some time. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Account NX huge page disallowed iff huge page was requestedSean Christopherson2-2/+3
Condition the accounting of a disallowed huge NX page on the original requested level of the page being greater than the current iterator level. This does two things: accounts the page if and only if a huge page was actually disallowed, and accounts the shadow page if and only if it was the level at which the huge page was disallowed. For the latter case, the previous logic would account all shadow pages used to create the translation for the forced small page, e.g. even PML4, which can't be a huge page on current hardware, would be accounted as having been a disallowed huge page when using 5-level EPT. The overzealous accounting is purely a performance issue, i.e. the recovery thread will spuriously zap shadow pages, but otherwise the bad behavior is harmless. Cc: Junaid Shahid <[email protected]> Fixes: b8e8c8303ff28 ("kvm: mmu: ITLB_MULTIHIT mitigation") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Capture requested page level before NX huge page workaroundSean Christopherson2-12/+18
Apply the "huge page disallowed" adjustment of the max level only after capturing the original requested level. The requested level will be used in a future patch to skip adding pages to the list of disallowed huge pages if a huge page wasn't possible anyways, e.g. if the page isn't mapped as a huge page in the host. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Move "huge page disallowed" calculation into mapping helpersSean Christopherson2-20/+26
Calculate huge_page_disallowed in __direct_map() and FNAME(fetch) in preparation for reworking the calculation so that it preserves the requested map level and eventually to avoid flagging a shadow page as being disallowed for being used as a large/huge page when it couldn't have been huge in the first place, e.g. because the backing page in the host is not large. Pass the error code into the helpers and use it to recalcuate exec and write_fault instead adding yet more booleans to the parameters. Opportunistically use huge_page_disallowed instead of lpage_disallowed to match the nomenclature used within the mapping helpers (though even they have existing inconsistencies). No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Refactor the zap loop for recovering NX lpagesSean Christopherson1-4/+6
Refactor the zap loop in kvm_recover_nx_lpages() to be a for loop that iterates on to_zap and drop the !to_zap check that leads to the in-loop calling of kvm_mmu_commit_zap_page(). The in-loop commit when to_zap hits zero is superfluous now that there's an unconditional commit after the loop to handle the case where lpage_disallowed_mmu_pages is emptied. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Commit zap of remaining invalid pages when recovering lpagesSean Christopherson1-0/+1
Call kvm_mmu_commit_zap_page() after exiting the "prepare zap" loop in kvm_recover_nx_lpages() to finish zapping pages in the unlikely event that the loop exited due to lpage_disallowed_mmu_pages being empty. Because the recovery thread drops mmu_lock() when rescheduling, it's possible that lpage_disallowed_mmu_pages could be emptied by a different thread without to_zap reaching zero despite to_zap being derived from the number of disallowed lpages. Fixes: 1aa9b9572b105 ("kvm: x86: mmu: Recovery of shattered NX large pages") Cc: Junaid Shahid <[email protected]> Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Rename ops.h to vmx_ops.hSean Christopherson3-2/+1
Rename ops.h to vmx_ops.h to allow adding a tdx_ops.h in the future without causing massive confusion. Trust Domain Extensions (TDX) is built on VMX, but KVM cannot directly access the VMCS(es) for a TDX guest, thus TDX will need its own "ops" implementation for wrapping the low level operations. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Extract posted interrupt support to separate filesXiaoyao Li5-405/+439
Extract the posted interrupt code so that it can be reused for Trust Domain Extensions (TDX), which requires posted interrupts and can use KVM VMX's implementation almost verbatim. TDX is different enough from raw VMX that it is highly desirable to implement the guts of TDX in a separate file, i.e. reusing posted interrupt code by shoving TDX support into vmx.c would be a mess. Signed-off-by: Xiaoyao Li <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Bail early from final #PF handling on spurious faultsSean Christopherson2-1/+19
Detect spurious page faults, e.g. page faults that occur when multiple vCPUs simultaneously access a not-present page, and skip the SPTE write, prefetch, and stats update for spurious faults. Note, the performance benefits of skipping the write and prefetch are likely negligible, and the false positive stats adjustment is probably lost in the noise. The primary motivation is to play nice with TDX's SEPT in the long term. SEAMCALLs (to program SEPT entries) are quite costly, e.g. thousands of cycles, and a spurious SEPT update will result in a SEAMCALL error (which KVM will ideally treat as fatal). Reported-by: Kai Huang <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Return unique RET_PF_* values if the fault was fixedSean Christopherson2-30/+29
Introduce RET_PF_FIXED and RET_PF_SPURIOUS to provide unique return values instead of overloading RET_PF_RETRY. In the short term, the unique values add clarity to the code and RET_PF_SPURIOUS will be used by set_spte() to avoid unnecessary work for spurious faults. In the long term, TDX will use RET_PF_FIXED to deterministically map memory during pre-boot. The page fault flow may bail early for benign reasons, e.g. if the mmu_notifier fires for an unrelated address. With only RET_PF_RETRY, it's impossible for the caller to distinguish between "cool, page is mapped" and "darn, need to try again", and thus cannot handle benign cases like the mmu_notifier retry. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Invert RET_PF_* check when falling through to emulationSean Christopherson1-2/+2
Explicitly check for RET_PF_EMULATE instead of implicitly doing the same by checking for !RET_PF_RETRY (RET_PF_INVALID is handled earlier). This will adding new RET_PF_ types in future patches without breaking the emulation path. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Return -EIO if page fault returns RET_PF_INVALIDSean Christopherson1-1/+2
Exit to userspace with an error if the MMU is buggy and returns RET_PF_INVALID when servicing a page fault. This will allow a future patch to invert the emulation path, i.e. emulate only on RET_PF_EMULATE instead of emulating on anything but RET_PF_RETRY. This technically means that KVM will exit to userspace instead of emulating on RET_PF_INVALID, but practically speaking it's a nop as the MMU never returns RET_PF_INVALID. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/MMU: Recursively zap nested TDP SPs when zapping last/only parentBen Gardon2-8/+24
Recursively zap all to-be-orphaned children, unsynced or otherwise, when zapping a shadow page for a nested TDP MMU. KVM currently only zaps the unsynced child pages, but not the synced ones. This can create problems over time when running many nested guests because it leaves unlinked pages which will not be freed until the page quota is hit. With the default page quota of 20 shadow pages per 1000 guest pages, this looks like a memory leak and can degrade MMU performance. In a recent benchmark, substantial performance degradation was observed: An L1 guest was booted with 64G memory. 2G nested Windows guests were booted, 10 at a time for 20 iterations. (200 total boots) Windows was used in this benchmark because they touch all of their memory on startup. By the end of the benchmark, the nested guests were taking ~10% longer to boot. With this patch there is no degradation in boot time. Without this patch the benchmark ends with hundreds of thousands of stale EPT02 pages cluttering up rmaps and the page hash map. As a result, VM shutdown is also much slower: deleting memslot 0 was observed to take over a minute. With this patch it takes just a few miliseconds. Cc: Peter Shier <[email protected]> Signed-off-by: Ben Gardon <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Move flush logic from mmu_page_zap_pte() to FNAME(invlpg)Sean Christopherson2-9/+8
Move the logic that controls whether or not FNAME(invlpg) needs to flush fully into FNAME(invlpg) so that mmu_page_zap_pte() doesn't return a value. This allows a future patch to redefine the return semantics for mmu_page_zap_pte() so that it can recursively zap orphaned child shadow pages for nested TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: hyper-v: disallow configuring SynIC timers with no SynICVitaly Kuznetsov1-0/+11
Hyper-V Synthetic timers require SynIC but we don't seem to check that upon HV_X64_MSR_STIMER[X]_CONFIG/HV_X64_MSR_STIMER0_COUNT writes. Make the behavior match synic_set_msr(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86/mmu: Stash 'kvm' in a local variable in kvm_mmu_free_roots()Sean Christopherson1-7/+7
To make kvm_mmu_free_roots() a bit more readable, capture 'kvm' in a local variable instead of doing vcpu->kvm over and over (and over). No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Add a helper and macros to reduce boilerplate for sec exec ctlsSean Christopherson1-87/+64
Add a helper function and several wrapping macros to consolidate the copy-paste code in vmx_compute_secondary_exec_control() for adjusting controls that are dependent on guest CPUID bits. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE"Sean Christopherson4-9/+9
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in preparation for consolidating the logic for adjusting secondary exec controls based on the guest CPUID model. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Unconditionally clear CPUID.INVPCID if !CPUID.PCIDSean Christopherson1-5/+11
If PCID is not exposed to the guest, clear INVPCID in the guest's CPUID even if the VMCS INVPCID enable is not supported. This will allow consolidating the secondary execution control adjustment code without having to special case INVPCID. Technically, this fixes a bug where !CPUID.PCID && CPUID.INVCPID would result in unexpected guest behavior (#UD instead of #GP/#PF), but KVM doesn't support exposing INVPCID if it's not supported in the VMCS, i.e. such a config is broken/bogus no matter what. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Rename vmx_*_supported() helpers to cpu_has_vmx_*()Sean Christopherson2-11/+11
Rename helpers for a few controls to conform to the more prevelant style of cpu_has_vmx_<feature>(). Consistent names will allow adding macros to consolidate the boilerplate code for adjusting secondary execution controls. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28cpuidle-haltpoll: fix error comments in arch_haltpoll_disableLi Qiang1-1/+1
The 'arch_haltpoll_disable' is used to disable guest halt poll. Correct the comments. Fixes: a1c4423b02b21 ("cpuidle-haltpoll: disable host side polling when kvm virtualized") Signed-off-by: Li Qiang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Use "illegal GPA" helper for PT/RTIT output base checkSean Christopherson1-1/+1
Use kvm_vcpu_is_illegal_gpa() to check for a legal GPA when validating a PT output base instead of open coding a clever, but difficult to read, variant. Code readability is far more important than shaving a few uops in a slow path. No functional change intended. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: Move illegal GPA helper out of the MMU codeSean Christopherson4-7/+7
Rename kvm_mmu_is_illegal_gpa() to kvm_vcpu_is_illegal_gpa() and move it to cpuid.h so that's it's colocated with cpuid_maxphyaddr(). The helper is not MMU specific and will gain a user that is completely unrelated to the MMU in a future patch. No functional change intended. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Replace MSR_IA32_RTIT_OUTPUT_BASE_MASK with helper functionSean Christopherson1-4/+7
Replace the subtly not-a-constant MSR_IA32_RTIT_OUTPUT_BASE_MASK with a proper helper function to check whether or not the specified base is valid. Blindly referencing the local 'vcpu' is especially nasty. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: Unexport cpuid_query_maxphyaddr()Sean Christopherson1-1/+0
Stop exporting cpuid_query_maxphyaddr() now that it's not being abused by VMX. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Use precomputed MAXPHYADDR for RTIT base MSR checkSean Christopherson1-1/+1
Use cpuid_maxphyaddr() instead of cpuid_query_maxphyaddr() for the RTIT base MSR check. There is no reason to recompute MAXPHYADDR as the precomputed version is synchronized with CPUID updates, and MSR_IA32_RTIT_OUTPUT_BASE is not written between stuffing CPUID and refreshing vcpu->arch.maxphyaddr. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Do not perform emulation for INVD interceptTom Lendacky1-1/+2
The INVD instruction is emulated as a NOP, just skip the instruction instead. Signed-off-by: Tom Lendacky <[email protected]> Message-Id: <addd41be2fbf50f5f4059e990a2a0cff182d2136.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SEV: shorten comments around sev_clflush_pagesPaolo Bonzini1-12/+7
Very similar content is present in four comments in sev.c. Unfortunately there are small differences that make it harder to place the comment in sev_clflush_pages itself, but at least we can make it more concise. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Mark SEV launch secret pages as dirty.Cfir Cohen1-1/+14
The LAUNCH_SECRET command performs encryption of the launch secret memory contents. Mark pinned pages as dirty, before unpinning them. This matches the logic in sev_launch_update_data(). Signed-off-by: Cfir Cohen <[email protected]> Message-Id: <[email protected]> Reviewed-by: Brijesh Singh <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: Add VM-Enter failed tracepoints for super early checksSean Christopherson1-5/+5
Add tracepoints for the early consistency checks in nested_vmx_run(). The "VMLAUNCH vs. VMRESUME" check in particular is useful to trace, as there is no architectural way to check VMCS.LAUNCH_STATE, and subtle bugs such as VMCLEAR on the wrong HPA can lead to confusing errors in the L1 VMM. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nSVM: CR3 MBZ bits are only 63:52Krish Sadhukhan2-2/+2
Commit 761e4169346553c180bbd4a383aedd72f905bc9a created a wrong mask for the CR3 MBZ bits. According to APM vol 2, only the upper 12 bits are MBZ. Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests", 2020-07-08) Signed-off-by: Krish Sadhukhan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: emulating RDPID failure shall return #UD rather than #GPRobert Hoo1-1/+1
Per Intel's SDM, RDPID takes a #UD if it is unsupported, which is more or less what KVM is emulating when MSR_TSC_AUX is not available. In fact, there are no scenarios in which RDPID is supposed to #GP. Fixes: fb6d4d340e ("KVM: x86: emulate RDPID") Signed-off-by: Robert Hoo <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PISean Christopherson3-0/+16
On successful nested VM-Enter, check for pending interrupts and convert the highest priority interrupt to a pending posted interrupt if it matches L2's notification vector. If the vCPU receives a notification interrupt before nested VM-Enter (assuming L1 disables IRQs before doing VM-Enter), the pending interrupt (for L1) should be recognized and processed as a posted interrupt when interrupts become unblocked after VM-Enter to L2. This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is trying to inject an interrupt into L2 by setting the appropriate bit in L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's method of manually moving the vector from PIR->vIRR/RVI). KVM will observe the IPI while the vCPU is in L1 context and so won't immediately morph it to a posted interrupt for L2. The pending interrupt will be seen by vmx_check_nested_events(), cause KVM to force an immediate exit after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. After handling the VM-Exit, L1 will see that L2 has a pending interrupt in PIR, send another IPI, and repeat until L2 is killed. Note, posted interrupts require virtual interrupt deliveriy, and virtual interrupt delivery requires exit-on-interrupt, ergo interrupts will be unconditionally unmasked on VM-Enter if posted interrupts are enabled. Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Cc: [email protected] Cc: Liran Alon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: MIPS: clean up redundant kvm_run parameters in assemblyTianjia Zhang5-18/+14
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <[email protected]> Reviewed-by: Huacai Chen <[email protected]> Tested-by: Jiaxun Yang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Add tracepoint for cr_interceptionHaiwei Li1-0/+2
Add trace_kvm_cr_write and trace_kvm_cr_read for svm. Signed-off-by: Haiwei Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Analyze is_guest_mode() in svm_vcpu_run()Wanpeng Li1-5/+6
Analyze is_guest_mode() in svm_vcpu_run() instead of svm_exit_handlers_fastpath() in conformity with VMX version. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Invoke NMI handler via indirect call instead of INTnSean Christopherson1-15/+15
Rework NMI VM-Exit handling to invoke the kernel handler by function call instead of INTn. INTn microcode is relatively expensive, and aligning the IRQ and NMI handling will make it easier to update KVM should some newfangled method for invoking the handlers come along. Suggested-by: Andi Kleen <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Move IRQ invocation to assembly subroutineSean Christopherson2-30/+37
Move the asm blob that invokes the appropriate IRQ handler after VM-Exit into a proper subroutine. Unconditionally create a stack frame in the subroutine so that, as objtool sees things, the function has standard stack behavior. The dynamic stack adjustment makes using unwind hints problematic. Suggested-by: Josh Poimboeuf <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: Add kvm_x86_ops hook to short circuit emulationSean Christopherson5-35/+36
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_page()Haiwei Li1-4/+2
Use __GFP_ZERO while alloc_page(). Signed-off-by: Haiwei Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in ↵Krish Sadhukhan3-8/+19
vmcs02 if vmcs12 doesn't set it Currently, prepare_vmcs02_early() does not check if the "unrestricted guest" VM-execution control in vmcs12 is turned off and leaves the corresponding bit on in vmcs02. Due to this setting, vmentry checks which are supposed to render the nested guest state as invalid when this VM-execution control is not set, are passing in hardware. This patch turns off the "unrestricted guest" VM-execution control in vmcs02 if vmcs12 has turned it off. Suggested-by: Jim Mattson <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Krish Sadhukhan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: fix MSR_IA32_TSC read for nested migrationMaxim Levitsky1-2/+14
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Enable INVPCID feature on AMDBabu Moger2-0/+53
The following intercept bit has been added to support VMEXIT for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit has been added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID Enable the interceptions when the the guest is running with shadow page table enabled and handle the tlbflush based on the invpcid instruction type. For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: X86: Move handling of INVPCID types to x86Babu Moger3-67/+80
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.cBabu Moger5-36/+37
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_interceptBabu Moger2-42/+17
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Add new intercept word in vmcb_control_areaBabu Moger3-6/+17
The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Modify 64 bit intercept field to two 32 bit vectorsBabu Moger5-45/+41
Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>