aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm
AgeCommit message (Collapse)AuthorFilesLines
2022-08-10KVM: SVM: Disable SEV-ES support if MMIO caching is disableSean Christopherson2-3/+16
Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs generating #NPF(RSVD), which are reflected by the CPU into the guest as a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access to the guest instruction stream or register state and so can't directly emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT), and those flavors of #NPF cause automatic VM-Exits, not #VC. Adjust KVM's MMIO masks to account for the C-bit location prior to doing SEV(-ES) setup, and document that dependency between adjusting the MMIO SPTE mask and SEV(-ES) setup. Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)") Reported-by: Michael Roth <michael.roth@amd.com> Tested-by: Michael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini6-169/+467
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-28KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klogJarkko Sakkinen1-0/+3
As Virtual Machine Save Area (VMSA) is essential in troubleshooting attestation, dump it to the klog with the KERN_DEBUG level of priority. Cc: Jarkko Sakkinen <jarkko@kernel.org> Suggested-by: Harald Hoyer <harald@profian.com> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com> Message-Id: <20220728050919.24113-1-jarkko@profian.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: SVM: Do not virtualize MSR accesses for APIC LVTT registerSuravee Suthikulpanit1-1/+8
AMD does not support APIC TSC-deadline timer mode. AVIC hardware will generate GP fault when guest kernel writes 1 to bits [18] of the APIC LVTT register (offset 0x32) to set the timer mode. (Note: bit 18 is reserved on AMD system). Therefore, always intercept and let KVM emulate the MSR accesses. Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220725033428.3699-1-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bitsSean Christopherson1-1/+2
Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: SVM: Fix x2APIC MSRs interceptionSuravee Suthikulpanit1-25/+27
The index for svm_direct_access_msrs was incorrectly initialized with the APIC MMIO register macros. Fix by introducing a macro for calculating x2APIC MSRs. Fixes: 5c127c85472c ("KVM: SVM: Adding support for configuring x2APIC MSRs interception") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220718083833.222117-1-suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: nSVM: Pull CS.Base from actual VMCB12 for soft int/ex re-injectionMaciej S. Szmigiero1-4/+5
enter_svm_guest_mode() first calls nested_vmcb02_prepare_control() to copy control fields from VMCB12 to the current VMCB, then nested_vmcb02_prepare_save() to perform a similar copy of the save area. This means that nested_vmcb02_prepare_control() still runs with the previous save area values in the current VMCB so it shouldn't take the L2 guest CS.Base from this area. Explicitly pull CS.Base from the actual VMCB12 instead in enter_svm_guest_mode(). Granted, having a non-zero CS.Base is a very rare thing (and even impossible in 64-bit mode), having it change between nested VMRUNs is probably even rarer, but if it happens it would create a really subtle bug so it's better to fix it upfront. Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <4caa0f67589ae3c22c311ee0e6139496902f2edc.1658159083.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0Sean Christopherson1-6/+0
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call for the SVM implementation. EPT stores the memtype information in the lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8 without an explicit cast to a larger type; there's no need to return a full u64. Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see commit bf07be36cd88 ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220714153707.3239119-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Add dedicated helper to get CPUID entry with significant indexSean Christopherson1-1/+1
Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM queries for CPUID leaves whose index _may_ be significant, and drop the index param from the existing kvm_find_cpuid_entry(). Add a WARN in the inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID entry whose index is significant without explicitly providing an index. Using an explicit magic number and letting callers omit the index avoids confusion by eliminating the myriad cases where KVM specifies '0' as a dummy value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: SVM: fix task switch emulation on INTn instruction.Maxim Levitsky1-0/+1
Recently KVM's SVM code switched to re-injecting software interrupt events, if something prevented their delivery. Task switch due to task gate in the IDT, however is an exception to this rule, because in this case, INTn instruction causes a task switch intercept and its emulation completes the INTn emulation as well. Add a missing case to task_switch_interception for that. This fixes 32 bit kvm unit test taskswitch2. Fixes: 7e5b5ef8dca322 ("KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <20220714124453.188655-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-27x86: Add magic AMD return-thunkPeter Zijlstra1-0/+18
Note: needs to be in a section distinct from Retpolines such that the Retpoline RET substitution cannot possibly use immediate jumps. ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a little tricky but works due to the fact that zen_untrain_ret() doesn't have any stack ops and as such will emit a single ORC entry at the start (+0x3f). Meanwhile, unwinding an IP, including the __x86_return_thunk() one (+0x40) will search for the largest ORC entry smaller or equal to the IP, these will find the one ORC entry (+0x3f) and all works. [ Alexandre: SVM part. ] [ bp: Build fix, massages. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-24KVM: x86: nSVM: optimize svm_set_x2apic_msr_interceptionMaxim Levitsky3-0/+17
- Avoid toggling the x2apic msr interception if it is already up to date. - Avoid touching L0 msr bitmap when AVIC is inhibited on entry to the guest mode, because in this case the guest usually uses its own msr bitmap. Later on VM exit, the 1st optimization will allow KVM to skip touching the L0 msr bitmap as well. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-18-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Add AVIC doorbell tracepointSuravee Suthikulpanit1-1/+3
Add a tracepoint to track number of doorbells being sent to signal a running vCPU to process IRQ after being injected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-17-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possibleSuravee Suthikulpanit1-3/+18
For x2AVIC, the index from incomplete IPI #vmexit info is invalid for logical cluster mode. Only ICRH/ICRL values can be used to determine the IPI destination APIC ID. Since QEMU defines guest physical APIC ID to be the same as vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI, and avoid the overhead from searching through all vCPUs to match the target vCPU. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-16-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Introduce hybrid-AVIC modeSuravee Suthikulpanit2-11/+11
Currently, AVIC is inhibited when booting a VM w/ x2APIC support. because AVIC cannot virtualize x2APIC MSR register accesses. However, the AVIC doorbell can be used to accelerate interrupt injection into a running vCPU, while all guest accesses to x2APIC MSRs will be intercepted and emulated by KVM. With hybrid-AVIC support, the APICV_INHIBIT_REASON_X2APIC is no longer enforced. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-14-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpuSuravee Suthikulpanit1-1/+0
Originalliy, this WARN_ON is designed to detect when calling avic_vcpu_load() on an already running vcpu in AVIC mode (i.e. the AVIC is_running bit is set). However, for x2AVIC, the vCPU can switch from xAPIC to x2APIC mode while in running state, in which the avic_vcpu_load() will be called from svm_refresh_apicv_exec_ctrl(). Therefore, remove this warning since it is no longer appropriate. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-13-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Introduce logic to (de)activate x2AVIC modeSuravee Suthikulpanit3-5/+53
Introduce logic to (de)activate AVIC, which also allows switching between AVIC to x2AVIC mode at runtime. When an AVIC-enabled guest switches from APIC to x2APIC mode, the SVM driver needs to perform the following steps: 1. Set the x2APIC mode bit for AVIC in VMCB along with the maximum APIC ID support for each mode accodingly. 2. Disable x2APIC MSRs interception in order to allow the hardware to virtualize x2APIC MSRs accesses. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-12-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: nSVM: always intercept x2apic msrsMaxim Levitsky2-0/+14
As a preparation for x2avic, this patch ensures that x2apic msrs are always intercepted for the nested guest. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-11-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Refresh AVIC configuration when changing APIC modeSuravee Suthikulpanit3-0/+15
AMD AVIC can support xAPIC and x2APIC virtualization, which requires changing x2APIC bit VMCB and MSR intercepton for x2APIC MSRs. Therefore, call avic_refresh_apicv_exec_ctrl() to refresh configuration accordingly. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-10-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Adding support for configuring x2APIC MSRs interceptionSuravee Suthikulpanit2-2/+27
When enabling x2APIC virtualization (x2AVIC), the interception of x2APIC MSRs must be disabled to let the hardware virtualize guest MSR accesses. Current implementation keeps track of list of MSR interception state in the svm_direct_access_msrs array. Therefore, extends the array to include x2APIC MSRs. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-8-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Do not support updating APIC ID when in x2APIC modeSuravee Suthikulpanit1-1/+10
In X2APIC mode, the Logical Destination Register is read-only, which provides a fixed mapping between the logical and physical APIC IDs. Therefore, there is no Logical APIC ID table in X2AVIC and the processor uses the X2APIC ID in the backing page to create a vCPU’s logical ID. In addition, KVM does not support updating APIC ID in x2APIC mode, which means AVIC does not need to handle this case. Therefore, check x2APIC mode when handling physical and logical APIC ID update, and when invalidating logical APIC ID table. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-7-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC IDSuravee Suthikulpanit1-2/+8
In x2APIC mode, ICRH contains 32-bit destination APIC ID. So, update the avic_kick_target_vcpus() accordingly. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-6-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Update max number of vCPUs supported for x2AVIC modeSuravee Suthikulpanit1-3/+5
xAVIC and x2AVIC modes can support diffferent number of vcpus. Update existing logics to support each mode accordingly. Also, modify the maximum physical APIC ID for AVIC to 255 to reflect the actual value supported by the architecture. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-5-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SVM: Detect X2APIC virtualization (x2AVIC) supportSuravee Suthikulpanit3-13/+56
Add CPUID check for the x2APIC virtualization (x2AVIC) feature. If available, the SVM driver can support both AVIC and x2AVIC modes when load the kvm_amd driver with avic=1. The operating mode will be determined at runtime depending on the guest APIC mode. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-4-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to [GET/SET]_XAPIC_DEST_FIELDSuravee Suthikulpanit1-2/+2
To signify that the macros only support 8-bit xAPIC destination ID. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220519102709.24125-3-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: SEV: Init target VMCBs in sev_migrate_fromPeter Gonda3-33/+48
The target VMCBs during an intra-host migration need to correctly setup for running SEV and SEV-ES guests. Add sev_init_vmcb() function and make sev_es_init_vmcb() static. sev_init_vmcb() uses the now private function to init SEV-ES guests VMCBs when needed. Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220623173406.744645-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()Mingwei Zhang1-2/+2
Adding the accounting flag when allocating pages within the SEV function, since these memory pages should belong to individual VM. No functional change intended. Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220623171858.2083637-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20Revert "KVM: x86: always allow host-initiated writes to PMU MSRs"Sean Christopherson1-1/+1
Revert the hack to allow host-initiated accesses to all "PMU" MSRs, as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether or not it has a snowball's chance in hell of actually being a PMU MSR. That mostly gets papered over by the actual get/set helpers only handling MSRs that they knows about, except there's the minor detail that kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled. I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is disabled, either via module param or capability. This reverts commit d1c88a4020567ba4da52f778bcd9619d87e4ea75. Fixes: d1c88a402056 ("KVM: x86: always allow host-initiated writes to PMU MSRs") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20Revert "KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if ↵Sean Christopherson1-10/+1
!enable_pmu" Eating reads and writes to all "PMU" MSRs when there is no PMU is wildly broken as it results in allowing accesses to _any_ MSR on Intel CPUs as intel_is_valid_msr() returns true for all host_initiated accesses. A revert of commit d1c88a402056 ("KVM: x86: always allow host-initiated writes to PMU MSRs") will soon follow. This reverts commit 8e6a58e28b34e8d247e772159b8fa8f6bae39192. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Move "apicv_active" into "struct kvm_lapic"Sean Christopherson1-2/+3
Move the per-vCPU apicv_active flag into KVM's local APIC instance. APICv is fully dependent on an in-kernel local APIC, but that's not at all clear when reading the current code due to the flag being stored in the generic kvm_vcpu_arch struct. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: SVM: Drop unused AVIC / kvm_x86_ops declarationsSean Christopherson1-4/+0
Drop a handful of unused AVIC function declarations whose implementations were removed during the conversion to optional static calls. No functional change intended. Fixes: abb6d479e226 ("KVM: x86: make several APIC virtualization callbacks optional") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-15KVM: X86/SVM: Use root_level in svm_load_mmu_pgd()Lai Jiangshan1-1/+1
Use root_level in svm_load_mmu_pg() rather that looking up the root level in vcpu->arch.mmu->root_role.level. svm_load_mmu_pgd() has only one caller, kvm_mmu_load_pgd(), which always passes vcpu->arch.mmu->root_role.level as root_level. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20220605063417.308311-7-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-15KVM: SVM: Hide SEV migration lockdep goo behind CONFIG_PROVE_LOCKINGSean Christopherson1-10/+7
Wrap the manipulation of @role and the manual mutex_{release,acquire}() invocations in CONFIG_PROVE_LOCKING=y to squash a clang-15 warning. When building with -Wunused-but-set-parameter and CONFIG_DEBUG_LOCK_ALLOC=n, clang-15 seees there's no usage of @role in mutex_lock_killable_nested() and yells. PROVE_LOCKING selects DEBUG_LOCK_ALLOC, and the only reason KVM manipulates @role is to make PROVE_LOCKING happy. To avoid true ugliness, use "i" and "j" to detect the first pass in the loops; the "idx" field that's used by kvm_for_each_vcpu() is guaranteed to be '0' on the first pass as it's simply the first entry in the vCPUs XArray, which is fully KVM controlled. kvm_for_each_vcpu() passes '0' for xa_for_each_range()'s "start", and xa_for_each_range() will not enter the loop if there's no entry at '0'. Fixes: 0c2c7c069285 ("KVM: SEV: Mark nested locking of vcpu->lock") Reported-by: kernel test robot <lkp@intel.com> Cc: Peter Gonda <pgonda@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220613214237.2538266-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-15KVM: SEV: fix misplaced closing parenthesisPaolo Bonzini1-2/+2
This caused a warning on 32-bit systems, but undoubtedly would have acted funny on 64-bit as well. The fix was applied directly on merge in 5.19, see commit 24625f7d91fb ("Merge tag for-linus of git://git.kernel.org/pub/scm/virt/kvm/kvm"). Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-119/+103
Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
2022-06-09Merge branch 'kvm-5.20-early'Paolo Bonzini4-105/+214
s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to show tests x86: * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Rewrite gfn-pfn cache refresh * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit
2022-06-09KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini2-20/+23
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky3-27/+8
Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky1-36/+69
There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky1-35/+0
AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky1-1/+3
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini3-17/+17
into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry
2022-06-08KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmuLike Xu1-1/+10
Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, as is the case for MSR_K7_EVNTSEL0 or MSR_F15H_PERF_CTL0, it has to be always retrievable and settable with KVM_GET_MSR and KVM_SET_MSR. Accept a zero value for these MSRs to obey the contract. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220601031925.59693-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-15/+25
Pull KVM fixes from Paolo Bonzini: - syzkaller NULL pointer dereference - TDP MMU performance issue with disabling dirty logging - 5.14 regression with SVM TSC scaling - indefinite stall on applying live patches - unstable selftest - memory leak from wrong copy-and-paste - missed PV TLB flush when racing with emulation * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: do not set st->preempted when going back to user space KVM: SVM: fix tsc scaling cache logic KVM: selftests: Make hyperv_clock selftest more stable KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set KVM: Don't null dereference ops->destroy
2022-06-08KVM: x86: Introduce "struct kvm_caps" to track misc caps/settingsSean Christopherson2-8/+9
Add kvm_caps to hold a variety of capabilites and defaults that aren't handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce the amount of boilerplate code required to add a new feature. The vast majority (all?) of the caps interact with vendor code and are written only during initialization, i.e. should be tagged __read_mostly, declared extern in x86.h, and exported. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Drop amd_event_mapping[] in the KVM contextLike Xu1-53/+3
All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW, which means that the table that maps perf_hw_id to event select values is no longer useful, at least for AMD. For Intel, the logic to check if the pmu event reported by Intel cpuid is not available is still required, in which case pmc_perf_hw_id() could be renamed to hw_event_is_unavail() and a bool value is returned to replace the semantics of "PERF_COUNT_HW_MAX+1". Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220518132512.37864-12-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Use only the uniform interface reprogram_counter()Paolo Bonzini1-1/+1
Since reprogram_counter(), reprogram_{gp, fixed}_counter() currently have the same incoming parameter "struct kvm_pmc *pmc", the callers can simplify the conetxt by using uniformly exported interface, which makes reprogram_ {gp, fixed}_counter() static and eliminates EXPORT_SYMBOL_GPL. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220518132512.37864-8-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()Like Xu1-2/+4
Because inside reprogram_gp_counter() it is bound to assign the requested eventel to pmc->eventsel, this assignment step can be moved forward, thus simplifying the passing of parameters to "struct kvm_pmc *pmc" only. No functional change intended. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220518132512.37864-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: always allow host-initiated writes to PMU MSRsPaolo Bonzini1-1/+1
Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, it has to be always retrievable and settable with KVM_GET_MSR and KVM_SET_MSR. Accept the PMU MSRs unconditionally in intel_is_valid_msr, if the access was host-initiated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: nSVM: Transparently handle L1 -> L2 NMI re-injectionMaciej S. Szmigiero3-0/+20
A NMI that L1 wants to inject into its L2 should be directly re-injected, without causing L0 side effects like engaging NMI blocking for L1. It's also worth noting that in this case it is L1 responsibility to track the NMI window status for its L2 guest. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>