aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm
AgeCommit message (Collapse)AuthorFilesLines
2023-08-02KVM: SVM: Use svm_get_lbr_vmcb() helper to handle writes to DEBUGCTLSean Christopherson1-6/+1
Use the recently introduced svm_get_lbr_vmcb() instead an open coded equivalent to retrieve the target VMCB when emulating writes to MSR_IA32_DEBUGCTLMSR. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: SVM: Clean up handling of LBR virtualization enabledSean Christopherson1-9/+4
Clean up the enable_lbrv computation in svm_update_lbrv() to consolidate the logic for computing enable_lbrv into a single statement, and to remove the coding style violations (lack of curly braces on nested if). No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-08-02KVM: SVM: Fix dead KVM_BUG() code in LBR MSR virtualizationSean Christopherson1-29/+16
Refactor KVM's handling of LBR MSRs on SVM to avoid a second layer of case statements, and thus eliminate a dead KVM_BUG() call, which (a) will never be hit in the current code base and (b) if a future commit breaks things, will never fire as KVM passes "false" instead "true" or '1' for the KVM_BUG() condition. Reported-by: Michal Luczaj <[email protected]> Cc: Yuan Yao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-29KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalidSean Christopherson1-0/+6
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid, e.g. due to setting bits 63:32, illegal combinations, or to a value that isn't allowed in VMX (non-)root mode. The VMX checks in particular are "fun" as failure to disallow Real Mode for an L2 that is configured with unrestricted guest disabled, when KVM itself has unrestricted guest enabled, will result in KVM forcing VM86 mode to virtual Real Mode for L2, but then fail to unwind the related metadata when synthesizing a nested VM-Exit back to L1 (which has unrestricted guest enabled). Opportunistically fix a benign typo in the prototype for is_valid_cr4(). Cc: [email protected] Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-07-29Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"Sean Christopherson1-8/+2
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows dereferencing memslots during WRMSR emulation, drop the requirement that "next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a better fix than avoiding the pastpath, but at the time it was thought that accessing SRCU-protected data in the fastpath was a one-off edge case. This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-07-28KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI windowSean Christopherson1-0/+13
Bail early from svm_enable_nmi_window() for SEV-ES guests without trying to enable single-step of the guest, as single-stepping an SEV-ES guest is impossible and the guest is responsible for *telling* KVM when it is ready for an new NMI to be injected. Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the field is ignored by hardware, but it's all kinds of confusing. Signed-off-by: Alexey Kardashevskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guestsSean Christopherson2-6/+9
Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete) instead of setting awaiting_iret_completion and waiting until the *next* VM-Exit to unmask NMIs. The whole point of "NMI complete" is that the guest is responsible for telling the hypervisor when it's safe to inject an NMI, i.e. there's no need to wait. And because there's no IRET to single-step, the next VM-Exit could be a long time coming, i.e. KVM could incorrectly hold an NMI pending for far longer than what is required and expected. Opportunistically fix a stale reference to HF_IRET_MASK. Fixes: 916b54a7688b ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"") Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest") Cc: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SEV-ES: Eliminate #DB intercept when DebugSwap enabledAlexey Kardashevskiy1-0/+11
Disable #DB for SEV-ES guests when DebugSwap is enabled. There is no point in such intercept as KVM does not allow guest debug for SEV-ES guests. Signed-off-by: Alexey Kardashevskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: add comment as to why KVM disables #DB intercept iff DebugSwap=1] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SEV: Enable data breakpoints in SEV-ESAlexey Kardashevskiy1-3/+33
Add support for "DebugSwap for SEV-ES guests", which provides support for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e. allows KVM to expose debug capabilities to SEV-ES guests. Without DebugSwap support, the CPU doesn't save/load most _guest_ debug registers (except DR6/7), and KVM cannot manually context switch guest DRs due the VMSA being encrypted. Enable DebugSwap if and only if the CPU also supports NoNestedDataBp, which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when vectoring a #DB. Without NoNestedDataBp, a malicious guest can DoS the host by putting the CPU into an infinite loop of vectoring #DBs (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496) Set the features bit in sev_es_sync_vmsa() which is the last point when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most init happens) is called not only when VCPU is initialised but also on intrahost migration when VMSA is encrypted. Eliminate DR7 intercepts as KVM can't modify guest DR7, and intercepting DR7 would completely defeat the purpose of enabling DebugSwap. Make X86_FEATURE_DEBUG_SWAP appear in /proc/cpuinfo (by not adding "") to let the operator know if the VM can debug. Signed-off-by: Alexey Kardashevskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SVM/SEV/SEV-ES: Rework interceptsAlexey Kardashevskiy2-23/+20
Currently SVM setup is done sequentially in init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs. Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only. No functional change intended. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Santosh Shukla <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: drop comment about intercepting DR7] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SEV-ES: explicitly disable debugAlexey Kardashevskiy2-1/+13
SVM/SEV enable debug registers intercepts to skip swapping DRs on entering/exiting the guest. When the guest is in control of debug registers (vcpu->guest_debug == 0), there is an optimisation to reduce the number of context switches: intercepts are cleared and the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping on guest enter/exit. The same code also executes for SEV-ES, however it has no effect as - it always takes (vcpu->guest_debug == 0) branch; - KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared; - vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them with the values from _encrypted_ VMSA. Be explicit about SEV-ES not supporting debug: - return right away from dr_interception() and skip unnecessary processing; - return an error right away from the KVM_SEV_LAUNCH_UPDATE_VMSA handler if debugging was already enabled. KVM_SET_GUEST_DEBUG are failing already after KVM_SEV_LAUNCH_UPDATE_VMSA is finished due to vcpu->arch.guest_state_protected set to true. Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on guest exit) to signify that SEV-ES won't hit that path. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment about swap typesSean Christopherson1-10/+15
Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM needs to save a seemingly random subset of host state, and to provide a decoder for the APM's Type-A/B/C terminology. Signed-off-by: Alexey Kardashevskiy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEVAlexey Kardashevskiy2-6/+8
Currently SVM setup is done sequentially in init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions is #GP intercept which init_vmcb() skips setting for SEV guests and then sev_es_init_vmcb() needlessly clears it. Remove the SEV check from init_vmcb(). Clear the #GP intercept in sev_init_vmcb(). SEV-ES will use the SEV setting. No functional change intended. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Carlos Bilbao <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Reviewed-by: Santosh Shukla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-28KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the headerAlexey Kardashevskiy2-42/+42
Static functions set_dr_intercepts() and clr_dr_intercepts() are only called from SVM so move them to .c. No functional change intended. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Carlos Bilbao <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Reviewed-by: Santosh Shukla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-07-27x86/srso: Add IBPB on VMEXITBorislav Petkov (AMD)2-1/+6
Add the option to flush IBPB only on VMEXIT in order to protect from malicious guests but one otherwise trusts the software that runs on the hypervisor. Signed-off-by: Borislav Petkov (AMD) <[email protected]>
2023-07-01Merge tag 'kvm-x86-svm-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-35/+13
KVM SVM changes for 6.5: - Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that are handled in the fastpath - Print more descriptive information about the status of SEV and SEV-ES during module load - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible memory leaks
2023-07-01Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-22/+65
KVM x86/pmu changes for 6.5: - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way
2023-07-01Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-4/+5
KVM x86 changes for 6.5: * Move handling of PAT out of MTRR code and dedup SVM+VMX code * Fix output of PIC poll command emulation when there's an interrupt * Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. * Misc cleanups
2023-06-13KVM: SVM: WARN, but continue, if misc_cg_set_capacity() failsSean Christopherson1-6/+2
WARN and continue if misc_cg_set_capacity() fails, as the only scenario in which it can fail is if the specified resource is invalid, which should never happen when CONFIG_KVM_AMD_SEV=y. Deliberately not bailing "fixes" a theoretical bug where KVM would leak the ASID bitmaps on failure, which again can't happen. If the impossible should happen, the end result is effectively the same with respect to SEV and SEV-ES (they are unusable), while continuing on has the advantage of letting KVM load, i.e. userspace can still run non-SEV guests. Reported-by: Alexander Mikhalitsyn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022Like Xu1-0/+4
CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance monitoring features for AMD processors. Bit 0 of EAX indicates support for Performance Monitoring Version 2 (PerfMonV2) features. If found to be set during PMU initialization, the EBX bits of the same CPUID function can be used to determine the number of available PMCs for different PMU types. Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so that guests can make use of the PerfMonV2 features. Co-developed-by: Sandipan Das <[email protected]> Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/svm/pmu: Add AMD PerfMonV2 supportLike Xu1-10/+45
If AMD Performance Monitoring Version 2 (PerfMonV2) is detected by the guest, it can use a new scheme to manage the Core PMCs using the new global control and status registers. In addition to benefiting from the PerfMonV2 functionality in the same way as the host (higher precision), the guest also can reduce the number of vm-exits by lowering the total number of MSRs accesses. In terms of implementation details, amd_is_valid_msr() is resurrected since three newly added MSRs could not be mapped to one vPMC. The possibility of emulating PerfMonV2 on the mainframe has also been eliminated for reasons of precision. Co-developed-by: Sandipan Das <[email protected]> Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Like Xu <[email protected]> [sean: drop "Based on the observed HW." comments] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_capLike Xu1-0/+3
Cap the number of general purpose counters enumerated on AMD to what KVM actually supports, i.e. don't allow userspace to coerce KVM into thinking there are more counters than actually exist, e.g. by enumerating X86_FEATURE_PERFCTR_CORE in guest CPUID when its not supported. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Like Xu <[email protected]> [sean: massage changelog] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/pmu: Advertise PERFCTR_CORE iff the min nr of counters is metLike Xu1-3/+12
Enable and advertise PERFCTR_CORE if and only if the minimum number of required counters are available, i.e. if perf says there are less than six general purpose counters. Opportunistically, use kvm_cpu_cap_check_and_set() instead of open coding the check for host support. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Like Xu <[email protected]> [sean: massage shortlog and changelog] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't metLike Xu1-0/+1
Disable PMU support when running on AMD and perf reports fewer than four general purpose counters. All AMD PMUs must define at least four counters due to AMD's legacy architecture hardcoding the number of counters without providing a way to enumerate the number of counters to software, e.g. from AMD's APM: The legacy architecture defines four performance counters (PerfCtrn) and corresponding event-select registers (PerfEvtSeln). Virtualizing fewer than four counters can lead to guest instability as software expects four counters to be available. Rather than bleed AMD details into the common code, just define a const unsigned int and provide a convenient location to document why Intel and AMD have different mins (in particular, AMD's lack of any way to enumerate less than four counters to the guest). Keep the minimum number of counters at Intel at one, even though old P6 and Core Solo/Duo processor effectively require a minimum of two counters. KVM can, and more importantly has up until this point, supported a vPMU so long as the CPU has at least one counter. Perf's support for P6/Core CPUs does require two counters, but perf will happily chug along with a single counter when running on a modern CPU. Cc: Jim Mattson <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Like Xu <[email protected]> [sean: set Intel min to '1', not '2'] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 codeLike Xu1-9/+0
Move the Intel PMU implementation of pmc_is_enabled() to common x86 code as pmc_is_globally_enabled(), and drop AMD's implementation. AMD PMU currently supports only v1, and thus not PERF_GLOBAL_CONTROL, thus the semantics for AMD are unchanged. And when support for AMD PMU v2 comes along, the common behavior will also Just Work. Signed-off-by: Like Xu <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: x86: Clean up: remove redundant bool conversionsMichal Luczaj1-1/+1
As test_bit() returns bool, explicitly converting result to bool is unnecessary. Get rid of '!!'. No functional change intended. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Michal Luczaj <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-06KVM: SVM: enhance info printk's in SEV initAlexander Mikhalitsyn1-2/+9
Let's print available ASID ranges for SEV/SEV-ES guests. This information can be useful for system administrator to debug if SEV/SEV-ES fails to enable. There are a few reasons. SEV: - NPT is disabled (module parameter) - CPU lacks some features (sev, decodeassists) - Maximum SEV ASID is 0 SEV-ES: - mmio_caching is disabled (module parameter) - CPU lacks sev_es feature - Minimum SEV ASID value is 1 (can be adjusted in BIOS/UEFI) Cc: Sean Christopherson <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: [email protected] Cc: [email protected] Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: print '0' for min SEV-ES ASID if there are no available ASIDs] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02KVM: SVM: Invoke trace_kvm_exit() for fastpath VM-ExitsSean Christopherson1-2/+2
Move SVM's call to trace_kvm_exit() from the "slow" VM-Exit handler to svm_vcpu_run() so that KVM traces fastpath VM-Exits that re-enter the guest without bouncing through the slow path. This bug is benign in the current code base as KVM doesn't currently support any such exits on SVM. Fixes: a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASKMaciej S. Szmigiero1-1/+1
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: x86: Move common handling of PAT MSR writes to kvm_set_msr_common()Sean Christopherson1-3/+4
Move the common check-and-set handling of PAT MSR writes out of vendor code and into kvm_set_msr_common(). This aligns writes with reads, which are already handled in common code, i.e. makes the handling of reads and writes symmetrical in common code. Alternatively, the common handling in kvm_get_msr_common() could be moved to vendor code, but duplicating code is generally undesirable (even though the duplicatated code is trivial in this case), and guest writes to PAT should be rare, i.e. the overhead of the extra function call is a non-issue in practice. Suggested-by: Kai Huang <[email protected]> Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: SVM: Use kvm_pat_valid() directly instead of kvm_mtrr_valid()Ke Guo1-1/+1
Use kvm_pat_valid() directly instead of bouncing through kvm_mtrr_valid(). The PAT is not an MTRR, and kvm_mtrr_valid() just redirects to kvm_pat_valid(), i.e. is exempt from KVM's "zap SPTEs" logic that's needed to honor guest MTRRs when the VM has a passthrough device with non-coherent DMA (KVM does NOT set "ignore guest PAT" in this case, and so enables hardware virtualization of the guest's PAT, i.e. doesn't need to manually emulate the PAT memtype). Signed-off-by: Ke Guo <[email protected]> [sean: massage changelog] Reviewed-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-01KVM: SVM: Remove TSS reloading code after VMEXITMingwei Zhang2-25/+0
Remove the dedicated post-VMEXIT TSS reloading code now that KVM uses VMLOAD to load host segment state, which includes TSS state. Fixes: e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state") Reported-by: Venkatesh Srinivas <[email protected]> Suggested-by: Jim Mattson <[email protected]> Signed-off-by: Mingwei Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: massage changelog] Signed-off-by: Sean Christopherson <[email protected]>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-76/+252
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-28Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-26Merge tag 'kvm-x86-svm-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-44/+229
KVM SVM changes for 6.4: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts
2023-04-26Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-2/+2
KVM x86 PMU changes for 6.4: - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - Misc cleanups and fixes
2023-04-26Merge tag 'kvm-x86-mmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-3/+2
KVM x86 MMU changes for 6.4: - Tweak FNAME(sync_spte) to avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to share the .sync_page() implementation, i.e. utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Misc cleanups
2023-04-26Merge tag 'kvm-x86-misc-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-4/+2
KVM x86 changes for 6.4: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Misc cleanups
2023-04-26Merge tag 'v6.4-p1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Total usage stats now include all that returned errors (instead of just some) - Remove maximum hash statesize limit - Add cloning support for hmac and unkeyed hashes - Demote BUG_ON in crypto_unregister_alg to a WARN_ON Algorithms: - Use RIP-relative addressing on x86 to prepare for PIE build - Add accelerated AES/GCM stitched implementation on powerpc P10 - Add some test vectors for cmac(camellia) - Remove failure case where jent is unavailable outside of FIPS mode in drbg - Add permanent and intermittent health error checks in jitter RNG Drivers: - Add support for 402xx devices in qat - Add support for HiSTB TRNG - Fix hash concurrency issues in stm32 - Add OP-TEE firmware support in caam" * tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits) i2c: designware: Add doorbell support for Mendocino i2c: designware: Use PCI PSP driver for communication powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10 crypto: p10-aes-gcm - Remove POWER10_CPU dependency crypto: testmgr - Add some test vectors for cmac(camellia) crypto: cryptd - Add support for cloning hashes crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm crypto: hmac - Add support for cloning crypto: hash - Add crypto_clone_ahash/shash crypto: api - Add crypto_clone_tfm crypto: api - Add crypto_tfm_get crypto: x86/sha - Use local .L symbols for code crypto: x86/crc32 - Use local .L symbols for code crypto: x86/aesni - Use local .L symbols for code crypto: x86/sha256 - Use RIP-relative addressing crypto: x86/ghash - Use RIP-relative addressing crypto: x86/des3 - Use RIP-relative addressing crypto: x86/crc32c - Use RIP-relative addressing crypto: x86/cast6 - Use RIP-relative addressing crypto: x86/cast5 - Use RIP-relative addressing ...
2023-04-24Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-12/+14
Pull vfs fget updates from Al Viro: "fget() to fdget() conversions" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse_dev_ioctl(): switch to fdget() cgroup_get_from_fd(): switch to fdget_raw() bpf: switch to fdget_raw() build_mount_idmapped(): switch to fdget() kill the last remaining user of proc_ns_fget() SVM-SEV: convert the rest of fget() uses to fdget() in there convert sgx_set_attribute() to fdget()/fdput() convert setns(2) to fdget()/fdput()
2023-04-20SVM-SEV: convert the rest of fget() uses to fdget() in thereAl Viro1-12/+14
Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2023-04-10KVM: x86: Rename Hyper-V remote TLB hooks to match established schemeSean Christopherson1-3/+2
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code, arch hooks from common code, etc. Reviewed-by: David Matlack <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-04-07KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()Like Xu1-1/+1
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam(). Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()") Signed-off-by: Like Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: trim the changelog] Signed-off-by: Sean Christopherson <[email protected]>
2023-04-06KVM: x86: Add macros to track first...last VMX feature MSRsSean Christopherson1-1/+1
Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-04-06KVM: SVM: Return the local "r" variable from svm_set_msr()Sean Christopherson1-5/+5
Rename "r" to "ret" and actually return it from svm_set_msr() to reduce the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm: svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it would be propagated to the caller. Alternatively, the declaration of "r" could be moved into the handling of MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper for kvm_set_user_return_msr() would allow eliding a local variable, but that feels like delaying the inevitable. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-04-06KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMDSean Christopherson1-0/+5
Virtualize FLUSH_L1D so that the guest can use the performant L1D flush if one of the many mitigations might require a flush in the guest, e.g. Linux provides an option to flush the L1D when switching mms. Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware and exposed to the guest, i.e. always let the guest write it directly if FLUSH_L1D is fully supported. Forward writes to hardware in host context on the off chance that KVM ends up emulating a WRMSR, or in the really unlikely scenario where userspace wants to force a flush. Restrict these forwarded WRMSRs to the known command out of an abundance of caution. Passing through the MSR means the guest can throw any and all values at hardware, but doing so in host context is arguably a bit more dangerous. Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-04-06KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common codeSean Christopherson1-14/+0
Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the logic to kvm_set_msr_common(). Now that the MSR interception toggling is handled as part of setting guest CPUID, the VMX and SVM paths are identical. Opportunistically massage the code to make it a wee bit denser. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-04-06KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson1-1/+4
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-04-06KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enablingSean Christopherson1-30/+13
Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or MSR_IA32_PRED_CMD, and because the entire foundation of the logic is flawed. The most immediate problem is an inverted check on @cmd that results in rejecting legal values. SVM doubles down on bugs and drops the error, i.e. silently breaks all guest mitigations based on the command MSRs. The next issue is that neither VMX nor SVM was updated to mark MSR_IA32_FLUSH_CMD as being a possible passthrough MSR, which isn't hugely problematic, but does break MSR filtering and triggers a WARN on VMX designed to catch this exact bug. The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a write-only MSR, i.e. doesn't need the same "deferred passthrough" shenanigans as MSR_IA32_SPEC_CTRL. Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is no point where KVM advertises, but does not support, L1D_FLUSH. This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e, 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and a807b78ad04b2eaa348f52f5cc7702385b6de1ee. Reported-by: Nathan Chancellor <[email protected]> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X Cc: Emanuele Giuseppe Esposito <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Jim Mattson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Tested-by: Mathias Krause <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-03-27KVM: SVM: Flush Hyper-V TLB when requiredJeremi Piotrowski2-3/+49
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM is running on top of Hyper-V and Hyper-V exposes support for it (which is always). On AMD CPUs this enlightenment results in ASID invalidations not flushing TLB entries derived from the NPT. To force the underlying (L0) hypervisor to rebuild its shadow page tables, an explicit hypercall is needed. The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM only added remote TLB flush hooks. This worked out fine for a while, as sufficient remote TLB flushes where being issued in KVM to mask the problem. Since v5.17, changes in the TDP code reduced the number of flushes and the out-of-sync TLB prevents guests from booting successfully. Split svm_flush_tlb_current() into separate callbacks for the 3 cases (guest/all/current), and issue the required Hyper-V hypercall when a Hyper-V TLB flush is needed. The most important case where the TLB flush was missing is when loading a new PGD, which is followed by what is now svm_flush_tlb_current(). Cc: [email protected] # v5.17+ Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM") Link: https://lore.kernel.org/lkml/[email protected]/ Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Jeremi Piotrowski <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>