aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-22KVM: nSVM: prepare guest save area while is_guest_mode is truePaolo Bonzini1-1/+1
Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit 06fc7772690d ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit 384c63684397 ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/[email protected]/ Reviewed-by: Sean Christopherson <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Remove a variety of unnecessary exportsSean Christopherson2-21/+15
Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86: Fold "write-protect large" use case into generic write-protectSean Christopherson2-47/+17
Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PMLSean Christopherson5-153/+36
Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: VMX: Dynamically enable/disable PML based on memslot dirty loggingMakarand Sonare6-5/+70
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: Makarand Sonare <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86: Further clarify the logic and comments for toggling log dirtySean Christopherson1-4/+11
Add a sanity check in kvm_mmu_slot_apply_flags to assert that the LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on that holding true when zapping collapsible SPTEs. Manipulating the CPU dirty log (PML) and write-protection also relies on this assertion, but that's not obvious in the current code. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86: Move MMU's PML logic to common codeSean Christopherson5-100/+24
Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Make dirty log size hook (PML) a value, not a functionSean Christopherson4-13/+4
Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()Sean Christopherson1-1/+4
Expand the comment about need to use write-protection for nested EPT when PML is enabled to clarify that the tagging is a nop when PML is _not_ enabled. Without the clarification, omitting the PML check looks wrong at first^Wfifth glance. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: nVMX: Disable PML in hardware when running L2Sean Christopherson2-16/+25
Unconditionally disable PML in vmcs02, KVM emulates PML purely in the MMU, e.g. vmx_flush_pml_buffer() doesn't even try to copy the L2 GPAs from vmcs02's buffer to vmcs12. At best, enabling PML is a nop. At worst, it will cause vmx_flush_pml_buffer() to record bogus GFNs in the dirty logs. Initialize vmcs02.GUEST_PML_INDEX such that PML writes would trigger VM-Exit if PML was somehow enabled, skip flushing the buffer for guest mode since the index is bogus, and freak out if a PML full exit occurs when L2 is active. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEsSean Christopherson3-12/+14
When zapping SPTEs in order to rebuild them as huge pages, use the new helper that computes the max mapping level to detect whether or not a SPTE should be zapped. Doing so avoids zapping SPTEs that can't possibly be rebuilt as huge pages, e.g. due to hardware constraints, memslot alignment, etc... This also avoids zapping SPTEs that are still large, e.g. if migration was canceled before write-protected huge pages were shattered to enable dirty logging. Note, such pages are still write-protected at this time, i.e. a page fault VM-Exit will still occur. This will hopefully be addressed in a future patch. Sadly, TDP MMU loses its const on the memslot, but that's a pervasive problem that's been around for quite some time. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Pass the memslot to the rmap callbacksSean Christopherson1-9/+15
Pass the memslot to the rmap callbacks, it will be used when zapping collapsible SPTEs to verify the memslot is compatible with hugepages before zapping its SPTEs. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Split out max mapping level calculation to helperSean Christopherson2-15/+24
Factor out the logic for determining the maximum mapping level given a memslot and a gpa. The helper will be used when zapping collapsible SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that can't possibly be rebuilt as hugepages. No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-19KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and ↵Sean Christopherson1-1/+2
HugeTLB pages Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to rebuild them as huge pages in the TDP MMU. ZONE_DEVICE huge pages are managed differently than "regular" pages and are not compound pages. Likewise, PageTransCompoundMap() will not detect HugeTLB, so switch to PageCompound(). This matches the similar check in kvm_mmu_zap_collapsible_spte. Cc: Ben Gardon <[email protected]> Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: nVMX: no need to undo inject_page_fault change on nested vmexitPaolo Bonzini1-3/+0
This is not needed because the tweak was done on the guest_mmu, while nested_ept_uninit_mmu_context has just changed vcpu->arch.walk_mmu back to the root_mmu. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: nSVM: fix running nested guests when npt=0Paolo Bonzini1-0/+20
In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak as VMX has, to make sure that shadow mmu faults are injected as vmexits. It is not clear why this is needed at all, but for now keep the same code as VMX and we'll fix it for both. Based on a patch by Maxim Levitsky <[email protected]>. Fixes: 7c86663b68ba ("KVM: nSVM: inject exceptions via svm_check_nested_events") Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: nSVM: move nested vmrun tracepoint to enter_svm_guest_modeMaxim Levitsky1-12/+14
This way trace will capture all the nested mode entries (including entries after migration, and from smm) Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: VMX: read idt_vectoring_info a bit earlierMaxim Levitsky1-1/+3
trace_kvm_exit prints this value (using vmx_get_exit_info) so it makes sense to read it before the trace point. Fixes: dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler") Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: VMX: Allow INVPCID in guest without PCIDSean Christopherson1-10/+0
Remove the restriction that prevents VMX from exposing INVPCID to the guest without PCID also being exposed to the guest. The justification of the restriction is that INVPCID will #UD if it's disabled in the VMCS. While that is a true statement, it's also true that RDTSCP will #UD if it's disabled in the VMCS. Neither of those things has any dependency whatsoever on the guest being able to set CR4.PCIDE=1, which is what is effectively allowed by exposing PCID to the guest. Removing the bogus restriction aligns VMX with SVM, and also allows for an interesting configuration. INVPCID is that fastest way to do a global TLB flush, e.g. see native_flush_tlb_global(). Allowing INVPCID without PCID would let a guest use the expedited flush while also limiting the number of ASIDs consumed by the guest. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: x86: Advertise INVPCID by defaultSean Christopherson3-6/+3
Advertise INVPCID by default (if supported by the host kernel) instead of having both SVM and VMX opt in. INVPCID was opt in when it was a VMX only feature so that KVM wouldn't prematurely advertise support if/when it showed up in the kernel on AMD hardware. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-18KVM: SVM: Intercept INVPCID when it's disabled to inject #UDSean Christopherson1-4/+4
Intercept INVPCID if it's disabled in the guest, even when using NPT, as KVM needs to inject #UD in this case. Fixes: 4407a797e941 ("KVM: SVM: Enable INVPCID feature on AMD") Cc: Babu Moger <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-15selftests: kvm: avoid uninitialized variable warningPaolo Bonzini1-1/+2
The variable in practice will never be uninitialized, because the loop will always go through at least one iteration. In case it would not, make vcpu_get_cpuid report an assertion failure. Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-15selftests: kvm: add hardware_disable testIgnacio Alvarado3-0/+167
This test launches 512 VMs in serial and kills them after a random amount of time. The test was original written to exercise KVM user notifiers in the context of1650b4ebc99d: - KVM: Disable irq while unregistering user notifier - https://lore.kernel.org/kvm/CACXrx53vkO=HKfwWwk+fVpvxcNjPrYmtDZ10qWxFvVX_PTGp3g@mail.gmail.com/ Recently, this test piqued my interest because it proved useful to for AMD SNP in exercising the "in-use" pages, described in APM section 15.36.12, "Running SNP-Active Virtual Machines". Signed-off-by: Ignacio Alvarado <[email protected]> Signed-off-by: Marc Orr <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-12Merge tag 'kvmarm-5.12' of ↵Paolo Bonzini382-1963/+4984
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.12 - Make the nVHE EL2 object relocatable, resulting in much more maintainable code - Handle concurrent translation faults hitting the same page in a more elegant way - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Allow the disabling of symbol export from assembly code - Simplification of the early init hypercall handling
2021-02-12Merge branch 'kvm-arm64/pmu-debug-fixes-5.11' into kvmarm-master/nextMarc Zyngier3-38/+64
Signed-off-by: Marc Zyngier <[email protected]>
2021-02-12Merge branch 'kvm-arm64/rng-5.12' into kvmarm-master/nextMarc Zyngier5-1/+125
Signed-off-by: Marc Zyngier <[email protected]>
2021-02-12Merge branch 'kvm-arm64/hyp-reloc' into kvmarm-master/nextMarc Zyngier20-135/+604
Signed-off-by: Marc Zyngier <[email protected]>
2021-02-12Merge branch 'kvm-arm64/concurrent-translation-fault' into kvmarm-master/nextMarc Zyngier3-41/+60
Signed-off-by: Marc Zyngier <[email protected]>
2021-02-12Merge branch 'kvm-arm64/misc-5.12' into kvmarm-master/nextMarc Zyngier3-14/+7
Signed-off-by: Marc Zyngier <[email protected]>
2021-02-12Merge tag 'kvmarm-fixes-5.11-2' into kvmarm-master/nextMarc Zyngier6-49/+74
KVM/arm64 fixes for 5.11, take #2 - Don't allow tagged pointers to point to memslots - Filter out ARMv8.1+ PMU events on v8.0 hardware - Hide PMU registers from userspace when no PMU is configured - More PMU cleanups - Don't try to handle broken PSCI firmware - More sys_reg() to reg_to_encoding() conversions Signed-off-by: Marc Zyngier <[email protected]>
2021-02-11KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytesSean Christopherson1-5/+6
Add a 2 byte pad to struct compat_vcpu_info so that the sum size of its fields is actually 64 bytes. The effective size without the padding is also 64 bytes due to the compiler aligning evtchn_pending_sel to a 4-byte boundary, but depending on compiler alignment is subtle and unnecessary. Opportunistically replace spaces with tables in the other fields. Cc: David Woodhouse <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Don't bother mapping GVA for Xen shinfo testSean Christopherson1-4/+3
Don't bother mapping the Xen shinfo pages into the guest, they don't need to be accessed using the GVAs and passing a define with "GPA" in the name to addr_gva2hpa() is confusing. Cc: David Woodhouse <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Fix hex vs. decimal snafu in Xen testSean Christopherson1-1/+1
The Xen shinfo selftest uses '40' when setting the GPA of the vCPU info struct, but checks for the result at '0x40'. Arbitrarily use the hex version to resolve the bug. Fixes: 8d4e7e80838f ("KVM: x86: declare Xen HVM shared info capability and add test case") Cc: David Woodhouse <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Fix size of memslots created by Xen testsSean Christopherson2-4/+2
For better or worse, the memslot APIs take the number of pages, not the size in bytes. The Xen tests need 2 pages, not 8192 pages. Fixes: 8d4e7e80838f ("KVM: x86: declare Xen HVM shared info capability and add test case") Cc: David Woodhouse <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Ignore recently added Xen tests' build outputSean Christopherson1-0/+2
Add the new Xen test binaries to KVM selftest's .gitnore. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Add missing header file needed by xAPIC IPI testsPeter Shier1-0/+55
Fixes: 678e90a349a4 ("KVM: selftests: Test IPI to halted vCPU in xAPIC while backing page moves") Cc: Andrew Jones <[email protected]> Cc: Jim Mattson <[email protected]> Signed-off-by: Peter Shier <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.cRicardo Koller1-4/+4
Building the KVM selftests with LLVM's integrated assembler fails with: $ CFLAGS=-fintegrated-as make -C tools/testing/selftests/kvm CC=clang lib/x86_64/svm.c:77:16: error: too few operands for instruction asm volatile ("vmsave\n\t" : : "a" (vmcb_gpa) : "memory"); ^ <inline asm>:1:2: note: instantiated into assembly here vmsave ^ lib/x86_64/svm.c:134:3: error: too few operands for instruction "vmload\n\t" ^ <inline asm>:1:2: note: instantiated into assembly here vmload ^ This is because LLVM IAS does not currently support calling vmsave, vmload, or vmload without an explicit %rax operand. Add an explicit operand to vmsave, vmload, and vmrum in svm.c. Fixing this was suggested by Sean Christopherson. Tested: building without this error in clang 11. The following patch (not queued yet) needs to be applied to solve the other remaining error: "selftests: kvm: remove reassignment of non-absolute variables". Suggested-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/kvm/[email protected]/ Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Ricardo Koller <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: SVM: Make symbol 'svm_gp_erratum_intercept' staticWei Yongjun1-1/+1
The sparse tool complains as follows: arch/x86/kvm/svm/svm.c:204:6: warning: symbol 'svm_gp_erratum_intercept' was not declared. Should it be static? This symbol is not used outside of svm.c, so this commit marks it static. Fixes: 82a11e9c6fa2b ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wei Yongjun <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11Merge tag 'kvm-ppc-next-5.12-1' of ↵Paolo Bonzini16-281/+309
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.12 - Support for second data watchpoint on POWER10, from Ravi Bangoria - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes from Nick Piggin and Fabiano Rosas
2021-02-11locking/arch: Move qrwlock.h include after qspinlock.hWaiman Long6-6/+6
include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via asm-generic/qspinlock.h. However, this does not work because architectures might be using queued rwlocks but not queued spinlocks (csky), or because they might be defining their own queued_* macros before including asm/qspinlock.h. To fix this, ensure that asm/spinlock.h always includes qrwlock.h after defining arch_spin_is_locked (either directly for csky, or via asm/qspinlock.h for other architectures). The only inclusion elsewhere is in kernel/locking/qrwlock.c. That one is really unnecessary because the file is only compiled in SMP configurations (config QUEUED_RWLOCKS depends on SMP) and in that case linux/spinlock.h already includes asm/qrwlock.h if needed, via asm/spinlock.h. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Waiman Long <[email protected]> Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks") Tested-by: Guenter Roeck <[email protected]> Reviewed-by: Ben Gardon <[email protected]> [Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-11KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guestsNicholas Piggin1-0/+3
Commit 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") incorrectly removed the radix host instruction patch to skip re-loading the host SLB entries when exiting from a hash guest. Restore it. Fixes: 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-11KVM: PPC: Book3S HV: Ensure radix guest has no SLB entriesPaul Mackerras1-1/+5
Commit 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") changed the older guest entry path, with the side effect that vcpu->arch.slb_max no longer gets cleared for a radix guest. This means that a HPT guest which loads some SLB entries, switches to radix mode, runs the guest using the old guest entry path (e.g., because the indep_threads_mode module parameter has been set to false), and then switches back to HPT mode would now see the old SLB entries being present, whereas previously it would have seen no SLB entries. To avoid changing guest-visible behaviour, this adds a store instruction to clear vcpu->arch.slb_max for a radix guest using the old guest entry path. Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2Fabiano Rosas3-2/+13
These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [[email protected] - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: Save and restore FSCR in the P9 pathFabiano Rosas1-0/+4
The Facility Status and Control Register is a privileged SPR that defines the availability of some features in problem state. Since it can be written by the guest, we must restore it to the previous host value after guest exit. This restoration is currently done by taking the value from current->thread.fscr, which in the P9 path is not enough anymore because the guest could context switch the QEMU thread, causing the guest-current value to be saved into the thread struct. The above situation manifested when running a QEMU linked against a libc with System Call Vectored support, which causes scv instructions to be run by QEMU early during the guest boot (during SLOF), at which point the FSCR is 0 due to guest entry. After a few scv calls (1 to a couple hundred), the context switching happens and the QEMU thread runs with the guest value, resulting in a Facility Unavailable interrupt. This patch saves and restores the host value of FSCR in the inner guest entry loop in a way independent of current->thread.fscr. The old way of doing it is still kept in place because it works for the old entry path. Signed-off-by: Fabiano Rosas <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: remove unneeded semicolonYang Li1-1/+1
Eliminate the following coccicheck warning: ./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLBNicholas Piggin1-3/+3
IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guestNicholas Piggin1-1/+5
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: Fix radix guest SLB side channelNicholas Piggin1-8/+31
The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host ↵Nicholas Piggin5-226/+32
without mixed mode support This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Fabiano Rosas <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2021-02-10KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWRRavi Bangoria6-0/+35
Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>