aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-10-22KVM: x86: Remove unneeded kvm_vcpu variable, guest_xcr0_loadedAaron Lewis2-12/+5
The kvm_vcpu variable, guest_xcr0_loaded, is a waste of an 'int' and a conditional branch. VMX and SVM are the only users, and both unconditionally pair kvm_load_guest_xcr0() with kvm_put_guest_xcr0() making this check unnecessary. Without this variable, the predicates in kvm_load_guest_xcr0 and kvm_put_guest_xcr0 should match. Suggested-by: Sean Christopherson <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Aaron Lewis <[email protected]> Change-Id: I7b1eb9b62969d7bbb2850f27e42f863421641b23 Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Fix conditions for guest IA32_XSS supportAaron Lewis1-13/+11
Volume 4 of the SDM says that IA32_XSS is supported if CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3] is set, so only the X86_FEATURE_XSAVES check is necessary (X86_FEATURE_XSAVES is the Linux name for CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3]). Fixes: 4d763b168e9c5 ("KVM: VMX: check CPUID before allowing read/write of IA32_XSS") Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Aaron Lewis <[email protected]> Change-Id: I9059b9f2e3595e4b09a4cdcf14b933b22ebad419 Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: Introduce vcpu->arch.xsaves_enabledAaron Lewis3-0/+9
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Aaron Lewis <[email protected]> Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Rename {vmx,nested_vmx}_vcpu_setup()Xiaoyao Li3-6/+7
Rename {vmx,nested_vmx}_vcpu_setup() to match what they really do. Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Initialize vmx->guest_msrs[] right after allocationXiaoyao Li1-18/+16
Move the initialization of vmx->guest_msrs[] from vmx_vcpu_setup() to vmx_create_vcpu(), and put it right after its allocation. This also is the preperation for next patch. Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Remove vmx->hv_deadline_tsc initialization from vmx_vcpu_setup()Xiaoyao Li1-1/+0
... It can be removed here because the same code is called later in vmx_vcpu_reset() as the flow: kvm_arch_vcpu_setup() -> kvm_vcpu_reset() -> vmx_vcpu_reset() Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Write VPID to vmcs when creating vcpuXiaoyao Li1-3/+3
Move the code that writes vmx->vpid to vmcs from vmx_vcpu_reset() to vmx_vcpu_setup(), because vmx->vpid is allocated when creating vcpu and never changed. So we don't need to update the vmcs.vpid when resetting vcpu. Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86/vPMU: Declare kvm_pmu->reprogram_pmi field using DECLARE_BITMAPLike Xu2-11/+6
Replace the explicit declaration of "u64 reprogram_pmi" with the generic macro DECLARE_BITMAP for all possible appropriate number of bits. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Like Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: remove redundant code in kvm_arch_vm_ioctlMiaohe Lin1-3/+0
If we reach here with r = 0, we will reassign r = 0 unnecesarry, then do the label set_irqchip_out work. If we reach here with r != 0, then we will do the label work directly. So this if statement and r = 0 assignment is redundant. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameterSuthikulpanit, Suravee4-5/+5
Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: Fold decache_cr3() into cache_reg()Sean Christopherson4-17/+8
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: Add helpers to test/mark reg availability and dirtinessSean Christopherson4-32/+49
Add helpers to prettify code that tests and/or marks whether or not a register is available and/or dirty. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'Sean Christopherson2-4/+2
Now that indexing into arch.regs is either protected by WARN_ON_ONCE or done with hardcoded enums, combine all definitions for registers that are tracked by regs_avail and regs_dirty into 'enum kvm_reg'. Having a single enum type will simplify additional cleanup related to regs_avail and regs_dirty. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: Add WARNs to detect out-of-bounds register indicesSean Christopherson2-8/+10
Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg values that would cause KVM to overflow vcpu->arch.regs. Change the reg param to an 'int' to make it clear that the reg index is unverified. Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead is limited to flows where the reg index is generated at runtime. And there is at least one historical bug where KVM has generated an out-of- bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels"). Adding the WARN_ON_ONCE() protection paves the way for additional cleanup related to kvm_reg and kvm_reg_ex. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Optimize vmx_set_rflags() for unrestricted guestSean Christopherson1-2/+9
Rework vmx_set_rflags() to avoid the extra code need to handle emulation of real mode and invalid state when unrestricted guest is disabled. The primary reason for doing so is to avoid the call to vmx_get_rflags(), which will incur a VMREAD when RFLAGS is not already available. When running nested VMs, the majority of calls to vmx_set_rflags() will occur without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS during transitions between vmcs01 and vmcs02. Note, vmx_get_rflags() guarantees RFLAGS is marked available. Signed-off-by: Sean Christopherson <[email protected]> [Replace "else" with early "return" in the unrestricted guest branch. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessorsSean Christopherson1-9/+11
Capture struct vcpu_vmx in a local variable to improve the readability of vmx_{g,s}et_rflags(). No functional change intended. Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-dateSean Christopherson1-3/+5
Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has not been read from the VMCS since the last VM-Enter. If vcpu->arch.cr3 is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS, meaning KVM will do a VMREAD and then VMWRITE the value it just pulled from the VMCS. Note, this is a purely theoretical change, no instances of skipping the VMREAD+VMWRITE have been observed with this change. Tested-by: Reto Buerki <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: SVM: Reduce WBINVD/DF_FLUSH invocationsTom Lendacky1-15/+66
Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However, the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs that have been disassociated from guests through DEACTIVATE. To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID bitmap to track ASIDs that need to be reclaimed. When an SEV guest is terminated, add its ASID to the reclaim bitmap instead of clearing the bitmap in the existing SEV ASID bitmap. This delays the need to perform a WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH invocation can be performed and all ASIDs in the reclaim bitmap moved to the available ASIDs bitmap. The semaphore around DEACTIVATE can be changed to a read semaphore with the semaphore taken in write mode before performing the WBINVD/DF_FLUSH. Tested-by: David Rientjes <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: SVM: Remove unneeded WBINVD and DF_FLUSH when starting SEV guestsTom Lendacky2-15/+9
Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support currently performs this WBINVD/DF_FLUSH combination when an SEV guest is terminated, so there is no need for it to be done before LAUNCH. However, when the SEV firmware transitions the platform from UNINIT state to INIT state, all ASIDs will be marked invalid across all threads. Therefore, as part of transitioning the platform to INIT state, perform a WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver. Since the PSP/SEV device driver is x86 only, it can reference and use the WBINVD related functions directly. Cc: Gary Hook <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "David S. Miller" <[email protected]> Tested-by: David Rientjes <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-EnterSean Christopherson2-3/+17
Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter instead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated VM-Exit occurs without actually entering L2, e.g. if the nested run is squashed because nested VM-Enter (from L1) is putting L2 into HLT. Note, the above scenario can occur regardless of whether L1 is intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with vmcs.GUEST_ACTIVITY_STATE=HALTED. But practically speaking, a VMM will likely put a guest into HALTED if and only if it's not intercepting HLT. In an ideal world where EPT *requires* unrestricted guest (and vice versa), VMX could handle CR3 similar to how it handles RSP and RIP, e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But the unrestricted guest silliness complicates the dirty tracking logic to the point that explicitly handling vmcs02.GUEST_CR3 during nested VM-Enter is a simpler overall implementation. Cc: [email protected] Reported-and-tested-by: Reto Buerki <[email protected]> Tested-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: SVM: Guard against DEACTIVATE when performing WBINVD/DF_FLUSHTom Lendacky1-0/+20
The SEV firmware DEACTIVATE command disassociates an SEV guest from an ASID, clears the WBINVD indicator on all threads and indicates that the SEV firmware DF_FLUSH command must be issued before the ASID can be re-used. The SEV firmware DF_FLUSH command will return an error if a WBINVD has not been performed on every thread before it has been invoked. A window exists between the WBINVD and the invocation of the DF_FLUSH command where an SEV firmware DEACTIVATE command could be invoked on another thread, clearing the WBINVD indicator. This will cause the subsequent SEV firmware DF_FLUSH command to fail which, in turn, results in the SEV firmware ACTIVATE command failing for the reclaimed ASID. This results in the SEV guest failing to start. Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures that any DEACTIVATE cannot run before a DF_FLUSH has completed. Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Tested-by: David Rientjes <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: SVM: Serialize access to the SEV ASID bitmapTom Lendacky1-12/+17
The SEV ASID bitmap currently is not protected against parallel SEV guest startups. This can result in an SEV guest failing to start because another SEV guest could have been assigned the same ASID value. Use a mutex to serialize access to the SEV ASID bitmap. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Tested-by: David Rientjes <[email protected]> Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22Merge tag 'kvm-ppc-fixes-5.4-1' of ↵Paolo Bonzini3-10/+32
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM fix for 5.4 - Fix a bug in the XIVE code which can cause a host crash.
2019-10-22Merge tag 'kvmarm-fixes-5.4-2' of ↵Paolo Bonzini2-13/+39
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.4, take #2 Special PMU edition: - Fix cycle counter truncation - Fix cycle counter overflow limit on pure 64bit system - Allow chained events to be actually functional - Correct sample period after overflow
2019-10-22kvm: clear kvmclock MSR on resetPaolo Bonzini1-5/+3
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is not enabled. This can be confusing, so fix it. Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: x86: fix bugon.cocci warningskbuild test robot1-2/+1
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()") CC: Sean Christopherson <[email protected]> Signed-off-by: kbuild test robot <[email protected]> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: VMX: Remove specialized handling of unexpected exit-reasonsLiran Alon1-12/+0
Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") introduced specialized handling of specific exit-reasons that should not be raised by CPU because KVM configures VMCS such that they should never be raised. However, since commit 7396d337cfad ("KVM: x86: Return to userspace with internal error on unexpected exit reason"), VMX & SVM exit handlers were modified to generically handle all unexpected exit-reasons by returning to userspace with internal error. Therefore, there is no need for specialized handling of specific unexpected exit-reasons (This specialized handling also introduced inconsistency for these exit-reasons to silently skip guest instruction instead of return to userspace on internal-error). Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") Signed-off-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22selftests: kvm: fix sync_regs_test with newer gccsVitaly Kuznetsov1-10/+11
Commit 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm") was intended to make test more gcc-proof, however, the result is exactly the opposite: on newer gccs (e.g. 8.2.1) the test breaks with ==== Test Assertion Failure ==== x86_64/sync_regs_test.c:168: run->s.regs.regs.rbx == 0xBAD1DEA + 1 pid=14170 tid=14170 - Invalid argument 1 0x00000000004015b3: main at sync_regs_test.c:166 (discriminator 6) 2 0x00007f413fb66412: ?? ??:0 3 0x000000000040191d: _start at ??:? rbx sync regs value incorrect 0x1. Apparently, compile is still free to play games with registers even when they have variables attached. Re-write guest code with 'asm volatile' by embedding ucall there and making sure rbx is preserved. Fixes: 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm") Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supportedVitaly Kuznetsov1-0/+2
vmx_dirty_log_test fails on AMD and this is no surprise as it is VMX specific. Bail early when nested VMX is unsupported. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22selftests: kvm: consolidate VMX support checksVitaly Kuznetsov5-15/+15
vmx_* tests require VMX and three of them implement the same check. Move it to vmx library. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twiceVitaly Kuznetsov1-6/+1
vmx_set_nested_state_test() checks if VMX is supported twice: in the very beginning (and skips the whole test if it's not) and before doing test_vmx_nested_state(). One should be enough. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabledWanpeng Li1-13/+16
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host side polling is disabled. Acked-by: Marcelo Tosatti <[email protected]> Cc: Marcelo Tosatti <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22selftests: kvm: synchronize .gitignore to MakefileVitaly Kuznetsov1-0/+2
Because "Untracked files:" are annoying. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUIDJim Mattson1-1/+1
When the RDPID instruction is supported on the host, enumerate it in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-10-22Merge tag 'pinctrl-v5.4-2' of ↵Linus Torvalds10-159/+131
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a bunch of pin control fixes. I was lagging behind on this one, some fixes should have come in earlier, sorry about that. Anyways here it is, pretty straight-forward fixes, the Strago fix stand out as something serious affecting a lot of machines. Summary: - Handle multiple instances of Intel chips without complaining. - Restore the Intel Strago DMI workaround - Make the Armada 37xx handle pins over 32 - Fix the polarity of the LED group on Armada 37xx - Fix an off-by-one bug in the NS2 driver - Fix error path for iproc's platform_get_irq() - Fix error path on the STMFX driver - Fix a typo in the Berlin AS370 driver - Fix up misc errors in the Aspeed 2600 BMC support - Fix a stray SPDX tag" * tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups pinctrl: aspeed-g6: Fix UART13 group pinmux pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration pinctrl: aspeed-g6: Fix I2C14 SDA description pinctrl: aspeed-g6: Sort pins for sanity dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups pinctrl: berlin: as370: fix a typo s/spififib/spdifib pinctrl: armada-37xx: swap polarity on LED group pinctrl: stmfx: fix null pointer on remove pinctrl: iproc: allow for error from platform_get_irq() pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable() pinctrl: bcm-iproc: Use SPDX header pinctrl: armada-37xx: fix control of pins 32 and up pinctrl: cherryview: restore Strago DMI workaround for all versions pinctrl: intel: Allocate IRQ chip dynamic
2019-10-22KVM: PPC: Book3S HV: Reject mflags=2 (LPCR[AIL]=2) ADDR_TRANS_MODE modeNicholas Piggin1-0/+5
AIL=2 mode has no known users, so is not well tested or supported. Disallow guests from selecting this mode because it may become deprecated in future versions of the architecture. This policy decision is not left to QEMU because KVM support is required for AIL=2 (when injecting interrupts). Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: Implement LPCR[AIL]=3 mode for injected interruptsNicholas Piggin1-0/+15
kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which can result in the guest receiving interrupts as if LPCR[AIL]=0 contrary to the ISA. In practice, Linux guests cope with this deviation, but it should be fixed. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest deliveryNicholas Piggin3-57/+56
This consolidates the HV interrupt delivery logic into one place. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin8-62/+63
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S: Define and use SRR1_MSR_BITSNicholas Piggin3-2/+14
Acked-by: Paul Mackerras <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPsGreg Kurz5-2/+36
Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that userspace can tell how many interrupt servers it needs. If a VM needs less than the current default of KVM_MAX_VCPUS (2048), we can allocate less VPs in OPAL. Combined with a core stride (VSMT) that matches the number of guest threads per core, this may substantially increases the number of VMs that can run concurrently with an in-kernel XIVE device. Since the legacy XIVE KVM device is exposed to userspace through the XICS KVM API, a new attribute group is added to it for this purpose. While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES in the XICS documentation. Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: XIVE: Make VP block size configurableGreg Kurz3-25/+62
The XIVE VP is an internal structure which allow the XIVE interrupt controller to maintain the interrupt context state of vCPUs non dispatched on HW threads. When a guest is started, the XIVE KVM device allocates a block of XIVE VPs in OPAL, enough to accommodate the highest possible vCPU id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048). With a guest's core stride of 8 and a threading mode of 1 (QEMU's default), a VM must run at least 256 vCPUs to actually need such a range of VPs. A POWER9 system has a limited XIVE VP space : 512k and KVM is currently wasting this HW resource with large VP allocations, especially since a typical VM likely runs with a lot less vCPUs. Make the size of the VP block configurable. Add an nr_servers field to the XIVE structure and a function to set it for this purpose. Split VP allocation out of the device create function. Since the VP block isn't used before the first vCPU connects to the XIVE KVM device, allocation is now performed by kvmppc_xive_connect_vcpu(). This gives the opportunity to set nr_servers in between: kvmppc_xive_create() / kvmppc_xive_native_create() . . kvmppc_xive_set_nr_servers() . . kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu() The connect_vcpu() functions check that the vCPU id is below nr_servers and if it is the first vCPU they allocate the VP block. This is protected against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers() with the xive->lock mutex. Also, the block is allocated once for the device lifetime: nr_servers should stay constant otherwise connect_vcpu() could generate a boggus VP id and likely crash OPAL. It is thus forbidden to update nr_servers once the block is allocated. If the VP allocation fail, return ENOSPC which seems more appropriate to report the depletion of system wide HW resource than ENOMEM or ENXIO. A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence only need 256 VPs instead of 2048. If the stride is set to match the number of threads per core, this goes further down to 32. This will be exposed to userspace by a subsequent patch. Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helperGreg Kurz3-18/+36
Reduce code duplication by consolidating the checking of vCPU ids and VP ids to a common helper used by both legacy and native XIVE KVM devices. And explain the magic with a comment. Signed-off-by: Greg Kurz <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: XIVE: Show VP id in debugfsGreg Kurz2-4/+4
Print out the VP id of each connected vCPU, this allow to see: - the VP block base in which OPAL encodes information that may be useful when debugging - the packed vCPU id which may differ from the raw vCPU id if the latter is >= KVM_MAX_VCPUS (2048) Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocatedGreg Kurz2-7/+6
If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive set forever since the release method isn't called in this case. Any subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will thus always fail (DoS). This is a problem for QEMU since it destroys and re-creates these devices when the VM is reset: the VM would be restricted to using the much slower emulated XIVE or XICS forever. As an alternative to adding rollback, do not assign kvm->arch.xive before making sure the XIVE VPs are allocated in OPAL. Cc: [email protected] # v5.2 Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: E500: Replace current->mm by kvm->mmLeonardo Bras1-3/+3
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-22KVM: PPC: Reduce calls to get current->mm by storing the value locallyLeonardo Bras1-5/+6
Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: Leonardo Bras <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-21KVM: PPC: Report single stepping capabilityFabiano Rosas3-0/+6
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: Fabiano Rosas <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-10-20Linux 5.4-rc4Linus Torvalds1-1/+1
2019-10-20Merge tag 'kbuild-fixes-v5.4-2' of ↵Linus Torvalds3-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild fixes from Masahiro Yamada: - fix a bashism of setlocalversion - do not use the too new --sort option of tar * tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kheaders: substituting --sort in archive creation scripts: setlocalversion: fix a bashism kbuild: update comment about KBUILD_ALLDIRS