Age | Commit message (Collapse) | Author | Files | Lines |
|
The kvm_vcpu variable, guest_xcr0_loaded, is a waste of an 'int'
and a conditional branch. VMX and SVM are the only users, and both
unconditionally pair kvm_load_guest_xcr0() with kvm_put_guest_xcr0()
making this check unnecessary. Without this variable, the predicates in
kvm_load_guest_xcr0 and kvm_put_guest_xcr0 should match.
Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Aaron Lewis <[email protected]>
Change-Id: I7b1eb9b62969d7bbb2850f27e42f863421641b23
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Volume 4 of the SDM says that IA32_XSS is supported
if CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3] is set, so only the
X86_FEATURE_XSAVES check is necessary (X86_FEATURE_XSAVES is the Linux
name for CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3]).
Fixes: 4d763b168e9c5 ("KVM: VMX: check CPUID before allowing read/write of IA32_XSS")
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Aaron Lewis <[email protected]>
Change-Id: I9059b9f2e3595e4b09a4cdcf14b933b22ebad419
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to
vcpu->arch.
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Aaron Lewis <[email protected]>
Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Rename {vmx,nested_vmx}_vcpu_setup() to match what they really do.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Move the initialization of vmx->guest_msrs[] from vmx_vcpu_setup() to
vmx_create_vcpu(), and put it right after its allocation.
This also is the preperation for next patch.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
... It can be removed here because the same code is called later in
vmx_vcpu_reset() as the flow:
kvm_arch_vcpu_setup()
-> kvm_vcpu_reset()
-> vmx_vcpu_reset()
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Move the code that writes vmx->vpid to vmcs from vmx_vcpu_reset() to
vmx_vcpu_setup(), because vmx->vpid is allocated when creating vcpu and
never changed. So we don't need to update the vmcs.vpid when resetting
vcpu.
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Replace the explicit declaration of "u64 reprogram_pmi" with the generic
macro DECLARE_BITMAP for all possible appropriate number of bits.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If we reach here with r = 0, we will reassign r = 0
unnecesarry, then do the label set_irqchip_out work.
If we reach here with r != 0, then we will do the label
work directly. So this if statement and r = 0 assignment
is redundant.
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Generally, APICv for all vcpus in the VM are enable/disable in the same
manner. So, get_enable_apicv() should represent APICv status of the VM
instead of each VCPU.
Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter
instead of struct kvm_vcpu.
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common
cache_reg() callback and drop the dedicated decache_cr3(). The name
decache_cr3() is somewhat confusing as the caching behavior of CR3
follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and
has nothing in common with the caching behavior of CR0/CR4 (whose
decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage).
This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM.
Change it to a WARN_ON_ONCE() -- if the cache never requires filling,
the value is already in the right place -- and opportunistically add one
in VMX to provide an equivalent check.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add helpers to prettify code that tests and/or marks whether or not a
register is available and/or dirty.
Suggested-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Now that indexing into arch.regs is either protected by WARN_ON_ONCE or
done with hardcoded enums, combine all definitions for registers that
are tracked by regs_avail and regs_dirty into 'enum kvm_reg'. Having a
single enum type will simplify additional cleanup related to regs_avail
and regs_dirty.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg
values that would cause KVM to overflow vcpu->arch.regs. Change the reg
param to an 'int' to make it clear that the reg index is unverified.
Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads
and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead
is limited to flows where the reg index is generated at runtime. And
there is at least one historical bug where KVM has generated an out-of-
bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always
use 32-bit SMRAM save state for 32-bit kernels").
Adding the WARN_ON_ONCE() protection paves the way for additional
cleanup related to kvm_reg and kvm_reg_ex.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Rework vmx_set_rflags() to avoid the extra code need to handle emulation
of real mode and invalid state when unrestricted guest is disabled. The
primary reason for doing so is to avoid the call to vmx_get_rflags(),
which will incur a VMREAD when RFLAGS is not already available. When
running nested VMs, the majority of calls to vmx_set_rflags() will occur
without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS
during transitions between vmcs01 and vmcs02.
Note, vmx_get_rflags() guarantees RFLAGS is marked available.
Signed-off-by: Sean Christopherson <[email protected]>
[Replace "else" with early "return" in the unrestricted guest branch. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Capture struct vcpu_vmx in a local variable to improve the readability
of vmx_{g,s}et_rflags().
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has
not been read from the VMCS since the last VM-Enter. If vcpu->arch.cr3
is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS,
meaning KVM will do a VMREAD and then VMWRITE the value it just pulled
from the VMCS.
Note, this is a purely theoretical change, no instances of skipping
the VMREAD+VMWRITE have been observed with this change.
Tested-by: Reto Buerki <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a
WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However,
the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated
to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs
that have been disassociated from guests through DEACTIVATE.
To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID
bitmap to track ASIDs that need to be reclaimed. When an SEV guest is
terminated, add its ASID to the reclaim bitmap instead of clearing the
bitmap in the existing SEV ASID bitmap. This delays the need to perform a
WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the
available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH
invocation can be performed and all ASIDs in the reclaim bitmap moved to
the available ASIDs bitmap.
The semaphore around DEACTIVATE can be changed to a read semaphore with
the semaphore taken in write mode before performing the WBINVD/DF_FLUSH.
Tested-by: David Rientjes <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support
currently performs this WBINVD/DF_FLUSH combination when an SEV guest is
terminated, so there is no need for it to be done before LAUNCH.
However, when the SEV firmware transitions the platform from UNINIT state
to INIT state, all ASIDs will be marked invalid across all threads.
Therefore, as part of transitioning the platform to INIT state, perform a
WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver.
Since the PSP/SEV device driver is x86 only, it can reference and use the
WBINVD related functions directly.
Cc: Gary Hook <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Tested-by: David Rientjes <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
instead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE
is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
VM-Exit occurs without actually entering L2, e.g. if the nested run
is squashed because nested VM-Enter (from L1) is putting L2 into HLT.
Note, the above scenario can occur regardless of whether L1 is
intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with
vmcs.GUEST_ACTIVITY_STATE=HALTED. But practically speaking, a VMM will
likely put a guest into HALTED if and only if it's not intercepting HLT.
In an ideal world where EPT *requires* unrestricted guest (and vice
versa), VMX could handle CR3 similar to how it handles RSP and RIP,
e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But
the unrestricted guest silliness complicates the dirty tracking logic
to the point that explicitly handling vmcs02.GUEST_CR3 during nested
VM-Enter is a simpler overall implementation.
Cc: [email protected]
Reported-and-tested-by: Reto Buerki <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Liran Alon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The SEV firmware DEACTIVATE command disassociates an SEV guest from an
ASID, clears the WBINVD indicator on all threads and indicates that the
SEV firmware DF_FLUSH command must be issued before the ASID can be
re-used. The SEV firmware DF_FLUSH command will return an error if a
WBINVD has not been performed on every thread before it has been invoked.
A window exists between the WBINVD and the invocation of the DF_FLUSH
command where an SEV firmware DEACTIVATE command could be invoked on
another thread, clearing the WBINVD indicator. This will cause the
subsequent SEV firmware DF_FLUSH command to fail which, in turn, results
in the SEV firmware ACTIVATE command failing for the reclaimed ASID.
This results in the SEV guest failing to start.
Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex
before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures
that any DEACTIVATE cannot run before a DF_FLUSH has completed.
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Tested-by: David Rientjes <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The SEV ASID bitmap currently is not protected against parallel SEV guest
startups. This can result in an SEV guest failing to start because another
SEV guest could have been assigned the same ASID value. Use a mutex to
serialize access to the SEV ASID bitmap.
Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Tested-by: David Rientjes <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM fix for 5.4
- Fix a bug in the XIVE code which can cause a host crash.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm fixes for 5.4, take #2
Special PMU edition:
- Fix cycle counter truncation
- Fix cycle counter overflow limit on pure 64bit system
- Allow chained events to be actually functional
- Correct sample period after overflow
|
|
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is
not enabled. This can be confusing, so fix it.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Use BUG_ON instead of a if condition followed by BUG.
Generated by: scripts/coccinelle/misc/bugon.cocci
Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()")
CC: Sean Christopherson <[email protected]>
Signed-off-by: kbuild test robot <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit
and handle WAITPKG vmexit") introduced specialized handling of
specific exit-reasons that should not be raised by CPU because
KVM configures VMCS such that they should never be raised.
However, since commit 7396d337cfad ("KVM: x86: Return to userspace
with internal error on unexpected exit reason"), VMX & SVM
exit handlers were modified to generically handle all unexpected
exit-reasons by returning to userspace with internal error.
Therefore, there is no need for specialized handling of specific
unexpected exit-reasons (This specialized handling also introduced
inconsistency for these exit-reasons to silently skip guest instruction
instead of return to userspace on internal-error).
Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Commit 204c91eff798a ("KVM: selftests: do not blindly clobber registers in
guest asm") was intended to make test more gcc-proof, however, the result
is exactly the opposite: on newer gccs (e.g. 8.2.1) the test breaks with
==== Test Assertion Failure ====
x86_64/sync_regs_test.c:168: run->s.regs.regs.rbx == 0xBAD1DEA + 1
pid=14170 tid=14170 - Invalid argument
1 0x00000000004015b3: main at sync_regs_test.c:166 (discriminator 6)
2 0x00007f413fb66412: ?? ??:0
3 0x000000000040191d: _start at ??:?
rbx sync regs value incorrect 0x1.
Apparently, compile is still free to play games with registers even
when they have variables attached.
Re-write guest code with 'asm volatile' by embedding ucall there and
making sure rbx is preserved.
Fixes: 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
vmx_dirty_log_test fails on AMD and this is no surprise as it is VMX
specific. Bail early when nested VMX is unsupported.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
vmx_* tests require VMX and three of them implement the same check. Move it
to vmx library.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
vmx_set_nested_state_test() checks if VMX is supported twice: in the very
beginning (and skips the whole test if it's not) and before doing
test_vmx_nested_state(). One should be enough.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host
side polling is disabled.
Acked-by: Marcelo Tosatti <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Because "Untracked files:" are annoying.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When the RDPID instruction is supported on the host, enumerate it in
KVM_GET_SUPPORTED_CPUID.
Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a bunch of pin control fixes. I was lagging behind on this
one, some fixes should have come in earlier, sorry about that.
Anyways here it is, pretty straight-forward fixes, the Strago fix
stand out as something serious affecting a lot of machines.
Summary:
- Handle multiple instances of Intel chips without complaining.
- Restore the Intel Strago DMI workaround
- Make the Armada 37xx handle pins over 32
- Fix the polarity of the LED group on Armada 37xx
- Fix an off-by-one bug in the NS2 driver
- Fix error path for iproc's platform_get_irq()
- Fix error path on the STMFX driver
- Fix a typo in the Berlin AS370 driver
- Fix up misc errors in the Aspeed 2600 BMC support
- Fix a stray SPDX tag"
* tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups
pinctrl: aspeed-g6: Fix UART13 group pinmux
pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively
pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration
pinctrl: aspeed-g6: Fix I2C14 SDA description
pinctrl: aspeed-g6: Sort pins for sanity
dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups
pinctrl: berlin: as370: fix a typo s/spififib/spdifib
pinctrl: armada-37xx: swap polarity on LED group
pinctrl: stmfx: fix null pointer on remove
pinctrl: iproc: allow for error from platform_get_irq()
pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable()
pinctrl: bcm-iproc: Use SPDX header
pinctrl: armada-37xx: fix control of pins 32 and up
pinctrl: cherryview: restore Strago DMI workaround for all versions
pinctrl: intel: Allocate IRQ chip dynamic
|
|
AIL=2 mode has no known users, so is not well tested or supported.
Disallow guests from selecting this mode because it may become
deprecated in future versions of the architecture.
This policy decision is not left to QEMU because KVM support is
required for AIL=2 (when injecting interrupts).
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which
can result in the guest receiving interrupts as if LPCR[AIL]=0
contrary to the ISA.
In practice, Linux guests cope with this deviation, but it should be
fixed.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This consolidates the HV interrupt delivery logic into one place.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
reset_msr sets the MSR for interrupt injection, but it's cleaner and
more flexible to provide a single op to set both MSR and PC for the
interrupt.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Acked-by: Paul Mackerras <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that
userspace can tell how many interrupt servers it needs. If a VM needs
less than the current default of KVM_MAX_VCPUS (2048), we can allocate
less VPs in OPAL. Combined with a core stride (VSMT) that matches the
number of guest threads per core, this may substantially increases the
number of VMs that can run concurrently with an in-kernel XIVE device.
Since the legacy XIVE KVM device is exposed to userspace through the
XICS KVM API, a new attribute group is added to it for this purpose.
While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES
in the XICS documentation.
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The XIVE VP is an internal structure which allow the XIVE interrupt
controller to maintain the interrupt context state of vCPUs non
dispatched on HW threads.
When a guest is started, the XIVE KVM device allocates a block of
XIVE VPs in OPAL, enough to accommodate the highest possible vCPU
id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048).
With a guest's core stride of 8 and a threading mode of 1 (QEMU's
default), a VM must run at least 256 vCPUs to actually need such a
range of VPs.
A POWER9 system has a limited XIVE VP space : 512k and KVM is
currently wasting this HW resource with large VP allocations,
especially since a typical VM likely runs with a lot less vCPUs.
Make the size of the VP block configurable. Add an nr_servers
field to the XIVE structure and a function to set it for this
purpose.
Split VP allocation out of the device create function. Since the
VP block isn't used before the first vCPU connects to the XIVE KVM
device, allocation is now performed by kvmppc_xive_connect_vcpu().
This gives the opportunity to set nr_servers in between:
kvmppc_xive_create() / kvmppc_xive_native_create()
.
.
kvmppc_xive_set_nr_servers()
.
.
kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu()
The connect_vcpu() functions check that the vCPU id is below nr_servers
and if it is the first vCPU they allocate the VP block. This is protected
against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers()
with the xive->lock mutex.
Also, the block is allocated once for the device lifetime: nr_servers
should stay constant otherwise connect_vcpu() could generate a boggus
VP id and likely crash OPAL. It is thus forbidden to update nr_servers
once the block is allocated.
If the VP allocation fail, return ENOSPC which seems more appropriate to
report the depletion of system wide HW resource than ENOMEM or ENXIO.
A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence
only need 256 VPs instead of 2048. If the stride is set to match the number
of threads per core, this goes further down to 32.
This will be exposed to userspace by a subsequent patch.
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Reduce code duplication by consolidating the checking of vCPU ids and VP
ids to a common helper used by both legacy and native XIVE KVM devices.
And explain the magic with a comment.
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Print out the VP id of each connected vCPU, this allow to see:
- the VP block base in which OPAL encodes information that may be
useful when debugging
- the packed vCPU id which may differ from the raw vCPU id if the
latter is >= KVM_MAX_VCPUS (2048)
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or
XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive
set forever since the release method isn't called in this case. Any
subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will
thus always fail (DoS). This is a problem for QEMU since it destroys
and re-creates these devices when the VM is reset: the VM would be
restricted to using the much slower emulated XIVE or XICS forever.
As an alternative to adding rollback, do not assign kvm->arch.xive before
making sure the XIVE VPs are allocated in OPAL.
Cc: [email protected] # v5.2
Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Given that in kvm_create_vm() there is:
kvm->mm = current->mm;
And that on every kvm_*_ioctl we have:
if (kvm->mm != current->mm)
return -EIO;
I see no reason to keep using current->mm instead of kvm->mm.
By doing so, we would reduce the use of 'global' variables on code, relying
more in the contents of kvm struct.
Signed-off-by: Leonardo Bras <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Reduces the number of calls to get_current() in order to get the value of
current->mm by doing it once and storing the value, since it is not
supposed to change inside the same process).
Signed-off-by: Leonardo Bras <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
the next instruction to be single stepped via the
KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
to inform userspace about the state of single stepping support.
We currently don't have support for guest single stepping implemented
in Book3S HV so the capability is only present for Book3S PR and
BookE.
Signed-off-by: Fabiano Rosas <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- fix a bashism of setlocalversion
- do not use the too new --sort option of tar
* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: substituting --sort in archive creation
scripts: setlocalversion: fix a bashism
kbuild: update comment about KBUILD_ALLDIRS
|