Age | Commit message (Collapse) | Author | Files | Lines |
|
Let's drop the goto and return the error value directly.
Suggested-by: Peter Xu <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
No need for the goto label + local variable "r".
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's rename it into a proper arch specific callback.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
We know there is an ioapic, so let's call it directly.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
kvm_ioapic_init() is guaranteed to be called without any created VCPUs,
so doing an all-vcpu request results in a NOP.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Currently, one could set pin 8-15, implicitly referring to
KVM_IRQCHIP_PIC_SLAVE.
Get rid of the two local variables max_pin and delta on the way.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's just move it to the place where it is actually needed.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
I don't see any reason any more for this lock, seemed to be used to protect
removal of kvm->arch.vpic / kvm->arch.vioapic when already partially
inititalized, now access is properly protected using kvm->arch.irqchip_mode
and this shouldn't be necessary anymore.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
When handling KVM_GET_IRQCHIP, we already check irqchip_kernel(), which
implies a fully inititalized ioapic.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's just use kvm->arch.vioapic directly.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
It seemed like a nice idea to encapsulate access to kvm->arch.vpic. But
as the usage is already mixed, internal locks are taken outside of i8259.c
and grepping for "vpic" only is much easier, let's just get rid of
pic_irqchip().
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
KVM_IRQCHIP_KERNEL implies a fully inititalized ioapic, while
kvm->arch.vioapic might temporarily be set but invalidated again if e.g.
setting of default routing fails when setting KVM_CREATE_IRQCHIP.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's avoid checking against kvm->arch.vpic. We have kvm->arch.irqchip_mode
for that now.
KVM_IRQCHIP_KERNEL implies a fully inititalized pic, while kvm->arch.vpic
might temporarily be set but invalidated again if e.g. kvm_ioapic_init()
fails when setting KVM_CREATE_IRQCHIP. Although current users seem to be
fine, this avoids future bugs.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's replace the checks for pic_in_kernel() and ioapic_in_kernel() by
checks against irqchip_mode.
Also make sure that creation of any route is only possible if we have
an lapic in kernel (irqchip_in_kernel()) or if we are currently
inititalizing the irqchip.
This is necessary to switch pic_in_kernel() and ioapic_in_kernel() to
irqchip_mode, too.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Let's add a new mode and set it while we create the irqchip via
KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP.
This mode will be used later to test if adding routes
(in kvm_set_routing_entry()) is already allowed.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Avoid races between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP by taking
the kvm->lock when setting up routes.
If KVM_CREATE_IRQCHIP fails, KVM_SET_GSI_ROUTING could have already set
up routes pointing at pic/ioapic, being silently removed already.
Also, as a side effect, this patch makes sure that KVM_SET_GSI_ROUTING
and KVM_CAP_SPLIT_IRQCHIP cannot run in parallel.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
When delivering a machine check the CPU state is "loaded", which
means that some registers are already in the host registers.
Before writing the register content into the machine check
save area, we must make sure that we save the content of the
registers into the data structures that are used for delivering
the machine check.
We already do the right thing for access, vector/floating point
registers, let's do the same for guarded storage.
Fixes: 4e0b1ab72b8a ("KVM: s390: gs support for kvm guests")
Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
|
|
If the guest does not use the host register management, but it uses
the sdnx area, we must fill in a proper sdnxo value (address of sdnx
and the sdnxc).
Reported-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
From: Christian Borntraeger <[email protected]>
KVM: s390: features for 4.12
1. guarded storage support for guests
This contains an s390 base Linux feature branch that is necessary
to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation
|
|
The userspace exception injection API and code path are entirely
unprepared for exceptions that might cause a VM-exit from L2 to L1, so
the best course of action may be to simply disallow this for now.
1. The API provides no mechanism for userspace to specify the new DR6
bits for a #DB exception or the new CR2 value for a #PF
exception. Presumably, userspace is expected to modify these registers
directly with KVM_SET_SREGS before the next KVM_RUN ioctl. However, in
the event that L1 intercepts the exception, these registers should not
be changed. Instead, the new values should be provided in the
exit_qualification field of vmcs12 (Intel SDM vol 3, section 27.1).
2. In the case of a userspace-injected #DB, inject_pending_event()
clears DR7.GD before calling vmx_queue_exception(). However, in the
event that L1 intercepts the exception, this is too early, because
DR7.GD should not be modified by a #DB that causes a VM-exit directly
(Intel SDM vol 3, section 27.1).
3. If the injected exception is a #PF, nested_vmx_check_exception()
doesn't properly check whether or not L1 is interested in the
associated error code (using the #PF error code mask and match fields
from vmcs12). It may either return 0 when it should call
nested_vmx_vmexit() or vice versa.
4. nested_vmx_check_exception() assumes that it is dealing with a
hardware-generated exception intercept from L2, with some of the
relevant details (the VM-exit interruption-information and the exit
qualification) live in vmcs02. For userspace-injected exceptions, this
is not the case.
5. prepare_vmcs12() assumes that when its exit_intr_info argument
specifies valid information with a valid error code that it can VMREAD
the VM-exit interruption error code from vmcs02. For
userspace-injected exceptions, this is not the case.
Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
If we already entered/are about to enter SMM, don't allow switching to
INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events()
will report a warning.
Same applies if we are already in MP state INIT_RECEIVED and SMM is
requested to be turned on. Refuse to set the VCPU events in this case.
Fixes: cd7764fe9f73 ("KVM: x86: latch INITs while in system management mode")
Cc: [email protected] # 4.2+
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).
Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Remove code from architecture files that can be moved to virt/kvm, since there
is already common code for coalesced MMIO.
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
[Removed a pointless 'break' after 'return'.]
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
In order to simplify adding exit reasons in the future,
the array of exit reason names is now also sorted by
exit reason code.
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads. When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set. The MMU didn't know this detail,
so this patch adds it.
We also have to fix up the exit qualification. It may be wrong because
KVM sets bit 6 but the guest might not.
L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware. The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.
Signed-off-by: Paolo Bonzini <[email protected]>
[Fixed typo in walk_addr_generic() comment and changed bit clear +
conditional-set pattern in handle_ept_violation() to conditional-clear]
Signed-off-by: Radim Krčmář <[email protected]>
|
|
This prepares the MMU paging code for EPT accessed and dirty bits,
which can be enabled optionally at runtime. Code that updates the
accessed and dirty bits will need a pointer to the struct kvm_mmu.
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
handle_ept_violation is checking for "guest-linear-address invalid" +
"not a paging-structure walk". However, _all_ EPT violations without
a valid guest linear address are paging structure walks, because those
EPT violations happen when loading the guest PDPTEs.
Therefore, the check can never be true, and even if it were, KVM doesn't
care about the guest linear address; it only uses the guest *physical*
address VMCS field. So, remove the check altogether.
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Large pages at the PDPE level can be emulated by the MMU, so the bit
can be set unconditionally in the EPT capabilities MSR. The same is
true of 2MB EPT pages, though all Intel processors with EPT in practice
support those.
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Legacy device assignment has been deprecated since 4.2 (released
1.5 years ago). VFIO is better and everyone should have switched to it.
If they haven't, this should convince them. :)
Reviewed-by: Alex Williamson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Virtual NMIs are only missing in Prescott and Yonah chips. Both are obsolete
for virtualization usage---Yonah is 32-bit only even---so drop vNMI emulation.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this
MSR returns 0x100010a. More specifically, bit 24 is set, which is simply
wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean
up the reserved bits in order not to confuse guests.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Let's combine it in a single function vmx_switch_vmcs().
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
According to the Intel SDM, volume 3, section 28.3.2: Creating and
Using Cached Translation Information, "No linear mappings are used
while EPT is in use." INVEPT will invalidate both the guest-physical
mappings and the combined mappings in the TLBs and paging-structure
caches, so an INVVPID is superfluous.
Signed-off-by: Jim Mattson <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Introduce a cap to enable AIS facility bit, and add documentation
for this capability.
Signed-off-by: Yi Min Zhao <[email protected]>
Signed-off-by: Fei Li <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips
From: James Hogan <[email protected]>
KVM: MIPS: VZ support, Octeon III, and TLBR
Add basic support for the MIPS Virtualization Module (generally known as
MIPS VZ) in KVM. We primarily support the ImgTec P5600, P6600, I6400,
and Cavium Octeon III cores so far. Support is included for the
following VZ / guest hardware features:
- MIPS32 and MIPS64, r5 (VZ requires r5 or later) and r6
- TLBs with GuestID (IMG cores) or Root ASID Dealias (Octeon III)
- Shared physical root/guest TLB (IMG cores)
- FPU / MSA
- Cop0 timer (up to 1GHz for now due to soft timer limit)
- Segmentation control (EVA)
- Hardware page table walker (HTW) both for root and guest TLB
Also included is a proper implementation of the TLBR instruction for the
trap & emulate MIPS KVM implementation.
Preliminary MIPS architecture changes are applied directly with Ralf's
ack.
|
|
Inject adapter interrupts on a specified adapter which allows to
retrieve the adapter flags, e.g. if the adapter is subject to AIS
facility or not. And add documentation for this interface.
For adapters subject to AIS, handle the airq injection suppression
for a given ISC according to the interruption mode:
- before injection, if NO-Interruptions Mode, just return 0 and
suppress, otherwise, allow the injection.
- after injection, if SINGLE-Interruption Mode, change it to
NO-Interruptions Mode to suppress the following interrupts.
Besides, add tracepoint for suppressed airq and AIS mode transitions.
Signed-off-by: Yi Min Zhao <[email protected]>
Signed-off-by: Fei Li <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Provide an interface for userspace to modify AIS
(adapter-interruption-suppression) mode state, and add documentation
for the interface. Allowed target modes are ALL-Interruptions mode
and SINGLE-Interruption mode.
We introduce the 'simm' and 'nimm' fields in kvm_s390_float_interrupt
to store interruption modes for each ISC. Each bit in 'simm' and
'nimm' targets to one ISC, and collaboratively indicate three modes:
ALL-Interruptions, SINGLE-Interruption and NO-Interruptions. This
interface can initiate most transitions between the states; transition
from SINGLE-Interruption to NO-Interruptions via adapter interrupt
injection will be introduced in a following patch. The meaningful
combinations are as follows:
interruption mode | simm bit | nimm bit
------------------|----------|----------
ALL | 0 | 0
SINGLE | 1 | 0
NO | 1 | 1
Besides, add tracepoint to track AIS mode transitions.
Co-Authored-By: Yi Min Zhao <[email protected]>
Signed-off-by: Yi Min Zhao <[email protected]>
Signed-off-by: Fei Li <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
In order to properly implement adapter-interruption suppression, we
need a way for userspace to specify which adapters are subject to
suppression. Let's convert the existing (and unused) 'pad' field into
a 'flags' field and define a flag value for suppressible adapters.
Besides, add documentation for the interface.
Signed-off-by: Fei Li <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.
This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.
Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.
This should cause no behavioral change.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
At the moment iommu_table can be disposed by either calling
iommu_table_free() directly or it_ops::free(); the only implementation
of free() is in IODA2 - pnv_ioda2_table_free() - and it calls
iommu_table_free() anyway.
As we are going to have reference counting on tables, we need an unified
way of disposing tables.
This moves it_ops::free() call into iommu_free_table() and makes use
of the latter. The free() callback now handles only platform-specific
data.
As from now on the iommu_free_table() calls it_ops->free(), we need
to have it_ops initialized before calling iommu_free_table() so this
moves this initialization in pnv_pci_ioda2_create_table().
This should cause no behavioral change.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In real mode, TCE tables are invalidated using special
cache-inhibited store instructions which are not available in
virtual mode
This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.
The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.
This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add column '%Total' next to 'Total' for easier comparison of numbers between
hosts.
Signed-off-by: Stefan Raspl <[email protected]>
Marc Hartmayer <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Provide an interactive command to reset the tracepoint statistics.
Requires some extra work for debugfs, as the counters cannot be reset.
On the up side, this offers us the opportunity to have debugfs values
reset on startup and whenever a filter is modified, becoming consistent
with the tracepoint provider. As a bonus, 'kvmstat -dt' will now provide
useful output, instead of mixing values in totally different orders of
magnitude.
Furthermore, we avoid unnecessary resets when any of the filters is
"changed" interactively to the previous value.
Signed-off-by: Stefan Raspl <[email protected]>
Acked-by: Janosch Frank <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Provide a real simple way to erase any active filter.
Signed-off-by: Stefan Raspl <[email protected]>
Reviewed-by: Marc Hartmayer <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Add a new option '-g'/'--guest' to select a particular process by providing
the QEMU guest name.
Notes:
- The logic to figure out the pid corresponding to the guest name might look
scary, but works pretty reliably in practice; in the unlikely event that it
returns add'l flukes, it will bail out and hint at using '-p' instead, no
harm done.
- Mixing '-g' and '-p' is possible, and the final instance specified on the
command line is the significant one. This is consistent with current
behavior for '-p' which, if specified multiple times, also regards the final
instance as the significant one.
Signed-off-by: Stefan Raspl <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Behavior on empty/0 input for regex and pid filtering was inconsistent, as
the former would keep the current filter, while the latter would (naturally)
remove any pid filtering.
Make things consistent by falling back to the default filter on empty input
for the regex filter dialogue.
Signed-off-by: Stefan Raspl <[email protected]>
Reviewed-by: Marc Hartmayer <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
If a user defines a regex filter through the interactive command, display
the active regex in the header's second line.
Signed-off-by: Stefan Raspl <[email protected]>
Reviewed-by: Marc Hartmayer <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|