Age | Commit message (Collapse) | Author | Files | Lines |
|
Initialize the XSS exit bitmap. It is zero so there should be no XSAVES
or XRSTORS exits.
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
- EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved.
- EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID
leaves we support, because variable "supported" comes from XCR0 and not
XSS. Bits above 0 are reserved, so ECX is overall zero. Output register
EDX is reserved.
Source: Intel Architecture Instruction Set Extensions Programming
Reference, ref. number 319433-022
Reviewed-by: Radim Krčmář <[email protected]>
Tested-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This is the size of the XSAVES area. This starts providing guest support
for XSAVES (with no support yet for supervisor states, i.e. XSS == 0
always in guests for now).
Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice
no real processor exists that only has one of them, and there is no
other way for userspace programs to compute the area of the XSAVEC
save area. CPUID(EAX=0xd,ECX=1).EBX provides an upper bound.
Suggested-by: Radim Krčmář <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Tested-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
For code that deals with cpuid, this makes things a bit more readable.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format. Convert
in order to preserve userspace ABI.
Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.
Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324
Cc: Fenghua Yu <[email protected]>
Cc: [email protected]
Cc: H. Peter Anvin <[email protected]>
Tested-by: Nadav Amit <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it. Export it.
Cc: [email protected]
Cc: [email protected]
Cc: H. Peter Anvin <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixups for kvm/next (3.19)
Here we have two fixups of the latest interrupt rework and
one architectural fixup.
|
|
Instead of returning a possibly random or'ed together value, let's
always return -EFAULT if rc is set.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Currently we use a mixture of atomic/non-atomic bitops
and the local_int spin lock to protect the pending_irqs bitmap
and interrupt payload data.
We need to use atomic bitops for the pending_irqs bitmap everywhere
and in addition acquire the local_int lock where interrupt data needs
to be protected.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
The cpu address of a source cpu (responsible for an external irq) is only to
be stored if bit 6 of the ext irq code is set.
If bit 6 is not set, it is to be zeroed out.
The special external irq code used for virtio and pfault uses the cpu addr as a
parameter field. As bit 6 is set, this implementation is correct.
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
We currently track the pid of the task that runs the VCPU in vcpu_load.
If a yield to that VCPU is triggered while the PID of the wrong thread
is active, the wrong thread might receive a yield, but this will most
likely not help the executing thread at all. Instead, if we only track
the pid on the KVM_RUN ioctl, there are two possibilities:
1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
the VCPU thread is waiting for. In this case, the VCPU thread is not
runnable, but we also do not do a wrong yield.
2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
something that does not block the VCPU thread. In this case, the
VCPU thread can receive the directed yield correctly.
Signed-off-by: Christian Borntraeger <[email protected]>
CC: Rik van Riel <[email protected]>
CC: Raghavendra K T <[email protected]>
CC: Michael Mueller <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
kvm_enter_guest() has to be called with preemption disabled and will
set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread
is running and therefore needs no yield.
However, the check on PF_VCPU is wrong on s390, where preemption has
to stay enabled in order to correctly process page faults. Thus,
s390 reenables preemption and starts to execute the guest. The thread
might be scheduled out between kvm_enter_guest() and kvm_exit_guest(),
resulting in PF_VCPU being set but not being run. When this happens,
the opportunity for directed yield is missed.
However, this check is done already in kvm_vcpu_on_spin before calling
kvm_vcpu_yield_loop:
if (!ACCESS_ONCE(vcpu->preempted))
continue;
so the check on PF_VCPU is superfluous in general, and this patch
removes it.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Current linear search doesn't scale well when
large amount of memslots is used and looked up slot
is not in the beginning memslots array.
Taking in account that memslots don't overlap, it's
possible to switch sorting order of memslots array from
'npages' to 'base_gfn' and use binary search for
memslot lookup by GFN.
As result of switching to binary search lookup times
are reduced with large amount of memslots.
Following is a table of search_memslot() cycles
during WS2008R2 guest boot.
boot, boot + ~10 min
mostly same of using it,
slot lookup randomized lookup
max average average
cycles cycles cycles
13 slots : 1450 28 30
13 slots : 1400 30 40
binary search
117 slots : 13000 30 460
117 slots : 2000 35 180
binary search
Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
it will allow to use binary search for GFN -> memslot
lookups, reducing lookup cost with large slots amount.
Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In typical guest boot workload only 2-3 memslots are used
extensively, and at that it's mostly the same memslot
lookup operation.
Adding LRU cache improves average lookup time from
46 to 28 cycles (~40%) for this workload.
Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
UP/DOWN shift loops will shift array in needed
direction and stop at place where new slot should
be placed regardless of old slot size.
Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
if number of pages haven't changed sorting algorithm
will do nothing, so there is no need to do extra check
to avoid entering sorting logic.
Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
While fixing an x2apic bug,
17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)
we've made only one cluster available. This means that the amount of
logically addressible x2APICs was reduced to 16 and VCPUs kept
overwriting themselves in that region, so even the first cluster wasn't
set up correctly.
This patch extends x2APIC support back to the logical_map's limit, and
keeps the CVE fixed as messages for non-present APICs are dropped.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
They can't be violated now, but play it safe for the future.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
x2apic allows destinations > 0xff and we don't want them delivered to
lower APICs. They are correctly handled by doing nothing.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Physical mode can't address more than one APIC, but lowest-prio is
allowed, so we just reuse our paths.
SDM 10.6.2.1 Physical Destination:
Also, for any non-broadcast IPI or I/O subsystem initiated interrupt
with lowest priority delivery mode, software must ensure that APICs
defined in the interrupt address are present and enabled to receive
interrupts.
We could warn on top of that.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
False from kvm_irq_delivery_to_apic_fast() means that we don't handle it
in the fast path, but we still return false in cases that were perfectly
handled, fix that.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR.
Signed-off-by: Radim KrÄmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC
Register Address Space"). KVM needs to cause #GP on such accesses.
Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes).
Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Certain x86 instructions that use modrm operands only allow memory operand
(i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact.
Currently, the instructions that are such and are emulated by KVM are MOVBE,
MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be
emulated by the host regardless of MMIO.
The fix introduces a new group for handling such instructions, marking mod3 as
illegal instruction.
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Several fixes,cleanups and reworks
Here is a bunch of fixes that deal mostly with architectural compliance:
- interrupt priorities
- interrupt handling
- intruction exit handling
We also provide a helper function for getting the guest visible storage key.
|
|
Allow to specify CR14, logout area, external damage code
and failed storage address.
Since more then one machine check can be indicated to the guest at
a time we need to combine all indication bits with already pending
requests.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.
* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
one request pending from every other CPU in the system.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt()
to smaller functions which handle one type of interrupt.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Get rid of open coded value for virtio and pfault completion interrupts.
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
The 32bit external interrupt parameter is only valid for timing-alert and
service-signal interrupts.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
In preparation for the rework of the local interrupt injection code,
factor out injection routines from kvm_s390_inject_vcpu().
Signed-off-by: Jens Freimann <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
Define get_guest_storage_key which can be used to get the value of a guest
storage key. This compliments the functionality provided by the helper function
set_guest_storage_key. Both functions are needed for live migration of s390
guests that use storage keys.
Signed-off-by: Jason J. Herne <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
When injecting a floating interrupt and no CPU is idle we
kick one CPU to do an external exit. In case of I/O we
should trigger an I/O exit instead. This does not matter
for Linux guests as external and I/O interrupts are
enabled/disabled at the same time, but play safe anyway.
The same holds true for machine checks. Since there is no
special exit, just reuse the generic stop exit. The injection
code inside the VCPU loop will recheck anyway and rearm the
proper exits (e.g. control registers) if necessary.
Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
|
|
A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
This patch includes two small fixes for the PFMF handler: First, the
start address for PFMF has to be masked according to the current
addressing mode, which is now done with kvm_s390_logical_to_effective().
Second, the protection exceptions have a lower priority than the
specification exceptions, so the check for low-address protection
has to be moved after the last spot where we inject a specification
exception.
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
|
|
cs.base is declared as a __u64 variable and vector is a u32 so this
causes a static checker warning. The user indeed can set "sipi_vector"
to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the
value should really have 8-bit precision only.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Create a new header, and hide the device assignment functions there.
Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying
arch/x86/kvm/iommu.c to take a PCI device struct.
Based on a patch by Radim Krcmar <[email protected]>.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS. Mask it out.
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Now that ia64 is gone, we can hide deprecated device assignment in x86.
Notable changes:
- kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()
The easy parts were removed from generic kvm code, remaining
- kvm_iommu_(un)map_pages() would require new code to be moved
- struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM ia64 is no longer present so new applications shouldn't use them.
The main problem is that they most likely didn't work even before,
because of a conflict in the #defines:
#define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug)
#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *)
The argument to KVM_SET_GUEST_DEBUG is:
struct kvm_guest_debug {
__u32 control;
__u32 pad;
struct kvm_guest_debug_arch arch;
};
struct kvm_guest_debug_arch {
};
meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8
and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK.
KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling
kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK),
so KVM_IA64_VCPU_SET_STACK would just return -EINVAL.
Signed-off-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Signed-off-by: Radim Krcmar <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
ia64 does not need them anymore. Ack notifiers become x86-specific
too.
Suggested-by: Gleb Natapov <[email protected]>
Reviewed-by: Radim Krcmar <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
kvm/ia64 is gone, clean up Documentation too.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM for ia64 has been marked as broken not just once, but twice even,
and the last patch from the maintainer is now roughly 5 years old.
Time for it to rest in peace.
Acked-by: Gleb Natapov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove FIXME comments about needing fault addresses to be returned. These
are propaagated from walk_addr_generic to gva_to_gpa and from there to
ops->read_std and ops->write_std.
Signed-off-by: Nicholas Krause <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The check on the higher limit of the segment, and the check on the
maximum accessible size, is the same for both expand-up and
expand-down segments. Only the computation of "lim" varies.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
register_address has been a duplicate of address_mask ever since the
ancestor of __linearize was born in 90de84f50b42 (KVM: x86 emulator:
preserve an operand's segment identity, 2010-11-17).
However, we can put it to a better use by including the call to reg_read
in register_address. Similarly, the call to reg_rmw can be moved to
register_address_increment.
Signed-off-by: Paolo Bonzini <[email protected]>
|