aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-12-04KVM: s390: clean up return code handling in irq delivery codeJens Freimann1-13/+13
Instead of returning a possibly random or'ed together value, let's always return -EFAULT if rc is set. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-12-04KVM: s390: use atomic bitops to access pending_irqs bitmapJens Freimann1-2/+2
Currently we use a mixture of atomic/non-atomic bitops and the local_int spin lock to protect the pending_irqs bitmap and interrupt payload data. We need to use atomic bitops for the pending_irqs bitmap everywhere and in addition acquire the local_int lock where interrupt data needs to be protected. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-12-04KVM: s390: some ext irqs have to clear the ext cpu addrDavid Hildenbrand1-0/+3
The cpu address of a source cpu (responsible for an external irq) is only to be stored if bit 6 of the ext irq code is set. If bit 6 is not set, it is to be zeroed out. The special external irq code used for virtio and pfault uses the cpu addr as a parameter field. As bit 6 is set, this implementation is correct. Reviewed-by: Thomas Huth <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-12-04KVM: track pid for VCPU only on KVM_RUN ioctlChristian Borntraeger1-9/+9
We currently track the pid of the task that runs the VCPU in vcpu_load. If a yield to that VCPU is triggered while the PID of the wrong thread is active, the wrong thread might receive a yield, but this will most likely not help the executing thread at all. Instead, if we only track the pid on the KVM_RUN ioctl, there are two possibilities: 1) the thread that did a non-KVM_RUN ioctl is holding a mutex that the VCPU thread is waiting for. In this case, the VCPU thread is not runnable, but we also do not do a wrong yield. 2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing something that does not block the VCPU thread. In this case, the VCPU thread can receive the directed yield correctly. Signed-off-by: Christian Borntraeger <[email protected]> CC: Rik van Riel <[email protected]> CC: Raghavendra K T <[email protected]> CC: Michael Mueller <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: don't check for PF_VCPU when yieldingDavid Hildenbrand1-4/+0
kvm_enter_guest() has to be called with preemption disabled and will set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread is running and therefore needs no yield. However, the check on PF_VCPU is wrong on s390, where preemption has to stay enabled in order to correctly process page faults. Thus, s390 reenables preemption and starts to execute the guest. The thread might be scheduled out between kvm_enter_guest() and kvm_exit_guest(), resulting in PF_VCPU being set but not being run. When this happens, the opportunity for directed yield is missed. However, this check is done already in kvm_vcpu_on_spin before calling kvm_vcpu_yield_loop: if (!ACCESS_ONCE(vcpu->preempted)) continue; so the check on PF_VCPU is superfluous in general, and this patch removes it. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04kvm: optimize GFN to memslot lookup with large slots amountIgor Mammedov2-13/+29
Current linear search doesn't scale well when large amount of memslots is used and looked up slot is not in the beginning memslots array. Taking in account that memslots don't overlap, it's possible to switch sorting order of memslots array from 'npages' to 'base_gfn' and use binary search for memslot lookup by GFN. As result of switching to binary search lookup times are reduced with large amount of memslots. Following is a table of search_memslot() cycles during WS2008R2 guest boot. boot, boot + ~10 min mostly same of using it, slot lookup randomized lookup max average average cycles cycles cycles 13 slots : 1450 28 30 13 slots : 1400 30 40 binary search 117 slots : 13000 30 460 117 slots : 2000 35 180 binary search Signed-off-by: Igor Mammedov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04kvm: change memslot sorting rule from size to GFNIgor Mammedov1-6/+11
it will allow to use binary search for GFN -> memslot lookups, reducing lookup cost with large slots amount. Signed-off-by: Igor Mammedov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04kvm: search_memslots: add simple LRU memslot cachingIgor Mammedov1-2/+10
In typical guest boot workload only 2-3 memslots are used extensively, and at that it's mostly the same memslot lookup operation. Adding LRU cache improves average lookup time from 46 to 28 cycles (~40%) for this workload. Signed-off-by: Igor Mammedov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04kvm: update_memslots: drop not needed check for the same slotIgor Mammedov1-14/+11
UP/DOWN shift loops will shift array in needed direction and stop at place where new slot should be placed regardless of old slot size. Signed-off-by: Igor Mammedov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04kvm: update_memslots: drop not needed check for the same number of pagesIgor Mammedov1-15/+13
if number of pages haven't changed sorting algorithm will do nothing, so there is no need to do extra check to avoid entering sorting logic. Signed-off-by: Igor Mammedov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: allow 256 logical x2APICs againRadim Krčmář2-7/+6
While fixing an x2apic bug, 17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) we've made only one cluster available. This means that the amount of logically addressible x2APICs was reduced to 16 and VCPUs kept overwriting themselves in that region, so even the first cluster wasn't set up correctly. This patch extends x2APIC support back to the logical_map's limit, and keeps the CVE fixed as messages for non-present APICs are dropped. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: check bounds of APIC mapsRadim Krčmář1-4/+5
They can't be violated now, but play it safe for the future. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: fix APIC physical destination wrappingRadim Krčmář1-1/+4
x2apic allows destinations > 0xff and we don't want them delivered to lower APICs. They are correctly handled by doing nothing. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: deliver phys lowest-prioRadim Krčmář1-2/+0
Physical mode can't address more than one APIC, but lowest-prio is allowed, so we just reuse our paths. SDM 10.6.2.1 Physical Destination: Also, for any non-broadcast IPI or I/O subsystem initiated interrupt with lowest priority delivery mode, software must ensure that APICs defined in the interrupt address are present and enabled to receive interrupts. We could warn on top of that. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: don't retry hopeless APIC deliveryRadim Krčmář1-2/+2
False from kvm_irq_delivery_to_apic_fast() means that we don't handle it in the fast path, but we still return false in cases that were perfectly handled, fix that. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: use MSR_ICR instead of a numberRadim Krčmář1-2/+2
0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: Fix reserved x2apic registersNadav Amit1-0/+9
x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC Register Address Space"). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-04KVM: x86: Generate #UD when memory operand is requiredNadav Amit1-4/+34
Certain x86 instructions that use modrm operands only allow memory operand (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact. Currently, the instructions that are such and are emulated by KVM are MOVBE, MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be emulated by the host regardless of MMIO. The fix introduces a new group for handling such instructions, marking mod3 as illegal instruction. Signed-off-by: Nadav Amit <[email protected]> Reviewed-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-12-03Merge tag 'kvm-s390-next-20141128' of ↵Paolo Bonzini9-399/+872
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Several fixes,cleanups and reworks Here is a bunch of fixes that deal mostly with architectural compliance: - interrupt priorities - interrupt handling - intruction exit handling We also provide a helper function for getting the guest visible storage key.
2014-11-28KVM: s390: allow injecting all kinds of machine checksJens Freimann1-3/+11
Allow to specify CR14, logout area, external damage code and failed storage address. Since more then one machine check can be indicated to the guest at a time we need to combine all indication bits with already pending requests. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: handle pending local interrupts via bitmapJens Freimann6-282/+380
This patch adapts handling of local interrupts to be more compliant with the z/Architecture Principles of Operation and introduces a data structure which allows more efficient handling of interrupts. * get rid of li->active flag, use bitmap instead * Keep interrupts in a bitmap instead of a list * Deliver interrupts in the order of their priority as defined in the PoP * Use a second bitmap for sigp emergency requests, as a CPU can have one request pending from every other CPU in the system. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: add bitmap for handling cpu-local interruptsJens Freimann1-0/+86
Adds a bitmap to the vcpu structure which is used to keep track of local pending interrupts. Also add enum with all interrupt types sorted in order of priority (highest to lowest) Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: refactor interrupt delivery codeJens Freimann1-177/+282
Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt() to smaller functions which handle one type of interrupt. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: add defines for virtio and pfault interrupt codeJens Freimann1-2/+4
Get rid of open coded value for virtio and pfault completion interrupts. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: external param not valid for cpu timer and ckcDavid Hildenbrand1-4/+2
The 32bit external interrupt parameter is only valid for timing-alert and service-signal interrupts. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: refactor interrupt injection codeJens Freimann1-54/+167
In preparation for the rework of the local interrupt injection code, factor out injection routines from kvm_s390_inject_vcpu(). Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: S390: Create helper function get_guest_storage_keyJason J. Herne2-0/+40
Define get_guest_storage_key which can be used to get the value of a guest storage key. This compliments the functionality provided by the helper function set_guest_storage_key. Both functions are needed for live migration of s390 guests that use storage keys. Signed-off-by: Jason J. Herne <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: trigger the right CPU exit for floating interruptsChristian Borntraeger1-1/+11
When injecting a floating interrupt and no CPU is idle we kick one CPU to do an external exit. In case of I/O we should trigger an I/O exit instead. This does not matter for Linux guests as external and I/O interrupts are enabled/disabled at the same time, but play safe anyway. The same holds true for machine checks. Since there is no special exit, just reuse the generic stop exit. The injection code inside the VCPU loop will recheck anyway and rearm the proper exits (e.g. control registers) if necessary. Signed-off-by: Christian Borntraeger <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: David Hildenbrand <[email protected]>
2014-11-28KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instructionThomas Huth4-13/+23
A couple of our interception handlers rewind the PSW to the beginning of the instruction to run the intercepted instruction again during the next SIE entry. This normally works fine, but there is also the possibility that the instruction did not get run directly but via an EXECUTE instruction. In this case, the PSW does not point to the instruction that caused the interception, but to the EXECUTE instruction! So we've got to rewind the PSW to the beginning of the EXECUTE instruction instead. This is now accomplished with a new helper function kvm_s390_rewind_psw(). Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-28KVM: s390: Small fixes for the PFMF handlerThomas Huth1-4/+7
This patch includes two small fixes for the PFMF handler: First, the start address for PFMF has to be masked according to the current addressing mode, which is now done with kvm_s390_logical_to_effective(). Second, the protection exceptions have a lower priority than the specification exceptions, so the check for low-address protection has to be moved after the last spot where we inject a specification exception. Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-11-26arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through allShannon Zhao1-5/+10
When call kvm_vgic_inject_irq to inject interrupt, we can known which vcpu the interrupt for by the irq_num and the cpuid. So we should just kick this vcpu to avoid iterating through all. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Shannon Zhao <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25arm/arm64: vgic: Remove unreachable irq_clear_pendingChristoffer Dall1-2/+0
When 'injecting' an edge-triggered interrupt with a falling edge we shouldn't clear the pending state on the distributor. In fact, we don't, because the check in vgic_validate_injection would prevent us from ever reaching this bit of code. Remove the unreachable snippet. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25arm/arm64: KVM: avoid unnecessary guest register mangling on MMIO readAndre Przywara1-6/+9
Currently we mangle the endianness of the guest's register even on an MMIO _read_, where it is completely useless, because we will not use the value of that register. Rework the io_mem_abort() function to clearly separate between reads and writes and only do the endianness mangling on MMIO writes. Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25arm, arm64: KVM: handle potential incoherency of readonly memslotsArd Biesheuvel1-5/+15
Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25arm, arm64: KVM: allow forced dcache flush on page faultsLaszlo Ersek3-6/+13
To allow handling of incoherent memslots in a subsequent patch, this patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page() so that we can instruct it to flush the page's contents to DRAM even if the guest has caching globally enabled. Signed-off-by: Laszlo Ersek <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25kvm: add a memslot flag for incoherent memory regionsArd Biesheuvel1-0/+1
Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()Ard Biesheuvel4-13/+13
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()Ard Biesheuvel1-1/+6
Instead of using kvm_is_mmio_pfn() to decide whether a host region should be stage 2 mapped with device attributes, add a new static function kvm_is_device_pfn() that disregards RAM pages with the reserved bit set, as those should usually not be mapped as device memory. Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-25KVM: ARM: VGIC: Optimize the vGIC vgic_update_irq_pending function.wanghaibin1-0/+3
When vgic_update_irq_pending with level-sensitive false, it is need to deactivates an interrupt, and, it can go to out directly. Here return a false value, because it will be not need to kick. Signed-off-by: wanghaibin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-11-24kvm: x86: avoid warning about potential shift wrapping bugPaolo Bonzini3-3/+3
cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-24KVM: x86: move device assignment out of kvm_host.hPaolo Bonzini6-62/+64
Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <[email protected]>. Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-23kvm: x86: mask out XSAVESPaolo Bonzini1-1/+10
This feature is not supported inside KVM guests yet, because we do not emulate MSR_IA32_XSS. Mask it out. Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-23kvm: x86: move assigned-dev.c and iommu.c to arch/x86/Radim Krčmář7-30/+25
Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-21kvm: remove IA64 ioctlsRadim Krčmář1-3/+0
KVM ia64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because of a conflict in the #defines: #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) The argument to KVM_SET_GUEST_DEBUG is: struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; struct kvm_guest_debug_arch { }; meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK. KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK), so KVM_IA64_VCPU_SET_STACK would just return -EINVAL. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-21kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64Radim Krcmar2-24/+2
Signed-off-by: Radim Krcmar <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-21kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/Paolo Bonzini9-30/+29
ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <[email protected]> Reviewed-by: Radim Krcmar <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-20kvm: Documentation: remove ia64Tiejun Chen2-106/+22
kvm/ia64 is gone, clean up Documentation too. Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-20KVM: ia64: removePaolo Bonzini31-13272/+0
KVM for ia64 has been marked as broken not just once, but twice even, and the last patch from the maintainer is now roughly 5 years old. Time for it to rest in peace. Acked-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-19KVM: x86: Remove FIXMEs in emulate.cNicholas Krause1-4/+0
Remove FIXME comments about needing fault addresses to be returned. These are propaagated from walk_addr_generic to gva_to_gpa and from there to ops->read_std and ops->write_std. Signed-off-by: Nicholas Krause <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-19KVM: emulator: remove duplicated limit checkPaolo Bonzini1-9/+4
The check on the higher limit of the segment, and the check on the maximum accessible size, is the same for both expand-up and expand-down segments. Only the computation of "lim" varies. Signed-off-by: Paolo Bonzini <[email protected]>