aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/linux/kvm.h
AgeCommit message (Collapse)AuthorFilesLines
2015-03-06KVM: s390: Allocate and save/restore vector registersEric Farman1-1/+2
Define and allocate space for both the host and guest views of the vector registers for a given vcpu. The 32 vector registers occupy 128 bits each (512 bytes total), but architecturally are paired with 512 additional bytes of reserved space for future expansion. The kvm_sync_regs structs containing the registers are union'ed with 1024 bytes of padding in the common kvm_run struct. The addition of 1024 bytes of new register information clearly exceeds the existing union, so an expansion of that padding is required. When changing environments, we need to appropriately save and restore the vector registers viewed by both the host and guest, into and out of the sync_regs space. The floating point registers overlay the upper half of vector registers 0-15, so there's a bit of data duplication here that needs to be carefully avoided. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2015-01-23Merge tag 'kvm-s390-next-20150122' of ↵Paolo Bonzini1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions.
2015-01-23KVM: s390: forward most SIGP orders to user spaceDavid Hildenbrand1-0/+1
Most SIGP orders are handled partially in kernel and partially in user space. In order to: - Get a correct SIGP SET PREFIX handler that informs user space - Avoid race conditions between concurrently executed SIGP orders - Serialize SIGP orders per VCPU We need to handle all "slow" SIGP orders in user space. The remaining ones to be handled completely in kernel are: - SENSE - SENSE RUNNING - EXTERNAL CALL - EMERGENCY SIGNAL - CONDITIONAL EMERGENCY SIGNAL According to the PoP, they have to be fast. They can be executed without conflicting to the actions of other pending/concurrently executing orders (e.g. STOP vs. START). This patch introduces a new capability that will - when enabled - forward all but the mentioned SIGP orders to user space. The instruction counters in the kernel are still updated. Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2015-01-23KVM: s390: new parameter for SIGP STOP irqsDavid Hildenbrand1-0/+6
In order to get rid of the action_flags and to properly migrate pending SIGP STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember whether to store the status when stopping. For this reason, a new parameter (flags) for the SIGP STOP irq is introduced. These flags further define details of the requested STOP and can be easily migrated. Reviewed-by: Thomas Huth <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2015-01-20arm/arm64: KVM: add virtual GICv3 distributor emulationAndre Przywara1-0/+2
With everything separated and prepared, we implement a model of a GICv3 distributor and redistributors by using the existing framework to provide handler functions for each register group. Currently we limit the emulation to a model enforcing a single security state, with SRE==1 (forcing system register access) and ARE==1 (allowing more than 8 VCPUs). We share some of the functions provided for GICv2 emulation, but take the different ways of addressing (v)CPUs into account. Save and restore is currently not implemented. Similar to the split-off of the GICv2 specific code, the new emulation code goes into a new file (vgic-v3-emul.c). Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2014-11-21kvm: remove IA64 ioctlsRadim Krčmář1-3/+0
KVM ia64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because of a conflict in the #defines: #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) The argument to KVM_SET_GUEST_DEBUG is: struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; struct kvm_guest_debug_arch { }; meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK. KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK), so KVM_IA64_VCPU_SET_STACK would just return -EINVAL. Signed-off-by: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-11-03kvm: drop unsupported capabilities, fix documentationMichael S. Tsirkin1-8/+0
No kernel ever reported KVM_CAP_DEVICE_MSIX, KVM_CAP_DEVICE_MSI, KVM_CAP_DEVICE_ASSIGNMENT, KVM_CAP_DEVICE_DEASSIGNMENT. This makes the documentation wrong, and no application ever written to use these capabilities has a chance to work correctly. The only way to detect support is to try, and test errno for ENOTTY. That's unfortunate, but we can't fix the past. Document the actual semantics, and drop the definitions from the exported header to make it easier for application developers to note and fix the bug. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17KVM: device: add simple registration mechanism for kvm_device_opsWill Deacon1-6/+16
kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson <[email protected]> Cc: Alex Graf <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Marc Zyngier <[email protected]> Acked-by: Cornelia Huck <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-08-29KVM: Unconditionally export KVM_CAP_USER_NMIChristoffer Dall1-3/+1
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_USER_NMI was guarded by #ifdef __KVM_HAVE_USER_NMI, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_USER_NMI and change the the typo in the comment for the IOCTL number definition as well. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-08-29KVM: Unconditionally export KVM_CAP_READONLY_MEMChristoffer Dall1-2/+0
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_READONLY_MEM was guarded by #ifdef __KVM_HAVE_READONLY_MEM, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM. Signed-off-by: Christoffer Dall <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-08-05Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvmPaolo Bonzini1-2/+4
Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert "add load inst fixup" KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Conflicts: Documentation/virtual/kvm/api.txt
2014-07-28KVM: PPC: Remove DCR handlingAlexander Graf1-2/+2
DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: Alexander Graf <[email protected]>
2014-07-28KVM: Allow KVM_CHECK_EXTENSION on the vm fdAlexander Graf1-0/+1
The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately on PPC some of the capabilities change depending on the way a VM was created. So instead we need a way to expose capabilities as VM ioctl, so that we can see which VM type we're using (HV or PR). To enable this, add the KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio. Signed-off-by: Alexander Graf <[email protected]> Acked-by: Paolo Bonzini <[email protected]>
2014-07-28KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handlingPaul Mackerras1-0/+1
This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-07-10KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state controlDavid Hildenbrand1-0/+4
This patch - adds s390 specific MP states to linux headers and documents them - implements the KVM_{SET,GET}_MP_STATE ioctls - enables KVM_CAP_MP_STATE - allows user space to control the VCPU state on s390. If user space sets the VCPU state using the ioctl KVM_SET_MP_STATE, we can disable manual changing of the VCPU state and trust user space to do the right thing. Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-07-10KVM: prepare for KVM_(S|G)ET_MP_STATE on other architecturesDavid Hildenbrand1-1/+2
Highlight the aspects of the ioctls that are actually specific to x86 and ia64. As defined restrictions (irqchip) and mp states may not apply to other architectures, these parts are flagged to belong to x86 and ia64. In preparation for the use of KVM_(S|G)ET_MP_STATE by s390. Fix a spelling error (KVM_SET_MP_STATE vs. KVM_SET_MPSTATE) on the way. Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-05-30Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into ↵Paolo Bonzini1-0/+1
kvm-next Patch queue for ppc - 2014-05-30 In this round we have a few nice gems. PR KVM gains initial POWER8 support as well as LE host awareness, ihe e500 targets can now properly run u-boot, LE guests now work with PR KVM including KVM hypercalls and HV KVM guests can now use huge pages. On top of this there are some bug fixes. Conflicts: include/uapi/linux/kvm.h
2014-05-30KVM: PPC: Add CAP to indicate hcall fixesAlexander Graf1-0/+1
We worked around some nasty KVM magic page hcall breakages: 1) NX bit not honored, so ignore NX when we detect it 2) LE guests swizzle hypercall instruction Without these fixes in place, there's no way it would make sense to expose kvm hypercalls to a guest. Chances are immensely high it would trip over and break. So add a new CAP that gives user space a hint that we have workarounds for the bugs above in place. It can use those as hint to disable PV hypercalls when the guest CPU is anything POWER7 or higher and the host does not have fixes in place. Signed-off-by: Alexander Graf <[email protected]>
2014-05-27Merge tag 'kvm-arm-for-3.16' of ↵Paolo Bonzini1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next Changed for the 3.16 merge window. This includes KVM support for PSCI v0.2 and also includes generic Linux support for PSCI v0.2 (on hosts that advertise that feature via their DT), since the latter depends on headers introduced by the former. Finally there's a small patch from Marc that enables Cortex-A53 support.
2014-05-06KVM: s390: Add clock comparator and CPU timer IRQ injectionThomas Huth1-0/+2
Add an interface to inject clock comparator and CPU timer interrupts into the guest. This is needed for handling the external interrupt interception. Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-04-30KVM: Add KVM_EXIT_SYSTEM_EVENT to user space API headerAnup Patel1-0/+8
Currently, we don't have an exit reason to notify user space about a system-level event (for e.g. system reset or shutdown) triggered by the VCPU. This patch adds exit reason KVM_EXIT_SYSTEM_EVENT for this purpose. We can also inform user space about the 'type' and architecture specific 'flags' of a system-level event using the kvm_run structure. This newly added KVM_EXIT_SYSTEM_EVENT will be used by KVM ARM/ARM64 in-kernel PSCI v0.2 support to reset/shutdown VMs. Signed-off-by: Anup Patel <[email protected]> Signed-off-by: Pranavkumar Sawargaonkar <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2014-04-30KVM: Add capability to advertise PSCI v0.2 supportAnup Patel1-0/+1
User space (i.e. QEMU or KVMTOOL) should be able to check whether KVM ARM/ARM64 supports in-kernel PSCI v0.2 emulation. For this purpose, we define KVM_CAP_ARM_PSCI_0_2 in KVM user space interface header. Signed-off-by: Anup Patel <[email protected]> Signed-off-by: Pranavkumar Sawargaonkar <[email protected]> Acked-by: Christoffer Dall <[email protected]> Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2014-04-22Merge tag 'kvm-s390-20140422' of ↵Marcelo Tosatti1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue Lazy storage key handling ------------------------- Linux does not use the ACC and F bits of the storage key. Newer Linux versions also do not use the storage keys for dirty and reference tracking. We can optimize the guest handling for those guests for faults as well as page-in and page-out by simply not caring about the guest visible storage key. We trap guest storage key instruction to enable those keys only on demand. Migration bitmap Until now s390 never provided a proper dirty bitmap. Let's provide a proper migration bitmap for s390. We also change the user dirty tracking to a fault based mechanism. This makes the host completely independent from the storage keys. Long term this will allow us to back guest memory with large pages. per-VM device attributes ------------------------ To avoid the introduction of new ioctls, let's provide the attribute semanantic also on the VM-"device". Userspace controlled CMMA ------------------------- The CMMA assist is changed from "always on" to "on if requested" via per-VM device attributes. In addition a callback to reset all usage states is provided. Proper guest DAT handling for intercepts ---------------------------------------- While instructions handled by SIE take care of all addressing aspects, KVM/s390 currently does not care about guest address translation of intercepts. This worked out fine, because - the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real for all pages up to memory size - intercepts happen only for a small amount of cases - all of these intercepts happen to be in the kernel text for current distros Of course we need to be better for other intercepts, kernel modules etc. We provide the infrastructure and rework all in-kernel intercepts to work on logical addresses (paging etc) instead of real ones. The code has been running internally for several months now, so it is time for going public. GDB support ----------- We provide breakpoints, single stepping and watchpoints. Fixes/Cleanups -------------- - Improve program check delivery - Factor out the handling of transactional memory on program checks - Use the existing define __LC_PGM_TDB - Several cleanups in the lowcore structure - Documentation NOTES ----- - All patches touching base s390 are either ACKed or written by the s390 maintainers - One base KVM patch "KVM: add kvm_is_error_gpa() helper" - One patch introduces the notion of VM device attributes Signed-off-by: Marcelo Tosatti <[email protected]> Conflicts: include/uapi/linux/kvm.h
2014-04-22KVM: s390: Per-vm kvm device controlsDominik Dingel1-0/+1
We sometimes need to get/set attributes specific to a virtual machine and so need something else than ONE_REG. Let's copy the KVM_DEVICE approach, and define the respective ioctls for the vm file descriptor. Signed-off-by: Dominik Dingel <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Acked-by: Alexander Graf <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-04-17KVM: VMX: speed up wildcard MMIO EVENTFDMichael S. Tsirkin1-0/+1
With KVM, MMIO is much slower than PIO, due to the need to do page walk and emulation. But with EPT, it does not have to be: we know the address from the VMCS so if the address is unique, we can look up the eventfd directly, bypassing emulation. Unfortunately, this only works if userspace does not need to match on access length and data. The implementation adds a separate FAST_MMIO bus internally. This serves two purposes: - minimize overhead for old userspace that does not use eventfd with lengtth = 0 - minimize disruption in other code (since we don't know the length, devices on the MMIO bus only get a valid address in write, this way we don't need to touch all devices to teach them to handle an invalid length) At the moment, this optimization only has effect for EPT on x86. It will be possible to speed up MMIO for NPT and MMU using the same idea in the future. With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO. I was unable to detect any measureable slowdown to non-eventfd MMIO. Making MMIO faster is important for the upcoming virtio 1.0 which includes an MMIO signalling capability. The idea was suggested by Peter Anvin. Lots of thanks to Gleb for pre-review and suggestions. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2014-04-17KVM: support any-length wildcard ioeventfdMichael S. Tsirkin1-1/+2
It is sometimes benefitial to ignore IO size, and only match on address. In hindsight this would have been a better default than matching length when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind of access can be optimized on VMX: there no need to do page lookups. This can currently be done with many ioeventfds but in a suboptimal way. However we can't change kernel/userspace ABI without risk of breaking some applications. Use len = 0 to mean "ignore length for matching" in a more optimal way. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2014-03-21KVM: s390: irq routing for adapter interrupts.Cornelia Huck1-0/+11
Introduce a new interrupt class for s390 adapter interrupts and enable irqfds for s390. This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP, that needs to be enabled by userspace. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: Cornelia Huck <[email protected]>
2014-03-21KVM: Add per-vm capability enablement.Cornelia Huck1-0/+5
Allow KVM_ENABLE_CAP to act on a vm as well as on a vcpu. This makes more sense when the caller wants to enable a vm-related capability. s390 will be the first user; wire it up. Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Signed-off-by: Cornelia Huck <[email protected]>
2014-03-13kvm: x86: ignore ioapic polarityGabriel L. Somlo1-0/+1
Both QEMU and KVM have already accumulated a significant number of optimizations based on the hard-coded assumption that ioapic polarity will always use the ActiveHigh convention, where the logical and physical states of level-triggered irq lines always match (i.e., active(asserted) == high == 1, inactive == low == 0). QEMU guests are expected to follow directions given via ACPI and configure the ioapic with polarity 0 (ActiveHigh). However, even when misbehaving guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow), QEMU will still use the ActiveHigh signaling convention when interfacing with KVM. This patch modifies KVM to completely ignore ioapic polarity as set by the guest OS, enabling misbehaving guests to work alongside those which comply with the ActiveHigh polarity specified by QEMU's ACPI tables. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Gabriel L. Somlo <[email protected]> [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2014-01-30KVM: async_pf: Async page fault support on s390Dominik Dingel1-0/+2
This patch enables async page faults for s390 kvm guests. It provides the userspace API to enable and disable_wait this feature. The disable_wait will enforce that the feature is off by waiting on it. Also it includes the diagnose code, called by the guest to enable async page faults. The async page faults will use an already existing guest interface for this purpose, as described in "CP Programming Services (SC24-6084)". Signed-off-by: Dominik Dingel <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-01-30KVM: s390: add floating irq controllerJens Freimann1-0/+1
This patch adds a floating irq controller as a kvm_device. It will be necessary for migration of floating interrupts as well as for hardening the reset code by allowing user space to explicitly remove all pending floating interrupts. Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-01-29KVM: s390: add and extend interrupt information data structsJens Freimann1-0/+63
With the currently available struct kvm_s390_interrupt it is not possible to inject every kind of interrupt as defined in the z/Architecture. Add additional interruption parameters to the structures and move it to kvm.h Signed-off-by: Jens Freimann <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2014-01-17add support for Hyper-V reference time counterVadim Rozenfeld1-0/+1
Signed-off: Peter Lieven <[email protected]> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <[email protected]> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <[email protected]> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <[email protected]> 3. move check for TSC page enable from second patch to this one. v3 -> v4     Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: Paolo Bonzini <[email protected]>
2013-12-21KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGICChristoffer Dall1-0/+1
Support creating the ARM VGIC device through the KVM_CREATE_DEVICE ioctl, which can then later be leveraged to use the KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in a more generic API than the ARM-specific one and is useful for save/restore of VGIC state. Adds KVM_CAP_DEVICE_CTRL to ARM capabilities. Note that we change the check for creating a VGIC from bailing out if any VCPUs were created, to bailing out if any VCPUs were ever run. This is an important distinction that shouldn't break anything, but allows creating the VGIC after the VCPUs have been created. Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2013-11-04Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queueGleb Natapov1-0/+4
Conflicts: arch/powerpc/include/asm/processor.h
2013-10-30kvm: Add VFIO deviceAlex Williamson1-0/+4
So far we've succeeded at making KVM and VFIO mostly unaware of each other, but areas are cropping up where a connection beyond eventfds and irqfds needs to be made. This patch introduces a KVM-VFIO device that is meant to be a gateway for such interaction. The user creates the device and can add and remove VFIO groups to it via file descriptors. When a group is added, KVM verifies the group is valid and gets a reference to it via the VFIO external user interface. Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-10-30kvm: Add KVM_GET_EMULATED_CPUIDBorislav Petkov1-0/+2
Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-10-17kvm: powerpc: book3s: Allow the HV and PR selection per virtual machineAneesh Kumar K.V1-0/+4
This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: Aneesh Kumar K.V <[email protected]> [agraf: fix coding style] Signed-off-by: Alexander Graf <[email protected]>
2013-10-02ARM/ARM64: KVM: Implement KVM_ARM_PREFERRED_TARGET ioctlAnup Patel1-0/+1
For implementing CPU=host, we need a mechanism for querying preferred VCPU target type on underlying Host. This patch implements KVM_ARM_PREFERRED_TARGET vm ioctl which returns struct kvm_vcpu_init instance containing information about preferred VCPU target type and target specific features available for it. Signed-off-by: Anup Patel <[email protected]> Signed-off-by: Pranavkumar Sawargaonkar <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2013-08-26KVM: PPC: reserve a capability number for multitce supportAlexey Kardashevskiy1-0/+1
This is to reserve a capablity number for upcoming support of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls which support mulptiple DMA map/unmap operations per one call. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-07-03Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 updates from Catalin Marinas: "Main features: - KVM and Xen ports to AArch64 - Hugetlbfs and transparent huge pages support for arm64 - Applied Micro X-Gene Kconfig entry and dts file - Cache flushing improvements For arm64 huge pages support, there are x86 changes moving part of arch/x86/mm/hugetlbpage.c into mm/hugetlb.c to be re-used by arm64" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (66 commits) arm64: Add initial DTS for APM X-Gene Storm SOC and APM Mustang board arm64: Add defines for APM ARMv8 implementation arm64: Enable APM X-Gene SOC family in the defconfig arm64: Add Kconfig option for APM X-Gene SOC family arm64/Makefile: provide vdso_install target ARM64: mm: THP support. ARM64: mm: Raise MAX_ORDER for 64KB pages and THP. ARM64: mm: HugeTLB support. ARM64: mm: Move PTE_PROT_NONE bit. ARM64: mm: Make PAGE_NONE pages read only and no-execute. ARM64: mm: Restore memblock limit when map_mem finished. mm: thp: Correct the HPAGE_PMD_ORDER check. x86: mm: Remove general hugetlb code from x86. mm: hugetlb: Copy general hugetlb code from x86 to mm. x86: mm: Remove x86 version of huge_pmd_share. mm: hugetlb: Copy huge_pmd_share from x86 to mm. arm64: KVM: document kernel object mappings in HYP arm64: KVM: MAINTAINERS update arm64: KVM: userspace API documentation arm64: KVM: enable initialization of a 32bit vcpu ...
2013-06-12arm64: KVM: enable initialization of a 32bit vcpuMarc Zyngier1-0/+1
Wire the init of a 32bit vcpu by allowing 32bit modes in pstate, and providing sensible defaults out of reset state. This feature is of course conditioned by the presence of 32bit capability on the physical CPU, and is checked by the KVM_CAP_ARM_EL1_32BIT capability. Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2013-06-11kvm: Add definition of KVM_REG_MIPSDavid Daney1-0/+1
We use 0x7000000000000000ULL as 0x6000000000000000ULL is reserved for ARM64. Signed-off-by: David Daney <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-06-07arm64: KVM: system register handlingMarc Zyngier1-0/+1
Provide 64bit system register handling, modeled after the cp15 handling for ARM. Reviewed-by: Christopher Covington <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2013-05-02KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras1-0/+2
This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <[email protected]> Acked-by: David Gibson <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-04-28kvm: Allow build-time configuration of KVM device assignmentAlex Williamson1-4/+0
We hope to at some point deprecate KVM legacy device assignment in favor of VFIO-based assignment. Towards that end, allow legacy device assignment to be deconfigured. Signed-off-by: Alex Williamson <[email protected]> Reviewed-by: Alexander Graf <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-04-26KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS callsMichael Ellerman1-0/+3
For pseries machine emulation, in order to move the interrupt controller code to the kernel, we need to intercept some RTAS calls in the kernel itself. This adds an infrastructure to allow in-kernel handlers to be registered for RTAS services by name. A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to associate token values with those service names. Then, when the guest requests an RTAS service with one of those token values, it will be handled by the relevant in-kernel handler rather than being passed up to userspace as at present. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> [agraf: fix warning] Signed-off-by: Alexander Graf <[email protected]>
2013-04-26kvm/ppc/mpic: add KVM_CAP_IRQ_MPICScott Wood1-0/+1
Enabling this capability connects the vcpu to the designated in-kernel MPIC. Using explicit connections between vcpus and irqchips allows for flexibility, but the main benefit at the moment is that it simplifies the code -- KVM doesn't need vm-global state to remember which MPIC object is associated with this vm, and it doesn't need to care about ordering between irqchip creation and vcpu creation. Signed-off-by: Scott Wood <[email protected]> [agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu] Signed-off-by: Alexander Graf <[email protected]>
2013-04-26kvm/ppc/mpic: in-kernel MPIC emulationScott Wood1-0/+3
Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <[email protected]> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <[email protected]>
2013-04-26kvm: add device control APIScott Wood1-0/+27
Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. Both device types and individual attributes can be tested without having to create the device or get/set the attribute, without the need for separately managing enumerated capabilities. Signed-off-by: Scott Wood <[email protected]> Signed-off-by: Alexander Graf <[email protected]>