aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-09-22powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbersMihai Caraman2-6/+6
Book3E specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which unit to support exclusively. As Alexander Graf suggested, this will improve code readability especially in KVM. Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting c58ce397 and 6b310fc5 patches that added common defines. Signed-off-by: Mihai Caraman <[email protected]> Acked-by: Scott Wood <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22powerpc/booke: Restrict SPE exception handlers to e200/e500 coresMihai Caraman4-7/+34
SPE exception handlers are now defined for 32-bit e500mc cores even though SPE unit is not present and CONFIG_SPE is undefined. Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE and consequently guard __stup_ivors and __setup_cpu functions. Signed-off-by: Mihai Caraman <[email protected]> Acked-by: Scott Wood <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE: Add one reg interface for DBSRBharat Bhushan2-0/+7
Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE: Guest and hardware visible debug registers are sameBharat Bhushan3-11/+9
Guest visible debug register and hardware visible debug registers are same, so ther is no need to have arch->shadow_dbg_reg, instead use arch->dbg_reg. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE: Clear guest dbsr in userspace exit KVM_EXIT_DEBUGBharat Bhushan1-0/+2
Dbsr is not visible to userspace and we do not think any need to expose this to userspace because: Userspace cannot inject debug interrupt to guest (as this does not know guest ability to handle debug interrupt), so userspace will always clear DBSR. Now if userspace has to always clear DBSR in KVM_EXIT_DEBUG handling then clearing dbsr in kernel looks simple as this avoid doing SET_SREGS/set_one_reg() to clear DBSR Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE: Allow guest to change MSR_DEBharat Bhushan1-1/+1
This patch changes the default behavior of MSRP_DEP, that is guest is not allowed to change the MSR_DE, to guest can change MSR_DE. When userspace is debugging guest then it override the default behavior and set MSRP_DEP. This stops guest to change MSR_DE when userspace is debugging guest. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE : Emulate rfdi instructionBharat Bhushan2-0/+14
This patch adds "rfdi" instruction emulation which is required for guest debug hander on BOOKE-HV Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-22KVM: PPC: BOOKE: allow debug interrupt at "debug level"Bharat Bhushan1-1/+5
Debug interrupt can be either "critical level" or "debug level". There are separate set of save/restore registers used for different level. Example: DSRR0/DSRR1 are used for "debug level" and CSRR0/CSRR1 are used for critical level debug interrupt. Using CPU_FTR_DEBUG_LVL_EXC to decide which interrupt level to be used. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2014-09-18arm/arm64: KVM: vgic: make number of irqs a configurable attributeMarc Zyngier4-0/+49
In order to make the number of interrupts configurable, use the new fancy device management API to add KVM_DEV_ARM_VGIC_GRP_NR_IRQS as a VGIC configurable attribute. Userspace can now specify the exact size of the GIC (by increments of 32 interrupts). Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: delay vgic allocation until init timeMarc Zyngier3-21/+29
It is now quite easy to delay the allocation of the vgic tables until we actually require it to be up and running (when the first vcpu is kicking around, or someones tries to access the GIC registers). This allow us to allocate memory for the exact number of CPUs we have. As nobody configures the number of interrupts just yet, use a fallback to VGIC_NR_IRQS_LEGACY. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: kill VGIC_NR_IRQSMarc Zyngier2-9/+14
Nuke VGIC_NR_IRQS entierly, now that the distributor instance contains the number of IRQ allocated to this GIC. Also add VGIC_NR_IRQS_LEGACY to preserve the current API. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: handle out-of-range MMIO accessesMarc Zyngier2-12/+47
Now that we can (almost) dynamically size the number of interrupts, we're facing an interesting issue: We have to evaluate at runtime whether or not an access hits a valid register, based on the sizing of this particular instance of the distributor. Furthermore, the GIC spec says that accessing a reserved register is RAZ/WI. For this, add a new field to our range structure, indicating the number of bits a single interrupts uses. That allows us to find out whether or not the access is in range. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: kill VGIC_MAX_CPUSMarc Zyngier2-5/+4
We now have the information about the number of CPU interfaces in the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just rely on KVM_MAX_VCPUS where we don't have the choice. Yet. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: Parametrize VGIC_NR_SHARED_IRQSMarc Zyngier2-6/+11
Having a dynamic number of supported interrupts means that we cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore. Instead, make it take the distributor structure as a parameter, so it can return the right value. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18arm/arm64: KVM: vgic: switch to dynamic allocationMarc Zyngier3-53/+269
So far, all the VGIC data structures are statically defined by the *maximum* number of vcpus and interrupts it supports. It means that we always have to oversize it to cater for the worse case. Start by changing the data structures to be dynamically sizeable, and allocate them at runtime. The sizes are still very static though. Signed-off-by: Marc Zyngier <[email protected]>
2014-09-18KVM: ARM: vgic: plug irq injection raceMarc Zyngier1-1/+2
As it stands, nothing prevents userspace from injecting an interrupt before the guest's GIC is actually initialized. This goes unnoticed so far (as everything is pretty much statically allocated), but ends up exploding in a spectacular way once we switch to a more dynamic allocation (the GIC data structure isn't there yet). The fix is to test for the "ready" flag in the VGIC distributor before trying to inject the interrupt. Note that in order to avoid breaking userspace, we have to ignore what is essentially an error. Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: vgic: Clarify and correct vgic documentationChristoffer Dall1-6/+7
The VGIC virtual distributor implementation documentation was written a very long time ago, before the true nature of the beast had been partially absorbed into my bloodstream. Clarify the docs. Plus, it fixes an actual bug. ICFRn, pfff. Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: vgic: Fix SGI writes to GICD_I{CS}PENDR0Christoffer Dall1-2/+16
Writes to GICD_ISPENDR0 and GICD_ICPENDR0 ignore all settings of the pending state for SGIs. Make sure the implementation handles this correctly. Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: vgic: Improve handling of GICD_I{CS}PENDRnChristoffer Dall2-12/+123
Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled correctly for level-triggered interrupts. The spec states that for level-triggered interrupts, writes to the GICD_ISPENDRn activate the output of a flip-flop which is in turn or'ed with the actual input interrupt signal. Correspondingly, writes to GICD_ICPENDRn simply deactivates the output of that flip-flop, but does not (of course) affect the external input signal. Reads from GICC_IAR will also deactivate the flip-flop output. This requires us to track the state of the level-input separately from the state in the flip-flop. We therefore introduce two new variables on the distributor struct to track these two states. Astute readers may notice that this is introducing more state than required (because an OR of the two states gives you the pending state), but the remaining vgic code uses the pending bitmap for optimized operations to figure out, at the end of the day, if an interrupt is pending or not on the distributor side. Refactoring the code to consider the two state variables all the places where we currently access the precomputed pending value, did not look pretty. Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: vgic: Clear queued flags on unqueueChristoffer Dall1-1/+3
If we unqueue a level-triggered interrupt completely, and the LR does not stick around in the active state (and will therefore no longer generate a maintenance interrupt), then we should clear the queued flag so that the vgic can actually queue this level-triggered interrupt at a later time and deal with its pending state then. Note: This should actually be properly fixed to handle the active state on the distributor. Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: Rename irq_active to irq_queuedChristoffer Dall2-16/+21
We have a special bitmap on the distributor struct to keep track of when level-triggered interrupts are queued on the list registers. This was named irq_active, which is confusing, because the active state of an interrupt as per the GIC spec is a different thing, not specifically related to edge-triggered/level-triggered configurations but rather indicates an interrupt which has been ack'ed but not yet eoi'ed. Rename the bitmap and the corresponding accessor functions to irq_queued to clarify what this is actually used for. Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18arm/arm64: KVM: Rename irq_state to irq_pendingChristoffer Dall2-28/+28
The irq_state field on the distributor struct is ambiguous in its meaning; the comment says it's the level of the input put, but that doesn't make much sense for edge-triggered interrupts. The code actually uses this state variable to check if the interrupt is in the pending state on the distributor so clarify the comment and rename the actual variable and accessor methods. Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2014-09-18Merge remote-tracking branch 'kvm/next' into queueChristoffer Dall46-1179/+1293
Conflicts: arch/arm64/include/asm/kvm_host.h virt/kvm/arm/vgic.c
2014-09-17kvm: Make init_rmode_identity_map() return 0 on success.Tang Chen1-10/+8
In init_rmode_identity_map(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_create_vcpu(), and ret is redundant. This patch removes the redundant variable, and makes init_rmode_identity_map() return 0 on success, -errno on failure. Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17kvm: Remove ept_identity_pagetable from struct kvm_arch.Tang Chen3-28/+22
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17KVM: VFIO: register kvm_device_ops dynamicallyWill Deacon3-12/+15
Now that we have a dynamic means to register kvm_device_ops, use that for the VFIO kvm device, instead of relying on the static table. This is achieved by a module_init call to register the ops with KVM. Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Acked-by: Alex Williamson <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17KVM: s390: register flic ops dynamicallyCornelia Huck4-6/+3
Using the new kvm_register_device_ops() interface makes us get rid of an #ifdef in common code. Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17KVM: ARM: vgic: register kvm_device_ops dynamicallyWill Deacon3-83/+79
Now that we have a dynamic means to register kvm_device_ops, use that for the ARM VGIC, instead of relying on the static table. Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Acked-by: Marc Zyngier <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-17KVM: device: add simple registration mechanism for kvm_device_opsWill Deacon3-33/+55
kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson <[email protected]> Cc: Alex Graf <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Marc Zyngier <[email protected]> Acked-by: Cornelia Huck <[email protected]> Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-16kvm: ioapic: conditionally delay irq delivery duringeoi broadcastZhang Haoyu3-2/+66
Currently, we call ioapic_service() immediately when we find the irq is still active during eoi broadcast. But for real hardware, there's some delay between the EOI writing and irq delivery. If we do not emulate this behavior, and re-inject the interrupt immediately after the guest sends an EOI and re-enables interrupts, a guest might spend all its time in the ISR if it has a broken handler for a level-triggered interrupt. Such livelock actually happens with Windows guests when resuming from hibernation. As there's no way to recognize the broken handle from new raised ones, this patch delays an interrupt if 10.000 consecutive EOIs found that the interrupt was still high. The guest can then make a little forward progress, until a proper IRQ handler is set or until some detection routine in the guest (such as Linux's note_interrupt()) recognizes the situation. Cc: Michael S. Tsirkin <[email protected]> Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Zhang Haoyu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-16KVM: x86: Use kvm_make_request when applicableGuo Hui Liu1-8/+7
This patch replace the set_bit method by kvm_make_request to make code more readable and consistent. Signed-off-by: Guo Hui Liu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-11KVM: EVENTFD: remove inclusion of irq.hEric Auger1-1/+0
No more needed. irq.h would be void on ARM. Acked-by: Paolo Bonzini <[email protected]> Signed-off-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-11ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()Ard Biesheuvel3-26/+10
The ISS encoding for an exception from a Data Abort has a WnR bit[6] that indicates whether the Data Abort was caused by a read or a write instruction. While there are several fields in the encoding that are only valid if the ISV bit[24] is set, WnR is not one of them, so we can read it unconditionally. Instead of fixing both implementations of kvm_is_write_fault() in place, reimplement it just once using kvm_vcpu_dabt_iswrite(), which already does the right thing with respect to the WnR bit. Also fix up the callers to pass 'vcpu' Acked-by: Laszlo Ersek <[email protected]> Acked-by: Marc Zyngier <[email protected]> Acked-by: Christoffer Dall <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2014-09-11KVM: x86: make apic_accept_irq tracepoint more genericPaolo Bonzini2-9/+6
Initially the tracepoint was added only to the APIC_DM_FIXED case, also because it reported coalesced interrupts that only made sense for that case. However, the coalesced argument is not used anymore and tracing other delivery modes is useful, so hoist the call out of the switch statement. Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-11kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.Tang Chen2-4/+5
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of apic access page. So use this macro. Signed-off-by: Tang Chen <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-11Merge tag 'kvm-s390-next-20140910' of ↵Paolo Bonzini6-20/+74
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: Fixes and features for next (3.18) 1. Crypto/CPACF support: To enable the MSA4 instructions we have to provide a common control structure for each SIE control block 2. Two cleanups found by a static code checker: one redundant assignment and one useless if 3. Fix the page handling of the diag10 ballooning interface. If the guest freed the pages at absolute 0 some checks and frees were incorrect 4. Limit guests to 16TB 5. Add __must_check to interrupt injection code
2014-09-10KVM: s390/interrupt: remove double assignmentChristian Borntraeger1-1/+0
r is already initialized to 0. Signed-off-by: Christian Borntraeger <[email protected]> Reviewed-by: Thomas Huth <[email protected]>
2014-09-10KVM: s390/cmm: Fix prefix handling for diag 10 balloonChristian Borntraeger1-8/+18
The old handling of prefix pages was broken in the diag10 ballooner. We now rely on gmap_discard to check for start > end and do a slow path if the prefix swap pages are affected: 1. discard the pages from start to prefix 2. discard the absolute 0 pages 3. discard the pages after prefix swap to end Signed-off-by: Christian Borntraeger <[email protected]> Reviewed-by: Thomas Huth <[email protected]>
2014-09-10KVM: s390: get rid of constant condition in ipte_unlock_simpleChristian Borntraeger1-2/+1
Due to the earlier check we know that ipte_lock_count must be 0. No need to add a useless if. Let's make clear that we are going to always wakeup when we execute that code. Signed-off-by: Christian Borntraeger <[email protected]> Acked-by: Heiko Carstens <[email protected]>
2014-09-10KVM: s390: unintended fallthrough for external callChristian Borntraeger1-0/+1
We must not fallthrough if the conditions for external call are not met. Signed-off-by: Christian Borntraeger <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Cc: [email protected]
2014-09-10KVM: s390: Limit guest size to 16TBChristian Borntraeger1-1/+1
Currently we fill up a full 5 level page table to hold the guest mapping. Since commit "support gmap page tables with less than 5 levels" we can do better. Having more than 4 TB might be useful for some testing scenarios, so let's just limit ourselves to 16TB guest size. Having more than that is totally untested as I do not have enough swap space/memory. We continue to allow ucontrol the full size. Signed-off-by: Christian Borntraeger <[email protected]> Acked-by: Cornelia Huck <[email protected]> Cc: Martin Schwidefsky <[email protected]>
2014-09-10KVM: s390: add __must_check to interrupt deliver functionsChristian Borntraeger2-7/+7
We now propagate interrupt injection errors back to the ioctl. We should mark functions that might fail with __must_check. Signed-off-by: Christian Borntraeger <[email protected]> Acked-by: Jens Freimann <[email protected]>
2014-09-10KVM: CPACF: Enable MSA4 instructions for kvm guestTony Krowiak2-1/+46
We have to provide a per guest crypto block for the CPUs to enable MSA4 instructions. According to icainfo on z196 or later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192 and CMAC-AES-256. Signed-off-by: Tony Krowiak <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Michael Mueller <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]> [split MSA4/protected key into two patches]
2014-09-10KVM: fix api documentation of KVM_GET_EMULATED_CPUIDAlex Bennée1-70/+70
It looks like when this was initially merged it got accidentally included in the following section. I've just moved it back in the correct section and re-numbered it as other ioctls have been added since. Signed-off-by: Alex Bennée <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-10KVM: document KVM_SET_GUEST_DEBUG apiAlex Bennée1-0/+44
In preparation for working on the ARM implementation I noticed the debug interface was missing from the API document. I've pieced together the expected behaviour from the code and commit messages written it up as best I can. Signed-off-by: Alex Bennée <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-05KVM: remove redundant assignments in __kvm_set_memory_regionChristian Borntraeger1-3/+0
__kvm_set_memory_region sets r to EINVAL very early. Doing it again is not necessary. The same is true later on, where r is assigned -ENOMEM twice. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-05KVM: remove redundant assigment of return value in kvm_dev_ioctlChristian Borntraeger1-2/+0
The first statement of kvm_dev_ioctl is long r = -EINVAL; No need to reassign the same value. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-05KVM: remove redundant check of in_spin_loopChristian Borntraeger1-2/+1
The expression `vcpu->spin_loop.in_spin_loop' is always true, because it is evaluated only when the condition `!vcpu->spin_loop.in_spin_loop' is false. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-05KVM: x86: propagate exception from permission checks on the nested page faultPaolo Bonzini4-11/+16
Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2014-09-05KVM: x86: skip writeback on injection of nested exceptionPaolo Bonzini2-6/+10
If a nested page fault happens during emulation, we will inject a vmexit, not a page fault. However because writeback happens after the injection, we will write ctxt->eip from L2 into the L1 EIP. We do not write back if an instruction caused an interception vmexit---do the same for page faults. Suggested-by: Gleb Natapov <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>