aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-07-05KVM: MMU: try to fix up page faults before giving upPaolo Bonzini2-3/+43
The vGPU folks would like to trap the first access to a BAR by setting vm_ops on the VMAs produced by mmap-ing a VFIO device. The fault handler then can use remap_pfn_range to place some non-reserved pages in the VMA. This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn and fixup_user_fault together help supporting it. The patch also supports VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to reference counting. Cc: Xiao Guangrong <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Radim Krčmář <[email protected]> Tested-by: Neo Jia <[email protected]> Reported-by: Kirti Wankhede <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-05KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP framesPaolo Bonzini1-5/+15
Handle VM_IO like VM_PFNMAP, as is common in the rest of Linux; extract the formula to convert hva->pfn into a new function, which will soon gain more capabilities. Cc: Xiao Guangrong <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Radim Krčmář <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-05KVM: s390: inject PER i-fetch events on applicable icptsDavid Hildenbrand3-3/+34
In case we have to emuluate an instruction or part of it (instruction, partial instruction, operation exception), we have to inject a PER instruction-fetching event for that instruction, if hardware told us to do so. In case we retry an instruction, we must not inject the PER event. Please note that we don't filter the events properly yet, so guest debugging will be visible for the guest. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-07-03arm/arm64: Get rid of KERN_TO_HYPMarc Zyngier4-13/+10
We have both KERN_TO_HYP and kern_hyp_va, which do the exact same thing. Let's standardize on the latter. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Check that IDMAP doesn't intersect with VA rangeMarc Zyngier1-0/+15
This is more of a safety measure than anything else: If we end-up with an idmap page that intersect with the range picked for the the HYP VA space, abort the KVM setup, as it is unsafe to go further. I cannot imagine it happening on 64bit (we have a mechanism to work around it), but could potentially occur on a 32bit system with the kernel loaded high enough in memory so that in conflicts with the kernel VA. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Prune unused #definesMarc Zyngier2-19/+0
We can now remove a number of dead #defines, thanks to the trampoline code being gone. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm: KVM: Allow hyp teardownMarc Zyngier5-7/+24
So far, KVM was getting in the way of kexec on 32bit (and the arm64 kexec hackers couldn't be bothered to fix it on 32bit...). With simpler page tables, tearing KVM down becomes very easy, so let's just do it. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm: KVM: Simplify HYP initMarc Zyngier2-50/+14
Just like for arm64, we can now make the HYP setup a lot simpler, and we can now initialise it in one go (instead of the two phases we currently have). Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Kill free_boot_hyp_pgdMarc Zyngier4-29/+7
There is no way to free the boot PGD, because it doesn't exist anymore as a standalone entity. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Drop boot_pgdMarc Zyngier6-31/+8
Since we now only have one set of page tables, the concept of boot_pgd is useless and can be removed. We still keep it as an element of the "extended idmap" thing. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Simplify HYP init/teardownMarc Zyngier5-92/+26
Now that we only have the "merged page tables" case to deal with, there is a bunch of things we can simplify in the HYP code (both at init and teardown time). Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Always have merged page tablesMarc Zyngier2-64/+41
We're in a position where we can now always have "merged" page tables, where both the runtime mapping and the idmap coexist. This results in some code being removed, but there is more to come. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Runtime detection of lower HYP offsetMarc Zyngier1-0/+19
Add the code that enables the switch to the lower HYP VA range. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Export __hyp_text_start/end symbolsMarc Zyngier3-2/+8
Declare the __hyp_text_start/end symbols in asm/virt.h so that they can be reused without having to declare them locally. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Refactor kern_hyp_va to deal with multiple offsetsMarc Zyngier2-14/+39
As we move towards a selectable HYP VA range, it is obvious that we don't want to test a variable to find out if we need to use the bottom VA range, the top VA range, or use the address as is (for VHE). Instead, we can expand our current helper to generate the right mask or nop with code patching. We default to using the top VA space, with alternatives to switch to the bottom one or to nop out the instructions. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Define HYP offset masksMarc Zyngier1-2/+6
Define the two possible HYP VA regions in terms of VA_BITS, and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility definition. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: Add ARM64_HYP_OFFSET_LOW capabilityMarc Zyngier1-1/+2
As we need to indicate to the rest of the kernel which region of the HYP VA space is safe to use, add a capability that will indicate that KVM should use the [VA_BITS-2:0] range. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Kill HYP_PAGE_OFFSETMarc Zyngier1-2/+1
HYP_PAGE_OFFSET is not massively useful. And the way we use it in KERN_HYP_VA is inconsistent with the equivalent operation in EL2, where we use a mask instead. Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK, and get rid of the pointless macro. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm/arm64: KVM: Remove hyp_kern_va helperMarc Zyngier2-13/+0
hyp_kern_va is now completely unused, so let's remove it entirely. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Always reference __hyp_panic_string via its kernel VAMarc Zyngier1-2/+9
__hyp_panic_string is passed via the HYP panic code to the panic function, and is being "upgraded" to a kernel address, as it is referenced by the HYP code (in a PC-relative way). This is a bit silly, and we'd be better off obtaining the kernel address and not mess with it at all. This patch implements this with a tiny bit of asm glue, by forcing the string pointer to be read from the literal pool. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03arm64: KVM: Merged page tables documentationMarc Zyngier1-3/+37
Since dealing with VA ranges tends to hurt my brain badly, let's start with a bit of documentation that will hopefully help understanding what comes next... Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-03KVM: arm/arm64: The GIC is dead, long live the GICMarc Zyngier13-5603/+130
I don't think any single piece of the KVM/ARM code ever generated as much hatred as the GIC emulation. It was written by someone who had zero experience in modeling hardware (me), was riddled with design flaws, should have been scrapped and rewritten from scratch long before having a remote chance of reaching mainline, and yet we supported it for a good three years. No need to mention the names of those who suffered, the git log is singing their praises. Thankfully, we now have a much more maintainable implementation, and we can safely put the grumpy old GIC to rest. Fellow hackers, please raise your glass in memory of the GIC: The GIC is dead, long live the GIC! Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-07-01KVM: vmx: fix missed cancellation of TSC deadline timerWanpeng Li1-24/+24
INFO: rcu_sched detected stalls on CPUs/tasks: 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719) Task dump for CPU 1: qemu-system-x86 R running task 0 3529 3525 0x00080808 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9 Call Trace: ? kvm_write_guest_cached+0xb9/0x160 [kvm] ? __delay+0xf/0x20 ? wait_lapic_expire+0x14a/0x200 [kvm] ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm] ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm] ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0x5/0x210 ? do_vfs_ioctl+0x96/0x6a0 ? __fget_light+0x2a/0x90 ? SyS_ioctl+0x79/0x90 ? do_syscall_64+0x7c/0x1e0 ? entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced readily by running a full dynticks guest(since hrtimer in guest is heavily used) w/ lapic_timer_advance disabled. If fail to program hardware preemption timer, we will fallback to hrtimer based method, however, a previous programmed preemption timer miss to cancel in this scenario which results in one hardware preemption timer and one hrtimer emulated tsc deadline timer run simultaneously. So sometimes the target guest deadline tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set a huge value, then host soft lockup as above. This patch fix it by cancelling the previous programmed preemption timer if there is once we failed to program the new preemption timer and fallback to hrtimer based method. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Yunhong Jiang <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-01KVM: x86: introduce cancel_hv_tscdeadlineWanpeng Li1-8/+10
Introduce cancel_hv_tscdeadline() to encapsulate preemption timer cancel stuff. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Yunhong Jiang <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-01KVM: vmx: fix underflow in TSC deadline calculationPaolo Bonzini1-3/+3
If the TSC deadline timer is programmed really close to the deadline or even in the past, the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set to a huge value. This generally results in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting delta_tsc to be positive or zero. Reported-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-01KVM: x86: use guest_exit_irqoffPaolo Bonzini3-12/+9
This gains a few clock cycles per vmexit. On Intel there is no need anymore to enable the interrupts in vmx_handle_external_intr, since we are using the "acknowledge interrupt on exit" feature. AMD needs to do that, and must be careful to avoid the interrupt shadow. Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-01KVM: x86: always use "acknowledge interrupt on exit"Paolo Bonzini1-4/+3
This is necessary to simplify handle_external_intr in the next patch. Signed-off-by: Paolo Bonzini <[email protected]>
2016-07-01KVM: remove kvm_guest_enter/exit wrappersPaolo Bonzini10-41/+19
Use the functions from context_tracking.h directly. Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-29arm/arm64: KVM: Make default HYP mappings non-excutableMarc Zyngier2-2/+2
Structures that can be generally written to don't have any requirement to be executable (quite the opposite). This includes the kvm and vcpu structures, as well as the stacks. Let's change the default to incorporate the XN flag. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-06-29arm/arm64: KVM: Map the HYP text as read-onlyMarc Zyngier4-4/+6
There should be no reason for mapping the HYP text read/write. As such, let's have a new set of flags (PAGE_HYP_EXEC) that allows execution, but makes the page as read-only, and update the two call sites that deal with mapping code. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-06-29arm/arm64: KVM: Enforce HYP read-only mapping of the kernel's rodata sectionMarc Zyngier3-1/+3
In order to be able to use C code in HYP, we're now mapping the kernel's rodata in HYP. It works absolutely fine, except that we're mapping it RWX, which is not what it should be. Add a new HYP_PAGE_RO protection, and pass it as the protection flags when mapping the rodata section. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-06-29arm64: Add PTE_HYP_XN page table flagMarc Zyngier1-0/+1
EL2 page tables can be configured to deny code from being executed, which is done by setting bit 54 in the page descriptor. It is the same bit as PTE_UXN, but the "USER" reference felt odd in the hypervisor code. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-06-29arm/arm64: KVM: Add a protection parameter to create_hyp_mappingsMarc Zyngier4-10/+12
Currently, create_hyp_mappings applies a "one size fits all" page protection (PAGE_HYP). As we're heading towards separate protections for different sections, let's make this protection a parameter, and let the callers pass their prefered protection (PAGE_HYP for everyone for the time being). Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Christoffer Dall <[email protected]>
2016-06-28context_tracking: move rcu_virt_note_context_switch out of kvm_host.hPaolo Bonzini2-25/+38
Make kvm_guest_{enter,exit} and __kvm_guest_{enter,exit} trivial wrappers around the code in context_tracking.h. Name the context_tracking.h functions consistently with what those for kernel<->user switch. Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-23MIPS: KVM: Combine entry trace events into classJames Hogan1-40/+22
Combine the kvm_enter, kvm_reenter and kvm_out trace events into a single kvm_transition event class to reduce duplication and bloat. Suggested-by: Steven Rostedt <[email protected]> Fixes: 93258604ab6d ("MIPS: KVM: Add guest mode switch trace events") Signed-off-by: James Hogan <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-23kvm: x86: use getboottime64Arnd Bergmann1-5/+5
KVM reads the current boottime value as a struct timespec in order to calculate the guest wallclock time, resulting in an overflow in 2038 on 32-bit systems. The data then gets passed as an unsigned 32-bit number to the guest, and that in turn overflows in 2106. We cannot do much about the second overflow, which affects both 32-bit and 64-bit hosts, but we can ensure that they both behave the same way and don't overflow until 2106, by using getboottime64() to read a timespec64 value. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-23KVM: VMX: enable guest access to LMCE related MSRsAshok Raj3-6/+43
On Intel platforms, this patch adds LMCE to KVM MCE supported capabilities and handles guest access to LMCE related MSRs. Signed-off-by: Ashok Raj <[email protected]> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported Only enable LMCE on Intel platform Check MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL] Signed-off-by: Haozhong Zhang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-23KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROLHaozhong Zhang1-1/+24
KVM currently does not check the value written to guest MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features may be set. This patch makes KVM to validate individual bits written to guest MSR_IA32_FEATURE_CONTROL according to enabled features. Signed-off-by: Haozhong Zhang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-23KVM: VMX: move msr_ia32_feature_control to vcpu_vmxHaozhong Zhang1-7/+6
msr_ia32_feature_control will be used for LMCE and not depend only on nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. Signed-off-by: Haozhong Zhang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-06-21Merge tag 'kvm-s390-next-4.8-2' of ↵Paolo Bonzini23-133/+3211
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next) With an updated QEMU this allows to create nested KVM guests (KVM under KVM) on s390. s390 memory management changes from Martin Schwidefsky or acked by Martin. One common code memory management change (pageref) acked by Andrew Morton. The feature has to be enabled with the nested medule parameter.
2016-06-21KVM: s390: vsie: add module parameter "nested"David Hildenbrand1-1/+6
Let's be careful first and allow nested virtualization only if enabled by the system administrator. In addition, user space still has to explicitly enable it via SCLP features for it to work. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: add indication for future featuresDavid Hildenbrand2-0/+22
We have certain SIE features that we cannot support for now. Let's add these features, so user space can directly prepare to enable them, so we don't have to update yet another component. In addition, add a comment block, telling why it is for now not possible to forward/enable these features. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: correctly set and handle guest TODDavid Hildenbrand2-0/+11
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore, we have to add the guest 2 epoch to the guest 3 epoch. We also have to take care of guest 2 epoch changes on STP syncs. This will work just fine by also updating the guest 3 epoch when a vsie_block has been set for a VCPU. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: speed up VCPU external callsDavid Hildenbrand1-0/+6
Whenever a SIGP external call is injected via the SIGP external call interpretation facility, the VCPU is not kicked. When a VCPU is currently in the VSIE, the external call might not be processed immediately. Therefore we have to provoke partial execution exceptions, which leads to a kick of the VCPU and therefore also kick out of VSIE. This is done by simulating the WAIT state. This bit has no other side effects. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: don't use CPUSTAT_WAIT to detect if a VCPU is idleDavid Hildenbrand2-6/+6
As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but to force interception of external calls, let's check in the bitmap instead. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: speed up VCPU irq delivery when handling vsieDavid Hildenbrand4-0/+43
Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we have to kick it out of vsie, so the request will be handled faster. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: try to refault after a reported fault to g2David Hildenbrand1-1/+23
We can avoid one unneeded SIE entry after we reported a fault to g2. Theoretically, g2 resolves the fault and we can create the shadow mapping directly, instead of failing again when entering the SIE. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: support IBS interpretationDavid Hildenbrand3-0/+5
We can easily enable ibs for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: support conditional-external-interceptionDavid Hildenbrand3-0/+5
We can easily enable cei for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-06-21KVM: s390: vsie: support intervention-bypassDavid Hildenbrand3-0/+5
We can easily enable intervention bypass for guest 2, so it can use it for guest 3. Acked-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>