Age | Commit message (Collapse) | Author | Files | Lines |
|
The spec says those controls are at bit position 2 - makes 4 as value.
The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <[email protected]>
Date: Mon, 24 Feb 2014 18:11:11 +0800
Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle
This patch handle vmx and msr of Intel MPX feature.
Signed-off-by: Xudong Hao <[email protected]>
Signed-off-by: Liu Jinsong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We can easily emulate the HLT activity state for L1: If it decides that
L2 shall be halted on entry, just invoke the normal emulation of halt
after switching to L2. We do not depend on specific host features to
provide this, so we can expose the capability unconditionally.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If we let L1 use EPT, we should probably also support the INVEPT instruction.
In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.
Reviewed-by: Xiao Guangrong <[email protected]>
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Jun Nakajima <[email protected]>
Signed-off-by: Xinhao Xu <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing
Signed-off-by: Abel Gordon <[email protected]>
Reviewed-by: Orit Wasserman <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
Signed-off-by: Yang Zhang <[email protected]>
Reviewed-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Provided the host has this feature, it's straightforward to offer it to
the guest as well. We just need to load to timer value on L2 entry if
the feature was enabled by L1 and watch out for the corresponding exit
reason.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
We will need EFER.LMA saving to provide unrestricted guest mode. All
what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS
on L2->L1 switches. If the host does not support EFER.LMA saving,
no change is performed, otherwise we properly emulate for L1 what the
hardware does for L0. Advertise the support, depending on the host
feature.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Only interrupt and NMI exiting are mandatory for KVM to work, thus can
be exposed to the guest unconditionally, virtual NMI exiting is
optional. So we must not advertise it unless the host supports it.
Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at
this chance.
Reviewed-by:: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Properly set those bits to 1 that the spec demands in case bit 55 of
VMX_BASIC is 0 - like in our case.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"
* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
|
|
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:
- for pending interrupt, instead of direct injection, we may need
update architecture specific indicators before resuming to guest.
- A pending interrupt, which is masked by ISR, should be also
considered in above update action, since hardware will decide
when to inject it at right time. Current has_interrupt and
get_interrupt only returns a valid vector from injection p.o.v.
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.
Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
except APIC ID and TMCCT which need software's assistance to
get right value.
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Signed-off-by: David Howells <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Michael Kerrisk <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Dave Jones <[email protected]>
|
|
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is
the user accessible slots and the other is user + private. Make this
more obvious.
Reviewed-by: Gleb Natapov <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability
reporting, so remove it from KVM to avoid conflicts in future.
Signed-off-by: Zhang Xiantao <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Exporting KVM exit information to userspace to be consumed by perf.
Signed-off-by: Dong Hao <[email protected]>
[ Dong Hao <[email protected]>: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong <[email protected]>
Acked-by: Marcelo Tosatti <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: [email protected]
Cc: Runzhen Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch handles PCID/INVPCID for guests.
Process-context identifiers (PCIDs) are a facility by which a logical processor
may cache information for multiple linear-address spaces so that the processor
may retain cached information when software switches to a different linear
address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
Volume 3A for details.
For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
natively.
For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
Signed-off-by: Junjie Mao <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Signed-off-by: Haitao Shan <[email protected]>
Signed-off-by: Xudong Hao <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Instruction emulation for EOI writes can be skipped, since sane
guest simply uses MOV instead of string operations. This is a nice
improvement when guest doesn't support x2apic or hyper-V EOI
support.
a single VM bandwidth is observed with ~8% bandwidth improvement
(7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation.
Signed-off-by: Kevin Tian <[email protected]>
<Based on earlier work from>:
Signed-off-by: Eddie Dong <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
This patch adds a bunch of tests of the validity of the vmcs12 fields,
according to what the VMX spec and our implementation allows. If fields
we cannot (or don't want to) honor are discovered, an entry failure is
emulated.
According to the spec, there are two types of entry failures: If the problem
was in vmcs12's host state or control fields, the VMLAUNCH instruction simply
fails. But a problem is found in the guest state, the behavior is more
similar to that of an exit.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch implements nested_vmx_vmexit(), called when the nested L2 guest
exits and we want to run its L1 parent and let it handle this exit.
Note that this will not necessarily be called on every L2 exit. L0 may decide
to handle a particular exit on its own, without L1's involvement; In that
case, L0 will handle the exit, and resume running L2, without running L1 and
without calling nested_vmx_vmexit(). The logic for deciding whether to handle
a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(),
will appear in a separate patch below.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
VMX instructions specify success or failure by setting certain RFLAGS bits.
This patch contains common functions to do this, and they will be used in
the following patches which emulate the various VMX instructions.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling. There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.
Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.
Signed-off-by: Anthony Liguori <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Add all-context INVVPID type support.
Signed-off-by: Gui Jianfeng <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
instruction
According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.
Signed-off-by: Gui Jianfeng <[email protected]>
Reviewed-by: Sheng Yang <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).
Signed-off-by: Mohammed Gamal <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Signed-off-by: Avi Kivity <[email protected]>
|
|
Signed-off-by: Avi Kivity <[email protected]>
|
|
Following the new SDM. Now the bit is named "Ignore PAT memory type".
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Before enabling, execution of "rdtscp" in guest would result in #UD.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.
Cc: [email protected]
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap - upper bound on the amount of time between two successive
executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
a PAUSE loop
If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.
Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.
Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Required for EPT misconfiguration handler.
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.
It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.
The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.
Signed-off-by: Nitin A Kamble <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
VT-x needs an explicit MC vector intercept to handle machine checks in the
hyper visor.
It also has a special option to catch machine checks that happen
during VT entry.
Do these interceptions and forward them to the Linux machine check
handler. Make it always look like user space is interrupted because
the machine check handler treats kernel/user space differently.
Thanks to Jiang Yunhong for help and testing.
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Huang Ying <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
So far KVM only had basic x86 debug register support, once introduced to
realize guest debugging that way. The guest itself was not able to use
those registers.
This patch now adds (almost) full support for guest self-debugging via
hardware registers. It refactors the code, moving generic parts out of
SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
it ensures that the registers are properly switched between host and
guest.
This patch also prepares debug register usage by the host. The latter
will (once wired-up by the following patch) allow for hardware
breakpoints/watchpoints in guest code. If this is enabled, the guest
will only see faked debug registers without functionality, but with
content reflecting the guest's modifications.
Tested on Intel only, but SVM /should/ work as well, but who knows...
Known limitations: Trapping on tss switch won't work - most probably on
Intel.
Credits also go to Joerg Roedel - I used his once posted debugging
series as platform for this patch.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
VMX differentiates between processor and software generated exceptions
when injecting them into the guest. Extend vmx_queue_exception
accordingly (and refactor related constants) so that we can use this
service reliably for the new guest debugging framework.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Those definitions will be used by code outside KVM, so move it outside
of a KVM-specific source file.
Those definitions are used only on kvm/vmx.c, that already includes
asm/vmx.h, so they can be moved safely.
Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
vmx.h will be used by core code that is independent of KVM, so I am
moving it outside the arch/x86/kvm directory.
Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|