Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads. When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set. The MMU didn't know this detail,
so this patch adds it.
We also have to fix up the exit qualification. It may be wrong because
KVM sets bit 6 but the guest might not.
L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware. The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.
Signed-off-by: Paolo Bonzini <[email protected]>
[Fixed typo in walk_addr_generic() comment and changed bit clear +
conditional-set pattern in handle_ept_violation() to conditional-clear]
Signed-off-by: Radim Krčmář <[email protected]>
|
|
Rename the EPT_VIOLATION_READ/WRITE/INSTR constants to
EPT_VIOLATION_ACC_READ/WRITE/INSTR to more clearly indicate that these
signify the type of the memory access as opposed to the permissions
granted by the PTE.
Signed-off-by: Junaid Shahid <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This change implements lockless access tracking for Intel CPUs without EPT
A bits. This is achieved by marking the PTEs as not-present (but not
completely clearing them) when clear_flush_young() is called after marking
the pages as accessed. When an EPT Violation is generated as a result of
the VM accessing those pages, the PTEs are restored to their original values.
Signed-off-by: Junaid Shahid <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
MMIO SPTEs currently set both bits 62 and 63 to distinguish them as special
PTEs. However, bit 63 is used as the SVE bit in Intel EPT PTEs. The SVE bit
is ignored for misconfigured PTEs but not necessarily for not-Present PTEs.
Since MMIO SPTEs use an EPT misconfiguration, so using bit 63 for them is
acceptable. However, the upcoming fast access tracking feature adds another
type of special tracking PTE, which uses not-Present PTEs and hence should
not set bit 63.
In order to use common bits to distinguish both type of special PTEs, we
now use only bit 62 as the special bit.
Signed-off-by: Junaid Shahid <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This change adds some symbolic constants for VM Exit Qualifications
related to EPT Violations and updates handle_ept_violation() to use
these constants instead of hard-coded numbers.
Signed-off-by: Junaid Shahid <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The VMX capability MSRs advertise the set of features the KVM virtual
CPU can support. This set of features varies across different host CPUs
and KVM versions. This patch aims to addresses both sources of
differences, allowing VMs to be migrated across CPUs and KVM versions
without guest-visible changes to these MSRs. Note that cross-KVM-
version migration is only supported from this point forward.
When the VMX capability MSRs are restored, they are audited to check
that the set of features advertised are a subset of what KVM and the
CPU support.
Since the VMX capability MSRs are read-only, they do not need to be on
the default MSR save/restore lists. The userspace hypervisor can set
the values of these MSRs or read them from KVM at VCPU creation time,
and restore the same value after every save/restore.
Signed-off-by: David Matlack <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
- Remove VMX_EPT_EXTENT_INDIVIDUAL_ADDR, since there is no such type of
EPT invalidation
- Add missing VPID types names
Signed-off-by: Jan Dakinevich <[email protected]>
Tested-by: Ladi Prosek <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
|
|
These are never used by the host, but they can still be reflected to
the guest.
Tested-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e.
Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM. Remove
their usage from KVM.
Cc: Xiao Guangrong <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Ross Zwisler <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: Haozhong Zhang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add the INVVPID instruction emulation.
Reviewed-by: Wincy Van <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction
Currently we do not catch pcommit instruction for L1 guest and
allow L1 to catch this instruction for L2 if, as required by the spec,
L1 can enumerate the PCOMMIT instruction via CPUID:
| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the
| 1-setting of PCOMMIT exiting) is always the same as
| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting
| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID
The spec can be found at
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
VMX encodes access rights differently from LAR, and the latter is
most likely what x86 people think of when they think of "access
rights".
Rename them to avoid confusion.
Cc: [email protected]
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Allow a nested hypervisor to single step its guests.
Signed-off-by: Mihai Donțu <[email protected]>
[Fix overlong line. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.
Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Xiao Guangrong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Initialize the XSS exit bitmap. It is zero so there should be no XSAVES
or XRSTORS exits.
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The spec says those controls are at bit position 2 - makes 4 as value.
The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <[email protected]>
Date: Mon, 24 Feb 2014 18:11:11 +0800
Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle
This patch handle vmx and msr of Intel MPX feature.
Signed-off-by: Xudong Hao <[email protected]>
Signed-off-by: Liu Jinsong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We can easily emulate the HLT activity state for L1: If it decides that
L2 shall be halted on entry, just invoke the normal emulation of halt
after switching to L2. We do not depend on specific host features to
provide this, so we can expose the capability unconditionally.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If we let L1 use EPT, we should probably also support the INVEPT instruction.
In our current nested EPT implementation, when L1 changes its EPT table
for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in
the course of this modification already calls INVEPT. But if last level
of shadow page is unsync not all L1's changes to EPT12 are intercepted,
which means roots need to be synced when L1 calls INVEPT. Global INVEPT
should not be different since roots are synced by kvm_mmu_load() each
time EPTP02 changes.
Reviewed-by: Xiao Guangrong <[email protected]>
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Jun Nakajima <[email protected]>
Signed-off-by: Xinhao Xu <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add definitions for all the vmcs control fields/bits
required to enable vmcs-shadowing
Signed-off-by: Abel Gordon <[email protected]>
Reviewed-by: Orit Wasserman <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
Signed-off-by: Yang Zhang <[email protected]>
Reviewed-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Provided the host has this feature, it's straightforward to offer it to
the guest as well. We just need to load to timer value on L2 entry if
the feature was enabled by L1 and watch out for the corresponding exit
reason.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
We will need EFER.LMA saving to provide unrestricted guest mode. All
what is missing for this is picking up EFER.LMA from VM_ENTRY_CONTROLS
on L2->L1 switches. If the host does not support EFER.LMA saving,
no change is performed, otherwise we properly emulate for L1 what the
hardware does for L0. Advertise the support, depending on the host
feature.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Only interrupt and NMI exiting are mandatory for KVM to work, thus can
be exposed to the guest unconditionally, virtual NMI exiting is
optional. So we must not advertise it unless the host supports it.
Introduce the symbolic constant PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR at
this chance.
Reviewed-by:: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Properly set those bits to 1 that the spec demands in case bit 55 of
VMX_BASIC is 0 - like in our case.
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"
* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
|
|
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:
- for pending interrupt, instead of direct injection, we may need
update architecture specific indicators before resuming to guest.
- A pending interrupt, which is masked by ISR, should be also
considered in above update action, since hardware will decide
when to inject it at right time. Current has_interrupt and
get_interrupt only returns a valid vector from injection p.o.v.
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.
Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
except APIC ID and TMCCT which need software's assistance to
get right value.
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Signed-off-by: David Howells <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Michael Kerrisk <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Dave Jones <[email protected]>
|
|
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is
the user accessible slots and the other is user + private. Make this
more obvious.
Reviewed-by: Gleb Natapov <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability
reporting, so remove it from KVM to avoid conflicts in future.
Signed-off-by: Zhang Xiantao <[email protected]>
Signed-off-by: Gleb Natapov <[email protected]>
|
|
Exporting KVM exit information to userspace to be consumed by perf.
Signed-off-by: Dong Hao <[email protected]>
[ Dong Hao <[email protected]>: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong <[email protected]>
Acked-by: Marcelo Tosatti <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: [email protected]
Cc: Runzhen Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch handles PCID/INVPCID for guests.
Process-context identifiers (PCIDs) are a facility by which a logical processor
may cache information for multiple linear-address spaces so that the processor
may retain cached information when software switches to a different linear
address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
Volume 3A for details.
For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
natively.
For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
Signed-off-by: Junjie Mao <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Signed-off-by: Haitao Shan <[email protected]>
Signed-off-by: Xudong Hao <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Instruction emulation for EOI writes can be skipped, since sane
guest simply uses MOV instead of string operations. This is a nice
improvement when guest doesn't support x2apic or hyper-V EOI
support.
a single VM bandwidth is observed with ~8% bandwidth improvement
(7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation.
Signed-off-by: Kevin Tian <[email protected]>
<Based on earlier work from>:
Signed-off-by: Eddie Dong <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
This patch adds a bunch of tests of the validity of the vmcs12 fields,
according to what the VMX spec and our implementation allows. If fields
we cannot (or don't want to) honor are discovered, an entry failure is
emulated.
According to the spec, there are two types of entry failures: If the problem
was in vmcs12's host state or control fields, the VMLAUNCH instruction simply
fails. But a problem is found in the guest state, the behavior is more
similar to that of an exit.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch implements nested_vmx_vmexit(), called when the nested L2 guest
exits and we want to run its L1 parent and let it handle this exit.
Note that this will not necessarily be called on every L2 exit. L0 may decide
to handle a particular exit on its own, without L1's involvement; In that
case, L0 will handle the exit, and resume running L2, without running L1 and
without calling nested_vmx_vmexit(). The logic for deciding whether to handle
a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(),
will appear in a separate patch below.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
VMX instructions specify success or failure by setting certain RFLAGS bits.
This patch contains common functions to do this, and they will be used in
the following patches which emulate the various VMX instructions.
Signed-off-by: Nadav Har'El <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling. There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.
Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.
Signed-off-by: Anthony Liguori <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
Add all-context INVVPID type support.
Signed-off-by: Gui Jianfeng <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
|
|
instruction
According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.
Signed-off-by: Gui Jianfeng <[email protected]>
Reviewed-by: Sheng Yang <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|