Age | Commit message (Collapse) | Author | Files | Lines |
|
Introduces kvm_x86_call(), to streamline the usage of static calls of
kvm_x86_ops. The current implementation of these calls is verbose and
could lead to alignment challenges. This makes the code susceptible to
exceeding the "80 columns per single line of code" limit as defined in
the coding-style document. Another issue with the existing implementation
is that the addition of kvm_x86_ prefix to hooks at the static_call sites
hinders code readability and navigation. kvm_x86_call() is added to
improve code readability and maintainability, while adhering to the coding
style guidelines.
Signed-off-by: Wei Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Introduce the VM variable "nanoseconds per APIC bus cycle" in
preparation to make the APIC bus frequency configurable.
The TDX architecture hard-codes the core crystal clock frequency to
25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
does not allow the VMM to override the value.
In addition, per Intel SDM:
"The APIC timer frequency will be the processor’s bus clock or core
crystal clock frequency (when TSC/core crystal clock ratio is
enumerated in CPUID leaf 0x15) divided by the value specified in
the divide configuration register."
The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
APIC bus frequency of 1GHz.
Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare
for allowing userspace to tell KVM to use the frequency that TDX mandates
instead of the default 1Ghz. Doing so ensures that the guest doesn't have
a conflicting view of the APIC bus frequency.
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/ae75ce37c6c38bb4efd10a0a41932984c40b24ac.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Remove APIC_BUS_FREQUENCY and calculate it based on nanoseconds per APIC
bus cycle. APIC_BUS_FREQUENCY is used only for HV_X64_MSR_APIC_FREQUENCY.
The MSR is not frequently read, calculate it every time.
There are two constants related to the APIC bus frequency:
APIC_BUS_FREQUENCY and APIC_BUS_CYCLE_NS.
Only one value is required because one can be calculated from the other:
APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY.
Remove APIC_BUS_FREQUENCY and instead calculate it when needed.
This prepares for support of configurable APIC bus frequency by
requiring to change only a single variable.
Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
[reinette: rework changelog]
Signed-off-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/76a659d0898e87ebd73ee7c922f984a87a6ab370.1714081726.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs.
Because KVM CPU caps are initialized from the kernel boot CPU features this
makes the XSAVES feature also unavailable for KVM guests in this case.
At the same time the XSAVEC feature is left enabled.
Unfortunately, having XSAVEC but no XSAVES in CPUID breaks Hyper-V enabled
Windows Server 2016 VMs that have more than one vCPU.
Let's at least give users hint in the kernel log what could be wrong since
these VMs currently simply hang at boot with a black screen - giving no
clue what suddenly broke them and how to make them work again.
Trigger the kernel message hint based on the particular guest ID written to
the Guest OS Identity Hyper-V MSR implemented by KVM.
Defer this check to when the L1 Hyper-V hypervisor enables SVM in EFER
since we want to limit this message to Hyper-V enabled Windows guests only
(Windows session running nested as L2) but the actual Guest OS Identity MSR
write is done by L1 and happens before it enables SVM.
Fixes: b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
Signed-off-by: Maciej S. Szmigiero <[email protected]>
Message-Id: <b83ab45c5e239e5d148b0ae7750133a67ac9575c.1706127425.git.maciej.szmigiero@oracle.com>
[Move some checks before mutex_lock(), rename function. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Ever since the eventfd type was introduced back in 2007 in commit
e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal()
function only ever passed 1 as a value for @n. There's no point in
keeping that additional argument.
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Xu Yilun <[email protected]>
Acked-by: Andrew Donnellan <[email protected]> # ocxl
Acked-by: Eric Farman <[email protected]> # s390
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Don't apply the stimer's counter side effects when modifying its
value from user-space, as this may trigger spurious interrupts.
For example:
- The stimer is configured in auto-enable mode.
- The stimer's count is set and the timer enabled.
- The stimer expires, an interrupt is injected.
- The VM is live migrated.
- The stimer config and count are deserialized, auto-enable is ON, the
stimer is re-enabled.
- The stimer expires right away, and injects an unwarranted interrupt.
Cc: [email protected]
Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers")
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Fix compiler warnings when compiling KVM with [-Wunreachable-code-break].
No functional change intended.
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
KVM x86 PMU changes for 6.3:
- Add support for created masked events for the PMU filter to allow
userspace to heavily restrict what events the guest can use without
needing to create an absurd number of events
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel SPR
|
|
Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b
describes hypercalls above call code 0x8000 as extended hypercalls.
A Hyper-v hypervisor's guest VM finds availability of extended
hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the
guest can call extended hypercalls.
All extended hypercalls will exit to userspace by default. This allows
for easy support of future hypercalls without being dependent on KVM
releases.
If there will be need to process the hypercall in KVM instead of
userspace then KVM can create a capability which userspace can query to
know which hypercalls can be handled by the KVM and enable handling
of those hypercalls.
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Remove duplicate code to exit to userspace for hyper-v hypercalls and
use a common place to exit.
No functional change intended.
Signed-off-by: Vipin Sharma <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Add helpers to print unimplemented MSR accesses and condition all such
prints on report_ignored_msrs, i.e. honor userspace's request to not
print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited,
printing can still be problematic, e.g. if a print gets stalled when host
userspace is writing MSRs during live migration, an effective stall can
result in very noticeable disruption in the guest.
E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU
counters while the PMU was disabled in KVM.
- 99.75% 0.00% [.] __ioctl
- __ioctl
- 99.74% entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_ioctl
- do_vfs_ioctl
- 92.48% kvm_vcpu_ioctl
- kvm_arch_vcpu_ioctl
- 85.12% kvm_set_msr_ignored_check
svm_set_msr
kvm_set_msr_common
printk
vprintk_func
vprintk_default
vprintk_emit
console_unlock
call_console_drivers
univ8250_console_write
serial8250_console_write
uart_console_write
Reported-by: Aaron Lewis <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Normally, genuine Hyper-V doesn't expose architectural invariant TSC
(CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR
(HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID
feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the
PV MSR is set, invariant TSC bit starts to show up in CPUID. When the
feature is exposed to Hyper-V guests, reenlightenment becomes unneeded.
Add the feature to KVM. Keep CPUID output intact when the feature
wasn't exposed to L1 and implement the required logic for hiding
invariant TSC when the feature was exposed and invariant TSC control
MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to
disable the feature once it was enabled.
For the reference, for linux guests, support for the feature was added
in commit dce7cd62754b ("x86/hyperv: Allow guests to enable InvariantTSC").
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables
are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM
halves' and increase 'data_offset' otherwise. Coverity discovered, that in
one case both variables are incremented unconditionally. This doesn't seem
to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves'
data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also
takes into account 'hc->fast' but is still worth fixing.
To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to
'struct kvm_hv_hcall' as a union and use at call sites. This allows to
remove explicit 'data_offset'/'consumed_xmm_halves' parameters from
kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries()
helpers.
Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and
is not zeroed, consumers are supposed to initialize the appropriate field
if needed.
Reported-by: coverity-bot <[email protected]>
Addresses-Coverity-ID: 1527764 ("Uninitialized variables")
Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
With both nSVM and nVMX implementations in place, KVM can now expose
Hyper-V L2 TLB flush feature to userspace.
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Convert kvm_hv_get_assist_page() to return 'int' and propagate possible
errors from kvm_read_guest_cached().
Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In preparation to enabling L2 TLB flush, cache VP assist page in
'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry()
to nested_get_evmptr() and make it return eVMCS GPA directly.
No functional change intended.
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Handle L2 TLB flush requests by going through all vCPUs and checking
whether there are vCPUs running the same VM_ID with a VP_ID specified
in the requests. Perform synthetic exit to L2 upon finish.
Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
racy, we count on the fact that KVM flushes the whole L2 VPID upon
transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
transition between L1 and L2 to make sure all pending requests are
always processed.
For the reference, Hyper-V TLFS refers to the feature as "Direct
Virtual Flush".
Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
of on-stack 'sparse_banks'
To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.
Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2
requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings,
i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so
can remain an on-stack allocation.
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
To handle L2 TLB flush requests, KVM needs to use a separate fifo from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.
No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).
Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
instead of raw '64'
It may not be clear from where the '64' limit for the maximum sparse
bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead.
Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's
definition. Opportunistically adjust the comment around BUILD_BUG_ON().
No functional change.
Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
To handle L2 TLB flush requests, KVM needs to translate the specified
L2 GPA to L1 GPA to read hypercall arguments from there.
No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.
Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
flushing the whole VPID and this is sub-optimal. Switch to handling
these requests with 'flush_tlb_gva()' hooks instead. Use the newly
introduced TLB flush fifo to queue the requests.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.
Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
To allow flushing individual GVAs instead of always flushing the whole
VPID a per-vCPU structure to pass the requests is needed. Use standard
'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095
following GFNs in the lower 12 bits) and 'flush all'.
The size of the fifo is arbitrarily set to '16'.
Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and
kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the
queue before returning -EOPNOTSUPP (which triggers full TLB flush) so
the functional change is very small but the infrastructure is prepared
to handle individual GVA flush requests.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
request in kvm_vcpu_flush_tlb_guest().
The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest().
No functional change intended.
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Enlightened VMCS v1 got updated and now includes the required fields
for loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT features. For KVM on
Hyper-V enablement, KVM can just observe VMX control MSRs and use the
features (with or without eVMCS) when possible.
Hyper-V on KVM is messier as Windows 11 guests fail to boot if the
controls are advertised and a new PV feature flag, CPUID.0x4000000A.EBX
BIT(0), is not set. Honor the Hyper-V CPUID feature flag to play nice
with Windows guests.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM has to check guest visible HYPERV_CPUID_NESTED_FEATURES.EBX CPUID
leaf to know which Enlightened VMCS definition to use (original or 2022
update). Cache the leaf along with other Hyper-V CPUID feature leaves
to make the check quick.
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Return -ENOMEM back to userspace if allocating the Hyper-V vCPU struct
fails when enabling Hyper-V in guest CPUID. Silently ignoring failure
means that KVM will not have an up-to-date CPUID cache if allocating the
struct succeeds later on, e.g. when activating SynIC.
Rejecting the CPUID operation also guarantess that vcpu->arch.hyperv is
non-NULL if hyperv_enabled is true, which will allow for additional
cleanup, e.g. in the eVMCS code.
Note, the initialization needs to be done before CPUID is set, and more
subtly before kvm_check_cpuid(), which potentially enables dynamic
XFEATURES. Sadly, there's no easy way to avoid exposing Hyper-V details
to CPUID or vice versa. Expose kvm_hv_vcpu_init() and the Hyper-V CPUID
signature to CPUID instead of exposing cpuid_entry2_find() outside of
CPUID code. It's hard to envision kvm_hv_vcpu_init() being misused,
whereas cpuid_entry2_find() absolutely shouldn't be used outside of core
CPUID code.
Fixes: 10d7bf1e46dc ("KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When potentially allocating/initializing the Hyper-V vCPU struct, check
for an existing instance in kvm_hv_vcpu_init() instead of requiring
callers to perform the check. Relying on callers to do the check is
risky as it's all too easy for KVM to overwrite vcpu->arch.hyperv and
leak memory, and it adds additional burden on callers without much
benefit.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Wipe the whole 'hv_vcpu->cpuid_cache' with memset() instead of having to
zero each particular member when the corresponding CPUID entry was not
found.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM
queries for CPUID leaves whose index _may_ be significant, and drop the
index param from the existing kvm_find_cpuid_entry(). Add a WARN in the
inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID
entry whose index is significant without explicitly providing an index.
Using an explicit magic number and letting callers omit the index avoids
confusion by eliminating the myriad cases where KVM specifies '0' as a
dummy value.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Pull bitmap updates from Yury Norov:
- bitmap: optimize bitmap_weight() usage, from me
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab
- include/linux/find: Fix documentation, from Anna-Maria Behnsen
- bitmap: fix conversion from/to fix-sized arrays, from me
- bitmap: Fix return values to be unsigned, from Kees Cook
It has been in linux-next for at least a week with no problems.
* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
nodemask: Fix return values to be unsigned
bitmap: Fix return values to be unsigned
KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
KVM: x86: hyper-v: fix type of valid_bank_mask
ia64: cleanup remove_siblinginfo()
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
lib/bitmap: add test for bitmap_{from,to}_arr64
lib: add bitmap_{from,to}_arr64
lib/bitmap: extend comment for bitmap_(from,to)_arr32()
include/linux/find: Fix documentation
lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
MAINTAINERS: add cpumask and nodemask files to BITMAP_API
arch/x86: replace nodes_weight with nodes_empty where appropriate
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
irq: mips: replace cpumask_weight with cpumask_empty where appropriate
drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
...
|
|
kvm_hv_flush_tlb() applies bitmap API to a u64 variable valid_bank_mask.
Since valid_bank_mask has a fixed size, we can use hweight64() and avoid
excessive bloating.
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Jim Mattson <[email protected]>
CC: Joerg Roedel <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Sean Christopherson <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Vitaly Kuznetsov <[email protected]>
CC: Wanpeng Li <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Acked-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
|
|
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"
https://lore.kernel.org/lkml/[email protected]/
But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.
>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~
include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
10 | ((!!((w) & (1ULL << 0))) + \
| ^
include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
| ^~~~~~~~~~~~~~~~~
arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
1983 | if (hc->var_cnt != hweight64(valid_bank_mask))
| ^~~~~~~~~
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Jim Mattson <[email protected]>
CC: Joerg Roedel <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Sean Christopherson <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Vitaly Kuznetsov <[email protected]>
CC: Wanpeng Li <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"
https://lore.kernel.org/lkml/[email protected]/
But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.
>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~
include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
10 | ((!!((w) & (1ULL << 0))) + \
| ^
include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
| ^~~~~~~~~~~~~~~~~
arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
1983 | if (hc->var_cnt != hweight64(valid_bank_mask))
| ^~~~~~~~~
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Jim Mattson <[email protected]>
CC: Joerg Roedel <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Sean Christopherson <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Vitaly Kuznetsov <[email protected]>
CC: Wanpeng Li <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In some places kvm/hyperv.c code calls bitmap_weight() to check if any bit
of a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
|
|
The following WARN is triggered from kvm_vm_ioctl_set_clock():
WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm]
...
CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G W O 5.16.0.stable #20
Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021
RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm]
...
Call Trace:
<TASK>
? kvm_write_guest+0x114/0x120 [kvm]
kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm]
kvm_arch_vm_ioctl+0xa26/0xc50 [kvm]
? schedule+0x4e/0xc0
? __cond_resched+0x1a/0x50
? futex_wait+0x166/0x250
? __send_signal+0x1f1/0x3d0
kvm_vm_ioctl+0x747/0xda0 [kvm]
...
The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if
mark_page_dirty() is called without an active vCPU") but the change seems
to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's
no real need to actually write to guest memory to invalidate TSC page, this
can be done by the first vCPU which goes through kvm_guest_time_update().
Reported-by: Maxim Levitsky <[email protected]>
Reported-by: Naresh Kamboju <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
|
|
Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well. Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.
For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Setting non-zero values to SYNIC/STIMER MSRs activates certain features,
this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated.
Note, it would've been better to forbid writing anything to SYNIC/STIMER
MSRs, including zeroes, however, at least QEMU tries clearing
HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat
'special' as writing zero there triggers an action, this also should not
happen when SynIC wasn't activated.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however, possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.
The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The fixes for 5.17 conflict with cleanups made in the same area
earlier in the 5.18 development cycle.
|
|
It has been proven on practice that at least Windows Server 2019 tries
using HVCALL_SEND_IPI_EX in 'XMM fast' mode when it has more than 64 vCPUs
and it needs to send an IPI to a vCPU > 63. Similarly to other XMM Fast
hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}{,_EX}), this
information is missing in TLFS as of 6.0b. Currently, KVM returns an error
(HV_STATUS_INVALID_HYPERCALL_INPUT) and Windows crashes.
Note, HVCALL_SEND_IPI is a 'standard' fast hypercall (not 'XMM fast') as
all its parameters fit into RDX:R8 and this is handled by KVM correctly.
Cc: [email protected] # 5.14.x: 3244867af8c0: KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req
Cc: [email protected] # 5.14.x
Fixes: d8f5537a8816 ("KVM: hyper-v: Advertise support for fast XMM hypercalls")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
flush hypercalls
When TLB flush hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX are
issued in 'XMM fast' mode, the maximum number of allowed sparse_banks is
not 'HV_HYPERCALL_MAX_XMM_REGISTERS - 1' (5) but twice as many (10) as each
XMM register is 128 bit long and can hold two 64 bit long banks.
Cc: [email protected] # 5.14.x
Fixes: 5974565bc26d ("KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
'struct kvm_hv_hcall' has all the required information already,
there's no need to pass 'ex' additionally.
No functional change intended.
Cc: [email protected] # 5.14.x
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
'struct kvm_hv_hcall' has all the required information already,
there's no need to pass 'ex' additionally.
No functional change intended.
Cc: [email protected] # 5.14.x
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Similar to nVMX commit 502d2bf5f2fd ("KVM: nVMX: Implement Enlightened MSR
Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM).
Notable differences from nVMX implementation:
- As the feature uses SW reserved fields in VMCB control, KVM needs to
make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()).
- 'msrpm_base_pa' needs to be always be overwritten in
nested_svm_vmrun_msrpm(), even when the update is skipped. As an
optimization, nested_vmcb02_prepare_control() copies it from VMCB01
so when MSR-Bitmap feature for L2 is disabled nothing needs to be done.
- 'struct vmcb_ctrl_area_cached' needs to be extended with clean
fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to
copy it so nested_svm_vmrun_msrpm() can use it later.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In preparation for using kvm_hv_hypercall_enabled() from SVM code, make
it static inline to avoid the need to export it. The function is a
simple check with only two call sites currently.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|