aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-23kvm: x86/mmu: Support changed pte notifier in tdp MMUBen Gardon3-1/+67
In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. Add a hook and handle the change_pte MMU notifier. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-23kvm: x86/mmu: Add access tracking for tdp_mmuBen Gardon3-7/+128
In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. The main Linux MM uses the access tracking MMU notifiers for swap and other features. Add hooks to handle the test/flush HVA (range) family of MMU notifiers. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-23kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMUBen Gardon3-6/+86
In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. Add hooks to handle the invalidate range family of MMU notifiers. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-23kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMUBen Gardon2-3/+14
Attach struct kvm_mmu_pages to every page in the TDP MMU to track metadata, facilitate NX reclaim, and enable inproved parallelism of MMU operations in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-23kvm: x86/mmu: Add TDP MMU PF handlerBen Gardon5-37/+194
Add functions to handle page faults in the TDP MMU. These page faults are currently handled in much the same way as the x86 shadow paging based MMU, however the ordering of some operations is slightly different. Future patches will add eager NX splitting, a fast page fault handler, and parallel page faults. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator argBen Gardon2-7/+9
In order to avoid creating executable hugepages in the TDP MMU PF handler, remove the dependency between disallowed_hugepage_adjust and the shadow_walk_iterator. This will open the function up to being used by the TDP MMU PF handler in a future patch. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Support zapping SPTEs in the TDP MMUBen Gardon5-0/+136
Add functions to zap SPTEs to the TDP MMU. These are needed to tear down TDP MMU roots properly and implement other MMU functions which require tearing down mappings. Future patches will add functions to populate the page tables, but as for this patch there will not be any work for these functions to do. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: Cache as_id in kvm_memory_slotPeter Xu2-0/+7
Cache the address space ID just like the slot ID. It will be used in order to fill in the dirty ring entries. Suggested-by: Paolo Bonzini <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Add functions to handle changed TDP SPTEsBen Gardon3-1/+115
The existing bookkeeping done by KVM when a PTE is changed is spread around several functions. This makes it difficult to remember all the stats, bitmaps, and other subsystems that need to be updated whenever a PTE is modified. When a non-leaf PTE is marked non-present or becomes a leaf PTE, page table memory must also be freed. To simplify the MMU and facilitate the use of atomic operations on SPTEs in future patches, create functions to handle some of the bookkeeping required as a result of a change. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Allocate and free TDP MMU rootsBen Gardon5-6/+146
The TDP MMU must be able to allocate paging structure root pages and track the usage of those pages. Implement a similar, but separate system for root page allocation to that of the x86 shadow paging implementation. When future patches add synchronization model changes to allow for parallel page faults, these pages will need to be handled differently from the x86 shadow paging based MMU's root pages. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Init / Uninit the TDP MMUBen Gardon5-1/+55
The TDP MMU offers an alternative mode of operation to the x86 shadow paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU will require new fields that need to be initialized and torn down. Add hooks into the existing KVM MMU initialization process to do that initialization / cleanup. Currently the initialization and cleanup fucntions do not do very much, however more operations will be added in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Introduce tdp_iterBen Gardon3-1/+234
The TDP iterator implements a pre-order traversal of a TDP paging structure. This iterator will be used in future patches to create an efficient implementation of the KVM MMU for the TDP case. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: mmu: extract spte.h and spte.cPaolo Bonzini5-548/+607
The SPTE format will be common to both the shadow and the TDP MMU. Extract code that implements the format to a separate module, as a first step towards adding the TDP MMU and putting mmu.c on a diet. Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: mmu: Separate updating a PTE from kvm_set_pte_rmappPaolo Bonzini1-7/+17
The TDP MMU's own function for the changed-PTE notifier will need to be update a PTE in the exact same way as the shadow MMU. Rather than re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp. Extracted out of a patch by Ben Gardon. <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86/mmu: Separate making SPTEs from set_spteBen Gardon1-16/+33
Separate the functions for generating leaf page table entries from the function that inserts them into the paging structure. This refactoring will facilitate changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. No functional change expected. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This commit introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Reviewed-by: Peter Shier <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: mmu: Separate making non-leaf sptes from link_shadow_pageBen Gardon1-6/+15
The TDP MMU page fault handler will need to be able to create non-leaf SPTEs to build up the paging structures. Rather than re-implementing the function, factor the SPTE creation out of link_shadow_page. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21Merge branch 'kvm-fixes' into 'next'Paolo Bonzini6-23/+47
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21KVM: PPC: Book3S HV: Make struct kernel_param_ops definition constJoe Perches1-1/+1
This should be const, so make it so. Signed-off-by: Joe Perches <[email protected]> Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Let the guest own CR4.FSGSBASELai Jiangshan1-1/+1
Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the guest own it on VMX. KVM never queries the guest's CR4.FSGSBASE value, thus there is no reason to force VM-Exit on FSGSBASE being toggled. Note, because FSGSBASE is conditionally available, this is dependent on recent changes to intercept reserved CR4 bits and to update the CR4 guest/host mask in response to guest CPUID changes. Cc: Paolo Bonzini <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> [sean: added justification in changelog] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: VMX: Intercept guest reserved CR4 bits to inject #GP faultSean Christopherson1-5/+10
Intercept CR4 bits that are guest reserved so that KVM correctly injects a #GP fault if the guest attempts to set a reserved bit. If a feature is supported by the CPU but is not exposed to the guest, and its associated CR4 bit is not intercepted by KVM by default, then KVM will fail to inject a #GP if the guest sets the CR4 bit without triggering an exit, e.g. by toggling only the bit in question. Note, KVM doesn't give the guest direct access to any CR4 bits that are also dependent on guest CPUID. Yet. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Move call to update_exception_bitmap() into VMX codeSean Christopherson2-1/+3
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called back-to-back, subsume the exception bitmap update into the common CPUID update. Drop the SVM invocation entirely as SVM's exception bitmap doesn't vary with respect to guest CPUID. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Invoke vendor's vcpu_after_set_cpuid() after all common updatesSean Christopherson1-2/+3
Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react to changes made by the common code. In the near future, this will be used by VMX to update its CR4 guest/host masks to account for reserved bits. In the long term, SGX support will update the allowed XCR0 mask for enclaves based on the vCPU's allowed XCR0. vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by commit 2acf923e38fb ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee kvm_x86_ops->cpuid_update()). There is no indication that the placement of the common code updates after the vendor updates was anything more than a "new function at the end" decision. Inspection of the current code reveals no dependency on kvm_x86_ops' vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its helpers. The bulk of the common code depends only on the guest's CPUID configuration, kvm_mmu_reset_context() does not consume dynamic vendor state, and there are no collisions between kvm_pmu_refresh() and VMX's update of PT state. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: Intercept LA57 to inject #GP fault when it's reservedLai Jiangshan1-1/+1
Unconditionally intercept changes to CR4.LA57 so that KVM correctly injects a #GP fault if the guest attempts to set CR4.LA57 when it's supported in hardware but not exposed to the guest. Long term, KVM needs to properly handle CR4 bits that can be under guest control but also may be reserved from the guest's perspective. But, KVM currently sets the CR4 guest/host mask only during vCPU creation, and reworking flows to change that will take a bit of elbow grease. Even if/when generic support for intercepting reserved bits exists, it's probably not worth letting the guest set CR4.LA57 directly. LA57 can't be toggled while long mode is enabled, thus it's all but guaranteed to be set once (maybe twice, e.g. by BIOS and kernel) during boot and never touched again. On the flip side, letting the guest own CR4.LA57 may incur extra VMREADs. In other words, this temporary "hack" is probably also the right long term fix. Fixes: fd8cb433734e ("KVM: MMU: Expose the LA57 feature to VM.") Cc: [email protected] Cc: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> [sean: rewrote changelog] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: SVM: Initialize prev_ga_tag before useSuravee Suthikulpanit1-0/+1
The function amd_ir_set_vcpu_affinity makes use of the parameter struct amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct amd_iommu_pi_data from a list when not running in AVIC mode. However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero uninitialized value can cause unintended code path, which ends up making use of the struct vcpu_svm.ir_list and ir_list_lock without being initialized (since they are intended only for the AVIC case). This triggers NULL pointer dereference bug in the function vm_ir_list_del with the following call trace: svm_update_pi_irte+0x3c2/0x550 [kvm_amd] ? proc_create_single_data+0x41/0x50 kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm] __connect+0x5f/0xb0 [irqbypass] irq_bypass_register_producer+0xf8/0x120 [irqbypass] vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci] vfio_msi_set_block+0x77/0xe0 [vfio_pci] vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci] vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci] vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci] ? alloc_file_pseudo+0xa5/0x100 vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] ? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] __x64_sys_ioctl+0x96/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, initialize prev_ga_tag to zero before use. This should be safe because ga_tag value 0 is invalid (see function avic_vm_init). Fixes: dfa20099e26e ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()") Signed-off-by: Suravee Suthikulpanit <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: nSVM: implement on demand allocation of the nested stateMaxim Levitsky3-28/+83
This way we don't waste memory on VMs which don't use nesting virtualization even when the host enabled it for them. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: allow kvm_x86_ops.set_efer to return an error valueMaxim Levitsky6-7/+15
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: report negative values from wrmsr emulation to userspaceMaxim Levitsky2-5/+8
This will allow the KVM to report such errors (e.g -ENOMEM) to the userspace. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: xen_hvm_config: cleanup return valuesMaxim Levitsky1-14/+9
Return 1 on errors that are caused by wrong guest behavior (which will inject #GP to the guest) And return a negative error value on issues that are the kernel's fault (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm x86/mmu: Make struct kernel_param_ops definitions constJoe Perches1-2/+2
These should be const, so make it so. Signed-off-by: Joe Perches <[email protected]> Message-Id: <ed95eef4f10fc1317b66936c05bc7dd8f943a6d5.1601770305.git.joe@perches.com> Reviewed-by: Ben Gardon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: bump KVM_MAX_CPUID_ENTRIESVitaly Kuznetsov1-1/+1
As vcpu->arch.cpuid_entries is now allocated dynamically, the only remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/ KVM_SET_CPUID2 input for sanity. Since it was reported that the current limit (80) is insufficient for some CPUs, bump KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new limit. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: x86: allocate vcpu->arch.cpuid_entries dynamicallyVitaly Kuznetsov3-39/+53
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80) is reported to be insufficient but before we bump it let's switch to allocating vcpu->arch.cpuid_entries[] array dynamically. Currently, 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch' but having it pre-allocated (for all vCPUs which we also pre-allocate) gives us no real benefits. Another plus of the dynamic allocation is that we now do kvm_check_cpuid() check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so no changes are made in case the check fails. Opportunistically remove unneeded 'out' labels from kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return directly whenever possible. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]>
2020-10-21KVM: x86: disconnect kvm_check_cpuid() from vcpu->arch.cpuid_entriesVitaly Kuznetsov1-15/+23
As a preparatory step to allocating vcpu->arch.cpuid_entries dynamically make kvm_check_cpuid() check work with an arbitrary 'struct kvm_cpuid_entry2' array. Currently, when kvm_check_cpuid() fails we reset vcpu->arch.cpuid_nent to 0 and this is kind of weird, i.e. one would expect CPUIDs to remain unchanged when KVM_SET_CPUID[2] call fails. No functional change intended. It would've been possible to move the updated kvm_check_cpuid() in kvm_vcpu_ioctl_set_cpuid2() and check the supplied input before we start updating vcpu->arch.cpuid_entries/nent but we can't do the same in kvm_vcpu_ioctl_set_cpuid() as we'll have to copy 'struct kvm_cpuid_entry' entries first. The change will be made when vcpu->arch.cpuid_entries[] array becomes allocated dynamically. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21Documentation: kvm: fix some typos in cpuid.rstOliver Upton1-44/+44
Reviewed-by: Jim Mattson <[email protected]> Reviewed-by: Peter Shier <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Change-Id: I0c6355b09fedf8f9cc4cc5f51be418e2c1c82b7b Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86: only provide PV features if enabled in guest's CPUIDOliver Upton6-5/+106
KVM unconditionally provides PV features to the guest, regardless of the configured CPUID. An unwitting guest that doesn't check KVM_CPUID_FEATURES before use could access paravirt features that userspace did not intend to provide. Fix this by checking the guest's CPUID before performing any paravirtual operations. Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the aforementioned enforcement. Migrating a VM from a host w/o this patch to a host with this patch could silently change the ABI exposed to the guest, warranting that we default to the old behavior and opt-in for the new one. Reviewed-by: Jim Mattson <[email protected]> Reviewed-by: Peter Shier <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98 Message-Id: <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86: set wall_clock in kvm_write_wall_clock()Oliver Upton1-1/+2
Small change to avoid meaningless duplication in the subsequent patch. No functional change intended. Reviewed-by: Jim Mattson <[email protected]> Reviewed-by: Peter Shier <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Change-Id: I77ab9cdad239790766b7a49d5cbae5e57a3005ea Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fnOliver Upton1-26/+32
No functional change intended. Reviewed-by: Jim Mattson <[email protected]> Reviewed-by: Peter Shier <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Change-Id: I7cbe71069db98d1ded612fd2ef088b70e7618426 Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21x86/kvm: Update the comment about asynchronous page fault in exc_page_fault()Vitaly Kuznetsov1-5/+8
KVM was switched to interrupt-based mechanism for 'page ready' event delivery in Linux-5.8 (see commit 2635b5c4a0e4 ("KVM: x86: interrupt based APF 'page ready' event delivery")) and #PF (ab)use for 'page ready' event delivery was removed. Linux guest switched to this new mechanism exclusively in 5.9 (see commit b1d405751cd5 ("KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery")) so it is not possible to get #PF for a 'page ready' event even when the guest is running on top of an older KVM (APF mechanism won't be enabled). Update the comment in exc_page_fault() to reflect the new reality. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21x86/kvm: hide KVM options from menuconfig when KVM is not compiledMatteo Croce1-0/+1
Let KVM_WERROR depend on KVM, so it doesn't show in menuconfig alone. Signed-off-by: Matteo Croce <[email protected]> Message-Id: <[email protected]> Fixes: 4f337faf1c55e ("KVM: allow disabling -Werror") Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21Documentation: kvm: fix a typoLi Qiang1-1/+1
Fixes: e287d6de62f74 ("Documentation: kvm: Convert cpuid.txt to .rst") Signed-off-by: Li Qiang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: VMX: Forbid userspace MSR filters for x2APICPaolo Bonzini2-9/+18
Allowing userspace to intercept reads to x2APIC MSRs when APICV is fully enabled for the guest simply can't work. But more in general, the LAPIC could be set to in-kernel after the MSR filter is setup and allowing accesses by userspace would be very confusing. We could in principle allow userspace to intercept reads and writes to TPR, and writes to EOI and SELF_IPI, but while that could be made it work, it would still be silly. Cc: Alexander Graf <[email protected]> Cc: Aaron Lewis <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-21KVM: VMX: Ignore userspace MSR filters for x2APICSean Christopherson3-29/+47
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace filtering. Allowing userspace to intercept reads to x2APIC MSRs when APICV is fully enabled for the guest simply can't work; the LAPIC and thus virtual APIC is in-kernel and cannot be directly accessed by userspace. To keep things simple we will in fact forbid intercepting x2APIC MSRs altogether, independent of the default_allow setting. Cc: Alexander Graf <[email protected]> Cc: Aaron Lewis <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> [Modified to operate even if APICv is disabled, adjust documentation. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-20Merge tag 'kvmarm-5.10' of ↵Paolo Bonzini64-3468/+3505
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.10 - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation
2020-10-19KVM: VMX: Fix x2APIC MSR intercept handling on !APICV platformsPeter Xu1-2/+3
Fix an inverted flag for intercepting x2APIC MSRs and intercept writes by default, even when APICV is enabled. Fixes: 3eb900173c71 ("KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied") Co-developed-by: Peter Xu <[email protected]> [sean: added changelog] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-03Merge tag 'kvmarm-fixes-5.9-3' of ↵Paolo Bonzini1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.9, take #3 - Fix synchronization of VTTBR update on TLB invalidation for nVHE systems
2020-10-03KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF interceptPaolo Bonzini1-10/+12
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of the #PF intercept bit in the exception bitmap when they do not match. This means that, if PFEC_MASK and/or PFEC_MATCH are set, the hypervisor can get a vmexit for #PF exceptions even when the corresponding bit is clear in the exception bitmap. This is unexpected and is promptly detected by a WARN_ON_ONCE. To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr). Reported-by: Qian Cai <[email protected]>> Reported-by: Naresh Kamboju <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-10-02Merge branches 'kvm-arm64/pt-new' and 'kvm-arm64/pmu-5.9' into ↵Marc Zyngier3-21/+30
kvmarm-master/next Signed-off-by: Marc Zyngier <[email protected]>
2020-10-02KVM: arm64: Ensure user_mem_abort() return value is initialisedWill Deacon1-1/+1
If a change in the MMU notifier sequence number forces user_mem_abort() to return early when attempting to handle a stage-2 fault, we return uninitialised stack to kvm_handle_guest_abort(), which could potentially result in the injection of an external abort into the guest or a spurious return to userspace. Neither or these are what we want to do. Initialise 'ret' to 0 in user_mem_abort() so that bailing due to a change in the MMU notrifier sequence number is treated as though the fault was handled. Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Alexandru Elisei <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-02KVM: arm64: Pass level hint to TLBI during stage-2 permission faultWill Deacon1-7/+16
Alex pointed out that we don't pass a level hint to the TLBI instruction when handling a stage-2 permission fault, even though the walker does at some point have the level information in its hands. Rework stage2_update_leaf_attrs() so that it can optionally return the level of the updated pte to its caller, which can in turn be used to provide the correct TLBI level hint. Reported-by: Alexandru Elisei <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Cc: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2020-10-02KVM: arm64: Fix some documentation build warningsMauro Carvalho Chehab1-13/+13
As warned with make htmldocs: .../Documentation/virt/kvm/devices/vcpu.rst:70: WARNING: Malformed table. Text in column margin in table line 2. ======= ====================================================== -ENODEV: PMUv3 not supported or GIC not initialized -ENXIO: PMUv3 not properly configured or in-kernel irqchip not configured as required prior to calling this attribute -EBUSY: PMUv3 already initialized -EINVAL: Invalid filter range ======= ====================================================== The ':' character for two lines are above the size of the column. Besides that, other tables at the file doesn't use ':', so just drop them. While here, also fix this warning also introduced at the same patch: .../Documentation/virt/kvm/devices/vcpu.rst:88: WARNING: Block quote ends without a blank line; unexpected unindent. By marking the C code as a literal block. Fixes: 8be86a5eec04 ("KVM: arm64: Document PMU filtering API") Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/b5385dd0213f1f070667925bf7a807bf5270ba78.1601616399.git.mchehab+huawei@kernel.org
2020-10-01KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guestMarc Zyngier1-0/+7
Commit a0e50aa3f4a8 ("KVM: arm64: Factor out stage 2 page table data from struct kvm") dropped the ISB after __load_guest_stage2(), only leaving the one that is required when the speculative AT workaround is in effect. As Andrew points it: "This alternative is 'backwards' to avoid a double ISB as there is one in __load_guest_stage2 when the workaround is active." Restore the missing ISB, conditionned on the AT workaround not being active. Fixes: a0e50aa3f4a8 ("KVM: arm64: Factor out stage 2 page table data from struct kvm") Reported-by: Andrew Scull <[email protected]> Reported-by: Thomas Tai <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>