aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-04KVM: x86: Take KVM's SRCU lock only if steal time update is neededSean Christopherson1-9/+11
Enter a SRCU critical section for a memslots lookup during steal time update if and only if a steal time update is actually needed. Taking the lock can be avoided if steal time is disabled by the guest, or if KVM knows it has already flagged the vCPU as being preempted. Reword the comment to be more precise as to exactly why memslots will be queried. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-04KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()Sean Christopherson1-10/+0
Remove the disabling of page faults across kvm_steal_time_set_preempted() as KVM now accesses the steal time struct (shared with the guest) via a cached mapping (see commit b043138246a4, "x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed".) The cache lookup is flagged as atomic, thus it would be a bug if KVM tried to resolve a new pfn, i.e. we want the splat that would be reached via might_fault(). Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-04KVM: do not assume PTE is writable after follow_pfnPaolo Bonzini1-3/+12
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens <[email protected]> Cc: [email protected] Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-04KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEsBen Gardon1-3/+3
There is a bug in the TDP MMU function to zap SPTEs which could be replaced with a larger mapping which prevents the function from doing anything. Fix this by correctly zapping the last level SPTEs. Cc: [email protected] Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-03KVM: arm64: Stub EXPORT_SYMBOL for nVHE EL2 codeQuentin Perret1-2/+2
In order to ensure the module loader does not get confused if a symbol is exported in EL2 nVHE code (as will be the case when we will compile e.g. lib/memset.S into the EL2 object), make sure to stub all exports using __DISABLE_EXPORTS in the nvhe folder. Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by: Quentin Perret <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-02-03asm-generic: export: Stub EXPORT_SYMBOL with __DISABLE_EXPORTSQuentin Perret1-1/+1
It is currently possible to stub EXPORT_SYMBOL() macros in C code using __DISABLE_EXPORTS, which is necessary to run in constrained environments such as the EFI stub or the decompressor. But this currently doesn't apply to exports from assembly, which can lead to somewhat confusing situations. Consolidate the __DISABLE_EXPORTS infrastructure by checking it from asm-generic/export.h as well. Signed-off-by: Quentin Perret <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-02-03KVM: arm64: Correct spelling of DBGDIDR registerAlexandru Elisei1-3/+3
The aarch32 debug ID register is called DBG*D*IDR (emphasis added), not DBGIDR, use the correct spelling. Signed-off-by: Alexandru Elisei <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-02-03KVM: arm64: Use symbolic names for the PMU versionsMarc Zyngier1-4/+4
Instead of using a bunch of magic numbers, use the existing definitions that have been added since 8673e02e58410 ("arm64: perf: Add support for ARMv8.5-PMU 64-bit counters") Reviewed-by: Alexandru Elisei <[email protected]> Reviewed-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Upgrade PMU support to ARMv8.4Marc Zyngier3-4/+16
Upgrading the PMU code from ARMv8.1 to ARMv8.4 turns out to be pretty easy. All that is required is support for PMMIR_EL1, which is read-only, and for which returning 0 is a valid option as long as we don't advertise STALL_SLOT as an implemented event. Let's just do that and adjust what we return to the guest. Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Limit the debug architecture to ARMv8.0Marc Zyngier1-0/+3
Let's not pretend we support anything but ARMv8.0 as far as the debug architecture is concerned. Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Refactor filtering of ID registersMarc Zyngier1-23/+28
Our current ID register filtering is starting to be a mess of if() statements, and isn't going to get any saner. Let's turn it into a switch(), which has a chance of being more readable, and introduce a FEATURE() macro that allows easy generation of feature masks. No functionnal change intended. Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Add handling of AArch32 PCMEID{2,3} PMUv3 registersMarc Zyngier1-3/+9
Despite advertising support for AArch32 PMUv3p1, we fail to handle the PMCEID{2,3} registers, which conveniently alias with the top bits of PMCEID{0,1}_EL1. Implement these registers with the usual AA32(HI/LO) aliasing mechanism. Reviewed-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Fix AArch32 PMUv3 cappingMarc Zyngier1-2/+2
We shouldn't expose *any* PMU capability when no PMU has been configured for this VM. Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: arm64: Fix missing RES1 in emulation of DBGBIDRMarc Zyngier1-1/+1
The AArch32 CP14 DBGDIDR has bit 15 set to RES1, which our current emulation doesn't set. Just add the missing bit. Reported-by: Peter Maydell <[email protected]> Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Alexandru Elisei <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-02-03KVM: x86: cleanup CR3 reserved bits checksPaolo Bonzini3-13/+5
If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: [email protected] Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-03KVM: SVM: Treat SVM as unsupported when running as an SEV guestSean Christopherson2-0/+6
Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-02KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit modeSean Christopherson1-0/+2
Set the emulator context to PROT64 if SYSENTER transitions from 32-bit userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at the end of x86_emulate_insn() will incorrectly truncate the new RIP. Note, this bug is mostly limited to running an Intel virtual CPU model on an AMD physical CPU, as other combinations of virtual and physical CPUs do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility mode is legal, and unconditionally transitions to 64-bit mode. On AMD CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel, SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring guest TLB shenanigans). Fixes: fede8076aab4 ("KVM: x86: handle wrap around 32-bit address space") Cc: [email protected] Signed-off-by: Jonny Barker <[email protected]> [sean: wrote changelog] Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-01KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID checkVitaly Kuznetsov1-0/+2
Commit 7a873e455567 ("KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even when PCID support is missing: ==== Test Assertion Failure ==== x86_64/set_sregs_test.c:41: rc pid=6956 tid=6956 - Invalid argument 1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41 2 0x00000000004014fc: main at set_sregs_test.c:119 3 0x00007f2d9346d041: ?? ??:0 4 0x000000000040164d: _start at ??:? KVM allowed unsupported CR4 bit (0x20000) Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make kvm_is_valid_cr4() fail. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-01KVM/x86: assign hva with the right value to vm_munmap the pagesZheng Zhan Liang1-1/+1
Cc: Paolo Bonzini <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Zheng Zhan Liang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-01KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=offPaolo Bonzini2-13/+30
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: [email protected] Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Paolo Bonzini <[email protected]>
2021-02-01KVM: arm64: Make gen-hyprel endianness agnosticMarc Zyngier2-16/+42
gen-hyprel is, for better or worse, a native-endian program: it assumes that the ELF data structures are in the host's endianness, and even assumes that the compiled kernel is little-endian in one particular case. None of these assumptions hold true though: people actually build (use?) BE arm64 kernels, and seem to avoid doing so on BE hosts. Madness! In order to solve this, wrap each access to the ELF data structures with the required byte-swapping magic. This requires to obtain the kernel data structure, and provide per-endianess wrappers. This result in a kernel that links and even boots in a model. Fixes: 8c49b5d43d4c ("KVM: arm64: Generate hyp relocation data") Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Acked-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2021-01-28Fix unsynchronized access to sev members through svm_register_enc_regionPeter Gonda1-7/+10
Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <[email protected]> V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/[email protected]/ Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-28KVM: Documentation: Fix documentation for nested.Yu Zhang2-3/+5
Nested VMX was enabled by default in commit 1e58e5e59148 ("KVM: VMX: enable nested virtualization by default"), which was merged in Linux 4.20. This patch is to fix the documentation accordingly. Signed-off-by: Yu Zhang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-28Merge tag 'kvmarm-fixes-5.11-3' of ↵Paolo Bonzini1-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #3 - Avoid clobbering extra registers on initialisation
2021-01-28KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctlMichael Roth1-1/+1
Recent commit 255cbecfe0 modified struct kvm_vcpu_arch to make 'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries rather than embedding the array in the struct. KVM_SET_CPUID and KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed. As a result, KVM_GET_CPUID2 currently returns random fields from struct kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix this by treating 'cpuid_entries' as a pointer when copying its contents to userspace buffer. Fixes: 255cbecfe0c9 ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically") Cc: Vitaly Kuznetsov <[email protected]> Signed-off-by: Michael Roth <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-27Merge branch 'arm64/for-next/misc' into kvm-arm64/hyp-relocMarc Zyngier0-0/+0
Signed-off-by: Marc Zyngier <[email protected]>
2021-01-25KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMXPaolo Bonzini3-9/+29
VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <[email protected]> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <[email protected]> # 5.10.x Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"Sean Christopherson1-26/+25
Revert the dirty/available tracking of GPRs now that KVM copies the GPRs to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available GPRs"), tracking for GPRs noticeably impacts KVM's code footprint. This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff. Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guestSean Christopherson1-9/+6
Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <[email protected]> Cc: Tom Lendacky <[email protected]> Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migrationMaxim Levitsky1-5/+8
Even when we are outside the nested guest, some vmcs02 fields may not be in sync vs vmcs12. This is intentional, even across nested VM-exit, because the sync can be delayed until the nested hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those rarely accessed fields. However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare() before the vmcs12 contents are copied to userspace. Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed") Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25kvm: tracing: Fix unmatched kvm_entry and kvm_exit eventsLorenzo Brescia3-2/+5
On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <[email protected]> Signed-off-by: Dario Faggioli <[email protected]> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOGZenghui Yu1-9/+7
Update various words, including the wrong parameter name and the vague description of the usage of "slot" field. Signed-off-by: Zenghui Yu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: x86: get smi pending status correctlyJay Zhou1-0/+4
The injection process of smi has two steps: Qemu KVM Step1: cpu->interrupt_request &= \ ~CPU_INTERRUPT_SMI; kvm_vcpu_ioctl(cpu, KVM_SMI) call kvm_vcpu_ioctl_smi() and kvm_make_request(KVM_REQ_SMI, vcpu); Step2: kvm_vcpu_ioctl(cpu, KVM_RUN, 0) call process_smi() if kvm_check_request(KVM_REQ_SMI, vcpu) is true, mark vcpu->arch.smi_pending = true; The vcpu->arch.smi_pending will be set true in step2, unfortunately if vcpu paused between step1 and step2, the kvm_run->immediate_exit will be set and vcpu has to exit to Qemu immediately during step2 before mark vcpu->arch.smi_pending true. During VM migration, Qemu will get the smi pending status from KVM using KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status will be lost. Signed-off-by: Jay Zhou <[email protected]> Signed-off-by: Shengen Zhuang <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]Like Xu1-1/+1
The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as 0x0300 in the intel_perfmon_event_map[]. Correct its usage. Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2") Signed-off-by: Like Xu <[email protected]> Message-Id: <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()Like Xu1-0/+4
Since we know vPMU will not work properly when (1) the guest bit_width(s) of the [gp|fixed] counters are greater than the host ones, or (2) guest requested architectural events exceeds the range supported by the host, so we can setup a smaller left shift value and refresh the guest cpuid entry, thus fixing the following UBSAN shift-out-of-bounds warning: shift exponent 197 is too large for 64-bit type 'long long unsigned int' Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: [email protected] Signed-off-by: Like Xu <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: x86: Add more protection against undefined behavior in rsvd_bits()Sean Christopherson1-1/+8
Add compile-time asserts in rsvd_bits() to guard against KVM passing in garbage hardcoded values, and cap the upper bound at '63' for dynamic values to prevent generating a mask that would overflow a u64. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VMQuentin Perret1-1/+1
The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM ioctl. Fixes: e5d83c74a580 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic") Signed-off-by: Quentin Perret <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-01-25Merge tag 'kvmarm-fixes-5.11-2' of ↵Paolo Bonzini6-49/+74
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #2 - Don't allow tagged pointers to point to memslots - Filter out ARMv8.1+ PMU events on v8.0 hardware - Hide PMU registers from userspace when no PMU is configured - More PMU cleanups - Don't try to handle broken PSCI firmware - More sys_reg() to reg_to_encoding() conversions
2021-01-25KVM: arm64: Implement the TRNG hypervisor callArd Biesheuvel4-1/+94
Provide a hypervisor implementation of the ARM architected TRNG firmware interface described in ARM spec DEN0098. All function IDs are implemented, including both 32-bit and 64-bit versions of the TRNG_RND service, which is the centerpiece of the API. The API is backed by the kernel's entropy pool only, to avoid guests draining more precious direct entropy sources. Signed-off-by: Ard Biesheuvel <[email protected]> [Andre: minor fixes, drop arch_get_random() usage] Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-25KVM: arm64: Mark the page dirty only if the fault is handled successfullyYanan Wang1-5/+8
We now set the pfn dirty and mark the page dirty before calling fault handlers in user_mem_abort(), so we might end up having spurious dirty pages if update of permissions or mapping has failed. Let's move these two operations after the fault handlers, and they will be done only if the fault has been handled successfully. When an -EAGAIN errno is returned from the map handler, we hope to the vcpu to enter guest directly instead of exiting back to userspace, so adjust the return value at the end of function. Signed-off-by: Yanan Wang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-25KVM: arm64: Filter out the case of only changing permissions from stage-2 ↵Yanan Wang2-11/+26
map path (1) During running time of a a VM with numbers of vCPUs, if some vCPUs access the same GPA almost at the same time and the stage-2 mapping of the GPA has not been built yet, as a result they will all cause translation faults. The first vCPU builds the mapping, and the followed ones end up updating the valid leaf PTE. Note that these vCPUs might want different access permissions (RO, RW, RX, RWX, etc.). (2) It's inevitable that we sometimes will update an existing valid leaf PTE in the map path, and we perform break-before-make in this case. Then more unnecessary translation faults could be caused if the *break stage* of BBM is just catched by other vCPUS. With (1) and (2), something unsatisfactory could happen: vCPU A causes a translation fault and builds the mapping with RW permissions, vCPU B then update the valid leaf PTE with break-before-make and permissions are updated back to RO. Besides, *break stage* of BBM may trigger more translation faults. Finally, some useless small loops could occur. We can make some optimization to solve above problems: When we need to update a valid leaf PTE in the map path, let's filter out the case where this update only change access permissions, and don't update the valid leaf PTE here in this case. Instead, let the vCPU enter back the guest and it will exit next time to go through the relax_perms path without break-before-make if it still wants more permissions. Signed-off-by: Yanan Wang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-25KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 mapYanan Wang1-27/+28
Procedures of hyp stage-1 map and guest stage-2 map are quite different, but they are tied closely by function kvm_set_valid_leaf_pte(). So adjust the relative code for ease of code maintenance in the future. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Yanan Wang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-25KVM: arm64: Simplify __kvm_hyp_init HVC detectionAndrew Scull1-11/+4
The arguments for __do_hyp_init are now passed with a pointer to a struct which means there are scratch registers available for use. Thanks to this, we no longer need to use clever, but hard to read, tricks that avoid the need for scratch registers when checking for the __kvm_hyp_init HVC. Tested-by: David Brazdil <[email protected]> Signed-off-by: Andrew Scull <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-25KVM: arm64: Don't clobber x4 in __do_hyp_initAndrew Scull1-9/+11
arm_smccc_1_1_hvc() only adds write contraints for x0-3 in the inline assembly for the HVC instruction so make sure those are the only registers that change when __do_hyp_init is called. Tested-by: David Brazdil <[email protected]> Signed-off-by: Andrew Scull <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Remove hyp_symbol_addrDavid Brazdil5-43/+17
Hyp code used the hyp_symbol_addr helper to force PC-relative addressing because absolute addressing results in kernel VAs due to the way hyp code is linked. This is not true anymore, so remove the helper and update all of its users. Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Remove patching of fn pointers in hypDavid Brazdil4-32/+4
Storing a function pointer in hyp now generates relocation information used at early boot to convert the address to hyp VA. The existing alternative-based conversion mechanism is therefore obsolete. Remove it and simplify its users. Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Fix constant-pool users in hypDavid Brazdil3-42/+31
Hyp code uses absolute addressing to obtain a kimg VA of a small number of kernel symbols. Since the kernel now converts constant pool addresses to hyp VAs, this trick does not work anymore. Change the helpers to convert from hyp VA back to kimg VA or PA, as needed and rework the callers accordingly. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Apply hyp relocations at runtimeDavid Brazdil4-1/+33
KVM nVHE code runs under a different VA mapping than the kernel, hence so far it avoided using absolute addressing because the VA in a constant pool is relocated by the linker to a kernel VA (see hyp_symbol_addr). Now the kernel has access to a list of positions that contain a kimg VA but will be accessed only in hyp execution context. These are generated by the gen-hyprel build-time tool and stored in .hyp.reloc. Add early boot pass over the entries and convert the kimg VAs to hyp VAs. Note that this requires for .hyp* ELF sections to be mapped read-write at that point. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Generate hyp relocation dataDavid Brazdil4-3/+451
Add a post-processing step to compilation of KVM nVHE hyp code which calls a custom host tool (gen-hyprel) on the partially linked object file (hyp sections' names prefixed). The tool lists all R_AARCH64_ABS64 data relocations targeting hyp sections and generates an assembly file that will form a new section .hyp.reloc in the kernel binary. The new section contains an array of 32-bit offsets to the positions targeted by these relocations. Since these addresses of those positions will not be determined until linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32 relocation with addend <section_base_sym> + <r_offset>. The linker of `vmlinux` will therefore fill the slot accordingly. This relocation data will be used at runtime to convert the kernel VAs at those positions to hyp VAs. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-01-23KVM: arm64: Add symbol at the beginning of each hyp sectionDavid Brazdil2-4/+29
Generating hyp relocations will require referencing positions at a given offset from the beginning of hyp sections. Since the final layout will not be determined until the linking of `vmlinux`, modify the hyp linker script to insert a symbol at the first byte of each hyp section to use as an anchor. The linker of `vmlinux` will place the symbols together with the sections. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]