aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2018-10-17kvm: vmx: Defer setting of DR6 until #DB deliveryJim Mattson2-31/+62
When exception payloads are enabled by userspace (which is not yet possible) and a #DB is raised in L2, defer the setting of DR6 until later. Under VMX, this allows the L1 hypervisor to intercept the fault before DR6 is modified. Under SVM, DR6 is modified before L1 can intercept the fault (as has always been the case with DR7). Note that the payload associated with a #DB exception includes only the "new DR6 bits." When the payload is delievered, DR6.B0-B3 will be cleared and DR6.RTM will be set prior to merging in the new DR6 bits. Also note that bit 16 in the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. Though the reverse of DR6.RTM, this makes the #DB payload field compatible with both the pending debug exceptions field under VMX and the exit qualification for #DB exceptions under VMX. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Defer setting of CR2 until #PF deliveryJim Mattson4-21/+62
When exception payloads are enabled by userspace (which is not yet possible) and a #PF is raised in L2, defer the setting of CR2 until the #PF is delivered. This allows the L1 hypervisor to intercept the fault before CR2 is modified. For backwards compatibility, when exception payloads are not enabled by userspace, kvm_multiple_exception modifies CR2 when the #PF exception is raised. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Add payload operands to kvm_multiple_exceptionJim Mattson1-7/+15
kvm_multiple_exception now takes two additional operands: has_payload and payload, so that updates to CR2 (and DR6 under VMX) can be delayed until the exception is delivered. This is necessary to properly emulate VMX or SVM hardware behavior for nested virtualization. The new behavior is triggered by vcpu->kvm->arch.exception_payload_enabled, which will (later) be set by a new per-VM capability, KVM_CAP_EXCEPTION_PAYLOAD. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm: x86: Add exception payload fields to kvm_vcpu_eventsJim Mattson3-18/+51
The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a later commit) adds the following fields to struct kvm_vcpu_events: exception_has_payload, exception_payload, and exception.pending. With this capability set, all of the details of vcpu->arch.exception, including the payload for a pending exception, are reported to userspace in response to KVM_GET_VCPU_EVENTS. With this capability clear, the original ABI is preserved, and the exception.injected field is set for either pending or injected exceptions. When userspace calls KVM_SET_VCPU_EVENTS with KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer translated to exception.pending. KVM_SET_VCPU_EVENTS can now only establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17parisc: Add alternative coding infrastructureHelge Deller12-43/+223
This patch adds the necessary code to patch a running kernel at runtime to improve performance. The current implementation offers a few optimizations variants: - When running a SMP kernel on a single UP processor, unwanted assembler statements like locking functions are overwritten with NOPs. When multiple instructions shall be skipped, one branch instruction is used instead of multiple nop instructions. - In the UP case, some pdtlb and pitlb instructions are patched to become pdtlb,l and pitlb,l which only flushes the CPU-local tlb entries instead of broadcasting the flush to other CPUs in the system and thus may improve performance. - fic and fdc instructions are skipped if no I- or D-caches are installed. This should speed up qemu emulation and cacheless systems. - If no cache coherence is needed for IO operations, the relevant fdc and sync instructions in the sba and ccio drivers are replaced by nops. - On systems which share I- and D-TLBs and thus don't have a seperate instruction TLB, the pitlb instruction is replaced by a nop. Live-patching is done early in the boot process, just after having run the system inventory. No drivers are running and thus no external interrupts should arrive. So the hope is that no TLB exceptions will occur during the patching. If this turns out to be wrong we will probably need to do the patching in real-mode. Signed-off-by: Helge Deller <[email protected]>
2018-10-17Merge branch 'parisc-4.19-3' of ↵Greg Kroah-Hartman1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Helge writes: "parisc fix: Fix an unitialized variable usage in the parisc unwind code." * 'parisc-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix uninitialized variable usage in unwind.c
2018-10-17x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context ↵Sebastian Andrzej Siewior1-1/+1
switch if there is an FPU Booting an i486 with "no387 nofxsr" ends with with the following crash: math_emulate: 0060:c101987d Kernel panic - not syncing: Math emulation needed in kernel on the first context switch in user land. The reason is that copy_fpregs_to_fpstate() tries FNSAVE which does not work as the FPU is turned off. This bug was introduced in: f1c8cd0176078 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active") Add a check for X86_FEATURE_FPU before trying to save FPU registers (we have such a check in switch_fpu_finish() already). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: f1c8cd0176078 ("x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/fpu: Remove second definition of fpu in __fpu__restore_sig()Sebastian Andrzej Siewior1-1/+0
Commit: c5bedc6847c3b ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active") introduced the 'fpu' variable at top of __restore_xstate_sig(), which now shadows the other definition: arch/x86/kernel/fpu/signal.c:318:28: warning: symbol 'fpu' shadows an earlier one arch/x86/kernel/fpu/signal.c:271:20: originally declared here Remove the shadowed definition of 'fpu', as the two definitions are the same. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: c5bedc6847c3b ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/entry/64: Further improve paranoid_entry commentsAndy Lutomirski1-6/+4
Commit: 16561f27f94e ("x86/entry: Add some paranoid entry/exit CR3 handling comments") ... added some comments. This improves them a bit: - When I first read the new comments, it was unclear to me whether they were referring to the case where paranoid_entry interrupted other entry code or where paranoid_entry was itself interrupted. Clarify it. - Remove the EBX comment. We no longer use EBX as a SWAPGS indicator. Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/c47daa1888dc2298e7e1d3f82bd76b776ea33393.1539542111.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/entry/32: Clear the CS high bitsJan Kiszka1-6/+7
Even if not on an entry stack, the CS's high bits must be initialized because they are unconditionally evaluated in PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards. [ bp: Make the commit message tone passive and impartial. ] Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack") Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Acked-by: Joerg Roedel <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Andrea Arcangeli <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Boris Ostrovsky <[email protected]> CC: Brian Gerst <[email protected]> CC: Dave Hansen <[email protected]> CC: David Laight <[email protected]> CC: Denys Vlasenko <[email protected]> CC: Eduardo Valentin <[email protected]> CC: Greg KH <[email protected]> CC: Ingo Molnar <[email protected]> CC: Jiri Kosina <[email protected]> CC: Josh Poimboeuf <[email protected]> CC: Juergen Gross <[email protected]> CC: Linus Torvalds <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Thomas Gleixner <[email protected]> CC: Will Deacon <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] CC: [email protected] CC: linux-mm <[email protected]> CC: x86-ml <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17x86/kconfig: Remove redundant 'default n' lines from all x86 Kconfig'sBartlomiej Zolnierkiewicz3-9/+0
'default n' is the default value for any bool or tristate Kconfig setting so there is no need to write it explicitly. Also, since commit: f467c5640c29 ("kconfig: only write '# CONFIG_FOO is not set' for visible symbols") ... the Kconfig behavior is the same regardless of 'default n' being present or not: ... One side effect of (and the main motivation for) this change is making the following two definitions behave exactly the same: config FOO bool config FOO bool default n With this change, neither of these will generate a '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied). That might make it clearer to people that a bare 'default n' is redundant. ... Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/20181016134217eucas1p2102984488b89178a865162553369025b%[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-17parisc: Include compressed vmlinux file in vmlinuz boot kernelHelge Deller4-30/+91
Change the parisc vmlinuz boot code to include and process the real compressed vmlinux.gz ELF file instead of a compressed memory dump. This brings parisc in sync on how it's done on x86_64. The benefit of this change is that, e.g. for debugging purposes, one can then extract the vmlinux file out of the vmlinuz which was booted which wasn't possible before. This can be archieved with the existing scripts/extract-vmlinux script, which just needs a small tweak to prefer to extract a compressed file before trying the existing given binary. The downside of this approach is that due to the extra round of decompression/ELF processing we need more physical memory installed to be able to boot a kernel. Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Fix address in HPMC IVAJohn David Anglin2-2/+3
Helge noticed that the address of the os_hpmc handler was not being correctly calculated in the hpmc macro. As a result, PDCE_CHECK would fail to call os_hpmc: <Cpu2> e800009802e00000 0000000000000000 CC_ERR_CHECK_HPMC <Cpu2> 37000f7302e00000 8040004000000000 CC_ERR_CPU_CHECK_SUMMARY <Cpu2> f600105e02e00000 fffffff0f0c00000 CC_MC_HPMC_MONARCH_SELECTED <Cpu2> 140003b202e00000 000000000000000b CC_ERR_HPMC_STATE_ENTRY <Cpu2> 5600100b02e00000 00000000000001a0 CC_MC_OS_HPMC_LEN_ERR <Cpu2> 5600106402e00000 fffffff0f0438e70 CC_MC_BR_TO_OS_HPMC_FAILED <Cpu2> e800009802e00000 0000000000000000 CC_ERR_CHECK_HPMC <Cpu2> 37000f7302e00000 8040004000000000 CC_ERR_CPU_CHECK_SUMMARY <Cpu2> 4000109f02e00000 0000000000000000 CC_MC_HPMC_INITIATED <Cpu2> 4000101902e00000 0000000000000000 CC_MC_MULTIPLE_HPMCS <Cpu2> 030010d502e00000 0000000000000000 CC_CPU_STOP The address problem can be seen by dumping the fault vector: 0000000040159000 <fault_vector_20>: 40159000: 63 6f 77 73 stb r15,-2447(dp) 40159004: 20 63 61 6e ldil L%b747000,r3 40159008: 20 66 6c 79 ldil L%-1c3b3000,r3 ... 40159020: 08 00 02 40 nop 40159024: 20 6e 60 02 ldil L%15d000,r3 40159028: 34 63 00 00 ldo 0(r3),r3 4015902c: e8 60 c0 02 bv,n r0(r3) 40159030: 08 00 02 40 nop 40159034: 00 00 00 00 break 0,0 40159038: c0 00 70 00 bb,*< r0,sar,40159840 <fault_vector_20+0x840> 4015903c: 00 00 00 00 break 0,0 Location 40159038 should contain the physical address of os_hpmc: 000000004015d000 <os_hpmc>: 4015d000: 08 1a 02 43 copy r26,r3 4015d004: 01 c0 08 a4 mfctl iva,r4 4015d008: 48 85 00 68 ldw 34(r4),r5 This patch moves the address setup into initialize_ivt to resolve the above problem. I tested the change by dumping the HPMC entry after setup: 0000000040209020: 8000240 0000000040209024: 206a2004 0000000040209028: 34630ac0 000000004020902c: e860c002 0000000040209030: 8000240 0000000040209034: 1bdddce6 0000000040209038: 15d000 000000004020903c: 1a0 Signed-off-by: John David Anglin <[email protected]> Cc: <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Fix exported address of os_hpmc handlerHelge Deller1-2/+1
In the C-code we need to put the physical address of the hpmc handler in the interrupt vector table (IVA) in order to get HPMCs working. Since on parisc64 function pointers are indirect (in fact they are function descriptors) we instead export the address as variable and not as function. This reverts a small part of commit f39cce654f9a ("parisc: Add cfi_startproc and cfi_endproc to assembly code"). Signed-off-by: Helge Deller <[email protected]> Cc: <[email protected]> [4.9+]
2018-10-17parisc: Fix map_pages() to not overwrite existing pte entriesHelge Deller1-6/+2
Fix a long-existing small nasty bug in the map_pages() implementation which leads to overwriting already written pte entries with zero, *if* map_pages() is called a second time with an end address which isn't aligned on a pmd boundry. This happens for example if we want to remap only the text segment read/write in order to run alternative patching on the code. Exiting the loop when we reach the end address fixes this. Cc: [email protected] Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Purge TLB entries after updating page table entry and set page ↵John David Anglin2-15/+13
accessed flag in TLB handler This patch may resolve some races in TLB handling.  Hopefully, TLB inserts are accesses and protected by spin lock. If not, we may need to IPI calls and do local purges on PA 2.0. Signed-off-by: John David Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Release spinlocks using ordered storeJohn David Anglin2-10/+6
This patch updates the spin unlock code to use an ordered store with release semanatics. All prior accesses are guaranteed to be performed before an ordered store is performed. Using an ordered store is significantly faster than using the sync memory barrier. Signed-off-by: John David Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Clean up crash header outputHelge Deller1-2/+2
On kernel crash, this is the current output: Kernel Fault: Code=26 (Data memory access rights trap) regs=(ptrval) (Addr=00000004) Drop the address of regs, it's of no use for debugging, and show the faulty address without parenthesis. Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Add SYSTEM_INFO and REGISTER TOC PAT functionsHelge Deller1-0/+8
Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Remove PTE load and fault check from L2_ptep macroJohn David Anglin1-6/+6
This change removes the PTE load and present check from the L2_ptep macro. The load and check for kernel pages is now done in the tlb_lock macro. This avoids a double load and check for user pages. The load and check for user pages is now done inside the lock so the fault handler can't be called while the entry is being updated. This version uses an ordered store to release the lock when the page table entry isn't present. It also corrects the check in the non SMP case. Signed-off-by: John David Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Reorder TLB flush timing calculationJohn David Anglin1-8/+10
On boot (mostly reboot), my c8000 sometimes crashes after it prints the TLB flush threshold. The lockup is hard. The front LED flashes red and the box must be unplugged to reset the error. I noticed that when the crash occurs the TLB flush threshold is about one quarter what it is on a successful boot. If I disabled the calculation, the crash didn't occur. There also seemed to be a timing dependency affecting the crash. I finally realized that the flush_tlb_all() timing test runs just after the secondary CPUs are started. There seems to be a problem with running flush_tlb_all() too soon after the CPUs are started. The timing for the range test always seemed okay. So, I reversed the order of the two timing tests and I haven't had a crash at this point so far. I added a couple of information messages which I have left to help with diagnosis if the problem should appear on another machine. This version reduces the minimum TLB flush threshold to 16 KiB. Signed-off-by: John David Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: remove check for minimum required GCC versionMasahiro Yamada1-9/+0
Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6") bumped the minimum GCC version to 4.6 for all architectures. The version check in arch/parisc/Makefile is obsolete now. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2018-10-17parisc: Use PARISC_ITLB_TRAP constant in entry.SHelge Deller1-1/+1
Fixes: 5b00ca0b8035 ("parisc: Restore possibility to execute 64-bit applications") Signed-off-by: Helge Deller <[email protected]>
2018-10-17kvm: x86: Add has_payload and payload to kvm_queued_exceptionJim Mattson2-0/+10
The payload associated with a #PF exception is the linear address of the fault to be loaded into CR2 when the fault is delivered. The payload associated with a #DB exception is a mask of the DR6 bits to be set (or in the case of DR6.RTM, cleared) when the fault is delivered. Add fields has_payload and payload to kvm_queued_exception to track payloads for pending exceptions. The new fields are introduced here, but for now, they are just cleared. Reported-by: Jim Mattson <[email protected]> Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-16MIPS: Cleanup DSP ASE detectionPaul Burton3-19/+21
Currently we hardcode a list of files for which we specify that the toolchain has DSP ASE support when building for MIPSr2 only. This has a number of problems: 1) It doesn't actually ensure that the toolchain supports the DSP ASE at all. 2) It's fragile if we try to use DSP ASE macros in other files. 3) It makes no provision for MIPSr6 & later systems which also support the DSP ASE & end up using the .word directive implementation of the DSP macros. Fix this by detecting assembler support for the DSP ASE globally, not just for a small set of files, and not just for MIPSr2. This now exposes use of toolchain DSP support to kernel builds targeting MIPSr1 and older, so we add .set MIPS_ISA_LEVEL directives prior to all .set dsp directives in order to prevent the assembler from complaining that the DSP ASE is only supported with MIPSr2 & higher. Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20901/ Cc: [email protected]
2018-10-17x86/kvm/nVMX: nested state migration for Enlightened VMCSVitaly Kuznetsov3-20/+65
Add support for get/set of nested state when Enlightened VMCS is in use. A new KVM_STATE_NESTED_EVMCS flag to indicate eVMCS on the vCPU was enabled is added. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/nVMX: allow bare VMXON state migrationVitaly Kuznetsov1-7/+8
It is perfectly valid for a guest to do VMXON and not do VMPTRLD. This state needs to be preserved on migration. Cc: [email protected] Fixes: 8fcc4b5923af5de58b80b53a069453b135693304 Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinitVitaly Kuznetsov1-2/+10
vcpu->arch.pv_eoi is accessible through both HV_X64_MSR_VP_ASSIST_PAGE and MSR_KVM_PV_EOI_EN so on migration userspace may try to restore them in any order. Values match, however, kvm_lapic_enable_pv_eoi() uses different length: for Hyper-V case it's the whole struct hv_vp_assist_page, for KVM native case it is 8. In case we restore KVM-native MSR last cache will be reinitialized with len=8 so trying to access VP assist page beyond 8 bytes with kvm_read_guest_cached() will fail. Check if we re-initializing cache for the same address and preserve length in case it was greater. Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/hyperv: don't clear VP assist pages on initVitaly Kuznetsov1-1/+7
VP assist pages may hold valuable data which needs to be preserved across migration. Clean PV EOI portion of the data on init, the guest is responsible for making sure there's no garbage in the rest. This will be used for nVMX migration, eVMCS address needs to be preserved. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: optimize prepare_vmcs02{,_full} for Enlightened VMCS caseVitaly Kuznetsov1-53/+65
When Enlightened VMCS is in use by L1 hypervisor we can avoid vmwriting VMCS fields which did not change. Our first goal is to achieve minimal impact on traditional VMCS case so we're not wrapping each vmwrite() with an if-changed checker. We also can't utilize static keys as Enlightened VMCS usage is per-guest. This patch implements the simpliest solution: checking fields in groups. We skip single vmwrite() statements as doing the check will cost us something even in non-evmcs case and the win is tiny. Unfortunately, this makes prepare_vmcs02_full{,_full}() code Enlightened VMCS-dependent (and a bit ugly). Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: implement enlightened VMPTRLD and VMCLEARVitaly Kuznetsov1-7/+108
Per Hyper-V TLFS 5.0b: "The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to the corresponding field in the VP assist page (see section 7.8.7). Another field in the VP assist page controls the currently active enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in size and must be initially zeroed. No VMPTRLD instruction must be executed to make an enlightened VMCS active or current. After the L1 hypervisor performs a VM entry with an enlightened VMCS, the VMCS is considered active on the processor. An enlightened VMCS can only be active on a single processor at the same time. The L1 hypervisor can execute a VMCLEAR instruction to transition an enlightened VMCS from the active to the non-active state. Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior." Keep Enlightened VMCS structure for the current L2 guest permanently mapped from struct nested_vmx instead of mapping it every time. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: add enlightened VMCS stateVitaly Kuznetsov1-17/+423
Adds hv_evmcs pointer and implement copy_enlightened_to_vmcs12() and copy_enlightened_to_vmcs12(). prepare_vmcs02()/prepare_vmcs02_full() separation is not valid for Enlightened VMCS, do full sync for now. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capabilityVitaly Kuznetsov4-0/+64
Enlightened VMCS is opt-in. The current version does not contain all fields supported by nested VMX so we must not advertise the corresponding VMX features if enlightened VMCS is enabled. Userspace is given the enlightened VMCS version supported by KVM as part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to be advertised to the nested hypervisor, currently done via a cpuid leaf for Hyper-V. Suggested-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: VMX: refactor evmcs_sanitize_exec_ctrls()Vitaly Kuznetsov1-61/+47
Split off EVMCS1_UNSUPPORTED_* macros so we can re-use them when enabling Enlightened VMCS for Hyper-V on KVM. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: hyperv: define VP assist page helpersLadi Prosek5-6/+29
The state related to the VP assist page is still managed by the LAPIC code in the pv_eoi field. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: x86: reintroduce pte_list_remove, but including mmu_spte_clear_track_bitsWei Yang1-3/+9
rmap_remove() removes the sptep after locating the correct rmap_head but, in several cases, the caller has already known the correct rmap_head. This patch introduces a new pte_list_remove(); because it is known that the spte is present (or it would not have an rmap_head), it is safe to remove the tracking bits without any previous check. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM: x86: rename pte_list_remove to __pte_list_removeWei Yang1-8/+8
This is a patch preparing for further change. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17kvm/x86 : fix some typoPeng Hao1-2/+2
Signed-off-by: Peng Hao <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/VMX: Change hv flush logic when ept tables are mismatched.Lan Tianyu1-7/+8
If ept table pointers are mismatched, flushing tlb for each vcpus via hv flush interface still helps to reduce vmexits which are triggered by IPI and INEPT emulation. Signed-off-by: Lan Tianyu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Use 32bit xor to clear registerUros Bizjak1-2/+2
x86_64 zero-extends 32bit xor to a full 64bit register. Use %k asm operand modifier to force 32bit register and save 268 bytes in kvm.o Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Use assembly instruction mnemonics instead of .byte streamsUros Bizjak3-40/+21
Recently the minimum required version of binutils was changed to 2.20, which supports all VMX instruction mnemonics. The patch removes all .byte #defines and uses real instruction mnemonics instead. The compiler is now able to pass memory operand to the instruction, so there is no need for memory clobber anymore. Also, the compiler adds CC register clobber automatically to all extended asm clauses, so the patch also removes explicit CC clobber. The immediate benefit of the patch is removal of many unnecesary register moves, resulting in 1434 saved bytes in vmx.o: text data bss dec hex filename 151257 18246 8500 178003 2b753 vmx.o 152691 18246 8500 179437 2bced vmx-old.o Some examples of improvement include removal of unneeded moves of %rsp to %rax in front of invept and invvpid instructions: a57e: b9 01 00 00 00 mov $0x1,%ecx a583: 48 89 04 24 mov %rax,(%rsp) a587: 48 89 e0 mov %rsp,%rax a58a: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) a591: 00 00 a593: 66 0f 38 80 08 invept (%rax),%rcx to: a45c: 48 89 04 24 mov %rax,(%rsp) a460: b8 01 00 00 00 mov $0x1,%eax a465: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) a46c: 00 00 a46e: 66 0f 38 80 04 24 invept (%rsp),%rax and the ability to use more optimal registers and memory operands in the instruction: 8faa: 48 8b 44 24 28 mov 0x28(%rsp),%rax 8faf: 4c 89 c2 mov %r8,%rdx 8fb2: 0f 79 d0 vmwrite %rax,%rdx to: 8e7c: 44 0f 79 44 24 28 vmwrite 0x28(%rsp),%r8 Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17KVM/x86: Fix invvpid and invept register operand size in 64-bit modeUros Bizjak1-2/+2
Register operand size of invvpid and invept instruction in 64-bit mode has always 64 bits. Adjust inline function argument type to reflect correct size. Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()Vitaly Kuznetsov1-0/+6
We don't use root page role for nested_mmu, however, optimizing out re-initialization in case nothing changed is still valuable as this is done for every nested vmentry. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is neededVitaly Kuznetsov2-34/+63
MMU reconfiguration in init_kvm_tdp_mmu()/kvm_init_shadow_mmu() can be avoided if the source data used to configure it didn't change; enhance MMU extended role with the required fields and consolidate common code in kvm_calc_mmu_role_common(). Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/nVMX: introduce source data cache for kvm_init_shadow_ept_mmu()Vitaly Kuznetsov2-15/+52
MMU re-initialization is expensive, in particular, update_permission_bitmask() and update_pkru_bitmask() are. Cache the data used to setup shadow EPT MMU and avoid full re-init when it is unchanged. Signed-off-by: Vitaly Kuznetsov <[email protected]> Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: make space for source data caching in struct kvm_mmuVitaly Kuznetsov3-9/+35
In preparation to MMU reconfiguration avoidance we need a space to cache source data. As this partially intersects with kvm_mmu_page_role, create 64bit sized union kvm_mmu_role holding both base and extended data. No functional change. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu: get rid of redundant kvm_mmu_setup()Paolo Bonzini3-14/+1
Just inline the contents into the sole caller, kvm_init_mmu is now public. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Sean Christopherson <[email protected]>
2018-10-17x86/kvm/mmu: introduce guest_mmuVitaly Kuznetsov3-18/+40
When EPT is used for nested guest we need to re-init MMU as shadow EPT MMU (nested_ept_init_mmu_context() does that). When we return back from L2 to L1 kvm_mmu_reset_context() in nested_vmx_load_cr3() resets MMU back to normal TDP mode. Add a special 'guest_mmu' so we can use separate root caches; the improved hit rate is not very important for single vCPU performance, but it avoids contention on the mmu_lock for many vCPUs. On the nested CPUID benchmark, with 16 vCPUs, an L2->L1->L2 vmexit goes from 42k to 26k cycles. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2018-10-17x86/kvm/mmu.c: add kvm_mmu parameter to kvm_mmu_free_roots()Vitaly Kuznetsov3-6/+8
Add an option to specify which MMU root we want to free. This will be used when nested and non-nested MMUs for L1 are split. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Sean Christopherson <[email protected]>
2018-10-17x86/kvm/mmu.c: set get_pdptr hook in kvm_init_shadow_ept_mmu()Vitaly Kuznetsov2-0/+2
kvm_init_shadow_ept_mmu() doesn't set get_pdptr() hook and is this not a problem just because MMU context is already initialized and this hook points to kvm_pdptr_read(). As we're intended to use a dedicated MMU for shadow EPT MMU set this hook explicitly. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: Sean Christopherson <[email protected]>