aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2020-09-28KVM: x86: emulating RDPID failure shall return #UD rather than #GPRobert Hoo1-1/+1
Per Intel's SDM, RDPID takes a #UD if it is unsupported, which is more or less what KVM is emulating when MSR_TSC_AUX is not available. In fact, there are no scenarios in which RDPID is supposed to #GP. Fixes: fb6d4d340e ("KVM: x86: emulate RDPID") Signed-off-by: Robert Hoo <[email protected]> Message-Id: <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PISean Christopherson3-0/+16
On successful nested VM-Enter, check for pending interrupts and convert the highest priority interrupt to a pending posted interrupt if it matches L2's notification vector. If the vCPU receives a notification interrupt before nested VM-Enter (assuming L1 disables IRQs before doing VM-Enter), the pending interrupt (for L1) should be recognized and processed as a posted interrupt when interrupts become unblocked after VM-Enter to L2. This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is trying to inject an interrupt into L2 by setting the appropriate bit in L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's method of manually moving the vector from PIR->vIRR/RVI). KVM will observe the IPI while the vCPU is in L1 context and so won't immediately morph it to a posted interrupt for L2. The pending interrupt will be seen by vmx_check_nested_events(), cause KVM to force an immediate exit after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. After handling the VM-Exit, L1 will see that L2 has a pending interrupt in PIR, send another IPI, and repeat until L2 is killed. Note, posted interrupts require virtual interrupt deliveriy, and virtual interrupt delivery requires exit-on-interrupt, ergo interrupts will be unconditionally unmasked on VM-Enter if posted interrupts are enabled. Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Cc: [email protected] Cc: Liran Alon <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Add tracepoint for cr_interceptionHaiwei Li1-0/+2
Add trace_kvm_cr_write and trace_kvm_cr_read for svm. Signed-off-by: Haiwei Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Analyze is_guest_mode() in svm_vcpu_run()Wanpeng Li1-5/+6
Analyze is_guest_mode() in svm_vcpu_run() instead of svm_exit_handlers_fastpath() in conformity with VMX version. Suggested-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Invoke NMI handler via indirect call instead of INTnSean Christopherson1-15/+15
Rework NMI VM-Exit handling to invoke the kernel handler by function call instead of INTn. INTn microcode is relatively expensive, and aligning the IRQ and NMI handling will make it easier to update KVM should some newfangled method for invoking the handlers come along. Suggested-by: Andi Kleen <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: VMX: Move IRQ invocation to assembly subroutineSean Christopherson2-30/+37
Move the asm blob that invokes the appropriate IRQ handler after VM-Exit into a proper subroutine. Unconditionally create a stack frame in the subroutine so that, as objtool sees things, the function has standard stack behavior. The dynamic stack adjustment makes using unwind hints problematic. Suggested-by: Josh Poimboeuf <[email protected]> Cc: Uros Bizjak <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: Add kvm_x86_ops hook to short circuit emulationSean Christopherson5-35/+36
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_page()Haiwei Li1-4/+2
Use __GFP_ZERO while alloc_page(). Signed-off-by: Haiwei Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in ↵Krish Sadhukhan3-8/+19
vmcs02 if vmcs12 doesn't set it Currently, prepare_vmcs02_early() does not check if the "unrestricted guest" VM-execution control in vmcs12 is turned off and leaves the corresponding bit on in vmcs02. Due to this setting, vmentry checks which are supposed to render the nested guest state as invalid when this VM-execution control is not set, are passing in hardware. This patch turns off the "unrestricted guest" VM-execution control in vmcs02 if vmcs12 has turned it off. Suggested-by: Jim Mattson <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Krish Sadhukhan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: x86: fix MSR_IA32_TSC read for nested migrationMaxim Levitsky1-2/+14
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Enable INVPCID feature on AMDBabu Moger2-0/+53
The following intercept bit has been added to support VMEXIT for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit has been added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID Enable the interceptions when the the guest is running with shadow page table enabled and handle the tlbflush based on the invpcid instruction type. For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: X86: Move handling of INVPCID types to x86Babu Moger3-67/+80
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.cBabu Moger5-36/+37
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_interceptBabu Moger2-42/+17
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Add new intercept word in vmcb_control_areaBabu Moger3-6/+17
The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Modify 64 bit intercept field to two 32 bit vectorsBabu Moger5-45/+41
Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Modify intercept_exceptions to generic interceptsBabu Moger4-12/+15
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Change intercept_dr to generic interceptsBabu Moger4-42/+38
Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Change intercept_cr to generic interceptsBabu Moger4-24/+23
Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)Babu Moger3-0/+39
This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nSVM: Remove unused fieldBabu Moger2-3/+0
host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: refactor exit labels in svm_create_vcpuMaxim Levitsky1-7/+7
Kernel coding style suggests not to use labels like error1,error2 Suggested-by: Jim Mattson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_pageMaxim Levitsky1-4/+2
Another small refactoring. Suggested-by: Jim Mattson <[email protected]> Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: refactor msr permission bitmap allocationMaxim Levitsky1-22/+23
Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates the msr bitmap and add svm_vcpu_free_msrpm to free it. This will be used later to move the nested msr permission bitmap allocation to nested.c Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nSVM: rename nested vmcb to vmcb12Maxim Levitsky3-119/+117
This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: SVM: rename a variable in the svm_create_vcpuMaxim Levitsky1-6/+6
The 'page' is to hold the vcpu's vmcb so name it as such to avoid confusion. Signed-off-by: Maxim Levitsky <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: LAPIC: Reduce world switch latency caused by timer_advance_nsWanpeng Li3-12/+7
All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: LAPIC: Narrow down the kick target vCPUWanpeng Li3-8/+3
The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer fires on a different pCPU which vCPU is running on. This kick costs about 1000 clock cycles and we don't need this when injecting already-expired timer or when using the VMX preemption timer because kvm_lapic_expired_hv_timer() is called from the target vCPU. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: LAPIC: Guarantee the timer is in tsc-deadline mode when settingWanpeng Li1-2/+1
Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot() and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when wrmsr MSR_IA32_TSCDEADLINE. Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: LAPIC: Return 0 when getting the tscdeadline timer if the lapic is hw ↵Wanpeng Li1-2/+1
disabled Return 0 when getting the tscdeadline timer if the lapic is hw disabled. Suggested-by: Paolo Bonzini <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: LAPIC: Fix updating DFR missing apic map recalculationWanpeng Li1-5/+10
There is missing apic map recalculation after updating DFR, if it is INIT RESET, in x2apic mode, local apic is software enabled before. This patch fix it by introducing the function kvm_apic_set_dfr() to be called in INIT RESET handling path. Signed-off-by: Wanpeng Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: Simplify the initialization of nested_vmx_msrsChenyi Qiang1-2/+1
The nested VMX controls MSRs can be initialized by the global capability values stored in vmcs_config. Signed-off-by: Chenyi Qiang <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabledChenyi Qiang1-2/+4
KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they are not exposed by the system ioctl KVM_GET_MSR. Add them to the setup of nested VMX controls MSR. Signed-off-by: Chenyi Qiang <[email protected]> Message-Id: <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state()Vitaly Kuznetsov1-5/+3
The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: 6ccbd29ade0d ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <[email protected]> Reported-by: Joerg Roedel <[email protected]> Reported-by: Colin King <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Acked-by: Joerg Roedel <[email protected]> Tested-by: Tom Lendacky <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-28x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury5-52/+19
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Michael Kelley <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Signed-off-by: Wei Liu <[email protected]>
2020-09-27x86/apic/msi: Unbreak DMAR and HPET MSIThomas Gleixner1-0/+2
Switching the DMAR and HPET MSI code to use the generic MSI domain ops missed to add the flag which tells the core code to update the domain operations with the defaults. As a consequence the core code crashes when an interrupt in one of those domains is allocated. Add the missing flags. Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops") Reported-by: Qian Cai <[email protected]> Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-27Merge tag 'x86-urgent-2020-09-27' of ↵Linus Torvalds7-12/+68
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for the x86 interrupt code: - Unbreak the magic 'search the timer interrupt' logic in IO/APIC code which got wreckaged when the core interrupt code made the state tracking logic stricter. That caused the interrupt line to stay masked after switching from IO/APIC to PIC delivery mode, which obviously prevents interrupts from being delivered. - Make run_on_irqstack_code() typesafe. The function argument is a void pointer which is then cast to 'void (*fun)(void *). This breaks Control Flow Integrity checking in clang. Use proper helper functions for the three variants reuqired" * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Unbreak check_timer() x86/irq: Make run_on_irqstack_cond() typesafe
2020-09-27x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury5-52/+19
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-09-26arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writebackMikulas Patocka1-1/+1
If we copy less than 8 bytes and if the destination crosses a cache line, __copy_user_flushcache would invalidate only the first cache line. This patch makes it invalidate the second cache line as well. Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Toshi Kani <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Al Viro <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Linus Torvalds <[email protected]>
2020-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-11/+39
Pull more kvm fixes from Paolo Bonzini: "Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Add a dedicated INVD intercept routine KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: x86: fix MSR_IA32_TSC read for nested migration selftests: kvm: Fix assert failure in single-step test KVM: x86: VMX: Make smaller physical guest address space support user-configurable
2020-09-25KVM: SVM: Add a dedicated INVD intercept routineTom Lendacky1-1/+7
The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <[email protected]> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-25x86/sev-es: Use GHCB accessor for setting the MMIO scratch bufferTom Lendacky1-1/+1
Use ghcb_set_sw_scratch() to set the GHCB scratch field, which will also set the corresponding bit in the GHCB valid_bitmap field to denote that sw_scratch is actually valid. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Link: https://lkml.kernel.org/r/ba84deabdf44a7a880454fb351d189c6ad79d4ba.1601041106.git.thomas.lendacky@amd.com
2020-09-25KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKESean Christopherson1-1/+2
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled. Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are toggled inadvertantly skipped the MMU context reset due to the mask of bits that triggers PDPTR loads also being used to trigger MMU context resets. Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode") Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode") Cc: Jim Mattson <[email protected]> Cc: Peter Shier <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-25dma-mapping: add a new dma_alloc_pages APIChristoph Hellwig1-0/+2
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag. Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Thomas Bogendoerfer <[email protected]> (MIPS part)
2020-09-25Merge branch 'master' of ↵Christoph Hellwig23-82/+230
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.
2020-09-24KVM: x86: fix MSR_IA32_TSC read for nested migrationMaxim Levitsky1-2/+15
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-09-24perf/x86/intel/uncore: Support PCIe3 unit on Snow RidgeKan Liang1-0/+53
The Snow Ridge integrated PCIe3 uncore unit can be used to collect performance data, e.g. utilization, between PCIe devices, plugged into the PCIe port, and the components (in M2IOSF) responsible for translating and managing requests to/from the device. The performance data is very useful for analyzing the performance of PCIe devices. The device with the PCIe3 uncore PMON units is owned by the portdrv_pci driver. Create a PCI sub driver for the PCIe3 uncore PMON units. Here are some difference between PCIe3 uncore unit and other uncore pci units. - There may be several Root Ports on a system. But the uncore counters only exist in the Root Port A. A user can configure the channel mask to collect the data from other Root Ports. - The event format of the PCIe3 uncore unit is the same as IIO unit of SKX. - The Control Register of PCIe3 uncore unit is 64 bits. - The offset of each counters is 8, which is the same as M2M unit of SNR. - New MSR addresses for unit control, counter and counter config. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-24perf/x86/intel/uncore: Generic support for the PCI sub driverKan Liang2-0/+82
Some uncore counters may be located in the configuration space of a PCI device, which already has a bonded driver. Currently, the uncore driver cannot register a PCI uncore PMU for these counters, because, to register a PCI uncore PMU, the uncore driver must be bond to the device. However, one device can only have one bonded driver. Add an uncore PCI sub driver to support such kind of devices. The sub driver doesn't own the device. In initialization, the sub driver searches the device via pci_get_device(), and register the corresponding PMU for the device. In the meantime, the sub driver registers a PCI bus notifier, which is used to notify the sub driver once the device is removed. The sub driver can unregister the PMU accordingly. The sub driver only searches the devices defined in its id table. The id table varies on different platforms, which will be implemented in the following platform-specific patch. Suggested-by: Bjorn Helgaas <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()Kan Liang1-10/+25
The PMU unregistration in the uncore PCI sub driver is similar as the normal PMU unregistration for a PCI device. The codes to unregister a PCI PMU can be shared. Factor out uncore_pci_pmu_unregister(), which will be used later. Use uncore_pci_get_dev_die_info() to replace the codes which retrieve the socket and die informaion. The pci_set_drvdata() is not included in uncore_pci_pmu_unregister() as well, because the uncore PCI sub driver will not touch the private driver data pointer of the device. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()Kan Liang1-31/+51
The PMU registration in the uncore PCI sub driver is similar as the normal PMU registration for a PCI device. The codes to register a PCI PMU can be shared. Factor out uncore_pci_pmu_register(), which will be used later. The pci_set_drvdata() is not included in uncore_pci_pmu_register(). The uncore PCI sub driver doesn't own the PCI device. It will not touch the private driver data pointer for the device. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]