blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2017-08-10	kvm: nVMX: Add support for fast unprotection of nested guest page tables	Paolo Bonzini	3	-6/+5
	This is the same as commit 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23), but for Intel processors. In this case, the exit qualification field's bit 8 says whether the EPT violation occurred while translating the guest's final physical address or rather while translating the guest page tables. Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-10	KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 guest	Brijesh Singh	1	-1/+2
	Commit 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23) added a new error code to aid nested page fault handling. The commit unprotects (kvm_mmu_unprotect_page) the page when we get a NPF due to guest page table walk where the page was marked RO. However, if an L0->L2 shadow nested page table can also be marked read-only when a page is read only in L1's nested page table. If such a page is accessed by L2 while walking page tables it can cause a nested page fault (page table walks are write accesses). However, after kvm_mmu_unprotect_page we may get another page fault, and again in an endless stream. To cover this use case, we qualify the new error_code check with vcpu->arch.mmu_direct_map so that the error_code check would run on L1 guest, and not the L2 guest. This avoids hitting the above scenario. Fixes: 147277540bbc54119172481c8ef6d930cc9fbfc2 Cc: [email protected] Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Thomas Lendacky <[email protected]> Signed-off-by: Brijesh Singh <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-10	KVM: X86: Fix residual mmio emulation request to userspace	Wanpeng Li	2	-0/+2
	Reported by syzkaller: The kvm-intel.unrestricted_guest=0 WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] CPU: 5 PID: 1014 Comm: warn_test Tainted: G W OE 4.13.0-rc3+ #8 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm] Call Trace: ? put_pid+0x3a/0x50 ? rcu_read_lock_sched_held+0x79/0x80 ? kmem_cache_free+0x2f2/0x350 kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 The syszkaller folks reported a residual mmio emulation request to userspace due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs several threads to launch the same vCPU, the thread which lauch this vCPU after the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will trigger the warning. #define _GNU_SOURCE #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/wait.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <fcntl.h> #include <unistd.h> #include <linux/kvm.h> #include <stdio.h> int kvmcpu; struct kvm_run run; void thr(void* arg) { int res; res = ioctl(kvmcpu, KVM_RUN, 0); printf("ret1=%d exit_reason=%d suberror=%d\n", res, run->exit_reason, run->internal.suberror); return 0; } void test() { int i, kvm, kvmvm; pthread_t th[4]; kvm = open("/dev/kvm", O_RDWR); kvmvm = ioctl(kvm, KVM_CREATE_VM, 0); kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0); run = (struct kvm_run*)mmap(0, 4096, PROT_READ\|PROT_WRITE, MAP_SHARED, kvmcpu, 0); srand(getpid()); for (i = 0; i < 4; i++) { pthread_create(&th[i], 0, thr, 0); usleep(rand() % 10000); } for (i = 0; i < 4; i++) pthread_join(th[i], 0); } int main() { for (;;) { int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { test(); exit(0); } int status; while (waitpid(pid, &status, __WALL) != pid) {} } return 0; } This patch fixes it by resetting the vcpu->mmio_needed once we receive the triple fault to avoid the residue. Reported-by: Dmitry Vyukov <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-10	kprobes/x86: Do not jump-optimize kprobes on irq entry code	Masami Hiramatsu	1	-3/+6
	Since the kernel segment registers are not prepared at the entry of irq-entry code, if a kprobe on such code is jump-optimized, accessing per-CPU variables may cause a kernel panic. However, if the kprobe is not optimized, it triggers an int3 exception and sets segment registers correctly. With this patch we check the probe-address and if it is in the irq-entry code, it prohibits optimizing such kprobes. This means we can continue probing such interrupt handlers by kprobes but it is not optimized anymore. Reported-by: Francis Deslauriers <[email protected]> Tested-by: Francis Deslauriers <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David S . Miller <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Max Filippov <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/150172795654.27216.9824039077047777477.stgit@devbox Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	irq: Make the irqentry text section unconditional	Masami Hiramatsu	2	-9/+2
	Generate irqentry and softirqentry text sections without any Kconfig dependencies. This will add extra sections, but there should be no performace impact. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David S . Miller <[email protected]> Cc: Francis Deslauriers <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Max Filippov <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/platform/intel-mid: Make 'bt_sfi_data' const	Bhumika Goyal	1	-1/+1
	Make this structure const as it is only used during copy operation. Signed-off-by: Bhumika Goyal <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/build: Drop unused mflags-y	Cao jin	1	-3/+0
	Subarchitecture support (mflags-y) was removed from x86 in this commit: 6bda2c8b32fe ("x86: remove subarchitecture support") So drop the mflags-y usage from the Makefile. Signed-off-by: Cao jin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/asm: Fix UNWIND_HINT_REGS macro for older binutils	Josh Poimboeuf	1	-4/+6
	Apparently the binutils 2.20 assembler can't handle the '&&' operator in the UNWIND_HINT_REGS macro. Rearrange the macro to do without it. This fixes the following error: arch/x86/entry/entry_64.S: Assembler messages: arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement Reported-by: Andrew Morton <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 39358a033b2e ("objtool, x86: Add facility for asm code to provide unwind hints") Link: http://lkml.kernel.org/r/e2ad97c1ae49a484644b4aaa4dd3faa4d6d969b2.1502116651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/asm/32: Fix regs_get_register() on segment registers	Andy Lutomirski	1	-0/+11
	The segment register high words on x86_32 may contain garbage. Teach regs_get_register() to read them as u16 instead of unsigned long. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/0b76f6dbe477b7b1a81938fddcc3c483d48f0ff2.1502314765.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/xen/64: Rearrange the SYSCALL entries	Andy Lutomirski	3	-25/+14
	Xen's raw SYSCALL entries are much less weird than native. Rather than fudging them to look like native entries, use the Xen-provided stack frame directly. This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would benefit from similar treatment. This makes one change to the native code path: the compat instruction that clears the high 32 bits of %rax is moved slightly later. I'd be surprised if this affects performance at all. Tested-by: Juergen Gross <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	Merge branch 'x86/urgent' into x86/asm, to pick up fixes	Ingo Molnar	48	-236/+659
	Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	x86/asm/64: Clear AC on NMI entries	Andy Lutomirski	1	-0/+2
	This closes a hole in our SMAP implementation. This patch comes from grsecurity. Good catch! Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/314cc9f294e8f14ed85485727556ad4f15bb1659.1502159503.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	Merge branch 'linus' into locking/core, to pick up fixes	Ingo Molnar	19	-146/+347
	Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	Merge branch 'linus' into sched/core, to pick up fixes	Ingo Molnar	19	-146/+347
	Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	perf/x86/amd/uncore: Get correct number of cores sharing last level cache	Janakarajan Natarajan	1	-3/+16
	In Family 17h, the number of cores sharing a cache level is obtained from the Cache Properties CPUID leaf (0x8000001d) by passing in the cache level in ECX. In prior families, a cache level of 2 was used to determine this information. To get the right information, irrespective of Family, iterate over the cache levels using CPUID 0x8000001d. The last level cache is the last value to return a non-zero value in EAX. Signed-off-by: Janakarajan Natarajan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/5ab569025b39cdfaeca55b571d78c0fc800bdb69.1497452002.git.Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	perf/x86/amd/uncore: Rename cpufeatures macro for cache counters	Janakarajan Natarajan	2	-2/+2
	In Family 17h, L3 is the last level cache as opposed to L2 in previous families. Avoid this name confusion and rename X86_FEATURE_PERFCTR_L2 to X86_FEATURE_PERFCTR_LLC to indicate the performance counter on the last level of cache. Signed-off-by: Janakarajan Natarajan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/016311029fdecdc3fdc13b7ed865c6cbf48b2f15.1497452002.git.Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	Merge branch 'perf/urgent' into perf/core, to pick up fixes	Ingo Molnar	18	-144/+304
	Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10	perf/x86: Fix RDPMC vs. mm_struct tracking	Peter Zijlstra	1	-9/+7
	Vince reported the following rdpmc() testcase failure: > Failing test case: > > fd=perf_event_open(); > addr=mmap(fd); > exec() // without closing or unmapping the event > fd=perf_event_open(); > addr=mmap(fd); > rdpmc() // GPFs due to rdpmc being disabled The problem is of course that exec() plays tricks with what is current->mm, only destroying the old mappings after having installed the new mm. Fix this confusion by passing along vma->vm_mm instead of relying on current->mm. Reported-by: Vince Weaver <[email protected]> Tested-by: Vince Weaver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Link: http://lkml.kernel.org/r/[email protected] [ Minor cleanups. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-09	bpf, x86: implement jiting of BPF_J{LT,LE,SLT,SLE}	Daniel Borkmann	1	-0/+26
	This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the x86_64 eBPF JIT. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-09	crypto: x86/sha1 - Fix reads beyond the number of blocks passed	[email protected]	2	-32/+37
	It was reported that the sha1 AVX2 function(sha1_transform_avx2) is reading ahead beyond its intended data, and causing a crash if the next block is beyond page boundary: http://marc.info/?l=linux-crypto-vger&m=149373371023377 This patch makes sure that there is no overflow for any buffer length. It passes the tests written by Jan Stancek that revealed this problem: https://github.com/jstancek/sha1-avx2-crash I have re-enabled sha1-avx2 by reverting commit b82ce24426a4071da9529d726057e4e642948667 Cc: <[email protected]> Fixes: b82ce24426a4 ("crypto: sha1-ssse3 - Disable avx2") Originally-by: Ilya Albrekht <[email protected]> Tested-by: Jan Stancek <[email protected]> Signed-off-by: Megha Dey <[email protected]> Reported-by: Jan Stancek <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2017-08-08	KVM: X86: implement the logic for spinlock optimization	Longpeng(Mike)	5	-4/+21
	get_cpl requires vcpu_load, so we must cache the result (whether the vcpu was preempted when its cpl=0) in kvm_vcpu_arch. Signed-off-by: Longpeng(Mike) <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-08	KVM: add spinlock optimization framework	Longpeng(Mike)	4	-3/+8
	If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: x86: use general helpers for some cpuid manipulation	Radim Krčmář	4	-20/+14
	Add guest_cpuid_clear() and use it instead of kvm_find_cpuid_entry(). Also replace some uses of kvm_find_cpuid_entry() with guest_cpuid_has(). Signed-off-by: Radim Krčmář <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: x86: generalize guest_cpuid_has_ helpers	Radim Krčmář	6	-150/+95
	This patch turns guest_cpuid_has_XYZ(cpuid) into guest_cpuid_has(cpuid, X86_FEATURE_XYZ), which gets rid of many very similar helpers. When seeing a X86_FEATURE_*, we can know which cpuid it belongs to, but this information isn't in common code, so we recreate it for KVM. Add some BUILD_BUG_ONs to make sure that it runs nicely. Signed-off-by: Radim Krčmář <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: x86: X86_FEATURE_NRIPS is not scattered anymore	Radim Krčmář	1	-13/+1
	bit(X86_FEATURE_NRIPS) is 3 since 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability"), so we can simplify the code. Signed-off-by: Radim Krčmář <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: nVMX: Emulate EPTP switching for the L1 hypervisor	Bandan Das	2	-6/+124
	When L2 uses vmfunc, L0 utilizes the associated vmexit to emulate a switching of the ept pointer by reloading the guest MMU. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Bandan Das <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-07	KVM: nVMX: Enable VMFUNC for the L1 hypervisor	Bandan Das	1	-2/+51
	Expose VMFUNC in MSRs and VMCS fields. No actual VMFUNCs are enabled. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Bandan Das <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-07	KVM: vmx: Enable VMFUNCs	Bandan Das	2	-1/+24
	Enable VMFUNC in the secondary execution controls. This simplifies the changes necessary to expose it to nested hypervisors. VMFUNCs still cause #UD when invoked. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Bandan Das <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-07	KVM: nVMX: get rid of nested_release_page*	David Hildenbrand	1	-26/+15
	Let's also just use the underlying functions directly here. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> [Rebased on top of 9f744c597460 ("KVM: nVMX: do not pin the VMCS12")] Signed-off-by: Radim Krčmář <[email protected]>
2017-08-07	KVM: nVMX: get rid of nested_get_page()	David Hildenbrand	1	-31/+26
	nested_get_page() just sounds confusing. All we want is a page from G1. This is even unrelated to nested. Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy lines. Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]> [Squash pasto fix from Wanpeng Li. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: nVMX: INVPCID support	Paolo Bonzini	1	-9/+25
	Expose the "Enable INVPCID" secondary execution control to the guest and properly reflect the exit reason. In addition, before this patch the guest was always running with INVPCID enabled, causing pcid.flat's "Test on INVPCID when disabled" test to fail. Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07	KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCY	Ladi Prosek	4	-2/+10
	It has been experimentally confirmed that supporting these two MSRs is one of the necessary conditions for nested Hyper-V to use the TSC page. Modern Windows guests are noticeably slower when they fall back to reading timestamps from the HV_X64_MSR_TIME_REF_COUNT MSR instead of using the TSC page. The newly supported MSRs are advertised with the AccessFrequencyRegs partition privilege flag and CPUID.40000003H:EDX[8] "Support for determining timer frequencies is available" (both outside of the scope of this KVM patch). Reviewed-by: Radim Krčmář <[email protected]> Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-04	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	4	-84/+189
	Pull KVM fixes from Radim Krčmář: "ARM: - Yet another race with VM destruction plugged - A set of small vgic fixes x86: - Preserve pending INIT - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF emulation - nVMX interrupt injection and dirty tracking fixes - initialize to make UBSAN happy" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" KVM: nVMX: mark vmcs12 pages dirty on L2 exit kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown KVM: nVMX: do not pin the VMCS12 KVM: avoid using rcu_dereference_protected KVM: X86: init irq->level in kvm_pv_kick_cpu_op KVM: X86: Fix loss of pending INIT due to race KVM: async_pf: make rcu irq exit if not triggered from idle task KVM: nVMX: fixes to nested virt interrupt injection KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 KVM: arm/arm64: Handle hva aging while destroying the vm KVM: arm/arm64: PMU: Fix overflow interrupt injection KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
2017-08-04	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds	1	-16/+11
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "The recent irq core changes unearthed API abuse in the HPET code, which manifested itself in a suspend/resume regression. The fix replaces the cruft with the proper function calls and cures the regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Cure interface abuse in the resume path
2017-08-04	crypto: algapi - make crypto_xor() take separate dst and src arguments	Ard Biesheuvel	4	-8/+5
	There are quite a number of occurrences in the kernel of the pattern if (dst != src) memcpy(dst, src, walk.total % AES_BLOCK_SIZE); crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE); or crypto_xor(keystream, src, nbytes); memcpy(dst, keystream, nbytes); where crypto_xor() is preceded or followed by a memcpy() invocation that is only there because crypto_xor() uses its output parameter as one of the inputs. To avoid having to add new instances of this pattern in the arm64 code, which will be refactored to implement non-SIMD fallbacks, add an alternative implementation called crypto_xor_cpy(), taking separate input and output arguments. This removes the need for the separate memcpy(). Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2017-08-03	treewide: Consolidate Apple DMI checks	Lukas Wunner	4	-2/+15
	We're about to amend ACPI bus scan with DMI checks whether we're running on a Mac to support Apple device properties in AML. The DMI checks are performed for every single device, adding overhead for everything x86 that isn't Apple, which is the majority. Rafael and Andy therefore request to perform the DMI match only once and cache the result. Outside of ACPI various other Apple DMI checks exist and it seems reasonable to use the cached value there as well. Rafael, Andy and Darren suggest performing the DMI check in arch code and making it available with a header in include/linux/platform_data/x86/. To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c to perform the DMI check and invoke it from setup_arch(). Switch over all existing Apple DMI checks, thereby fixing two deficiencies: * They are now #defined to false on non-x86 arches and can thus be optimized away if they're located in cross-arch code. * Some of them only match "Apple Inc." but not "Apple Computer, Inc.", which is used by BIOSes released between January 2006 (when the first x86 Macs started shipping) and January 2007 (when the company name changed upon introduction of the iPhone). Suggested-by: Andy Shevchenko <[email protected]> Suggested-by: Rafael J. Wysocki <[email protected]> Suggested-by: Darren Hart <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Acked-by: Mika Westerberg <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-08-03	Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'	Rafael J. Wysocki	1	-14/+26
	* pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03	KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"	Wanpeng Li	1	-2/+9
	------------[ cut here ]------------ WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] Call Trace: vmx_check_nested_events+0x131/0x1f0 [kvm_intel] ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which means that tells the kernel to not make use of any IOAPICs that may be present in the system. Actually external_intr variable in nested_vmx_vmexit() is the req_int_win variable passed from vcpu_enter_guest() which means that the L0's userspace requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && L0's userspace reqeusts an irq window) is true, so there is no interrupt which L1 requires to inject to L2, we should not attempt to emualte "Acknowledge interrupt on exit" for the irq window requirement in this scenario. This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" if there is no L1 requirement to inject an interrupt to L2. Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> [Added code comment to make it obvious that the behavior is not correct. We should do a userspace exit with open interrupt window instead of the nested VM exit. This patch still improves the behavior, so it was accepted as a (temporary) workaround.] Signed-off-by: Radim Krčmář <[email protected]>
2017-08-02	KVM: nVMX: mark vmcs12 pages dirty on L2 exit	David Matlack	1	-10/+43
	The host physical addresses of L1's Virtual APIC Page and Posted Interrupt descriptor are loaded into the VMCS02. The CPU may write to these pages via their host physical address while L2 is running, bypassing address-translation-based dirty tracking (e.g. EPT write protection). Mark them dirty on every exit from L2 to prevent them from getting out of sync with dirty tracking. Also mark the virtual APIC page and the posted interrupt descriptor dirty when KVM is virtualizing posted interrupt processing. Signed-off-by: David Matlack <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-02	kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown	David Matlack	1	-5/+11
	According to the Intel SDM, software cannot rely on the current VMCS to be coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12 flushes. 24.11.1 Software Use of Virtual-Machine Control Structures ... If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should execute VMCLEAR for that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states). ... This fixes a "suspicious rcu_dereference_check() usage!" warning during kvm_vm_release() because nested_release_vmcs12() calls kvm_vcpu_write_guest_page() without holding kvm->srcu. Signed-off-by: David Matlack <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-02	KVM: nVMX: do not pin the VMCS12	Paolo Bonzini	1	-17/+7
	Since the current implementation of VMCS12 does a memcpy in and out of guest memory, we do not need current_vmcs12 and current_vmcs12_page anymore. current_vmptr is enough to read and write the VMCS12. And David Matlack noted: This patch also fixes dirty tracking (memslot->dirty_bitmap) of the VMCS12 page by using kvm_write_guest. nested_release_page() only marks the struct page dirty. Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> [Added David Matlack's note and nested_release_page_clean() fix.] Signed-off-by: Radim Krčmář <[email protected]>
2017-08-02	KVM: X86: init irq->level in kvm_pv_kick_cpu_op	Longpeng(Mike)	1	-0/+1
	'lapic_irq' is a local variable and its 'level' field isn't initialized, so 'level' is random, it doesn't matter but makes UBSAN unhappy: UBSAN: Undefined behaviour in .../lapic.c:... load of value 10 is not a valid value for type '_Bool' ... Call Trace: [<ffffffff81f030b6>] dump_stack+0x1e/0x20 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm] [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm] [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm] [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm] [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel] [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel] ... Signed-off-by: Longpeng(Mike) <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-02	KVM: X86: Fix loss of pending INIT due to race	Wanpeng Li	1	-8/+11
	When SMP VM start, AP may lost INIT because of receiving INIT between kvm_vcpu_ioctl_x86_get/set_vcpu_events. vcpu 0 vcpu 1 kvm_vcpu_ioctl_x86_get_vcpu_events events->smi.latched_init = 0 send INIT to vcpu1 set vcpu1's pending_events kvm_vcpu_ioctl_x86_set_vcpu_events if (events->smi.latched_init == 0) clear INIT in pending_events This patch fixes it by just update SMM related flags if we are in SMM. Thanks Peng Hao for the report and original commit message. Reported-by: Peng Hao <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-01	x86/intel_rdt: Show bitmask of shareable resource with other executing units	Fenghua Yu	3	-0/+21
	CPUID.(EAX=0x10, ECX=res#):EBX[31:0] reports a bit mask for a resource. Each set bit within the length of the CBM indicates the corresponding unit of the resource allocation may be used by other entities in the platform (e.g. an integrated graphics engine or hardware units outside the processor core and have direct access to the resource). Each cleared bit within the length of the CBM indicates the corresponding allocation unit can be configured to implement a priority-based allocation scheme without interference with other hardware agents in the system. Bits outside the length of the CBM are reserved. More details on the bit mask are described in x86 Software Developer's Manual. The bitmask is shown in "info" directory for each resource. It's up to user to decide how to use the bitmask within a CBM in a partition to share or isolate a resource with other executing units. Suggested-by: Reinette Chatre <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-08-01	x86/intel_rdt/mbm: Handle counter overflow	Vikas Shivappa	4	-6/+97
	Set up a delayed work queue for each domain that will read all the MBM counters of active RMIDs once per second to make sure that they don't wrap around between reads from users. [Tony: Added the initializations for the work structure and completed the patch] Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-29-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01	x86/intel_rdt/mbm: Add mbm counter initialization	Vikas Shivappa	4	-2/+23
	MBM counters are monotonically increasing counts representing the total memory bytes at a particular time. In order to calculate total_bytes for an rdtgroup, we store the value of the counter when we create an rdtgroup or when a new domain comes online. When the total_bytes(all memory controller bytes) or local_bytes(local memory controller bytes) file in "mon_data" is read it shows the total bytes for that rdtgroup since its creation. User can snapshot this at different time intervals to obtain bytes/second. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-28-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01	x86/intel_rdt/mbm: Basic counting of MBM events (total and local)	Tony Luck	4	-2/+86
	Check CPUID bits for whether each of the MBM events is supported. Allocate space for each RMID for each counter in each domain to save previous MSR counter value and running total of data. Create files in each of the monitor directories. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-27-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01	x86/intel_rdt/cqm: Add CPU hotplug support	Vikas Shivappa	3	-7/+90
	Resource groups have a per domain directory under "mon_data". Add or remove these directories as and when domains come online and go offline. Also update the per cpu rmids and cache upon onlining and offlining cpus. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-26-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01	x86/intel_rdt/cqm: Add sched_in support	Vikas Shivappa	2	-25/+29
	OS associates an RMID/CLOSid to a task by writing the per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. The sched_in code will stay as no-op unless we are running on Intel SKU which supports either resource control or monitoring and we also enable them by mounting the resctrl fs. The per cpu CLOSid/RMID values are cached and the write is performed only when a task with a different CLOSid/RMID is scheduled in. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-25-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01	x86/intel_rdt: Introduce rdt_enable_key for scheduling	Vikas Shivappa	1	-2/+10
	Introduce the usage of rdt_enable_key in sched_in code as a preparation to add RDT monitoring support for sched_in. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-24-git-send-email-vikas.shivappa@linux.intel.com