aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2017-08-11x86: provide an init_mem_mapping hypervisor hookJuergen Gross1-0/+10
Provide a hook in hypervisor_x86 called after setting up initial memory mapping. This is needed e.g. by Xen HVM guests to map the hypervisor shared info page. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized ↵Borislav Petkov1-1/+1
VMSAVE/VMLOAD" CPUID flag "virtual_vmload_vmsave" is what is going to land in /proc/cpuinfo now as per v4.13-rc4, for a single feature bit which is clearly too long. So rename it to what it is called in the processor manual. "v_vmsave_vmload" is a bit shorter, after all. We could go more aggressively here but having it the same as in the processor manual is advantageous. Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Radim Krčmář <[email protected]> Cc: Janakarajan Natarajan <[email protected]> Cc: Jörg Rödel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: kvm-ML <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/hyper-v: Use hypercall for remote TLB flushVitaly Kuznetsov2-0/+10
Hyper-V host can suggest us to use hypercall for doing remote TLB flush, this is supposed to work faster than IPIs. Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls we need to put the input somewhere in memory and we don't really want to have memory allocation on each call so we pre-allocate per cpu memory areas on boot. pv_ops patching is happening very early so we need to separate hyperv_setup_mmu_ops() and hyper_alloc_mmu(). It is possible and easy to implement local TLB flushing too and there is even a hint for that. However, I don't see a room for optimization on the host side as both hypercall and native tlb flush will result in vmexit. The hint is also not set on modern Hyper-V versions. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumpsAndy Lutomirski1-2/+3
In ELF_COPY_CORE_REGS, we're copying from the current task, so accessing thread.fsbase and thread.gsbase makes no sense. Just read the values from the CPU registers. In practice, the old code would have been correct most of the time simply because thread.fsbase and thread.gsbase usually matched the CPU registers. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chang Seok <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10hyper-v: Globalize vp_indexVitaly Kuznetsov1-0/+24
To support implementing remote TLB flushing on Hyper-V with a hypercall we need to make vp_index available outside of vmbus module. Rename and globalize. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/hyper-v: Implement rep hypercallsVitaly Kuznetsov1-0/+39
Rep hypercalls are normal hypercalls which perform multiple actions at once. Hyper-V guarantees to return exectution to the caller in not more than 50us and the caller needs to use hypercall continuation. Touch NMI watchdog between hypercall invocations. This is going to be used for HvFlushVirtualAddressList hypercall for remote TLB flushing. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/hyper-v: Introduce fast hypercall implementationVitaly Kuznetsov1-0/+34
Hyper-V supports 'fast' hypercalls when all parameters are passed through registers. Implement an inline version of a simpliest of these calls: hypercall with one 8-byte input and no output. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/hyper-v: Make hv_do_hypercall() inlineVitaly Kuznetsov1-0/+40
We have only three call sites for hv_do_hypercall() and we're going to change HVCALL_SIGNAL_EVENT to doing fast hypercall so we can inline this function for optimization. Hyper-V top level functional specification states that r9-r11 registers and flags may be clobbered by the hypervisor during hypercall and with inlining this is somewhat important, add the clobbers. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is setVitaly Kuznetsov1-1/+6
Code is arch/x86/hyperv/ is only needed when CONFIG_HYPERV is set, the 'basic' support and detection lives in arch/x86/kernel/cpu/mshyperv.c which is included when CONFIG_HYPERVISOR_GUEST is set. Signed-off-by: Vitaly Kuznetsov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Jork Loeser <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Simon Xiao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10kvm: nVMX: Add support for fast unprotection of nested guest page tablesPaolo Bonzini1-1/+0
This is the same as commit 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes", 2016-11-23), but for Intel processors. In this case, the exit qualification field's bit 8 says whether the EPT violation occurred while translating the guest's final physical address or rather while translating the guest page tables. Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-10x86/asm: Fix UNWIND_HINT_REGS macro for older binutilsJosh Poimboeuf1-4/+6
Apparently the binutils 2.20 assembler can't handle the '&&' operator in the UNWIND_HINT_REGS macro. Rearrange the macro to do without it. This fixes the following error: arch/x86/entry/entry_64.S: Assembler messages: arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement arch/x86/entry/entry_64.S:521: Error: non-constant expression in ".if" statement Reported-by: Andrew Morton <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 39358a033b2e ("objtool, x86: Add facility for asm code to provide unwind hints") Link: http://lkml.kernel.org/r/e2ad97c1ae49a484644b4aaa4dd3faa4d6d969b2.1502116651.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10x86/asm/32: Fix regs_get_register() on segment registersAndy Lutomirski1-0/+11
The segment register high words on x86_32 may contain garbage. Teach regs_get_register() to read them as u16 instead of unsigned long. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/0b76f6dbe477b7b1a81938fddcc3c483d48f0ff2.1502314765.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar10-19/+25
Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar4-1/+7
Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar4-1/+7
Signed-off-by: Ingo Molnar <[email protected]>
2017-08-10perf/x86/amd/uncore: Rename cpufeatures macro for cache countersJanakarajan Natarajan1-1/+1
In Family 17h, L3 is the last level cache as opposed to L2 in previous families. Avoid this name confusion and rename X86_FEATURE_PERFCTR_L2 to X86_FEATURE_PERFCTR_LLC to indicate the performance counter on the last level of cache. Signed-off-by: Janakarajan Natarajan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suravee Suthikulpanit <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/016311029fdecdc3fdc13b7ed865c6cbf48b2f15.1497452002.git.Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-08-08KVM: X86: implement the logic for spinlock optimizationLongpeng(Mike)1-0/+3
get_cpl requires vcpu_load, so we must cache the result (whether the vcpu was preempted when its cpl=0) in kvm_vcpu_arch. Signed-off-by: Longpeng(Mike) <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-08-07KVM: nVMX: Emulate EPTP switching for the L1 hypervisorBandan Das1-0/+6
When L2 uses vmfunc, L0 utilizes the associated vmexit to emulate a switching of the ept pointer by reloading the guest MMU. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Bandan Das <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-07KVM: vmx: Enable VMFUNCsBandan Das1-0/+3
Enable VMFUNC in the secondary execution controls. This simplifies the changes necessary to expose it to nested hypervisors. VMFUNCs still cause #UD when invoked. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Bandan Das <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-08-03treewide: Consolidate Apple DMI checksLukas Wunner1-0/+1
We're about to amend ACPI bus scan with DMI checks whether we're running on a Mac to support Apple device properties in AML. The DMI checks are performed for every single device, adding overhead for everything x86 that isn't Apple, which is the majority. Rafael and Andy therefore request to perform the DMI match only once and cache the result. Outside of ACPI various other Apple DMI checks exist and it seems reasonable to use the cached value there as well. Rafael, Andy and Darren suggest performing the DMI check in arch code and making it available with a header in include/linux/platform_data/x86/. To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c to perform the DMI check and invoke it from setup_arch(). Switch over all existing Apple DMI checks, thereby fixing two deficiencies: * They are now #defined to false on non-x86 arches and can thus be optimized away if they're located in cross-arch code. * Some of them only match "Apple Inc." but not "Apple Computer, Inc.", which is used by BIOSes released between January 2006 (when the first x86 Macs started shipping) and January 2007 (when the company name changed upon introduction of the iPhone). Suggested-by: Andy Shevchenko <[email protected]> Suggested-by: Rafael J. Wysocki <[email protected]> Suggested-by: Darren Hart <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Acked-by: Mika Westerberg <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-08-01x86/intel_rdt/cqm: Add sched_in supportVikas Shivappa1-21/+29
OS associates an RMID/CLOSid to a task by writing the per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. The sched_in code will stay as no-op unless we are running on Intel SKU which supports either resource control or monitoring and we also enable them by mounting the resctrl fs. The per cpu CLOSid/RMID values are cached and the write is performed only when a task with a different CLOSid/RMID is scheduled in. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-25-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Introduce rdt_enable_key for schedulingVikas Shivappa1-2/+10
Introduce the usage of rdt_enable_key in sched_in code as a preparation to add RDT monitoring support for sched_in. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-24-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Prepare to add RDT monitor cpus file supportVikas Shivappa1-2/+2
Separate the ctrl cpus file handling from the generic cpus file handling and convert the per cpu closid from u32 to a struct which will be used later to add rmid to the same struct. Also cleanup some name space. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-17-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Change closid type from int to u32Vikas Shivappa1-1/+1
OS associates a CLOSid(Class of service id) to a task by writing the high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and it is zero indexed. Hence change the type to u32 from int. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com
2017-08-01x86/intel_rdt: Cleanup namespace to support RDT monitoringVikas Shivappa1-2/+2
Few of the data-structures have generic names although they are RDT allocation specific. Rename them to be allocation specific to accommodate RDT monitoring. E.g. s/enabled/alloc_enabled/ No functional change. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-08-01x86/intel_rdt: Change file names to accommodate RDT monitor codeVikas Shivappa3-311/+72
Because the "perf cqm" and resctrl code were separately added and indivdually configurable, there seem to be separate context switch code and also things on global .h which are not really needed. Move only the scheduling specific code and definitions to <asm/intel_rdt_sched.h> and the put all the other declarations to a local intel_rdt.h. h/t to Reinette Chatre for pointing out that we should separate the public interfaces used by other parts of the kernel from private objects shared between the various files comprising RDT. No functional change. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-08-01x86/intel_rdt: Introduce a common compile option for RDTVikas Shivappa1-2/+2
We currently have a CONFIG_RDT_A which is for RDT(Resource directory technology) allocation based resctrl filesystem interface. As a preparation to add support for RDT monitoring as well into the same resctrl filesystem, change the config option to be CONFIG_RDT which would include both RDT allocation and monitoring code. No functional change. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-08-01x86/perf/cqm: Wipe out perf based cqmVikas Shivappa1-2/+0
'perf cqm' never worked due to the incompatibility between perf infrastructure and cqm hardware support. The hardware uses RMIDs to track the llc occupancy of tasks and these RMIDs are per package. This makes monitoring a hierarchy like cgroup along with monitoring of tasks separately difficult and several patches sent to lkml to fix them were NACKed. Further more, the following issues in the current perf cqm make it almost unusable: 1. No support to monitor the same group of tasks for which we do allocation using resctrl. 2. It gives random and inaccurate data (mostly 0s) once we run out of RMIDs due to issues in Recycling. 3. Recycling results in inaccuracy of data because we cannot guarantee that the RMID was stolen from a task when it was not pulling data into cache or even when it pulled the least data. Also for monitoring llc_occupancy, if we stop using an RMID_x and then start using an RMID_y after we reclaim an RMID from an other event, we miss accounting all the occupancy that was tagged to RMID_x at a later perf_count. 2. Recycling code makes the monitoring code complex including scheduling because the event can lose RMID any time. Since MBM counters count bandwidth for a period of time by taking snap shot of total bytes at two different times, recycling complicates the way we count MBM in a hierarchy. Also we need a spin lock while we do the processing to account for MBM counter overflow. We also currently use a spin lock in scheduling to prevent the RMID from being taken away. 4. Lack of support when we run different kind of event like task, system-wide and cgroup events together. Data mostly prints 0s. This is also because we can have only one RMID tied to a cpu as defined by the cqm hardware but a perf can at the same time tie multiple events during one sched_in. 5. No support of monitoring a group of tasks. There is partial support for cgroup but it does not work once there is a hierarchy of cgroups or if we want to monitor a task in a cgroup and the cgroup itself. 6. No support for monitoring tasks for the lifetime without perf overhead. 7. It reported the aggregate cache occupancy or memory bandwidth over all sockets. But most cloud and VMM based use cases want to know the individual per-socket usage. Signed-off-by: Vikas Shivappa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2017-07-30acpi, x86/mm: Remove encryption mask from ACPI page protection typeTom Lendacky2-5/+7
The arch_apei_get_mem_attributes() function is used to set the page protection type for ACPI physical addresses. When SME is active, the associated protection type cannot have the encryption mask set since the ACPI tables live in un-encrypted memory - the kernel will see corrupted data. To fix this, create a new protection type, PAGE_KERNEL_NOENC, that is a 'no encryption' version of PAGE_KERNEL, and return that from arch_apei_get_mem_attributes(). Signed-off-by: Tom Lendacky <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Dave Young <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/e1cb9395b2f061cd96f1e59c3cbbe5ff5d4ec26e.1501186516.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-07-30x86/mm, kexec: Fix memory corruption with SME on successive kexecsTom Lendacky1-1/+2
After issuing successive kexecs it was found that the SHA hash failed verification when booting the kexec'd kernel. When SME is enabled, the change from using pages that were marked encrypted to now being marked as not encrypted (through new identify mapped page tables) results in memory corruption if there are any cache entries for the previously encrypted pages. This is because separate cache entries can exist for the same physical location but tagged both with and without the encryption bit. To prevent this, issue a wbinvd if SME is active before copying the pages from the source location to the destination location to clear any possible cache entry conflicts. Signed-off-by: Tom Lendacky <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Dave Young <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/e7fb8610af3a93e8f8ae6f214cd9249adc0df2b4.1501186516.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <[email protected]>
2017-07-30x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment readsAndy Lutomirski1-5/+5
Now that pt_regs properly defines segment fields as 16-bit on 32-bit CPUs, there's no need to mask off the high word. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-07-30x86/asm/32: Make pt_regs's segment registers be 16 bitsAndy Lutomirski1-6/+26
Many 32-bit x86 CPUs do 16-bit writes when storing segment registers to memory. This can cause the high word of regs->[cdefgs]s to occasionally contain garbage. Rather than making the entry code more complicated to fix up the garbage, just change pt_regs to reflect reality. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-07-27x86/topology: Remove the unused parent_node() macroDou Liyang1-6/+0
Commit: a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") ... removed the last user of parent_node(), so remove the macro. Reported-by: Michael Ellerman <[email protected]> Signed-off-by: Dou Liyang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-26x86: irq: Define a global vector for nested posted interruptsWincy Van4-1/+7
We are using the same vector for nested/non-nested posted interrupts delivery, this may cause interrupts latency in L1 since we can't kick the L2 vcpu out of vmx-nonroot mode. This patch introduces a new vector which is only for nested posted interrupts to solve the problems above. Signed-off-by: Wincy Van <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-07-26x86/kconfig: Consolidate unwinders into multiple choice selectionJosh Poimboeuf1-2/+2
There are three mutually exclusive unwinders. Make that more obvious by combining them into a multiple-choice selection: CONFIG_FRAME_POINTER_UNWINDER CONFIG_ORC_UNWINDER CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y) Frame pointers are still the default (for now). The old CONFIG_FRAME_POINTER option is still used in some arch-independent places, so keep it around, but make it invisible to the user on x86 - it's now selected by CONFIG_FRAME_POINTER_UNWINDER=y. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@treble Signed-off-by: Ingo Molnar <[email protected]>
2017-07-26x86/unwind: Add the ORC unwinderJosh Poimboeuf4-30/+103
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y. It plugs into the existing x86 unwinder framework. It relies on objtool to generate the needed .orc_unwind and .orc_unwind_ip sections. For more details on why ORC is used instead of DWARF, see Documentation/x86/orc-unwinder.txt - but the short version is that it's a simplified, fundamentally more robust debugninfo data structure, which also allows up to two orders of magnitude faster lookups than the DWARF unwinder - which matters to profiling workloads like perf. Thanks to Andy Lutomirski for the performance improvement ideas: splitting the ORC unwind table into two parallel arrays and creating a fast lookup table to search a subset of the unwind table. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com [ Extended the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-25Merge branch 'WIP.locking/atomics' into locking/coreIngo Molnar4-74/+151
Merge two uncontroversial cleanups from this branch while the rest is being reworked. Signed-off-by: Ingo Molnar <[email protected]>
2017-07-25x86/asm: Add suffix macro for GEN_*_RMWcc()Kees Cook1-13/+24
The coming x86 refcount protection needs to be able to add trailing instructions to the GEN_*_RMWcc() operations. This extracts the difference between the goto/non-goto cases so the helper macros can be defined outside the #ifdef cases. Additionally adds argument naming to the resulting asm for referencing from suffixed instructions, and adds clobbers for "cc", and "cx" to let suffixes use _ASM_CX, and retain any set flags. Signed-off-by: Kees Cook <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David S. Miller <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Greg KH <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jann Horn <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Serge E. Hallyn <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: linux-arch <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-25x86/mm: Implement PCID based optimization: try to preserve old TLB entries ↵Andy Lutomirski3-2/+21
using PCID PCID is a "process context ID" -- it's what other architectures call an address space ID. Every non-global TLB entry is tagged with a PCID, only TLB entries that match the currently selected PCID are used, and we can switch PGDs without flushing the TLB. x86's PCID is 12 bits. This is an unorthodox approach to using PCID. x86's PCID is far too short to uniquely identify a process, and we can't even really uniquely identify a running process because there are monster systems with over 4096 CPUs. To make matters worse, past attempts to use all 12 PCID bits have resulted in slowdowns instead of speedups. This patch uses PCID differently. We use a PCID to identify a recently-used mm on a per-cpu basis. An mm has no fixed PCID binding at all; instead, we give it a fresh PCID each time it's loaded except in cases where we want to preserve the TLB, in which case we reuse a recent value. Here are some benchmark results, done on a Skylake laptop at 2.3 GHz (turbo off, intel_pstate requesting max performance) under KVM with the guest using idle=poll (to avoid artifacts when bouncing between CPUs). I haven't done any real statistics here -- I just ran them in a loop and picked the fastest results that didn't look like outliers. Unpatched means commit a4eb8b993554, so all the bookkeeping overhead is gone. ping-pong between two mms on the same CPU using eventfd: patched: 1.22µs patched, nopcid: 1.33µs unpatched: 1.34µs Same ping-pong, but now touch 512 pages (all zero-page to minimize cache misses) each iteration. dTLB misses are measured by dtlb_load_misses.miss_causes_a_walk: patched: 1.8µs 11M dTLB misses patched, nopcid: 6.2µs, 207M dTLB misses unpatched: 6.1µs, 190M dTLB misses Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Nadav Amit <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/9ee75f17a81770feed616358e6860d98a2a5b1e7.1500957502.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2017-07-24x86/io: Make readq() / writeq() API consistentAndy Shevchenko1-4/+6
Despite the following commit: 93093d099e5d ("x86: provide readq()/writeq() on 32-bit too, complete") which says: ...Also, map all the APIs to the strongest ordering variant. It's way too easy to mess such details up in drivers and the difference between "memory" and "" constrained asm() constructs is in the noise range. ... we have for now only one user of this API (i.e. writeq_relaxed() in drivers/hwtracing/intel_th/sth.c) on x86 and it does care about "relaxed" part of it. Moreover 32-bit support has been removed from that header, though appeared later in specific headers that emphasizes its non-atomic context. The rest should keep in mind a consistent picture of the __raw_IO() vs. IO() vs. IO_relaxed() API. Signed-off-by: Andy Shevchenko <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-24x86/io: Remove xlate_dev_kmem_ptr() duplicationAndy Shevchenko1-5/+0
Generic header defines xlate_dev_kmem_ptr(). Reuse it from generic header and remove in x86 code. Move a description to the generic header as well. Signed-off-by: Andy Shevchenko <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-24x86/io: Remove mem*io() duplicationsAndy Shevchenko1-45/+0
Generic header defines memset_io, memcpy_fromio(). and memcpy_toio(). Reuse them from generic header and remove in x86 code. Move the descriptions to the generic header as well. Signed-off-by: Andy Shevchenko <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-24x86/io: Include asm-generic/io.h to architectural codeAndy Shevchenko1-0/+3
asm-generic/io.h defines few helpers which would be useful in the drivers, such as writesb() and readsb(). Include it to the asm/io.h in architectural folder. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Wolfram Sang <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-24x86/io: Define IO accessors by preprocessorAndy Shevchenko1-0/+41
As a preparatory to use generic IO accessor helpers we need to define architecture dependent functions via preprocessor to let world know we have them. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Wolfram Sang <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-23Merge 4.13-rc2 into char-misc-nextGreg Kroah-Hartman6-18/+18
We want the char/misc driver fixes in here as well to handle future changes. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2017-07-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Half of the fixes are for various build time warnings triggered by randconfig builds. Most (but not all...) were harmless. There's also: - ACPI boundary condition fixes - UV platform fixes - defconfig updates - an AMD K6 CPU init fix - a %pOF printk format related preparatory change - .. and a warning fix related to the tlb/PCID changes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/devicetree: Convert to using %pOF instead of ->full_name x86/platform/uv/BAU: Disable BAU on single hub configurations x86/platform/intel-mid: Fix a format string overflow warning x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG x86/build: Silence the build with "make -s" x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning x86/fpu/math-emu: Fix possible uninitialized variable use perf/x86: Shut up false-positive -Wmaybe-uninitialized warning x86/defconfig: Remove stale, old Kconfig options x86/ioapic: Pass the correct data to unmask_ioapic_irq() x86/acpi: Prevent out of bound access caused by broken ACPI tables x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT x86/platform/uv/BAU: Fix congested_response_us not taking effect x86/cpu: Use indirect call to measure performance in init_amd_k6()
2017-07-21Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debug: Fix WARN_ON_ONCE() for modules MAINTAINERS: Update the PTRACE entry
2017-07-21x86/mm: Allow userspace have mappings above 47-bitKirill A. Shutemov1-2/+2
All bits and pieces are now in place and we can allow userspace to have VMAs above 47 bits. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-21x86/mm: Prepare to expose larger address space to userspaceKirill A. Shutemov2-3/+6
On x86, 5-level paging enables 56-bit userspace virtual address space. Not all user space is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. It collides with valid pointers with 5-level paging and leads to crashes. To mitigate this, we are not going to allocate virtual address space above 47-bit by default. But userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 47-bits. If hint address set above 47-bit, but MAP_FIXED is not specified, we try to look for unmapped area by specified address. If it's already occupied, we look for unmapped area in *full* address space, rather than from 47-bit window. A high hint address would only affect the allocation in question, but not any future mmap()s. Specifying high hint address on older kernel or on machine without 5-level paging support is safe. The hint will be ignored and kernel will fall back to allocation from 47-bit address space. This approach helps to easily make application's memory allocator aware about large address space without manually tracking allocated virtual address space. The patch puts all machinery in place, but not yet allows userspace to have mappings above 47-bit -- TASK_SIZE_MAX has to be raised to get the effect. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-21x86/mpx: Do not allow MPX if we have mappings above 47-bitKirill A. Shutemov2-0/+12
MPX (without MAWA extension) cannot handle addresses above 47 bits, so we need to make sure that MPX cannot be enabled if we already have a VMA above the boundary and forbid creating such VMAs once MPX is enabled. The patch implements mpx_unmapped_area_check() which is called from all variants of get_unmapped_area() to check if the requested address fits mpx. On enabling MPX, we check if we already have any vma above 47-bit boundary and forbit the enabling if we do. As long as DEFAULT_MAP_WINDOW is equal to TASK_SIZE_MAX, the change is nop. It will change when we allow userspace to have mappings above 47-bits. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Readability edits. ] Signed-off-by: Ingo Molnar <[email protected]>