aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2017-06-29KVM: lapic: reorganize start_hv_timerPaolo Bonzini1-13/+27
There are many cases in which the hv timer must be canceled. Split out a new function to avoid duplication. Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-28arch: remove unused macro/function thread_saved_pc()Tobias Klauser2-13/+0
The only user of thread_saved_pc() in non-arch-specific code was removed in commit 8243d5597793 ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-06-28PCI: Work around poweroff & suspend-to-RAM issue on Macbook Pro 11Bjorn Helgaas1-0/+32
Neither soft poweroff (transition to ACPI power state S5) nor suspend-to-RAM (transition to state S3) works on the Macbook Pro 11,4 and 11,5. The problem is related to the [mem 0x7fa00000-0x7fbfffff] space. When we use that space, e.g., by assigning it to the 00:1c.0 Root Port, the ACPI Power Management 1 Control Register (PM1_CNT) at [io 0x1804] doesn't work anymore. Linux does a soft poweroff (transition to S5) by writing to PM1_CNT. The theory about why this doesn't work is: - The write to PM1_CNT causes an SMI - The BIOS SMI handler depends on something in [mem 0x7fa00000-0x7fbfffff] - When Linux assigns [mem 0x7fa00000-0x7fbfffff] to the 00:1c.0 Port, it covers up whatever the SMI handler uses, so the SMI handler no longer works correctly Reserve the [mem 0x7fa00000-0x7fbfffff] space so we don't assign it to anything. This is voodoo programming, since we don't know what the real conflict is, but we've failed to find the root cause. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=103211 Tested-by: [email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Cc: [email protected] Cc: Rafael J. Wysocki <[email protected]> Cc: Lukas Wunner <[email protected]> Cc: Chen Yu <[email protected]>
2017-06-28kvm: nVMX: Check memory operand to INVVPIDJim Mattson1-4/+18
The memory operand fetched for INVVPID is 128 bits. Bits 63:16 are reserved and must be zero. Otherwise, the instruction fails with VMfail(Invalid operand to INVEPT/INVVPID). If the INVVPID_TYPE is 0 (individual address invalidation), then bits 127:64 must be in canonical form, or the instruction fails with VMfail(Invalid operand to INVEPT/INVVPID). Signed-off-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-28x86/PCI: Select CONFIG_PCI_LOCKLESS_CONFIGThomas Gleixner2-2/+3
All x86 PCI configuration space accessors have either their own serialization or can operate completely lockless (ECAM). Disable the global lock in the generic PCI configuration space accessors. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-28x86/PCI/ce4100: Properly lock accessor functionsThomas Gleixner1-39/+48
x86 wants to get rid of the global pci_lock protecting the config space accessors so ECAM mode can operate completely lockless, but the CE4100 PCI code relies on that to protect the simulation registers. Restructure the code so it uses the x86 specific pci_config_lock to serialize the inner workings of the CE4100 PCI magic. That allows to remove the global locking via pci_lock later. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-28x86/PCI: Abort if legacy init failsThomas Gleixner1-8/+10
If the legacy PCI init fails, then there are no PCI config space accesors available, but the code continues and tries to scan the busses, which fails due to the lack of config space accessors. Return right away, if the last init fallback fails. Switch the few printks to pr_info while at it. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-28x86/PCI: Remove duplicate definesThomas Gleixner1-7/+1
For some historic reason these defines are duplicated and also available in arch/x86/include/asm/pci_x86.h, Remove them. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-06-28locking/atomic/x86: Use 's64 *' for 'old' argument of atomic64_try_cmpxchg()Dmitry Vyukov2-7/+7
atomic64_try_cmpxchg() declares old argument as 'long *', this makes it impossible to use it in portable code. If caller passes 'long *', it becomes 32-bits on 32-bit arches. If caller passes 's64 *', it does not compile on x86_64. Change type of old argument to 's64 *' instead. Signed-off-by: Dmitry Vyukov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/fa6f77f2375150d26ea796a77e8b59195fd2ab13.1497690003.git.dvyukov@google.com Signed-off-by: Ingo Molnar <[email protected]>
2017-06-28locking/atomic/x86: Un-macro-ify atomic ops implementationDmitry Vyukov3-70/+147
CPP turns perfectly readable code into a much harder to read syntactic soup. Ingo suggested to write them out as-is in C and ignore the higher linecount. Do this. (As a side effect, plain C functions will be easier to KASAN-instrument as well.) Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Dmitry Vyukov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/a35b983dd3be937a3cf63c4e2db487de2cdc7b8f.1497690003.git.dvyukov@google.com [ Beautified the C code some more and twiddled the changelog to mention the linecount increase and the KASAN benefit. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-28x86: remove arch specific dma_supported implementationChristoph Hellwig7-11/+8
And instead wire it up as method for all the dma_map_ops instances. Note that this also means the arch specific check will be fully instead of partially applied in the AMD iommu driver. Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-28x86: remove DMA_ERROR_CODEChristoph Hellwig1-2/+0
All dma_map_ops instances now handle their errors through ->mapping_error. Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-28x86/calgary: implement ->mapping_errorChristoph Hellwig1-8/+16
DMA_ERROR_CODE is going to go away, so don't rely on it. Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-28x86/pci-nommu: implement ->mapping_errorChristoph Hellwig1-1/+9
DMA_ERROR_CODE is going to go away, so don't rely on it. Signed-off-by: Christoph Hellwig <[email protected]>
2017-06-27x86, libnvdimm, pmem: remove global pmem apiDan Williams1-47/+0
Now that all callers of the pmem api have been converted to dax helpers that call back to the pmem driver, we can remove include/linux/pmem.h and asm/pmem.h. Cc: <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Oliver O'Halloran <[email protected]> Cc: Ross Zwisler <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2017-06-27x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimmDan Williams2-5/+6
Kill this globally defined wrapper and move to libnvdimm so that we can ultimately remove include/linux/pmem.h and asm/pmem.h. Cc: <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2017-06-27x86/insn: perf tools: Add new ptwrite instructionAdrian Hunter1-1/+1
Add ptwrite to the op code map and the perf tools new instructions test. To run the test: $ tools/perf/perf test "x86 ins" 39: Test x86 instruction decoder - new instructions : Ok Or to see the details: $ tools/perf/perf test -v "x86 ins" 2>&1 | grep ptwrite For information about ptwrite, refer the Intel SDM. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-27KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exitLadi Prosek1-0/+6
enable_nmi_window is supposed to be a no-op if we know that we'll see a VM exit by the time the NMI window opens. This commit adds two more cases: * We intercept stgi so we don't need to singlestep on GIF=0. * We emulate nested vmexit so we don't need to singlestep when nested VM exit is required. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-27KVM: SVM: don't NMI singlestep over event injectionLadi Prosek1-0/+16
Singlestepping is enabled by setting the TF flag and care must be taken to not let the guest see (and reuse at an inconvenient time) the modified rflag value. One such case is event injection, as part of which flags are pushed on the stack and restored later on iret. This commit disables singlestepping when we're about to inject an event and forces an immediate exit for us to re-evaluate the NMI related state. Suggested-by: Paolo Bonzini <[email protected]> Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-27KVM: SVM: hide TF/RF flags used by NMI singlestepLadi Prosek1-1/+14
These flags are used internally by SVM so it's cleaner to not leak them to callers of svm_get_rflags. This is similar to how the TF flag is handled on KVM_GUESTDBG_SINGLESTEP by kvm_get_rflags and kvm_set_rflags. Without this change, the flags may propagate from host VMCB to nested VMCB or vice versa while singlestepping over a nested VM enter/exit, and then get stuck in inappropriate places. Example: NMI singlestepping is enabled while running L1 guest. The instruction to step over is VMRUN and nested vmrun emulation stashes rflags to hsave->save.rflags. Then if singlestepping is disabled while still in L2, TF/RF will be cleared from the nested VMCB but the next nested VM exit will restore them from hsave->save.rflags and cause an unexpected DB exception. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-27KVM: nSVM: do not forward NMI window singlestep VM exits to L1Ladi Prosek1-5/+40
Nested hypervisor should not see singlestep VM exits if singlestepping was enabled internally by KVM. Windows is particularly sensitive to this and known to bluescreen on unexpected VM exits. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-27KVM: SVM: introduce disable_nmi_singlestep helperLadi Prosek1-4/+9
Just moving the code to a new helper in preparation for following commits. Signed-off-by: Ladi Prosek <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-27x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systemsYazen Ghannam1-1/+2
AMD systems support the Monitor/Mwait instructions and these can be used for ACPI C1 in the same way as on Intel systems. Three things are needed: 1) This patch. 2) BIOS that declares a C1 state in _CST to use FFH, with correct values. 3) CPUID_Fn00000005_EDX is non-zero on the system. The BIOS on AMD systems have historically not defined a C1 state in _CST, so the acpi_idle driver uses HALT for ACPI C1. Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a BIOS is released for these systems that requests a C1 state with FFH, the FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The acpi_idle driver will then fallback to using HALT for ACPI C1. Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS support for using FFH for ACPI C1. Allow ffh_cstate_init() to succeed on AMD systems. Tested on Fam15h and Fam17h systems. Signed-off-by: Yazen Ghannam <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-27x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERFLen Brown2-0/+80
The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-27Merge back PM tools material for v4.13.Rafael J. Wysocki1-6/+12
2017-06-26x86/mce: Always save severity in machine_check_poll()Yazen Ghannam1-6/+1
The MCE severity gives a hint as to how to handle the error. The notifier blocks can then use the severity to decide on an action. It's not necessary for machine_check_poll() to filter errors for the notifier chain, since each block will check its own set of conditions before handling an error. Also, there isn't any urgency for machine_check_poll() to make decisions based on severity like in do_machine_check(). If we can assume that a severity is set then we can use it in more notifier blocks. For example, the CEC block could check for a "KEEP" severity rather than checking bits in the status. This isn't possible now since the severity is not set except for "DEFFRRED/UCNA" errors with a valid address. Save the severity since we have it, and let the notifier blocks decide if they want to do anything. Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-26x86/microcode: Make a couple of symbols staticColin Ian King2-2/+2
The helper function __load_ucode_amd() and pointer intel_ucode_patch do not need to be in global scope, so make them static. Fixes those sparse warnings: "symbol '__load_ucode_amd' was not declared. Should it be static?" "symbol 'intel_ucode_patch' was not declared. Should it be static?" Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-26x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUDJérôme Glisse1-1/+7
Since commit: af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()") we no longer free PUDs so that we do not have to synchronize all PGDs on hot-remove/vfree(). But the new 5-level page table patchset reverted that for 4-level page tables, in the following commit: f2a6a7050109: ("x86: Convert the rest of the code to support p4d_t") This patch restores the damage and disables free_pud() if we are in the 4-level page table case, thus avoiding BUG_ON() after hot-remove. Signed-off-by: Jérôme Glisse <[email protected]> [ Clarified the changelog and the code comments. ] Reviewed-by: Kirill A. Shutemov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-25Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix to unbreak the vdso32 build for 64bit kernels caused by excess #includes in the mshyperv header" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mshyperv: Remove excess #includes from mshyperv.h
2017-06-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixlets for perf: - Return the proper error code if aux buffers for a event are not supported. - Calculate the probe offset for inlined functions correctly - Update the Skylake DTLB load/store miss event so it can count 1G TLB entries as well" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Fix probe definition for inlined functions perf/x86/intel: Add 1G DTLB load/store miss support for SKL perf/aux: Correct return code of rb_alloc_aux() if !has_aux(ev)
2017-06-25xen: allocate page for shared info page from low memoryJuergen Gross2-9/+24
In a HVM guest the kernel allocates the page for mapping the shared info structure via extend_brk() today. This will lead to a drop of performance as the underlying EPT entry will have to be split up into 4kB entries as the single shared info page is located in hypervisor memory. The issue has been detected by using the libmicro munmap test: unmapping 8kB of memory was faster by nearly a factor of two when no pv interfaces were active in the HVM guest. So instead of taking a page from memory which might be mapped via large EPT entries use a page which is already mapped via a 4kB EPT entry: we can take a page from the first 1MB of memory as the video memory at 640kB disallows using larger EPT entries. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-06-25x86/build: Specify stack alignment for clangMatthias Kaehlcke1-5/+21
For gcc stack alignment is configured with -mpreferred-stack-boundary=N, clang has the option -mstack-alignment=N for that purpose. Use the same alignment as with gcc. If the alignment is not specified clang assumes an alignment of 16 bytes, as required by the standard ABI. However as mentioned in d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported") the standard kernel entry on x86-64 leaves the stack on an 8-byte boundary, as a consequence clang will keep the stack misaligned. Signed-off-by: Matthias Kaehlcke <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-25x86/build: Use __cc-option for boot code compiler optionsMatthias Kaehlcke1-4/+5
cc-option is used to enable compiler options for the boot code if they are available. The macro uses KBUILD_CFLAGS and KBUILD_CPPFLAGS for the check, however these flags aren't used to build the boot code, in consequence cc-option can yield wrong results. For example -mpreferred-stack-boundary=2 is never set with a 64-bit compiler, since the setting is only valid for 16 and 32-bit binaries. This is also the case for 32-bit kernel builds, because the option -m32 is added to KBUILD_CFLAGS after the assignment of REALMODE_CFLAGS. Use __cc-option instead of cc-option for the boot mode options. The macro receives the compiler options as parameter instead of using KBUILD_C*FLAGS, for the boot code we pass REALMODE_CFLAGS. Also use separate statements for the __cc-option checks instead of performing them in the initial assignment of REALMODE_CFLAGS since the variable is an input of the macro. Signed-off-by: Matthias Kaehlcke <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2017-06-25kbuild: remove cc-option-alignMasahiro Yamada1-4/+3
Documentation/kbuild/makefiles.txt says the change for align options occurred at GCC 3.0, and Documentation/process/changes.rst says the minimal supported GCC version is 3.2, so it should be safe to hard-code -falign* options. Fix the only user arch/x86/Makefile_32.cpu and remove cc-option-align. Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2017-06-24Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar3-30/+34
Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24x86/paravirt: Remove unnecessary return from void functionAnton Vasilyev1-1/+1
The patch removes unnecessary return from void function. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Chris Wright <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24x86/boot: Add missing strchr() declarationTommy Nguyen1-0/+1
The Sparse static analyzer emits this warning: symbol 'strchr' was not declared. Should it be static? This patch adds the appropriate extern declaration to string.h to fix the warning. Signed-off-by: Tommy Nguyen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/20170623143601.GA20743@NoChina Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24x86/mshyperv: Remove excess #includes from mshyperv.hThomas Gleixner1-2/+1
A recent commit included linux/slab.h in linux/irq.h. This breaks the build of vdso32 on a 64-bit kernel. The reason is that linux/irq.h gets included into the vdso code via linux/interrupt.h which is included from asm/mshyperv.h. That makes the 32-bit vdso compile fail, because slab.h includes the pgtable headers for 64-bit on a 64-bit build. Neither linux/clocksource.h nor linux/interrupt.h are needed in the mshyperv.h header file itself - it has a dependency on <linux/atomic.h>. Remove the includes and unbreak the build. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: [email protected] Fixes: dee863b571b0 ("hv: export current Hyper-V clocksource") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1706231038460.2647@nanos Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmapMichal Hocko1-3/+0
Since the following commit in 2008: cc503c1b43e0 ("x86: PIE executable randomization") We added a heuristics to treat applications with RLIMIT_STACK configured to unlimited as legacy. This means: a) set the mmap_base to 1/3 of address space + randomization and b) mmap from bottom to top. This makes some sense as it allows the stack to grow really large. On the other hand it reduces the address space usable for default mmaps (without address hint) quite a lot. We have received a bug report that SAP HANA workload has hit into this limitation. We could argue that the user just got what he asked for when setting up the unlimited stack but to be realistic growing stack up to 1/6 TASK_SIZE (allowed by mmap_base) is pretty much unimited in the real life. This would give mmap 20TB of additional address space which is quite nice. Especially when it is much more likely to use that address space than the reserved stack. Digging into the history the original implementation of the randomization: 8817210d4d96 ("[PATCH] x86_64: Flexmap for 32bit and randomized mappings for 64bit") didn't have this restriction. So let's try and remove this assumption - hopefully nothing breaks. Signed-off-by: Michal Hocko <[email protected]> Acked-by: Jiri Kosina <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Dave Jones <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ So I've applied this to tip:x86/mm with a wider Cc: list - if anyone objects to this change please holler. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-24x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"Len Brown1-8/+2
cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz that is otherwise reported in x86 /proc/cpuinfo "cpu MHz". There are four problems with this scheme, any of them is sufficient justification to delete it. 1. Depending on which cpufreq driver is loaded, the behavior of this field is different. 2. Distros complain that they have to explain to users why and how this field changes. Distros have requested a constant. 3. The two major providers of this information, acpi_cpufreq and intel_pstate, both "get it wrong" in different ways. acpi_cpufreq lies to the user by telling them that they are running at whatever frequency was last requested by software. intel_pstate lies to the user by telling them that they are running at the average frequency computed over an undefined measurement. But an average computed over an undefined interval, is itself, undefined... 4. On modern processors, user space utilities, such as turbostat(1), are more accurate and more precise, while supporing concurrent measurement over arbitrary intervals. Users who have been consulting /proc/cpuinfo to track changing CPU frequency will be dissapointed that it no longer wiggles -- perhaps being unaware of the limitations of the information they have been consuming. Yes, they can change their scripts to look in sysfs cpufreq/scaling_cur_frequency. Here they will find the same data of dubious quality here removed from /proc/cpuinfo. The value in sysfs will be addressed in a subsequent patch to address issues 1-3, above. Issue 4 will remain -- users that really care about accurate frequency information should not be using either proc or sysfs kernel interfaces. They should be using using turbostat(8), or a similar purpose-built analysis tool. Signed-off-by: Len Brown <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-06-23x86/xen/efi: Initialize only the EFI struct members used by XenDaniel Kiper1-33/+12
The current approach, which is the wholesale efi struct initialization from a 'efi_xen' local template is not robust. Usually if new member is defined then it is properly initialized in drivers/firmware/efi/efi.c, but not in arch/x86/xen/efi.c. The effect is that the Xen initialization clears any fields the generic code might have set and the Xen code does not know about yet. I saw this happen a few times, so let's initialize only the EFI struct members used by Xen and maintain no local duplicate, to avoid such issues in the future. Signed-off-by: Daniel Kiper <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Clarified the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2017-06-22x86/apic: Mark single target interruptsThomas Gleixner1-0/+7
If the interrupt destination mode of the APIC is physical then the effective affinity is restricted to a single CPU. Mark the interrupt accordingly in the domain allocation code, so the core code can avoid pointless affinity setting attempts. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/apic: Implement effective irq mask updateThomas Gleixner3-0/+8
Add the effective irq mask update to the apic implementations and enable effective irq masks for x86. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/apic: Add irq_data argument to apic->cpu_mask_to_apicid()Thomas Gleixner5-15/+32
The decision to which CPUs an interrupt is effectively routed happens in the various apic->cpu_mask_to_apicid() implementations To support effective affinity masks this information needs to be updated in irq_data. Add a pointer to irq_data to the callbacks and feed it through the call chain. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/apic: Move cpumask and to core codeThomas Gleixner12-42/+29
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming cpumasks to search for the target. Move that operation to the call site and rename it to cpu_mask_to_apicid() Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/apic: Move online masking to core codeThomas Gleixner3-35/+22
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline cpus. The callsite already has a mask available, which has the offline CPUs removed. Use that and remove the extra bits. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/uv: Use default_cpu_mask_to_apicid_and()Thomas Gleixner1-15/+4
Same functionality except the extra bits ored on the apicid. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/apic: Move flat_cpu_mask_to_apicid_and() into C sourceThomas Gleixner2-22/+22
No point in having inlines assigned to function pointers at multiple places. Just bloats the text. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/irq: Use irq_migrate_all_off_this_cpu()Thomas Gleixner2-87/+3
The generic migration code supports all the required features already. Remove the x86 specific implementation and use the generic one. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-06-22x86/irq: Restructure fixup_irqs()Thomas Gleixner1-26/+20
Reorder fixup_irqs() so it matches the flow in the generic migration code. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected]