Age | Commit message (Collapse) | Author | Files | Lines |
|
There are many cases in which the hv timer must be canceled. Split out
a new function to avoid duplication.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d5597793 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Neither soft poweroff (transition to ACPI power state S5) nor
suspend-to-RAM (transition to state S3) works on the Macbook Pro 11,4 and
11,5.
The problem is related to the [mem 0x7fa00000-0x7fbfffff] space. When we
use that space, e.g., by assigning it to the 00:1c.0 Root Port, the ACPI
Power Management 1 Control Register (PM1_CNT) at [io 0x1804] doesn't work
anymore.
Linux does a soft poweroff (transition to S5) by writing to PM1_CNT. The
theory about why this doesn't work is:
- The write to PM1_CNT causes an SMI
- The BIOS SMI handler depends on something in
[mem 0x7fa00000-0x7fbfffff]
- When Linux assigns [mem 0x7fa00000-0x7fbfffff] to the 00:1c.0 Port, it
covers up whatever the SMI handler uses, so the SMI handler no longer
works correctly
Reserve the [mem 0x7fa00000-0x7fbfffff] space so we don't assign it to
anything.
This is voodoo programming, since we don't know what the real conflict is,
but we've failed to find the root cause.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=103211
Tested-by: [email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: [email protected]
Cc: Rafael J. Wysocki <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Chen Yu <[email protected]>
|
|
The memory operand fetched for INVVPID is 128 bits. Bits 63:16 are
reserved and must be zero. Otherwise, the instruction fails with
VMfail(Invalid operand to INVEPT/INVVPID). If the INVVPID_TYPE is 0
(individual address invalidation), then bits 127:64 must be in
canonical form, or the instruction fails with VMfail(Invalid operand
to INVEPT/INVVPID).
Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
All x86 PCI configuration space accessors have either their own
serialization or can operate completely lockless (ECAM).
Disable the global lock in the generic PCI configuration space accessors.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
x86 wants to get rid of the global pci_lock protecting the config space
accessors so ECAM mode can operate completely lockless, but the CE4100 PCI
code relies on that to protect the simulation registers.
Restructure the code so it uses the x86 specific pci_config_lock to
serialize the inner workings of the CE4100 PCI magic. That allows to remove
the global locking via pci_lock later.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
If the legacy PCI init fails, then there are no PCI config space accesors
available, but the code continues and tries to scan the busses, which fails
due to the lack of config space accessors.
Return right away, if the last init fallback fails.
Switch the few printks to pr_info while at it.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
For some historic reason these defines are duplicated and also available in
arch/x86/include/asm/pci_x86.h,
Remove them.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
atomic64_try_cmpxchg() declares old argument as 'long *',
this makes it impossible to use it in portable code.
If caller passes 'long *', it becomes 32-bits on 32-bit arches.
If caller passes 's64 *', it does not compile on x86_64.
Change type of old argument to 's64 *' instead.
Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/fa6f77f2375150d26ea796a77e8b59195fd2ab13.1497690003.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
CPP turns perfectly readable code into a much harder to read syntactic soup.
Ingo suggested to write them out as-is in C and ignore the higher linecount.
Do this.
(As a side effect, plain C functions will be easier to KASAN-instrument as well.)
Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/a35b983dd3be937a3cf63c4e2db487de2cdc7b8f.1497690003.git.dvyukov@google.com
[ Beautified the C code some more and twiddled the changelog
to mention the linecount increase and the KASAN benefit. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
And instead wire it up as method for all the dma_map_ops instances.
Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
All dma_map_ops instances now handle their errors through
->mapping_error.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
DMA_ERROR_CODE is going to go away, so don't rely on it.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
DMA_ERROR_CODE is going to go away, so don't rely on it.
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Now that all callers of the pmem api have been converted to dax helpers that
call back to the pmem driver, we can remove include/linux/pmem.h and
asm/pmem.h.
Cc: <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Oliver O'Halloran <[email protected]>
Cc: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Kill this globally defined wrapper and move to libnvdimm so that we can
ultimately remove include/linux/pmem.h and asm/pmem.h.
Cc: <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Add ptwrite to the op code map and the perf tools new instructions test.
To run the test:
$ tools/perf/perf test "x86 ins"
39: Test x86 instruction decoder - new instructions : Ok
Or to see the details:
$ tools/perf/perf test -v "x86 ins" 2>&1 | grep ptwrite
For information about ptwrite, refer the Intel SDM.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
enable_nmi_window is supposed to be a no-op if we know that we'll see
a VM exit by the time the NMI window opens. This commit adds two more
cases:
* We intercept stgi so we don't need to singlestep on GIF=0.
* We emulate nested vmexit so we don't need to singlestep when nested
VM exit is required.
Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Singlestepping is enabled by setting the TF flag and care must be
taken to not let the guest see (and reuse at an inconvenient time)
the modified rflag value. One such case is event injection, as part
of which flags are pushed on the stack and restored later on iret.
This commit disables singlestepping when we're about to inject an
event and forces an immediate exit for us to re-evaluate the NMI
related state.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
These flags are used internally by SVM so it's cleaner to not leak
them to callers of svm_get_rflags. This is similar to how the TF
flag is handled on KVM_GUESTDBG_SINGLESTEP by kvm_get_rflags and
kvm_set_rflags.
Without this change, the flags may propagate from host VMCB to nested
VMCB or vice versa while singlestepping over a nested VM enter/exit,
and then get stuck in inappropriate places.
Example: NMI singlestepping is enabled while running L1 guest. The
instruction to step over is VMRUN and nested vmrun emulation stashes
rflags to hsave->save.rflags. Then if singlestepping is disabled
while still in L2, TF/RF will be cleared from the nested VMCB but the
next nested VM exit will restore them from hsave->save.rflags and
cause an unexpected DB exception.
Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Nested hypervisor should not see singlestep VM exits if singlestepping
was enabled internally by KVM. Windows is particularly sensitive to this
and known to bluescreen on unexpected VM exits.
Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Just moving the code to a new helper in preparation for following
commits.
Signed-off-by: Ladi Prosek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
AMD systems support the Monitor/Mwait instructions and these can be used
for ACPI C1 in the same way as on Intel systems.
Three things are needed:
1) This patch.
2) BIOS that declares a C1 state in _CST to use FFH, with correct values.
3) CPUID_Fn00000005_EDX is non-zero on the system.
The BIOS on AMD systems have historically not defined a C1 state in _CST,
so the acpi_idle driver uses HALT for ACPI C1.
Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a
BIOS is released for these systems that requests a C1 state with FFH, the
FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The
acpi_idle driver will then fallback to using HALT for ACPI C1.
Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS
support for using FFH for ACPI C1.
Allow ffh_cstate_init() to succeed on AMD systems.
Tested on Fam15h and Fam17h systems.
Signed-off-by: Yazen Ghannam <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.
Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.
Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.
MHz is computed like so:
MHz = base_MHz * delta_APERF / delta_MPERF
MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.
As with previous methods of calculating MHz,
idle time is excluded.
base_MHz above is from TSC calibration global "cpu_khz".
This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.
When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.
Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.
Signed-off-by: Len Brown <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
|
|
The MCE severity gives a hint as to how to handle the error. The
notifier blocks can then use the severity to decide on an action.
It's not necessary for machine_check_poll() to filter errors for
the notifier chain, since each block will check its own set of
conditions before handling an error.
Also, there isn't any urgency for machine_check_poll() to make decisions
based on severity like in do_machine_check().
If we can assume that a severity is set then we can use it in more
notifier blocks. For example, the CEC block could check for a "KEEP"
severity rather than checking bits in the status. This isn't possible
now since the severity is not set except for "DEFFRRED/UCNA" errors with
a valid address.
Save the severity since we have it, and let the notifier blocks decide
if they want to do anything.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The helper function __load_ucode_amd() and pointer intel_ucode_patch do
not need to be in global scope, so make them static.
Fixes those sparse warnings:
"symbol '__load_ucode_amd' was not declared. Should it be static?"
"symbol 'intel_ucode_patch' was not declared. Should it be static?"
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Since commit:
af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()")
we no longer free PUDs so that we do not have to synchronize
all PGDs on hot-remove/vfree().
But the new 5-level page table patchset reverted that for 4-level
page tables, in the following commit:
f2a6a7050109: ("x86: Convert the rest of the code to support p4d_t")
This patch restores the damage and disables free_pud() if we are in the
4-level page table case, thus avoiding BUG_ON() after hot-remove.
Signed-off-by: Jérôme Glisse <[email protected]>
[ Clarified the changelog and the code comments. ]
Reviewed-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix to unbreak the vdso32 build for 64bit kernels caused by
excess #includes in the mshyperv header"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mshyperv: Remove excess #includes from mshyperv.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Three fixlets for perf:
- Return the proper error code if aux buffers for a event are not
supported.
- Calculate the probe offset for inlined functions correctly
- Update the Skylake DTLB load/store miss event so it can count 1G
TLB entries as well"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Fix probe definition for inlined functions
perf/x86/intel: Add 1G DTLB load/store miss support for SKL
perf/aux: Correct return code of rb_alloc_aux() if !has_aux(ev)
|
|
In a HVM guest the kernel allocates the page for mapping the shared
info structure via extend_brk() today. This will lead to a drop of
performance as the underlying EPT entry will have to be split up into
4kB entries as the single shared info page is located in hypervisor
memory.
The issue has been detected by using the libmicro munmap test:
unmapping 8kB of memory was faster by nearly a factor of two when no
pv interfaces were active in the HVM guest.
So instead of taking a page from memory which might be mapped via
large EPT entries use a page which is already mapped via a 4kB EPT
entry: we can take a page from the first 1MB of memory as the video
memory at 640kB disallows using larger EPT entries.
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
|
|
For gcc stack alignment is configured with -mpreferred-stack-boundary=N,
clang has the option -mstack-alignment=N for that purpose. Use the same
alignment as with gcc.
If the alignment is not specified clang assumes an alignment of
16 bytes, as required by the standard ABI. However as mentioned in
d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if
supported") the standard kernel entry on x86-64 leaves the stack
on an 8-byte boundary, as a consequence clang will keep the stack
misaligned.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
cc-option is used to enable compiler options for the boot code if they
are available. The macro uses KBUILD_CFLAGS and KBUILD_CPPFLAGS for the
check, however these flags aren't used to build the boot code, in
consequence cc-option can yield wrong results. For example
-mpreferred-stack-boundary=2 is never set with a 64-bit compiler,
since the setting is only valid for 16 and 32-bit binaries. This
is also the case for 32-bit kernel builds, because the option -m32 is
added to KBUILD_CFLAGS after the assignment of REALMODE_CFLAGS.
Use __cc-option instead of cc-option for the boot mode options.
The macro receives the compiler options as parameter instead of using
KBUILD_C*FLAGS, for the boot code we pass REALMODE_CFLAGS.
Also use separate statements for the __cc-option checks instead
of performing them in the initial assignment of REALMODE_CFLAGS since
the variable is an input of the macro.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Documentation/kbuild/makefiles.txt says the change for align options
occurred at GCC 3.0, and Documentation/process/changes.rst says the
minimal supported GCC version is 3.2, so it should be safe to hard-code
-falign* options.
Fix the only user arch/x86/Makefile_32.cpu and remove cc-option-align.
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The patch removes unnecessary return from void function.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Chris Wright <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The Sparse static analyzer emits this warning:
symbol 'strchr' was not declared. Should it be static?
This patch adds the appropriate extern declaration to string.h
to fix the warning.
Signed-off-by: Tommy Nguyen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20170623143601.GA20743@NoChina
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A recent commit included linux/slab.h in linux/irq.h. This breaks the build
of vdso32 on a 64-bit kernel.
The reason is that linux/irq.h gets included into the vdso code via
linux/interrupt.h which is included from asm/mshyperv.h. That makes the
32-bit vdso compile fail, because slab.h includes the pgtable headers for
64-bit on a 64-bit build.
Neither linux/clocksource.h nor linux/interrupt.h are needed in the
mshyperv.h header file itself - it has a dependency on <linux/atomic.h>.
Remove the includes and unbreak the build.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: [email protected]
Fixes: dee863b571b0 ("hv: export current Hyper-V clocksource")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1706231038460.2647@nanos
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since the following commit in 2008:
cc503c1b43e0 ("x86: PIE executable randomization")
We added a heuristics to treat applications with RLIMIT_STACK configured
to unlimited as legacy. This means:
a) set the mmap_base to 1/3 of address space + randomization and
b) mmap from bottom to top.
This makes some sense as it allows the stack to grow really large. On the
other hand it reduces the address space usable for default mmaps
(without address hint) quite a lot.
We have received a bug report that SAP HANA workload has hit into this
limitation.
We could argue that the user just got what he asked for when setting
up the unlimited stack but to be realistic growing stack up to 1/6
TASK_SIZE (allowed by mmap_base) is pretty much unimited in the real
life. This would give mmap 20TB of additional address space which is
quite nice. Especially when it is much more likely to use that address
space than the reserved stack.
Digging into the history the original implementation of the randomization:
8817210d4d96 ("[PATCH] x86_64: Flexmap for 32bit and randomized mappings for 64bit")
didn't have this restriction.
So let's try and remove this assumption - hopefully nothing breaks.
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ So I've applied this to tip:x86/mm with a wider Cc: list - if anyone objects to this change please holler. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz
that is otherwise reported in x86 /proc/cpuinfo "cpu MHz".
There are four problems with this scheme,
any of them is sufficient justification to delete it.
1. Depending on which cpufreq driver is loaded, the behavior
of this field is different.
2. Distros complain that they have to explain to users
why and how this field changes. Distros have requested a constant.
3. The two major providers of this information, acpi_cpufreq
and intel_pstate, both "get it wrong" in different ways.
acpi_cpufreq lies to the user by telling them that
they are running at whatever frequency was last
requested by software.
intel_pstate lies to the user by telling them that
they are running at the average frequency computed
over an undefined measurement. But an average computed
over an undefined interval, is itself, undefined...
4. On modern processors, user space utilities, such as
turbostat(1), are more accurate and more precise, while
supporing concurrent measurement over arbitrary intervals.
Users who have been consulting /proc/cpuinfo to
track changing CPU frequency will be dissapointed that
it no longer wiggles -- perhaps being unaware of the
limitations of the information they have been consuming.
Yes, they can change their scripts to look in sysfs
cpufreq/scaling_cur_frequency. Here they will find the same
data of dubious quality here removed from /proc/cpuinfo.
The value in sysfs will be addressed in a subsequent patch
to address issues 1-3, above.
Issue 4 will remain -- users that really care about
accurate frequency information should not be using either
proc or sysfs kernel interfaces.
They should be using using turbostat(8), or a similar
purpose-built analysis tool.
Signed-off-by: Len Brown <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The current approach, which is the wholesale efi struct initialization from
a 'efi_xen' local template is not robust. Usually if new member is defined
then it is properly initialized in drivers/firmware/efi/efi.c, but not in
arch/x86/xen/efi.c.
The effect is that the Xen initialization clears any fields the generic code
might have set and the Xen code does not know about yet.
I saw this happen a few times, so let's initialize only the EFI struct members
used by Xen and maintain no local duplicate, to avoid such issues in the future.
Signed-off-by: Daniel Kiper <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ Clarified the changelog. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If the interrupt destination mode of the APIC is physical then the
effective affinity is restricted to a single CPU.
Mark the interrupt accordingly in the domain allocation code, so the core
code can avoid pointless affinity setting attempts.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Add the effective irq mask update to the apic implementations and enable
effective irq masks for x86.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The decision to which CPUs an interrupt is effectively routed happens in
the various apic->cpu_mask_to_apicid() implementations
To support effective affinity masks this information needs to be updated in
irq_data. Add a pointer to irq_data to the callbacks and feed it through
the call chain.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming
cpumasks to search for the target.
Move that operation to the call site and rename it to cpu_mask_to_apicid()
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline
cpus. The callsite already has a mask available, which has the offline CPUs
removed. Use that and remove the extra bits.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Same functionality except the extra bits ored on the apicid.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
No point in having inlines assigned to function pointers at multiple
places. Just bloats the text.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
The generic migration code supports all the required features
already. Remove the x86 specific implementation and use the generic one.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|
|
Reorder fixup_irqs() so it matches the flow in the generic migration code.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
|