Age | Commit message (Collapse) | Author | Files | Lines |
|
I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for
cycles:pp. But the event is not documented for Skylake, and has some
issues.
The recommended replacement for cycles:pp is to use
INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy
Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The
event is not really new, but has been already used by perf before
Sandy Bridge for the original cycles:p
Note the SDM doesn't document that event either, but it's being
documented in the latest version of the event list on:
https://download.01.org/perfmon/SKL
This patch does:
- Remove UOPS_RETIRED.ALL from the Skylake PEBS event list
- Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to
allow cmask=16,inv=1 for cycles:pp
- We don't need an extra entry for the base INST_RETIRED event,
because it is already covered by the catch-all PEBS table entry.
- Switch Skylake to use the Core2 PEBS alias (which is
INST_RETIRED.TOTAL_CYCLES_PS)
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Normally we drop PEBS events with a zero status field. But when
there is only a single PEBS event active we can assume the
PEBS record is for that event. The PEBS buffer is always flushed
when PEBS events are disabled, so there is no risk of mishandling
state PEBS records this way.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The recent commit:
75f80859b130 ("perf/x86/intel/pebs: Robustify PEBS buffer drain")
causes lots of warnings on different CPUs before Skylake
when running PEBS intensive workloads.
They can have a zero status field in the PEBS record when
PEBS is racing with clearing of GLOBAl_STATUS.
This also can cause hangs (it seems there are still
problems with printk in NMI).
Disable the warning, but still ignore the record.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) checks whether MEMBER
is last member of TYPE by evaluating:
offsetof(TYPE::MEMBER) + sizeof(TYPE::MEMBER) == sizeof(TYPE)
and ensuring TYPE::MEMBER is the last member of the TYPE.
This condition breaks on structs that are padded to be
aligned. This patch ensures the TYPE alignment is taken
into account.
This bug was revealed after adding cacheline alignment into
struct sched_entity, which broke task_struct::thread check:
CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There is no need to worry about module and __init text disappearing
case, because that ftrace has a module notifier that is called when
a module is being unloaded and before the text goes away and this
code grabs the ftrace_lock mutex and removes the module functions
from the ftrace list, such that it will no longer do any
modifications to that module's text, the update to make functions
be traced or not is done under the ftrace_lock mutex as well.
And by now, __init section codes should not been modified
by ftrace, because it is black listed in recordmcount.c and
ignored by ftrace.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Li Bin <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
|
|
The MMCFG PCI accessors weren't being setup for NumacConnect2
correctly due to over-early assignment; this would create the
potential for the wrong PCI domain to be accessed.
Fix this by using the correct arch-specific PCI init function.
Signed-off-by: Daniel J Blueman <[email protected]>
Acked-by: Steffen Persvold <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This was meant to print base address and entry count; make it do so
again.
Fixes: 37868fe113ff "x86/ldt: Make modify_ldt synchronous"
Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The related warning from gcc 6.0:
arch/x86/kernel/ptrace.c:127:18: warning: ‘arg_offs_table’ defined but not used [-Wunused-const-variable]
static const int arg_offs_table[] = {
^~~~~~~~~~~~~~
Signed-off-by: Chen Gang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Conflicts:
drivers/gpu/drm/i915/intel_pm.c
|
|
The Linux kernel already has the concept of IRQ domain, wherein a
component can expose a set of IRQs which are managed by a particular
interrupt controller chip or other subsystem. The PCI driver exposes
the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from
PCI Express devices. This patch exposes the functions which are
necessary for creating a MSI IRQ domain within a module.
[ tglx: Split it into x86 and core irq parts ]
Signed-off-by: Jake Oshins <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.
In a single VCPU Xen PV guest we should have:
/proc/interrupts:
CPU0
0: 4934 xen-percpu-virq timer0
1: 0 xen-percpu-ipi spinlock0
2: 0 xen-percpu-ipi resched0
3: 0 xen-percpu-ipi callfunc0
4: 0 xen-percpu-virq debug0
5: 0 xen-percpu-ipi callfuncsingle0
6: 0 xen-percpu-ipi irqwork0
7: 321 xen-dyn-event xenbus
8: 90 xen-dyn-event hvc_console
...
But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.
genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.
Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).
Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.
Reported-and-tested-by: Sander Eikelenboom <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # 4.2+
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.
The remaining ones need more careful inspection before a conversion can
happen. On the TODO.
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: David Sterba <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Add an enum for the ->x86_capability array indices and cleanup
get_cpu_cap() by killing some redundant local vars.
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Turn the CPUID leafs which are proper CPUID feature bit leafs into
separate ->x86_capability words.
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Pull in upstream changes so we can apply depending patches.
|
|
Now, multiple CPUs can receive an external NMI simultaneously by
specifying the "apic_extnmi=all" command line parameter. When we take
a crash dump by using external NMI with this option, we fail to save
registers into the crash dump. This happens as follows:
CPU 0 CPU 1
================================ =============================
receive an external NMI
default_do_nmi() receive an external NMI
spin_lock(&nmi_reason_lock) default_do_nmi()
io_check_error() spin_lock(&nmi_reason_lock)
panic() busy loop
...
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
busy loop...
Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
IRET so the NMI is not handled and the callback function to save
registers is never called.
To solve this issue, we check if the IPI for crash dumping was issued
while waiting for nmi_reason_lock to be released, and if so, call its
callback function directly. If the IPI is not issued (e.g. kdump is
disabled), the actual behavior doesn't change.
Signed-off-by: Hidehiro Kawai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stefan Lippers-Hollmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This patch introduces a command line parameter apic_extnmi:
apic_extnmi=( bsp|all|none )
The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.
"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.
If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.
Signed-off-by: Hidehiro Kawai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Bandan Das <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Ribalda Delgado <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: x86-ml <[email protected]>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.
For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:
CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...
Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.
In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.
To save registers in this case, we need to:
a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loop
Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).
This patch does the following:
1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.
2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callback
Signed-off-by: Hidehiro Kawai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Aaron Tomlin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Young <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Gobinda Charan Maji <[email protected]>
Cc: HATAYAMA Daisuke <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Javi Merino <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: lkml <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Nicolas Iooss <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Stefan Lippers-Hollmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.
To avoid this problem, don't call panic() in NMI context if we've
already entered panic().
For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.
Signed-off-by: Hidehiro Kawai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Aaron Tomlin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Gobinda Charan Maji <[email protected]>
Cc: HATAYAMA Daisuke <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Javi Merino <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: lkml <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Nicolas Iooss <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Vivek Goyal <[email protected]>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Pull in update changes so we can apply conflicting patches
|
|
Intel's MCA implementation broadcasts MCEs to all CPUs on the
node. This poses a problem for offlined CPUs which cannot
participate in the rendezvous process:
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
Kernel Offset: disabled
Rebooting in 100 seconds..
More specifically, Linux does a soft offline of a CPU when
writing a 0 to /sys/devices/system/cpu/cpuX/online, which
doesn't prevent the #MC exception from being broadcasted to that
CPU.
Ensure that offline CPUs don't participate in the MCE rendezvous
and clear the RIP valid status bit so that a second MCE won't
cause a shutdown.
Without the patch, mce_start() will increment mce_callin and
wait for all CPUs. Offlined CPUs should avoid participating in
the rendezvous process altogether.
Signed-off-by: Ashok Raj <[email protected]>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree includes four core perf fixes for misc bugs, three fixes to
x86 PMU drivers, and two updates to old email addresses"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do not send exit event twice
perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
perf: Fix PERF_EVENT_IOC_PERIOD deadlock
treewide: Remove old email address
perf/x86: Fix LBR call stack save/restore
perf: Update email address in MAINTAINERS
perf/core: Robustify the perf_cgroup_from_task() RCU checks
perf/core: Fix RCU problem with cgroup context switching code
|
|
We've picked up a few conflicts and it would be nice
to resolve them before we move onwards.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thoma Gleixner:
"Another round of fixes for x86:
- Move the initialization of the microcode driver to late_initcall to
make sure everything that init function needs is available.
- Make sure that lockdep knows about interrupts being off in the
entry code before calling into c-code.
- Undo the cpu hotplug init delay regression.
- Use the proper conditionals in the mpx instruction decoder.
- Fixup restart_syscall for x32 tasks.
- Fix the hugepage regression on PAE kernels which was introduced
with the latest PAT changes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/signal: Fix restart_syscall number for x32 tasks
x86/mpx: Fix instruction decoder condition
x86/mm: Fix regression with huge pages on PAE
x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs
x86/entry/64: Fix irqflag tracing wrt context tracking
x86/microcode: Initialize the driver late when facilities are up
|
|
Now that we have generic MSR trace points we can remove the old
hackish perf MSR read tracing code.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Fix a definition in the perf rapl driver. __initconst must
be applied to a const object, but to declare a const pointer
you need to use * const ..., not const ... *
This fixes a section attribute conflict with LTO builds.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We need to add rest of the flags to the constraint mask
instead of another INTEL_ARCH_EVENT_MASK, fixing a typo.
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There was a mistake in the Haswell constraints table.
Signed-off-by: Yuanfang Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
when memory hotplug enabled system is booted with less
than 4GB of RAM and then later more RAM is hotplugged
32-bit devices stop functioning with following error:
nommu_map_single: overflow 327b4f8c0+1522 of device mask ffffffff
the reason for this is that if x86_64 system were booted
with RAM less than 4GB, it doesn't enable SWIOTLB and
when memory is hotplugged beyond MAX_DMA32_PFN, devices
that expect 32-bit addresses can't handle 64-bit addresses.
Fix it by tracking max possible PFN when parsing
memory affinity structures from SRAT ACPI table and
enable SWIOTLB if there is hotpluggable memory
regions beyond MAX_DMA32_PFN.
It fixes KVM guests when they use emulated devices
(reproduces with ata_piix, e1000 and usb devices,
RHBZ: 1275941, 1275977, 1271527)
It also fixes the HyperV, VMWare with emulated devices
which are affected by this issue as well.
Signed-off-by: Igor Mammedov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
max_possible_pfn will be used for tracking max possible
PFN for memory that isn't present in E820 table and
could be hotplugged later.
By default max_possible_pfn is initialized with max_pfn,
but later it could be updated with highest PFN of
hotpluggable memory ranges declared in ACPI SRAT table
if any present.
Signed-off-by: Igor Mammedov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number. For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.
Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <[email protected]>
Signed-off-by: Dmitry V. Levin <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Calling set_memory_rw() and set_memory_ro() for every iteration of the
loop in klp_write_object_relocations() is messy, inefficient, and
error-prone.
Change all the read-only pages to read-write before the loop and convert
them back to read-only again afterwards.
Suggested-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.
It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).
Acked-by: Peter Zijlstra <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
as __initdata
'range_new' doesn't seem to be used after init. It is only passed
to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the
latter of which also only passes it on to various *range*
library functions.
So mark it __initdata to free up an extra page after init.
Its contents are wiped at every call to mtrr_calc_range_state(),
so it being static is not about preserving state between calls,
but simply to avoid a 4k+ stack frame. While there, add a
comment explaining this and why it's safe.
We could also mark nr_range_new as __initdata, but since it's
just a single int and also doesn't carry state between calls (it
is unconditionally assigned to before it is read), we might as
well make it an ordinary automatic variable.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If there are no persistent memory ranges present then don't bother
creating the platform device. Otherwise, it loads the full libnvdimm
sub-system only to discover no resources present.
Reported-by: Andy Lutomirski <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Ingo noted that if we can guarantee _end is aligned to PAGE_SIZE
we can automatically avoid bugs along the lines of,
size = _end - _text >> PAGE_SHIFT
which is missing a call to PFN_ALIGN(). The EFI mixed mode
contains this bug, for example.
_text is already aligned to PAGE_SIZE through the use of
LOAD_PHYSICAL_ADDR, and the BSS and BRK sections are explicitly
aligned in the linker script, so it makes sense to align _end to
match.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sai Praneeth Prakhya <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The cal_chipset_ops structures are never modified, so declare
them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jon D. Mason <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Muli Ben-Yehuda <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
These are clearly just used during init.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Quentin Casasnovas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
commit f1ccd249319e allowed the cmdline "cpu_init_udelay=" to work
with all values, including the default of 10000.
But in setting the default of 10000, it over-rode the code that sets
the delay 0 on modern processors.
Also, tidy up use of INT/UINT.
Fixes: f1ccd249319e "x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior"
Reported-by: Shane <[email protected]>
Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
pte_update_defer can be removed as it is always set to the same
function as pte_update. So any usage of pte_update_defer() can be
replaced by pte_update().
pmd_update and pmd_update_defer are always set to paravirt_nop, so they
can just be nuked.
Signed-off-by: Juergen Gross <[email protected]>
Acked-by: Rusty Russell <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
x86_init_rdrand() was added with 2 goals:
1. Sanity check that the built-in-self-test circuit on the Digital
Random Number Generator (DRNG) is not complaining. As RDRAND
HW self-checks on every invocation, this goal is achieved
by simply invoking RDRAND and checking its return code.
2. Force a full re-seed of the random number generator.
This was done out of paranoia to benefit the most un-sophisticated
DRNG implementation conceivable in the architecture,
an implementation that does not exist, and unlikely ever will.
This worst-case full-re-seed is achieved by invoking
a 64-bit RDRAND 8192 times.
Unfortunately, this worst-case re-seed costs O(1,000us).
Magnifying this cost, it is done from identify_cpu(), which is the
synchronous critical path to bring a processor on-line -- repeated
for every logical processor in the system at boot and resume from S3.
As it is very expensive, and of highly dubious value, we delete the
worst-case re-seed from the kernel.
We keep the 1st goal -- sanity check the hardware, and mark it absent
if it complains.
This change reduces the cost of x86_init_rdrand() by a factor of 1,000x,
to O(1us) from O(1,000us).
Signed-off-by: Len Brown <[email protected]>
Link: http://lkml.kernel.org/r/058618cc56ec6611171427ad7205e37e377aa8d4.1439738240.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.
Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Saving and restoring lapic vectors in lapic_suspend() and
lapic_resume() is not consistent: the thmr vector saving is
guarded by a different config option than the restore part. The
cmci vector isn't handled at all.
Those inconsistencies are not very critical, as the missing cmci
vector will be set via mce resume handling, the wrong config
option used for restoring the thmr vector can't be configured
differently than the one which should be used.
Nevertheless correct the thmr vector restore and add cmci vector
handling.
Signed-off-by: Juergen Gross <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Minor code edits. ]
Signed-off-by: Ingo Molnar <[email protected]>
|