aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2013-06-20x86, trace: Add irq vector tracepointsSeiji Aguchi19-15/+419
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-06-20x86: Rename variables for debuggingSeiji Aguchi4-13/+54
Rename variables for debugging to describe meaning of them precisely. Also, introduce a generic way to switch IDT by checking a current state, debug on/off. Signed-off-by: Seiji Aguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-06-20x86, trace: Introduce entering/exiting_irq()Seiji Aguchi7-46/+109
When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ exiting_irq(); /* post-processing of this handler */ } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); /* pre-processing of this handler */ trace_irq_entry(); /* tracepoint for irq entry */ __smp_irq_handler(); /* * common logic between non-trace and trace handlers * in a vector. */ trace_irq_exit(); /* tracepoint for irq exit */ exiting_irq(); /* post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-06-20x86, reloc: Use xorl instead of xorq in relocate_kernel_64.SH. Peter Anvin1-17/+17
There is no point in using "xorq" to clear a register... use "xorl" to clear the bottom 32 bits, and the upper 32 bits get cleared by virtue of zero extension. Signed-off-by: H. Peter Anvin <[email protected]> Cc: Kees Cook <[email protected]> Link: http://lkml.kernel.org/n/[email protected]
2013-06-20Merge tag 'v3.10-rc6' into x86/cleanupsH. Peter Anvin16-212/+181
Linux 3.10-rc6 We need a change that is the mainline tree for further work. Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-20x86, fpu: Use static_cpu_has_safe before alternativesBorislav Petkov1-1/+1
The call stack below shows how this happens: basically eager_fpu_init() calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()), which, in turn, uses static_cpu_has. And we're executing before alternatives so static_cpu_has doesn't work there yet. Use the safe variant in this path which becomes optimal after alternatives have run. WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20() You're using static_cpu_has before alternatives have run! Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1 Call Trace: warn_slowpath_common warn_slowpath_fmt ? fpu_finit warn_pre_alternatives eager_fpu_init fpu_init cpu_init trap_init start_kernel ? repair_env_string x86_64_start_reservations x86_64_start_kernel Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-20x86: Add a static_cpu_has_safe variantBorislav Petkov2-1/+91
We want to use this in early code where alternatives might not have run yet and for that case we fall back to the dynamic boot_cpu_has. For that, force a 5-byte jump since the compiler could be generating differently sized jumps for each label. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-20x86: Sanity-check static_cpu_has usageBorislav Petkov3-2/+46
static_cpu_has may be used only after alternatives have run. Before that it always returns false if constant folding with __builtin_constant_p() doesn't happen. And you don't want that. This patch is the result of me debugging an issue where I overzealously put static_cpu_has in code which executed before alternatives have run and had to spend some time with scratching head and cursing at the monitor. So add a jump to a warning which screams loudly when we use this function too early. The alternatives patch that check away in conjunction with patching the rest of the kernel image. [ hpa: factored this into its own configuration option. If we want to have an overarching option, it should be an option which selects other options, not as a group option in the source code. ] Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-20x86, cpu: Add a synthetic, always true, cpu featureBorislav Petkov2-1/+3
This will be used in alternatives later as an always-replace flag. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-20KVM: MMU: retain more available bits on mmio spteXiao Guangrong2-3/+9
Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61 can be used for other purposes Signed-off-by: Xiao Guangrong <[email protected]> Reviewed-by: Gleb Natapov <[email protected]> Reviewed-by: Marcelo Tosatti <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2013-06-20Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two smaller fixes - plus a context tracking tracing fix that is a bit bigger" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/context-tracking: Add preempt_schedule_context() for tracing sched: Fix clear NOHZ_BALANCE_KICK sched/x86: Construct all sibling maps if smt
2013-06-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Four fixes. The mmap ones are unfortunately larger than desired - fuzzing uncovered bugs that needed perf context life time management changes to fix properly" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP perf: Fix mmap() accounting hole perf: Fix perf mmap bugs kprobes: Fix to free gone and unused optprobes
2013-06-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-12/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu idle fixes from Thomas Gleixner: - Add a missing irq enable. Fallout of the idle conversion - Fix stackprotector wreckage caused by the idle conversion * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: idle: Enable interrupts in the weak arch_cpu_idle() implementation idle: Add the stack canary init to cpu_startup_entry()
2013-06-20Merge branch 'perf/urgent' into perf/coreIngo Molnar1-4/+10
Merge in two hw_breakpoint fixes, before applying another 5. Signed-off-by: Ingo Molnar <[email protected]>
2013-06-20kprobes: Fix arch_prepare_kprobe to handle copy insn failuresMasami Hiramatsu1-4/+10
Fix arch_prepare_kprobe() to handle failures in copy instruction correctly. This fix is related to the previous fix: 8101376 which made __copy_instruction return an error result if failed, but caller site was not updated to handle it. Thus, this is the other half of the bugfix. This fix is also related to the following bug-report: https://bugzilla.redhat.com/show_bug.cgi?id=910649 Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Steven Rostedt <[email protected]> Tested-by: Jonathan Lebon <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <[email protected]>
2013-06-20x86: Fix trigger_all_cpu_backtrace() implementationMichel Lespinasse3-3/+7
The following change fixes the x86 implementation of trigger_all_cpu_backtrace(), which was previously (accidentally, as far as I can tell) disabled to always return false as on architectures that do not implement this function. trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h, should call arch_trigger_all_cpu_backtrace() if available, or return false if the underlying arch doesn't implement this function. x86 did provide a suitable arch_trigger_all_cpu_backtrace() implementation, but it wasn't actually being used because it was declared in asm/nmi.h, which linux/nmi.h doesn't include. Also, linux/nmi.h couldn't easily be fixed by including asm/nmi.h, because that file is not available on all architectures. I am proposing to fix this by moving the x86 definition of arch_trigger_all_cpu_backtrace() to asm/irq.h. Tested via: echo l > /proc/sysrq-trigger Before the change, this uses a fallback implementation which shows backtraces on active CPUs (using smp_call_function_interrupt() ) After the change, this shows NMI backtraces on all CPUs Signed-off-by: Michel Lespinasse <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-20x86/intel/cacheinfo: Shut up last long-standing warningBorislav Petkov1-27/+25
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’: arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized] This keeps on happening during randbuilds and the compiler is wrong here: In the case where cpuid4_cache_lookup_regs() returns 0, both this_leaf.size and this_leaf.eax get initialized. In the case where the CPUID leaf doesn't contain valid cache info, we error out which init_intel_cacheinfo() handles correctly without touching the abovementioned fields. So shut up the warning by clearing out the struct which we hand down. While at it, reverse error handling and gain one indentation level. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-185/+78
Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <[email protected]>
2013-06-19x86: Fix section mismatch on load_ucode_apPaul Gortmaker1-2/+2
We are in the process of removing all the __cpuinit annotations. While working on making that change, an existing problem was made evident: WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch in reference from the function cpu_init() to the function .init.text:load_ucode_ap() The function cpu_init() references the function __init load_ucode_ap(). This is often because cpu_init lacks a __init annotation or the annotation of load_ucode_ap is wrong. This now appears because in my working tree, cpu_init() is no longer tagged as __cpuinit, and so the audit picks up the mismatch. The 2nd hypothesis from the audit is the correct one, as there was an incorrect __init tag on the prototype in the header (but __cpuinit was used on the function itself.) The audit is telling us that the prototype's __init annotation took effect and the function did land in the .init.text section. Checking with objdump on a mainline tree that still has __cpuinit shows that the __cpuinit on the function takes precedence over the __init on the prototype, but that won't be true once we make __cpuinit a no-op. Even though we are removing __cpuinit, we temporarily align both the function and the prototype on __cpuinit so that the changeset can be applied to stable trees if desired. [ hpa: build fix only, no object code change ] Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: stable <[email protected]> # 3.9+ Signed-off-by: Paul Gortmaker <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-19x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.Konrad Rzeszutek Wilk4-3/+12
Which by default will be x86_acpi_suspend_lowlevel. This registration allows us to register another callback if there is a need to use another platform specific callback. Signed-off-by: Liang Tang <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: Ben Guthro <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-06-19x86/vdso: Convert use of typedef ctl_table to struct ctl_tableJoe Perches1-2/+2
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <[email protected]> Cc: Jiri Kosina <[email protected]> Link: http://lkml.kernel.org/r/a756fa0060e8eea25e8c1863c2764e86c2823617.1371177118.git.joe@perches.com Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19x86: Remove weird PTR_ERR() in do_debugRusty Russell1-1/+1
62edab905 changed the argument to notify_die() from dr6 to &dr6, but weirdly, used PTR_ERR() to cast it to a long. Since dr6 is on the stack, this is an abuse of PTR_ERR(). Cast to long, as per kernel standard. Signed-off-by: Rusty Russell <[email protected]> Cc: K.Prasad <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Add mem-loads/stores support for HaswellAndi Kleen3-7/+41
mem-loads is basically the same as Sandy Bridge, but we use a separate string for changes later. Haswell doesn't support the full precise store mode, so we emulate it using the "DataLA" facility. This allows to do everything, but for data sources we can only detect L1 hit or not. There is no explicit enable bit anymore, so we have to tie it to a perf internal only flag. The address is supported for all memory related PEBS events with DataLA. Instead of only logging for the load and store events we allow logging it for all (it will be simply 0 if the current event does not support it) Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Support Haswell/v4 LBR formatAndi Kleen1-5/+51
Haswell has two additional LBR from flags for TSX: in_tx and abort_tx, implemented as a new "v4" version of the LBR format. Handle those in and adjust the sign extension code to still correctly extend. The flags are exported similarly in the LBR record to the existing misprediction flag Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Move NMI clearing to end of PMI handlerAndi Kleen2-8/+13
This avoids some problems with spurious PMIs on Haswell. Haswell seems to behave more like P4 in this regard. Do the same thing as the P4 perf handler by unmasking the NMI only at the end. Shouldn't make any difference for earlier family 6 cores. (Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).) Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Add Haswell PEBS supportAndi Kleen3-2/+42
Add simple PEBS support for Haswell. The constraints are similar to SandyBridge with a few new events. Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Add simple Haswell PMU supportAndi Kleen3-2/+91
Similar to SandyBridge, but has a few new events and two new counter bits. There are some new counter flags that need to be prevented from being set on fixed counters, and allowed to be set for generic counters. Also we add support for the counter 2 constraint to handle all raw events. (Contains fixes from Stephane Eranian.) Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/intel: Add Haswell PEBS record supportAndi Kleen2-22/+91
Add support for the Haswell extended (fmt2) PEBS format. It has a superset of the nhm (fmt1) PEBS fields, but has a longer record so we need to adjust the code paths. The main advantage is the new "EventingRip" support which directly gives the instruction, not off-by-one instruction. So with precise == 2 we use that directly and don't try to use LBRs and walking basic blocks. This lowers the overhead of using precise significantly. Some other features are added in later patches. Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19x86/debug: Only print out DR registers if they are not power-on defaultsDave Jones2-4/+16
The DR registers are rarely useful when decoding oopses. With screen real estate during oopses at a premium, we can save two lines by only printing out these registers when they are set to something other than they power-on state. Signed-off-by: Dave Jones <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19Merge tag 'ras_fixlet_for_3.11' of ↵Ingo Molnar2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull "Fix typo in define" change from Borislav Petkov. Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19x86/boot: Close opened file descriptorJiri Slaby1-0/+1
During build we open a file, read that but do not close it. Fix that by sticking fclose() at the right place. Signed-off-by: Jiri Slaby <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: [email protected]
2013-06-19perf/x86/intel: Fix sparse warningYan, Zheng1-1/+1
Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementationSuravee Suthikulpanit3-0/+548
Implement a perf PMU to handle IOMMU performance counters and events. The PMU only supports counting mode (e.g. perf stat). Since the counters are shared across all cores, the PMU is implemented as "system-wide" mode. To invoke the AMD IOMMU PMU, issue a perf tool command such as: ./perf stat -a -e amd_iommu/<events>/ <command> or: ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command> For example: ./perf stat -a -e amd_iommu/mem_trans_total/ <command> The resulting count will be how many IOMMU total peripheral memory operations were performed during the command execution window. Signed-off-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86: Only print PMU state when also WARN()'ingDave Hansen1-2/+6
intel_pmu_handle_irq() has a warning in it if it does too many loops. It is a WARN_ONCE(), but the perf_event_print_debug() call beneath it is unconditional. For the first warning, you get a nice backtrace and message, but subsequent ones just dump the PMU state with no leading messages. I doubt this is what was intended. This patch will only print the PMU state when paired with the WARN_ON() text. It effectively open-codes WARN_ONCE()'s one-time-only logic. My suspicion is that the code really just wants to make sure we do not sit in the loop and spit out a warning for every loop iteration after the 100th. From what I've seen, this is very unlikely to happen since we also clear the PMU state. After this patch, instead of seeing the PMU state dumped each time, you will just see: [57494.894540] perf_event_intel: clearing PMU state on CPU#129 [57579.539668] perf_event_intel: clearing PMU state on CPU#10 [57587.137762] perf_event_intel: clearing PMU state on CPU#134 [57623.039912] perf_event_intel: clearing PMU state on CPU#114 [57644.559943] perf_event_intel: clearing PMU state on CPU#118 ... Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86: Reduce stack usage of x86_schedule_events()Andrew Hunter3-18/+22
x86_schedule_events() caches event constraints on the stack during scheduling. Given the number of possible events, this is 512 bytes of stack; since it can be invoked under schedule() under god-knows-what, this is causing stack blowouts. Trade some space usage for stack safety: add a place to cache the constraint pointer to struct perf_event. For 8 bytes per event (1% of its size) we can save the giant stack frame. This shouldn't change any aspect of scheduling whatsoever and while in theory the locality's a tiny bit worse, I doubt we'll see any performance impact either. Tested: `perf stat whatever` does not blow up and produces results that aren't hugely obviously wrong. I'm not sure how to run particularly good tests of perf code, but this should not produce any functional change whatsoever. Signed-off-by: Andrew Hunter <[email protected]> Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19Merge branch 'perf/urgent' into perf/coreIngo Molnar1-1/+1
Merge in the latest fixes, to avoid conflicts with ongoing work. Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EPStephane Eranian1-1/+1
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP. For some reason, the LDLAT extra reg definition for snb_ep showed up as duplicate in the snb table. This patch moves the definition of LDLAT back into the snb_ep table. Thanks to Don Zickus for tracking this one down. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad Signed-off-by: Ingo Molnar <[email protected]>
2013-06-19x86: kvmclock: zero initialize pvclock shared memory areaIgor Mammedov1-0/+1
kernel might hung in pvclock_clocksource_read() due to uninitialized memory might contain odd version value in following cycle: do { version = __pvclock_read_cycles(src, &ret, &flags); } while ((src->version & 1) || version != src->version); if secondary kvmclock is accessed before it's registered with kvm. Clear garbage in pvclock shared memory area right after it's allocated to avoid this issue. Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521 Signed-off-by: Igor Mammedov <[email protected]> [See BZ for analysis. We may want a different fix for 3.11, but this is the safest for now - Paolo] Cc: <[email protected]> # 3.8 Signed-off-by: Paolo Bonzini <[email protected]>
2013-06-18x86: fix build error and kconfig for ia32_emulation and binfmtRandy Dunlap1-0/+1
Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF when COMPAT_BINFMT_ELF is being selected. warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF) fs/built-in.o: In function `elf_core_dump': compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs' compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size' compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs' compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data' [ hpa: This was sent to me for -next but it is a low risk build fix ] Signed-off-by: Randy Dunlap <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-18x86, mtrr: Fix original mtrr range get for mtrr_cleanupYinghai Lu1-4/+4
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge during add range) broke mtrr cleanup on his setup in 3.9.5. corresponding commit in upstream is fbe06b7bae7c. *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G https://bugzilla.kernel.org/show_bug.cgi?id=59491 So it rejects new var mtrr layout. It turns out we have some problem with initial mtrr range retrieval. The current sequence is: x86_get_mtrr_mem_range ==> bunchs of add_range_with_merge ==> bunchs of subract_range ==> clean_sort_range add_range_with_merge for [0,1M) sort_range() add_range_with_merge could have blank slots, so we can not just sort only, that will have final result have extra blank slot in head. So move that calling add_range_with_merge for [0,1M), with that we could avoid extra clean_sort_range calling. Reported-by: Joshua Covington <[email protected]> Tested-by: Joshua Covington <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: <[email protected]> v3.9 Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-18KVM: x86: remove vcpu's CPL check in host-invoked XCR setZhanghaoyu (A)1-3/+2
__kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is called in two flows, one is invoked by guest, call stack shown as below, handle_xsetbv(or xsetbv_interception) kvm_set_xcr __kvm_set_xcr the other one is invoked by host, for example during system reset: kvm_arch_vcpu_ioctl kvm_vcpu_ioctl_x86_set_xcrs __kvm_set_xcr The former does need the CPL check, but the latter does not. Cc: [email protected] Signed-off-by: Zhang Haoyu <[email protected]> [Tweaks to commit message. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
2013-06-17Merge 3.10-rc6 into driver-core-nextGreg Kroah-Hartman18-214/+183
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-06-14x86 thermal: Disable power limit notification interrupt by defaultFenghua Yu1-8/+26
The package power limit notification interrupt is primarily for system diagnosis, and should not be blindly enabled on every system by default -- particuarly since Linux does nothing in the handler except count how many times it has been called... Add a new kernel cmdline parameter "int_pln_enable" for situations where users want to oberve these events via existing system counters: $ grep TRM /proc/interrupts $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Len Brown <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-06-14x86 thermal: Delete power-limit-notification console messagesFenghua Yu1-9/+0
Package power limits are common on some systems under some conditions -- so printing console messages when limits are reached causes unnecessary customer concern and support calls. Note that even with these console messages gone, the events can still be observed via system counters: $ grep TRM /proc/interrupts Shows total thermal interrupts, which includes both power limit notifications and thermal throttling interrupts. $ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/* Will show what caused those interrupts, core and package throttling and power limit notifications. https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Len Brown <[email protected]> Signed-off-by: Tony Luck <[email protected]>
2013-06-14x86: mm: Remove general hugetlb code from x86.Steve Capper2-67/+3
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have already been copied over to mm. This patch removes the x86 copies of these functions and activates the general ones by enabling: CONFIG_ARCH_WANT_GENERAL_HUGETLB Signed-off-by: Steve Capper <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Andrew Morton <[email protected]>
2013-06-14x86: mm: Remove x86 version of huge_pmd_share.Steve Capper2-120/+3
The huge_pmd_share code has been copied over to mm/hugetlb.c to make it accessible to other architectures. Remove the x86 copy of the huge_pmd_share code and enable the ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the general one. Signed-off-by: Steve Capper <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Andrew Morton <[email protected]>
2013-06-13Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds7-185/+70
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Another set of fixes, the biggest bit of this is yet another tweak to the UEFI anti-bricking code; apparently we finally got some feedback from Samsung as to what makes at least their systems fail. This set should actually fix the boot regressions that some other systems (e.g. SGI) have exhibited. Other than that, there is a patch to avoid a panic with particularly unhappy memory layouts and two minor protocol fixes which may or may not be manifest bugs" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix typo in kexec register clearing x86, relocs: Move __vvar_page from S_ABS to S_REL Modify UEFI anti-bricking code x86: Fix adjust_range_size_mask calling position
2013-06-13Merge tag 'efi-urgent' into x86/urgentH. Peter Anvin4-178/+65
* More tweaking to the EFI variable anti-bricking algorithm. Quite a few users were reporting boot regressions in v3.9. This has now been fixed with a more accurate "minimum storage requirement to avoid bricking" value from Samsung (5K instead of 50%) and code to trigger garbage collection when we near our limit - Matthew Garrett. Signed-off-by: H. Peter Anvin <[email protected]>
2013-06-13crypto: aesni_intel - fix accessing of unaligned memoryJussi Kivilinna1-16/+32
The new XTS code for aesni_intel uses input buffers directly as memory operands for pxor instructions, which causes crash if those buffers are not aligned to 16 bytes. Patch changes XTS code to handle unaligned memory correctly, by loading memory with movdqu instead. Reported-by: Dave Jones <[email protected]> Tested-by: Dave Jones <[email protected]> Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-06-13x86, mcheck, therm_throt: Process package thresholdsSrinivas Pandruvada2-4/+66
Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Zhang Rui <[email protected]>