aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-04-18perf: Allow building PMU drivers as modulesYan, Zheng2-0/+16
This patch adds support for building PMU driver as module. It exports the functions perf_pmu_{register,unregister}() and adds reference tracking for the PMU driver module. When the PMU driver is built as a module, each active event of the PMU holds a reference to the driver module. Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-18Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.Ingo Molnar124-1941/+2497
Signed-off-by: Ingo Molnar <[email protected]>
2014-04-18perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMUVenkatesh Srinivas1-3/+9
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-18Merge branch 'uprobes/core' of ↵Ingo Molnar3-226/+372
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes fixes and cleanups from Oleg Nesterov: "Any probed jmp/call can kill the application, see the changelog in 11/15." Signed-off-by: Ingo Molnar <[email protected]>
2014-04-17uprobes/x86: Emulate relative conditional "near" jmp'sOleg Nesterov1-0/+8
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10 if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same condition. Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are correct no matter if this jmp is "near" or "short". Reported-by: Jonathan Lebon <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Emulate relative conditional "short" jmp'sOleg Nesterov1-2/+55
Teach branch_emulate_op() to emulate the conditional "short" jmp's which check regs->flags. Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz. They all are rel8 and thus they can't trigger the problem, but perhaps we will add the support in future just for completeness. Reported-by: Jonathan Lebon <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Emulate relative call'sOleg Nesterov2-10/+71
See the previous "Emulate unconditional relative jmp's" which explains why we can not execute "jmp" out-of-line, the same applies to "call". Emulating of rip-relative call is trivial, we only need to additionally push the ret-address. If this fails, we execute this instruction out of line and this should trigger the trap, the probed application should die or the same insn will be restarted if a signal handler expands the stack. We do not even need ->post_xol() for this case. But there is a corner (and almost theoretical) case: another thread can expand the stack right before we execute this insn out of line. In this case it hit the same problem we are trying to solve. So we simply turn the probed insn into "call 1f; 1:" and add ->post_xol() which restores ->sp and restarts. Many thanks to Jonathan who finally found the standalone reproducer, otherwise I would never resolve the "random SIGSEGV's under systemtap" bug-report. Now that the problem is clear we can write the simplified test-case: void probe_func(void), callee(void); int failed = 1; asm ( ".text\n" ".align 4096\n" ".globl probe_func\n" "probe_func:\n" "call callee\n" "ret" ); /* * This assumes that: * * - &probe_func = 0x401000 + a_bit, aligned = 0x402000 * * - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000 * as xol_add_vma() asks; the 1st slot = 0x7fffffffe080 * * so we can target the non-canonical address from xol_vma using * the simple math below, 100 * 4096 is just the random offset */ asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n"); void callee(void) { failed = 0; } int main(void) { probe_func(); return failed; } It SIGSEGV's if you probe "probe_func" (although this is not very reliable, randomize_va_space/etc can change the placement of xol area). Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8) differently. This patch relies on lib/insn.c and thus implements the intel's behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use this insn, so we postpone the fix until we decide what should we do; emulate or not, support or not, etc. Reported-by: Jonathan Lebon <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Emulate nop's using ops->emulate()Oleg Nesterov1-19/+1
Finally we can kill the ugly (and very limited) code in __skip_sstep(). Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn. Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes "(rep;)+ nop;" at least, and (afaics) much more. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Emulate unconditional relative jmp'sOleg Nesterov2-1/+45
Currently we always execute all insns out-of-line, including relative jmp's and call's. This assumes that even if regs->ip points to nowhere after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will update it correctly. However, this doesn't work if this regs->ip == xol_vaddr + insn_offset is not canonical. In this case CPU generates #GP and general_protection() kills the task which tries to execute this insn out-of-line. Now that we have uprobe_xol_ops we can teach uprobes to emulate these insns and solve the problem. This patch adds branch_xol_ops which has a single branch_emulate_op() hook, so far it can only handle rel8/32 relative jmp's. TODO: move ->fixup into the union along with rip_rela_target_address. Signed-off-by: Oleg Nesterov <[email protected]> Reported-by: Jonathan Lebon <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and ↵Oleg Nesterov1-22/+15
arch_uretprobe_hijack_return_addr() 1. Add the trivial sizeof_long() helper and change other callers of is_ia32_task() to use it. TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame(). 2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr() and adjust_ret_addr() should be named "nleft". And in fact only the last copy_to_user() in arch_uretprobe_hijack_return_addr() actually needs to inspect the non-zero error code. TODO: adjust_ret_addr() should die. We can always calculate the value we need to write into *regs->sp, just UPROBE_FIX_CALL should record insn->length. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Teach arch_uprobe_post_xol() to restart if possibleOleg Nesterov1-4/+16
SIGILL after the failed arch_uprobe_post_xol() should only be used as a last resort, we should try to restart the probed insn if possible. Currently only adjust_ret_addr() can fail, and this can only happen if another thread unmapped our stack after we executed "call" out-of-line. Most probably the application if buggy, but even in this case it can have a handler for SIGSEGV/etc. And in theory it can be even correct and do something non-trivial with its memory. Of course we can't restart unconditionally, so arch_uprobe_post_xol() does this only if ->post_xol() returns -ERESTART even if currently this is the only possible error. default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim pointed out it should not forget to pop off the return address pushed by this insn executed out-of-line. Note: this is not "perfect", we do not want the extra handler_chain() after restart, but I think this is the best solution we can realistically do without too much uglifications. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Send SIGILL if arch_uprobe_post_xol() failsOleg Nesterov2-5/+19
Currently the error from arch_uprobe_post_xol() is silently ignored. This doesn't look good and this can lead to the hard-to-debug problems. 1. Change handle_singlestep() to loudly complain and send SIGILL. Note: this only affects x86, ppc/arm can't fail. 2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and avoid TF games if it is going to return an error. This can help to to analyze the problem, if nothing else we should not report ->ip = xol_slot in the core-file. Note: this means that handle_riprel_post_xol() can be called twice, but this is fine because it is idempotent. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Conditionalize the usage of handle_riprel_insn()Oleg Nesterov1-4/+2
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start, but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic. Move the callsite into "default" case and change the "0xff" case to fall-through. We are going to add the various hooks to handle the rip-relative jmp/call instructions (and more), we need this change to enforce the fact that the new code can not conflict with is_riprel_insn() logic which, after this change, can only be used by default_xol_ops. Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol() directly. This is fine unless another _xol_ops we may add later will need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup. In this case we can add uprobe_xol_ops->abort() hook, which (perhaps) we will need anyway in the long term. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]>
2014-04-17uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->opsOleg Nesterov2-40/+74
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops", move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default set of methods and change arch_uprobe_pre/post_xol() accordingly. This way we can add the new uprobe_xol_ops's to handle the insns which need the special processing (rip-relative jmp/call at least). Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]>
2014-04-17uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooksOleg Nesterov1-16/+14
No functional changes. Preparation to simplify the review of the next change. Just reorder the code in arch_uprobe_pre/post_xol() functions so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end. Also change arch_uprobe_pre_xol() to use utask instead of autask, to make the code more symmetrical with arch_uprobe_post_xol(). Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Acked-by: Srikar Dronamraju <[email protected]>
2014-04-17uprobes/x86: Gather "riprel" functions togetherOleg Nesterov1-65/+53
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to the closely related handle_riprel_insn(). This way it is simpler to read and understand this code, and this lessens the number of ifdef's. While at it, update the comment in handle_riprel_post_xol() as Jim suggested. TODO: rename them somehow to make the naming consistent. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]>
2014-04-17uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove ↵Oleg Nesterov1-7/+3
"mm" arg Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if it is true insn_rip_relative() must return false. validate_insn_bits() passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative() checks insn->x86_64. Also, remove the no longer needed "struct mm_struct *mm" argument and the unnecessary "return" at the end. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Acked-by: Srikar Dronamraju <[email protected]>
2014-04-17uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()Oleg Nesterov1-63/+47
No functional changes, preparation. Shift the code from prepare_fixups() to arch_uprobe_analyze_insn() with the following modifications: - Do not call insn_get_opcode() again, it was already called by validate_insn_bits(). - Move "case 0xea" up. This way "case 0xff" can fall through to default case. - change "case 0xff" to use the nested "switch (MODRM_REG)", this way the code looks a bit simpler. - Make the comments look consistent. While at it, kill the initialization of rip_rela_target_address and ->fixups, we can rely on kzalloc(). We will add the new members into arch_uprobe, it would be better to assume that everything is zero by default. TODO: cleanup/fix the mess in validate_insn_bits() paths: - validate_insn_64bits() and validate_insn_32bits() should be unified. - "ifdef" is not used consistently; if good_insns_64 depends on CONFIG_X86_64, then probably good_insns_32 should depend on CONFIG_X86_32/EMULATION - the usage of mm->context.ia32_compat looks wrong if the task is TIF_X32. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Acked-by: Srikar Dronamraju <[email protected]>
2014-04-17uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()Oleg Nesterov1-21/+2
UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This patch kills UPROBE_SKIP_SSTEP. I never understood why it was added; not only it doesn't help, it harms. It can only help to avoid arch_uprobe_skip_sstep() if it was already called before and failed. But this is ugly, if we want to know whether we can emulate this instruction or not we should do this analysis in arch_uprobe_analyze_insn(), not when we hit this probe for the first time. And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can fail or not depending on the task/register state, if this insn can be emulated but, say, put_user() fails we need to xol it this time, but this doesn't mean we shouldn't try to emulate it when this or another thread hits this bp next time. And this is the actual reason for this change. We need to emulate the "call" insn, but push(return-address) can obviously fail. Per-arch notes: x86: __skip_sstep() can only emulate "rep;nop". With this change it will be called every time and most probably for no reason. This will be fixed by the next changes. We need to change this suboptimal code anyway. arm: Should not be affected. It has its own "bool simulate" flag checked in arch_uprobe_skip_sstep(). ppc: Looks like, it can emulate almost everything. Does it actually need to record the fact that emulate_step() failed? Hopefully not. But if yes, it can add the ppc- specific flag into arch_uprobe. TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(), Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: David A. Long <[email protected]> Reviewed-by: Jim Keniston <[email protected]> Acked-by: Srikar Dronamraju <[email protected]>
2014-04-17kprobes/x86: Fix page-fault handling logicMasami Hiramatsu1-9/+7
Current kprobes in-kernel page fault handler doesn't expect that its single-stepping can be interrupted by an NMI handler which may cause a page fault(e.g. perf with callback tracing). In that case, the page-fault handled by kprobes and it misunderstands the page-fault has been caused by the single-stepping code and tries to recover IP address to probed address. But the truth is the page-fault has been caused by the NMI handler, and do_page_fault failes to handle real page fault because the IP address is modified and causes Kernel BUGs like below. ---- [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40 To handle this correctly, I fixed the kprobes fault handler to ensure the faulted ip address is its own single-step buffer instead of checking current kprobe state. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Sandeepa Prabhu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-17Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar18-76/+230
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core Pull perf/core improvements and fixes from Jiri Olsa: User visible changes: * Add --percentage option to control absolute/relative percentage output (Namhyung Kim) Plumbing changes: * Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by completion scripts (Ramkumar Ramachandra) Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2014-04-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds7-45/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes: - reboot regression fix - build message spam fix - GPU quirk fix - 'make kvmconfig' fix plus the wire-up of the renameat2() system call on i386" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove the PCI reboot method from the default chain x86/build: Supress "Nothing to be done for ..." messages x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks x86/platform: Fix "make O=dir kvmconfig" i386: Wire up the renameat2() syscall
2014-04-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds11-61/+148
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes, plus a simple hardware-enablement patch for the Intel RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is still fine at this stage" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Instead of redirecting flex output, use -o perf tools: Fix double free in perf test 21 (code-reading.c) perf stat: Initialize statistics correctly perf bench: Set more defaults in the 'numa' suite perf bench: Fix segfault at the end of an 'all' execution perf bench: Update manpage to mention numa and futex perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi() perf probe: Fix to handle errors in line_range searching perf probe: Fix --line option behavior perf tools: Pick up libdw without explicit LIBDW_DIR MAINTAINERS: Change e-mail to kernel.org one perf callchains: Disable unwind libraries when libelf isn't found tools lib traceevent: Do not call warning() directly tools lib traceevent: Print event name when show warning if possible perf top: Fix documentation of invalid -s option perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "ARM VIC (Vectored Irq Controller) irqchip driver fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: vic: Properly chain the cascaded IRQs
2014-04-16Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds3-21/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "liblockdep fixes and mutex debugging fixes" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/mutex: Fix debug_mutexes tools/liblockdep: Add proper versioning to the shared obj tools/liblockdep: Ignore asmlinkage and visible
2014-04-16Merge tag 'fbdev-fixes-3.15' of ↵Linus Torvalds8-32/+87
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev fixes from Tomi Valkeinen: - fix build errors for bf54x-lq043fb and imxfb - fbcon fix for da8xx-fb - omapdss fixes for hdmi audio, irq handling and fclk calculation * tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: video: bf54x-lq043fb: fix build error OMAPDSS: Change struct reg_field to dispc_reg_field OMAPDSS: Take pixelclock unit change into account in hdmi_compute_acr() OMAPDSS: fix shared irq handlers video: imxfb: Select LCD_CLASS_DEVICE unconditionally OMAPDSS: fix rounding when calculating fclk rate video: da8xx-fb: Fix casting of info->pseudo_palette
2014-04-16Merge tag 'pinctrl-v3.15-2' of ↵Linus Torvalds10-1471/+1514
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pincontrol fixes from Linus Walleij: "A first set of pin control fixes for the v3.15 series: - Fix a couple of barnsjukdomar on the Rockchip driver. - Remove an idiotic debug print I happened to leave behind in the Nomadik driver. - Fixup the Qualcomm MSM interrupt handling code for the TLMM v2. - Three patches renaming the Broadcom Capri driver to BCM28155. This has been falling between the chairs for some time due to some cross-tree synchronization misunderstandings, now I'm fed up with this and just rename it in this -rc1 phase" * tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: fix typo in bindings documentation Update bcm_defconfig with new pinctrl CONFIG pinctrl: Rename Broadcom Capri pinctrl driver pinctrl: msm: Correct interrupt code for TLMM v2 pinctrl: nomadik: delete stray debug print pinctrl: rockchip: handle first half of rk3188-bank0 correctly pinctrl: rockchip: add return value to rockchip_set_mux pinctrl: rockchip: fix offset of mux registers for rk3188
2014-04-16Merge branch 'for-linus' of ↵Linus Torvalds14-36/+223
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 patches from Martin Schwidefsky: "An update to the oops output with additional information about the crash. The renameat2 system call is enabled. Two patches in regard to the PTR_ERR_OR_ZERO cleanup. And a bunch of bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO s390/sclp_vt220: Fix kernel panic due to early terminal input s390/compat: fix typo s390/uaccess: fix possible register corruption in strnlen_user_srst() s390: add 31 bit warning message s390: wire up sys_renameat2 s390: show_registers() should not map user space addresses to kernel symbols s390/mm: print control registers and page table walk on crash s390/smp: fix smp_stop_cpu() for !CONFIG_SMP s390: fix control register update
2014-04-16Merge tag 'please-pull-ia64-erratum' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull itanium erratum fix from Tony Luck: "Small workaround for a rare, but annoying, erratum #237" * tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
2014-04-16[IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)Tony Luck3-3/+3
April 2014 Itanium processor specification update: http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html describes this erratum: ========================================================================= 237. Under a complex set of conditions, store to load forwarding for a sub 8-byte load may complete incorrectly Problem: A load instruction may complete incorrectly when a code sequence using 4-byte or smaller load and store operations to the same address is executed in combination with specific timing of all the following concurrent conditions: store to load forwarding, alignment checking enabled, a mis-predicted branch, and complex cache utilization activity. Implication: The affected sub 8-byte instruction may complete incorrectly resulting in unpredictable system behavior. There is an extremely low probability of exposure due to the significant number of complex microarchitectural concurrent conditions required to encounter the erratum. Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling Hyper-Threading will significantly reduce exposure to the conditions that contribute to encountering the erratum. Status: See the Summary Table of Changes for the affected steppings. ========================================================================= [Table of changes essentially lists all models from McKinley to Tukwila] The PSR.ac bit controls whether the processor will always generate an unaligned reference trap (0x5a00) for a misaligned data access (when PSR.ac=1) or if it will let the access succeed when running on a cpu that implements logic to handle some unaligned accesses. Way back in 2008 in commit b704882e70d87d7f56db5ff17e2253f3fa90e4f3 [IA64] Rationalize kernel mode alignment checking we made the decision to always enable strict checking. We were already doing so in trap/interrupt context because the common preamble code set this bit - but the rest of supervisor code (and by inheritance user code) ran with PSR.ac=0. We now reverse that decision and set PSR.ac=0 everywhere in the kernel (also inherited by user processes). This will avoid the erratum using the method described in the Itanium specification update. Net effect for users is that the processor will handle unaligned access when it can (typically with a tiny performance bubble in the pipeline ... but much less invasive than taking a trap and having the OS perform the access). Signed-off-by: Tony Luck <[email protected]>
2014-04-16perf sched: Introduce --list-cmds for use by scriptsRamkumar Ramachandra2-5/+7
Signed-off-by: Ramkumar Ramachandra <[email protected]> Acked-by: David Ahern <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf lock: Introduce --list-cmds for use by scriptsRamkumar Ramachandra2-5/+7
Signed-off-by: Ramkumar Ramachandra <[email protected]> Acked-by: David Ahern <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf mem: Introduce --list-cmds for use by scriptsRamkumar Ramachandra2-8/+9
Signed-off-by: Ramkumar Ramachandra <[email protected]> Acked-by: David Ahern <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf kmem: Introduce --list-cmds for use by scriptsRamkumar Ramachandra2-5/+7
Signed-off-by: Ramkumar Ramachandra <[email protected]> Acked-by: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf tools: Show absolute percentage by defaultNamhyung Kim1-1/+0
Now perf report will show absolute percentage on filter entries by default. Suggested-by: Jiri Olsa <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf ui/tui: Add 'F' hotkey to toggle percentage outputNamhyung Kim1-0/+4
Add 'F' hotkey to toggle relative and absolute percentage of filtered entries. Suggested-by: Jiri Olsa <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf tools: Add hist.percentage config optionNamhyung Kim4-0/+15
Add hist.percentage option for setting default value of the symbol_conf.filter_relative. It affects the output of various perf commands (like perf report, top and diff) only if filter(s) applied. An user can write .perfconfig file like below to show absolute percentage of filtered entries by default: $ cat ~/.perfconfig [hist] percentage = absolute And it can be changed through command line: $ perf report --percentage relative Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf diff: Add --percentage optionNamhyung Kim2-11/+40
The --percentage option is for controlling overhead percentage displayed. It can only receive either of "relative" or "absolute" and affects -c delta output only. For more information, please see previous commit same thing done to "perf report". Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf top: Add --percentage optionNamhyung Kim5-20/+37
The --percentage option is for controlling overhead percentage displayed. It can only receive either of "relative" or "absolute". Move the parser callback function into a common location since it's used by multiple commands now. For more information, please see previous commit same thing done to "perf report". Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf report: Add --percentage optionNamhyung Kim9-46/+106
The --percentage option is for controlling overhead percentage displayed. It can only receive either of "relative" or "absolute". "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains the original value before and after the filter is applied. $ perf report -s comm # Overhead Command # ........ ............ # 74.19% cc1 7.61% gcc 6.11% as 4.35% sh 4.14% make 1.13% fixdep ... $ perf report -s comm -c cc1,gcc --percentage absolute # Overhead Command # ........ ............ # 74.19% cc1 7.61% gcc $ perf report -s comm -c cc1,gcc --percentage relative # Overhead Command # ........ ............ # 90.69% cc1 9.31% gcc Note that it has zero effect if no filter was applied. Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16perf hists: Add support for showing relative percentageNamhyung Kim4-3/+26
When filtering by thread, dso or symbol on TUI it also update total period so that the output shows different result than no filter - the percentage changed to relative to filtered entries only. Sometimes this is not desired since users might expect same results with filter. So new filtered_* fields to hists->stats to count them separately. They'll be controlled/used by user later. Signed-off-by: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Olsa <[email protected]>
2014-04-16x86: Remove the PCI reboot method from the default chainIngo Molnar2-42/+44
Steve reported a reboot hang and bisected it back to this commit: a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list He heroically tested all reboot methods and found the following: reboot=t # triple fault ok reboot=k # keyboard ctrl FAIL reboot=b # BIOS ok reboot=a # ACPI FAIL reboot=e # EFI FAIL [system has no EFI] reboot=p # PCI 0xcf9 FAIL And I think it's pretty obvious that we should only try PCI 0xcf9 as a last resort - if at all. The other observation is that (on this box) we should never try the PCI reboot method, but close with either the 'triple fault' or the 'BIOS' (terminal!) reboot methods. Thirdly, CF9_COND is a total misnomer - it should be something like CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ... So this patch fixes the worst problems: - it orders the actual reboot logic to follow the reboot ordering pattern - it was in a pretty random order before for no good reason. - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and BOOT_CF9_SAFE flags to make the code more obvious. - it tries the BIOS reboot method before the PCI reboot method. (Since 'BIOS' is a terminal reboot method resulting in a hang if it does not work, this is essentially equivalent to removing the PCI reboot method from the default reboot chain.) - just for the miraculous possibility of terminal (resulting in hang) reboot methods of triple fault or BIOS returning without having done their job, there's an ordering between them as well. Reported-and-bisected-and-tested-by: Steven Rostedt <[email protected]> Cc: Li Aubrey <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Garrett <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds63-279/+454
Pull networking fixes from David Miller: 1) Fix BPF filter validation of netlink attribute accesses, from Mathias Kruase. 2) Netfilter conntrack generation seqcount not initialized properly, from Andrey Vagin. 3) Fix comparison mask computation on big-endian in nft_cmp_fast(), from Patrick McHardy. 4) Properly limit MTU over ipv6, from Eric Dumazet. 5) Fix seccomp system call argument population on 32-bit, from Daniel Borkmann. 6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead skb->mac_len needs to be used. From Vlad Yasevich. 7) We have several cases of using socket based communications to implement a tunnel. For example, some tunnels are encapsulations over UDP so we use an internal kernel UDP socket to do the transmits. These tunnels should behave just like other software devices and pass the packets on down to the next layer. Most importantly we want the top-level socket (eg TCP) that created the traffic to be charged for the SKB memory. However, once you get into the IP output path, we have code that assumed that whatever was attached to skb->sk is an IP socket. To keep the top-level socket being charged for the SKB memory, whilst satisfying the needs of the IP output path, we now pass in an explicit 'sk' argument. From Eric Dumazet. 8) ping_init_sock() leaks group info, from Xiaoming Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) cxgb4: use the correct max size for firmware flash qlcnic: Fix MSI-X initialization code ip6_gre: don't allow to remove the fb_tunnel_dev ipv4: add a sock pointer to dst->output() path. ipv4: add a sock pointer to ip_queue_xmit() driver/net: cosa driver uses udelay incorrectly at86rf230: fix __at86rf230_read_subreg function at86rf230: remove check if AVDD settled net: cadence: Add architecture dependencies net: Start with correct mac_len in skb_network_protocol Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" cxgb4: Save the correct mac addr for hw-loopback connections in the L2T net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF qlcnic: Do not disable SR-IOV when VFs are assigned to VMs qlcnic: Fix QLogic application/driver interface for virtual NIC configuration qlcnic: Fix PVID configuration on eSwitch port. qlcnic: Fix max ring count calculation qlcnic: Fix to send INIT_NIC_FUNC as first mailbox. qlcnic: Fix panic due to uninitialzed delayed_work struct in use. ...
2014-04-15cxgb4: use the correct max size for firmware flashSteve Wise1-1/+1
The wrong max fw size was being used and causing false "too big" errors running ethtool -f. Signed-off-by: Steve Wise <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-15qlcnic: Fix MSI-X initialization codeAlexander Gordeev1-12/+16
Function qlcnic_setup_tss_rss_intr() might enter endless loop in case pci_enable_msix() contiguously returns a positive number of MSI-Xs that could have been allocated. Besides, the function contains 'err = -EIO;' assignment that never could be reached. This update fixes the aforementioned issues. Cc: Shahed Shaikh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Alexander Gordeev <[email protected]> Acked-by: Shahed Shaikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-15ip6_gre: don't allow to remove the fb_tunnel_devNicolas Dichtel1-0/+10
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-15ipv4: add a sock pointer to dst->output() path.Eric Dumazet19-46/+74
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-15Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent Pull perf/urgent fixes from Jiri Olsa: * Instead of redirecting flex output, use -o (Cody P Schafer) * Fix double free in perf test 21 (Adrian Hunter) Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2014-04-15ipv4: add a sock pointer to ip_queue_xmit()Eric Dumazet10-13/+13
ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <[email protected]> Tested-by: Zhan Jianyu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-04-15irqchip: vic: Properly chain the cascaded IRQsLinus Walleij1-0/+6
We are flagging the parent IRQ as chained, then we must also make sure to call the chained_irq_[enter|exit] functions for things to work smoothly. Signed-off-by: Linus Walleij <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>