aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2019-06-10perf config: Bail out when a handler returns failure for a key-value pairArnaldo Carvalho de Melo1-2/+6
So perf_config() uses: int ret = 0; perf_config_set__for_each_entry(config_set, section, item) { ... ret = fn(); if (ret < 0) break; } return ret; Expecting that that break will imediatelly go to function exit to return that error value (ret). The problem is that perf_config_set__for_each_entry() expands into two nested for() loops, one traversing the sections in a config and the second the items in each of those sections, so we have to change that 'break' to a goto label right before that final 'return ret'. With that, for instance 'perf trace' now correctly bails out when a event that is requested to be added via its 'trace.add_events' ~/.perfconfig entry gets rejected by the kernel BPF verifier: # perf trace ls event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o' \___ Kernel verifier blocks program loading (add -v to see detail) Run 'perf list' for a list of valid events Error: wrong config key-value pair trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o # While before it would continue and explode later, when trying to find maps that would have been in place had that augmented_raw_syscalls.o precompiled BPF proggie been accepted by the, humm, bast... rigorous kernel BPF verifier 8-) Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Taeung Song <[email protected]> Cc: Yonghong Song <[email protected]> Fixes: 8a0a9c7e9146 ("perf config: Introduce new init() and exit()") Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): released under the gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 2 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 393Thomas Gleixner3-51/+3
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license not later! this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 305Thomas Gleixner3-6/+3
Based on 1 normalized pattern(s): licensed under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 6 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288Thomas Gleixner23-228/+23
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 251Thomas Gleixner7-14/+7
Based on 1 normalized pattern(s): released under the gpl v2 and only v2 not any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 12 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steve Winslow <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05perf db-export: Export IPC informationAdrian Hunter1-2/+6
Export cycle and instruction counts on samples and call-returns. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf thread-stack: Accumulate IPC informationAdrian Hunter2-0/+18
Cycle and instruction counts are added to the stack. The IPC of a function and all functions it calls, is also recorded. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packetsAdrian Hunter1-0/+51
When CYC packets are not available, it is still possible to count cycles using TSC/TMA/MTC timestamps. As the timestamp increments in TSC ticks, convert to CPU cycles using the current core-to-bus ratio. Do not accumulate cycles when control flow packet generation is not enabled, nor when time has been "lost", typically due to mwait, which is indicated by a TSC/TMA packet that is not part of PSB+. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ipAdrian Hunter1-6/+17
To make it easier to add new code for different TIP cases, separate each case. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Record when decoding PSB+ packetsAdrian Hunter1-10/+31
In preparation for using MTC packets to count cycles, record whether decoding is between a PSB and PSBEND packets. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Add support for samples to contain IPC ratioAdrian Hunter1-0/+29
Copy the incremental instruction count and cycle count onto 'instructions' and 'branches' samples. Because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf tools: Add IPC information to perf_sampleAdrian Hunter1-0/+2
Add counts of instructions and cycles, in order to represent instructions-per-cycle (IPC). Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Accumulate cycle count from CYC packetsAdrian Hunter2-1/+14
In preparation for providing instructions-per-cycle (IPC) information, accumulate cycle count from CYC packets. Although CYC packets are optional (requires config term 'cyc' to enable cycle-accurate mode when recording), the simplest way to count cycles is with CYC packets. The first complication is that cycles must be counted only when also counting instructions. That means when control flow packet generation is enabled i.e. between TIP.PGE and TIP.PGD packets. Also, sampling the cycle count follows the same rules as sampling the timestamp, that is, not before the instruction to which the decoder is walking is reached. In addition, the cycle count is not accurate for any but the first branch of a TNT packet. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf intel-pt: Factor out intel_pt_update_sample_timeAdrian Hunter1-8/+10
To eliminate some duplication and make the code more understandable, factor out intel_pt_update_sample_time. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf record: Allow mixing --user-regs with --call-graph=dwarfAlexey Budankov2-1/+12
When DWARF stacks were requested and at the same time that the user specifies a register set using the --user-regs option the full register context was being captured on samples: $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- stack_test2.g.O3 188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac ... FP chain: nr:0 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x53b .... BX 0x7ffedbdd3cc0 .... CX 0xffffffff .... DX 0x33d3a .... SI 0x7f09b74c38d0 .... DI 0x0 .... BP 0x401260 .... SP 0x7ffedbdd3cc0 .... IP 0x401236 .... FLAGS 0x20a .... CS 0x33 .... SS 0x2b .... R8 0x7f09b74c3800 .... R9 0x7f09b74c2da0 .... R10 0xfffffffffffff3ce .... R11 0x246 .... R12 0x401070 .... R13 0x7ffedbdd5db0 .... R14 0x0 .... R15 0x0 ... ustack: size 1024, offset 0xe0 . data_src: 0x5080021 ... thread: stack_test2.g.O:23828 ...... dso: /root/abudanko/stacks/stack_test2.g.O3 I.e. the --user-regs=IP,SP,BP was being ignored, being overridden by the needs of --call-graph=dwarf. After applying the change in this patch the sample data contains the user specified register, but making sure that at least the minimal set of register needed for DWARF unwinding (DWARF_MINIMAL_REGS) is requested. The user is warned that DWARF unwinding may not work if extra registers end up being needed. -g call-graph dwarf,K full_regs --user-regs=user_regs user_regs -g call-graph dwarf,K --user-regs=user_regs user_regs + DWARF_MINIMAL_REGS $ perf record -g --call-graph dwarf,1024 --user-regs=BP -- ls WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced. arch COPYING Documentation include Kbuild lbuild MAINTAINERS modules.builtin Module.symvers perf.data.old scripts System.map virt block CREDITS drivers init Kconfig lib Makefile modules.builtin.modinfo net README security tools vmlinux certs crypto fs ipc kernel LICENSES mm modules.order perf.data samples sound usr vmlinux.o [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ] 188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c ... FP chain: nr:0 ... user regs: mask 0x1c0 ABI 64-bit .... BP 0x401260 .... SP 0x7ffd3d85cc20 .... IP 0x401236 ... ustack: size 1024, offset 0x58 . data_src: 0x5080021 Committer notes: Detected build failures on arches where PERF_REGS_ is not available, such as debian:experimental-x-{mips,mips64,mipsel}, fedora 24 and 30 for ARC uClibc and glibc, reported to Alexey that provided a patch moving the DWARF_MINIMAL_REGS from evsel.c to util/perf_regs.h, where it is guarded by an HAVE_PERF_REGS_SUPPORT ifdef. Committer testing: # perf record --user-regs=bp,ax -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.955 MB perf.data (1773 samples) ] # perf script -F+uregs | grep AX: | head -5 perf 1719 [000] 181.272398: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272402: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272403: 8 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272405: 181 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272406: 4405 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 # perf record --call-graph=dwarf --user-regs=bp,ax -a sleep 1 WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced. [ perf record: Woken up 55 times to write data ] [ perf record: Captured and wrote 24.184 MB perf.data (2841 samples) ] [root@quaco ~]# perf script --hide-call-graph -F+uregs | grep AX: | head -5 perf 1729 [000] 211.268006: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268014: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268017: 5 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268020: 48 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268024: 490 cycles: ffffffffba00e471 intel_bts_enable_local+0x21 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db # Signed-off-by: Alexey Budankov <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-05perf symbols: Remove unused variable 'err'Leo Yan1-2/+1
Variable 'err' is defined but never used in function symsrc__init(), remove it and directly return -1 at the end of the function. Signed-off-by: Leo Yan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-06-03Merge tag 'v5.2-rc3' into perf/core, to pick up fixesIngo Molnar11-159/+11
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds4-12/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's a bunch of ring-buffer ordering fixes for a reproducible bug, plus a PEBS constraints regression fix. Plus tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers UAPI: Sync kvm.h headers with the kernel sources perf record: Fix s390 missing module symbol and warning for non-root users perf machine: Read also the end of the kernel perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms perf session: Add missing swap ops for namespace events perf namespace: Protect reading thread's namespace tools headers UAPI: Sync drm/drm.h with the kernel tools headers UAPI: Sync drm/i915_drm.h with the kernel tools headers UAPI: Sync linux/fs.h with the kernel tools headers UAPI: Sync linux/sched.h with the kernel tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel perf data: Fix 'strncat may truncate' build failure with recent gcc perf/ring-buffer: Use regular variables for nesting perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data perf/ring_buffer: Add ordering to rb->nest increment perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157Thomas Gleixner1-11/+1
Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1105 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner10-148/+10
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-05-28perf intel-pt: Rationalize intel_pt_sync_switch()'s use of next_tidAdrian Hunter1-2/+2
Returning 1 from intel_pt_sync_switch() causes the current tid to be set. That negates the need to keep next_tid anymore. Rationalize the code to that effect. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf intel-pt: Improve sync_switch by processing PERF_RECORD_SWITCH* in eventsAdrian Hunter1-1/+39
sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. Improve it by processing "context switch in" events. Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf python: Remove -fstack-protector-strong if clang doesn't have itArnaldo Carvalho de Melo1-0/+2
Some distros put -fstack-protector-strong in the compiler flags to be used to build python extensions, but then, the clang version in that distro doesn't know about that, only gcc does. Check if that is the case and remove it from the set of options used to build the python binding with clang. Case at hand: oraclelinux:7 $ head -2 /etc/os-release NAME="Oracle Linux Server" VERSION="7.6" $ grep stack-protector /usr/lib64/python2.7/_sysconfigdata.py | head -1 | cut -c-120 'CFLAGS': '-fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --para $ gcc version 4.8.5 20150623 (Red Hat 4.8.5-36.0.1) (GCC) clang version 3.4.2 (tags/RELEASE_34/dot2-final) clang: error: unknown argument: '-fstack-protector-strong' clang: error: unknown argument: '-fstack-protector-strong' error: command 'clang' failed with exit status 1 cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory make[2]: *** [/tmp/build/perf/python/perf.so] Error 1 Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf machine: Return NULL instead of null-terminating /proc/version arrayDonald Yandt1-2/+2
Return NULL instead of null-terminating version char array when fgets fails due to end-of-file or error. Signed-off-by: Donald Yandt <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yanmin Zhang <[email protected]> Fixes: 30ba5b0e66c8 ("perf machine: Null-terminate version char array upon fgets(/proc/version) error") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf version: Append 12 git SHA chars to the version stringArnaldo Carvalho de Melo1-1/+1
Bumping it from just 4: Before: $ perf -v perf version 5.2.rc1.g80978f $ After: $ perf -v perf version 5.2.rc1.g80978fc864c5 $ Requested-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf script: Remove superfluous BPF event titlesJiri Olsa1-2/+2
There's no need to display "ksymbol event with" text for the PERF_RECORD_KSYMBOL event and "bpf event with" test for the PERF_RECORD_BPF_EVENT event. Remove it so it also goes along with other side-band events display. Before: # perf script --show-bpf-events ... swapper 0 [000] 0.000000: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0ef971d len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 swapper 0 [000] 0.000000: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 36 After: # perf script --show-bpf-events ... swapper 0 [000] 0.000000: PERF_RECORD_KSYMBOL addr ffffffffc0ef971d len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 swapper 0 [000] 0.000000: PERF_RECORD_BPF_EVENT type 1, flags 0, id 36 Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf tests: Add map_groups__merge_in testJiri Olsa2-1/+3
Add map_groups__merge_in test to test the map_groups__merge_in function usage - merging kcore maps into existing eBPF maps. Committer testing: # perf test merge 59: map_groups__merge_in : Ok # perf test -v merge 59: map_groups__merge_in : --- start --- test child forked, pid 8349 test child finished with 0 ---- end ---- map_groups__merge_in: Ok # Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf script: Pad DSO name for --call-traceJiri Olsa2-0/+7
Pad the DSO name in --call-trace so we don't have the indent screwed by different DSO name lengths, as now for kernel there's also BPF code displayed. # perf-with-kcore record pt -e intel_pt//ku -- sleep 1 # perf-core/perf-with-kcore script pt --call-trace Before: sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]) kretprobe_perf_func sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]) trace_call_bpf sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464404: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_get_current_pid_tgid sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_ktime_get_ns sleep 3660 [16] 57036.806464725: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464725: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806465045: (bpf_prog_da4fe6b3d2c29b25_trace_return) __htab_map_lookup_elem sleep 3660 [16] 57036.806465366: ([kernel.kallsyms]) memcmp sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_probe_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) probe_kernel_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) __check_object_size sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) check_stack_object sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) copy_user_enhanced_fast_string sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_probe_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) probe_kernel_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) __check_object_size sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) check_stack_object sleep 3660 [16] 57036.806465687: ([kernel.kallsyms]) copy_user_enhanced_fast_string sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_get_current_uid_gid sleep 3660 [16] 57036.806466008: ([kernel.kallsyms]) from_kgid sleep 3660 [16] 57036.806466008: ([kernel.kallsyms]) from_kuid sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return) bpf_perf_event_output sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]) perf_event_output sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]) perf_prepare_sample sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]) perf_misc_flags sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806466328: ([kernel.kallsyms]) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806466328: ([kvm]) kvm_is_in_guest sleep 3660 [16] 57036.806466649: ([kernel.kallsyms]) __perf_event_header__init_id.isra.0 sleep 3660 [16] 57036.806466649: ([kernel.kallsyms]) perf_output_begin After: sleep 3660 [16] 57036.806464404: ([kernel.kallsyms] ) kretprobe_perf_func sleep 3660 [16] 57036.806464404: ([kernel.kallsyms] ) trace_call_bpf sleep 3660 [16] 57036.806464404: ([kernel.kallsyms] ) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464404: ([kernel.kallsyms] ) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_get_current_pid_tgid sleep 3660 [16] 57036.806464725: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_ktime_get_ns sleep 3660 [16] 57036.806464725: ([kernel.kallsyms] ) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806464725: ([kernel.kallsyms] ) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806465045: (bpf_prog_da4fe6b3d2c29b25_trace_return ) __htab_map_lookup_elem sleep 3660 [16] 57036.806465366: ([kernel.kallsyms] ) memcmp sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_probe_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) probe_kernel_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) __check_object_size sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) check_stack_object sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) copy_user_enhanced_fast_string sleep 3660 [16] 57036.806465687: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_probe_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) probe_kernel_read sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) __check_object_size sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) check_stack_object sleep 3660 [16] 57036.806465687: ([kernel.kallsyms] ) copy_user_enhanced_fast_string sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_get_current_uid_gid sleep 3660 [16] 57036.806466008: ([kernel.kallsyms] ) from_kgid sleep 3660 [16] 57036.806466008: ([kernel.kallsyms] ) from_kuid sleep 3660 [16] 57036.806466008: (bpf_prog_da4fe6b3d2c29b25_trace_return ) bpf_perf_event_output sleep 3660 [16] 57036.806466328: ([kernel.kallsyms] ) perf_event_output sleep 3660 [16] 57036.806466328: ([kernel.kallsyms] ) perf_prepare_sample sleep 3660 [16] 57036.806466328: ([kernel.kallsyms] ) perf_misc_flags sleep 3660 [16] 57036.806466328: ([kernel.kallsyms] ) __x86_indirect_thunk_rax sleep 3660 [16] 57036.806466328: ([kernel.kallsyms] ) __x86_indirect_thunk_rax Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf dso: Add BPF DSO read and size hooksJiri Olsa1-1/+48
Add BPF related code into DSO reading paths to return size (bpf_size) and read the BPF code (bpf_read). Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Use uintptr_t when casting from u64 to u8 pointers ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf dso: Simplify dso_cache__read functionJiri Olsa1-11/+6
There's no need for the while loop now, also we can connect two (ret > 0) condition legs together. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf dso: Separate generic code in dso_cache__readJiri Olsa1-21/+27
Move the file specific code in the dso_cache__read function to a separate file_read function. I'll add BPF specific code in the following patches. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf dso: Separate generic code in dso__data_file_size()Jiri Olsa1-7/+12
Moving file specific code in dso__data_file_size function into separate file_size function. I'll add bpf specific code in following patches. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf tools: Remove const from thread read accessorsNamhyung Kim3-9/+9
The namespaces and comm fields of a thread are protected by rwsem and require write access for it. So it ended up using a cast to remove the const qualifier. Let's get rid of the const then. Signed-off-by: Namhyung Kim <[email protected]> Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf tools: Preserve eBPF maps when loading kcoreJiri Olsa1-4/+93
We need to preserve eBPF maps even if they are covered by kcore, because we need to access eBPF dso for source data. Add the map_groups__merge_in function to do that. It merges a map into map_groups by splitting the new map within the existing map regions. Suggested-by: Adrian Hunter <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf machine: Keep zero in pgoff BPF mapJiri Olsa1-2/+2
With pgoff set to zero, the map__map_ip function will return BPF addresses based from 0, which is what we need when we read the data from a BPF DSO. Adding BPF symbols with mapped IP addresses as well. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf auxtrace: Fix itrace defaults for perf scriptAdrian Hunter1-1/+2
Commit 4eb068157121 ("perf script: Make itrace script default to all calls") does not work for the case when '--itrace' only is used, because default_no_sample is not being passed. Example: Before: $ perf record -e intel_pt/cyc/u ls $ perf script --itrace > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt differ After: $ perf script --itrace > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt are identical Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Fixes: 4eb068157121 ("perf script: Make itrace script default to all calls") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf intel-pt: Fix itrace defaults for perf scriptAdrian Hunter1-1/+2
Commit 4eb068157121 ("perf script: Make itrace script default to all calls") does not work because 'use_browser' is being used to determine whether to default to periodic sampling (i.e. better for perf report). The result is that nothing but CBR events display for perf script when no --itrace option is specified. Fix by using 'default_no_sample' and 'inject' instead. Example: Before: $ perf record -e intel_pt/cyc/u ls $ perf script > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt differ After: $ perf script > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt are identical Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] # v4.20+ Fixes: 90e457f7be08 ("perf tools: Add Intel PT support") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf machine: Read also the end of the kernelJiri Olsa1-9/+18
We mark the end of kernel based on the first module, but that could cover some bpf program maps. Reading _etext symbol if it's present to get precise kernel map end. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Thomas Richter <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf session: Add missing swap ops for namespace eventsNamhyung Kim1-0/+21
In case it's recorded in a different arch. Signed-off-by: Namhyung Kim <[email protected]> Cc: Hari Bathini <[email protected]> <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Fixes: f3b3614a284d ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf namespace: Protect reading thread's namespaceNamhyung Kim1-2/+13
It seems that the current code lacks holding the namespace lock in thread__namespaces(). Otherwise it can see inconsistent results. Signed-off-by: Namhyung Kim <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-28perf data: Fix 'strncat may truncate' build failure with recent gccShawn Landden1-1/+1
This strncat() is safe because the buffer was allocated with zalloc(), however gcc doesn't know that. Since the string always has 4 non-null bytes, just use memcpy() here. CC /home/shawn/linux/tools/perf/util/data-convert-bt.o In file included from /usr/include/string.h:494, from /home/shawn/linux/tools/lib/traceevent/event-parse.h:27, from util/data-convert-bt.c:22: In function ‘strncat’, inlined from ‘string_set_value’ at util/data-convert-bt.c:274:4: /usr/include/powerpc64le-linux-gnu/bits/string_fortified.h:136:10: error: ‘__builtin_strncat’ output may be truncated copying 4 bytes from a string of length 4 [-Werror=stringop-truncation] 136 | return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Shawn Landden <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> LPU-Reference: [email protected] Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-25perf-probe: Add user memory access attribute supportMasami Hiramatsu5-7/+33
Add user memory access attribute for kprobe event arguments. If a given 'local variable' is in user-space, User can specify memory access method by '@user' suffix. This is not only for string but also for data structure. If we access a field of data structure in user memory from kernel on some arch, it will fail. e.g. perf probe -a "sched_setscheduler param->sched_priority" This will fail to access the "param->sched_priority" because the param is __user pointer. Instead, we can now specify @user suffix for such argument. perf probe -a "sched_setscheduler param->sched_priority@user" Note that kernel memory access with "@user" must always fail on any arch. Link: http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2 Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-05-16perf stat: Support 'percore' event qualifierJin Yao2-7/+44
With this patch, we can use the 'percore' event qualifier in perf-stat. root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000 1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%) 1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%) 1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%) 1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%) 1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%) 1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%) 1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%) In this example, we count the event 'ref-cycles' per-core and per-CPU in one perf stat command-line. From the output, we can see: S0-C0 = CPU0 + CPU4 S0-C1 = CPU1 + CPU5 S0-C2 = CPU2 + CPU6 S0-C3 = CPU3 + CPU7 So the result is expected (tiny difference is ignored). Note that, the 'percore' event qualifier needs to use with option '-A'. Signed-off-by: Jin Yao <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-16perf stat: Factor out aggregate counts printingJin Yao1-25/+39
Move the aggregate counts printing to a new function print_counter_aggrdata, which will be used in following patches. Signed-off-by: Jin Yao <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-16perf tools: Add a 'percore' event qualifierJin Yao5-0/+34
Add a 'percore' event qualifier, like cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for both hardware threads in a core. We can already do this with --per-core, but it's often useful to do this together with other metrics that are collected per hardware thread. So we need to support this per-core counting on a event level. This can be implemented in only the user tool, no kernel support needed. v4: --- 1. Add Arnaldo's patch which updates the documentation for this new qualifier. 2. Rebase to latest perf/core branch v3: --- Simplify the code according to Jiri's comments. Before: "return term->val.percore ? true : false;" Now: "return term->val.percore;" v2: --- Change the qualifier name from 'coresum' to 'percore' according to comments from Jiri and Andi. Signed-off-by: Jin Yao <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-16perf intel-pt: Fix sample timestamp wrt non-taken branchesAdrian Hunter1-1/+4
The sample timestamp is updated to ensure that the timestamp represents the time of the sample and not a branch that the decoder is still walking towards. The sample timestamp is updated when the decoder returns, but the decoder does not return for non-taken branches. Update the sample timestamp then also. Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") was also a stable fix and appears, for example, in v4.4 stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample timestamp"). Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] # v4.4+ Fixes: 3f04d98e972b ("perf intel-pt: Improve sample timestamp") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-16perf intel-pt: Fix improved sample timestampAdrian Hunter1-3/+10
The decoder uses its current timestamp in samples. Usually that is a timestamp that has already passed, but in some cases it is a timestamp for a branch that the decoder is walking towards, and consequently hasn't reached. The intel_pt_sample_time() function decides which is which, but was not handling TNT packets exactly correctly. In the case of TNT, the timestamp applies to the first branch, so the decoder must first walk to that branch. That means intel_pt_sample_time() should return true for TNT, and this patch makes that change. However, if the first branch is a non-taken branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false for subsequent taken branches in the same TNT packet. To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to distinguish the cases. Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") was also a stable fix and appears, for example, in v4.4 stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample timestamp"). Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] # v4.4+ Fixes: 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-16perf intel-pt: Fix instructions sampling rateAdrian Hunter1-3/+10
The timestamp used to determine if an instruction sample is made, is an estimate based on the number of instructions since the last known timestamp. A consequence is that it might go backwards, which results in extra samples. Change it so that a sample is only made when the timestamp goes forwards. Note this does not affect a sampling period of 0 or sampling periods specified as a count of instructions. Example: Before: $ perf script --itrace=i10us ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 10 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 8 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 6 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 4 instructions:u: 7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222728: 16423 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222734: 12731 instructions:u: 7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so) ... After: $ perf script --itrace=i10us ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222728: 16479 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so) ... Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Fixes: f4aa081949e7b ("perf tools: Add Intel PT decoder") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>