aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2019-05-15perf parse-regs: Improve error output when faced with unknown register nameArnaldo Carvalho de Melo1-2/+1
Add quotes around the register name and suggest using 'perf record -I?' to get the list of available registers. Before: # perf record -Idi,xmm20,xmm1 Warning: unknown register xmm20, check man page Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use -I ? to list register names # # perf record -Idi,xmm20,xmm1 Warning: unknown register "xmm20", check man page or run "perf record -I?" Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use -I ? to list register names # Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-15perf record: Fix suggestion to get list of registers usable with --user-regs ↵Arnaldo Carvalho de Melo1-2/+2
and --intr-regs $ perf record -h -I Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use -I ? to list register names $ m $ perf record -I ? Workload failed: No such file or directory $ After: $ perf record -h -I Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names $ $ perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names $ Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Fixes: bcc84ec65ad1 ("perf record: Add ability to name registers to record") Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-15perf tools: Speed up report for perf compiled with linwunwindJiri Olsa3-7/+12
When compiled with libunwind, perf does some preparatory work when processing side-band events. This is not needed when report actually don't unwind dwarf callchains, so it's disabled with dwarf_callchain_users bool. However we could move that check to higher level and shield more unwanted code for normal report processing, giving us following speed up on kernel build profile: Before: $ perf record make -j40 ... $ ll ../../perf.data -rw-------. 1 jolsa jolsa 461783932 Apr 26 09:11 perf.data $ perf stat -e cycles:u,instructions:u perf report -i perf.data > out Performance counter stats for 'perf report -i perf.data': 78,669,920,155 cycles:u 99,076,431,951 instructions:u # 1.26 insn per cycle 55.382823668 seconds time elapsed 27.512341000 seconds user 27.712871000 seconds sys After: $ perf stat -e cycles:u,instructions:u perf report -i perf.data > out Performance counter stats for 'perf report -i perf.data': 59,626,798,904 cycles:u 88,583,575,849 instructions:u # 1.49 insn per cycle 21.296935559 seconds time elapsed 20.010191000 seconds user 1.202935000 seconds sys The speed is higher with profile having many side-band events, because these trigger libunwind preparatory code. This does not apply for perf compiled with libdw for dwarf unwind, only for build with libunwind. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-15csky: Add support for libdwMao Han7-1/+237
This patch add support for DWARF register mappings and libdw registers initialization, which is used by perf callchain analyzing when --call-graph=dwarf is given. Here is the elfutils csky backend patch set: https://sourceware.org/ml/elfutils-devel/2019-q2/msg00007.html Signed-off-by: Mao Han <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-15perf test: Fix spelling mistake "leadking" -> "leaking"Colin Ian King1-2/+2
There are a couple of spelling mistakes in test assert messages. Fix them. Signed-off-by: Colin King <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-15perf annotate: Remove hist__account_cycles() from callbackJin Yao3-9/+8
The hist__account_cycles() function is executed when the hist_iter__branch_callback() is called. But it looks it's not necessary. In hist__account_cycles, it already walks on all branch entries. This patch moves the hist__account_cycles out of callback, now the data processing is much faster than before. Previous code has an issue that the ch[offset].num++ (in __symbol__account_cycles) is executed repeatedly since hist__account_cycles is called in each hist_iter__branch_callback, so the counting of ch[offset].num is not correct (too big). With this patch, the issue is fixed. And we don't need the code of "ch->reset >= ch->num / 2" to check if there are too many overlaps (in annotation__count_and_fill), otherwise some data would be hidden. Now, we can try, for example: perf record -b ... perf annotate or perf report -s symbol The before/after output should be no change. v3: --- Fix the crash in stdio mode. Like previous code, it needs the checking of ui__has_annotation() before hist__account_cycles() v2: --- 1. Cover the similar perf report 2. Remove the checking code "ch->reset >= ch->num / 2" Signed-off-by: Jin Yao <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds98-6896/+11123
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel changes were: - add support for Intel's "adaptive PEBS v4" - which embedds LBS data in PEBS records and can thus batch up and reduce the IRQ (NMI) rate significantly - reducing overhead and making call-graph profiling less intrusive. - add Intel CPU core and uncore support updates for Tremont, Icelake, - extend the x86 PMU constraints scheduler with 'constraint ranges' to better support Icelake hw constraints, - make x86 call-chain support work better with CONFIG_FRAME_POINTER=y - misc other changes Tooling changes: - updates to the main tools: 'perf record', 'perf trace', 'perf stat' - updated Intel and S/390 vendor events - libtraceevent updates - misc other updates and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER watchdog: Fix typo in comment perf/x86/intel: Add Tremont core PMU support perf/x86/intel/uncore: Add Intel Icelake uncore support perf/x86/msr: Add Icelake support perf/x86/intel/rapl: Add Icelake support perf/x86/intel/cstate: Add Icelake support perf/x86/intel: Add Icelake support perf/x86: Support constraint ranges perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them perf/x86/intel: Support adaptive PEBS v4 perf/x86/intel/ds: Extract code of event update in short period perf/x86/intel: Extract memory code PEBS parser for reuse perf/x86: Support outputting XMM registers perf/x86/intel: Force resched when TFA sysctl is modified perf/core: Add perf_pmu_resched() as global function perf/headers: Fix stale comment for struct perf_addr_filter perf/core: Make perf_swevent_init_cpu() static perf/x86: Add sanity checks to x86_schedule_events() perf/x86: Optimize x86_schedule_events() ...
2019-05-02perf tools: Remove needless asm/unistd.h include fixing build in some placesArnaldo Carvalho de Melo1-1/+0
We were including sys/syscall.h and asm/unistd.h, since sys/syscall.h includes asm/unistd.h, sometimes this leads to the redefinition of defines, breaking the build. Noticed on ARC with uCLibc. Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Rich Felker <[email protected]> Cc: Vineet Gupta <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02tools build: Add -ldl to the disassembler-four-args feature testArnaldo Carvalho de Melo1-1/+1
Thomas Backlund reported that the perf build was failing on the Mageia 7 distro, that is because it uses: cat /tmp/build/perf/feature/test-disassembler-four-args.make.output /usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin': /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243: undefined reference to `dlopen' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271: undefined reference to `dlsym' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256: undefined reference to `dlclose' /usr/bin/ld: /home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246: undefined reference to `dlerror' as we allow dynamic linking and loading Mageia 7 uses these linker flags: $ rpm --eval %ldflags  -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags So add -ldl to this feature LDFLAGS. Reported-by: Thomas Backlund <[email protected]> Tested-by: Thomas Backlund <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf cs-etm: Always allocate memory for cs_etm_queue::prev_packetLeo Yan1-5/+3
Robert Walker reported a segmentation fault is observed when process CoreSight trace data; this issue can be easily reproduced by the command 'perf report --itrace=i1000i' for decoding tracing data. If neither the 'b' flag (synthesize branches events) nor 'l' flag (synthesize last branch entries) are specified to option '--itrace', cs_etm_queue::prev_packet will not been initialised. After merging the code to support exception packets and sample flags, there introduced a number of uses of cs_etm_queue::prev_packet without checking whether it is valid, for these cases any accessing to uninitialised prev_packet will cause crash. As cs_etm_queue::prev_packet is used more widely now and it's already hard to follow which functions have been called in a context where the validity of cs_etm_queue::prev_packet has been checked, this patch always allocates memory for cs_etm_queue::prev_packet. Reported-by: Robert Walker <[email protected]> Suggested-by: Robert Walker <[email protected]> Signed-off-by: Leo Yan <[email protected]> Tested-by: Robert Walker <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki K Poulouse <[email protected]> Cc: [email protected] Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet") Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf cs-etm: Don't check cs_etm_queue::prev_packet validityLeo Yan1-5/+1
Since cs_etm_queue::prev_packet is allocated for all cases, it will never be NULL pointer; now validity checking prev_packet is pointless, remove all of them. Signed-off-by: Leo Yan <[email protected]> Tested-by: Robert Walker <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki K Poulouse <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf report: Report OOM in status line in the GTK UIThomas Richter1-3/+5
An -ENOMEM error is not reported in the GTK GUI. Instead this error message pops up on the screen: [root@m35lp76 perf]# ./perf report -i perf.data.error68-1 Processing events... [974K/3M] Error:failed to process sample 0xf4198 [0x8]: failed to process type: 68 However when I use the same perf.data file with --stdio it works: [root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \ | head -12 # Total Lost Samples: 0 # # Samples: 76K of event 'cycles' # Event count (approx.): 99056160000 # # Overhead Command Shared Object Symbol # ........ ............... ................. ......... # 8.81% find [kernel.kallsyms] [k] ftrace_likely_update 8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update 8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update 2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_update The sample precentage is a bit low..... The GUI always fails in the FINISHED_ROUND event (68) and does not indicate the reason why. When happened is the following. Perf report calls a lot of functions and down deep when a FINISHED_ROUND event is processed, these functions are called: perf_session__process_event() + perf_session__process_user_event() + process_finished_round() + ordered_events__flush() + __ordered_events__flush() + do_flush() + ordered_events__deliver_event() + perf_session__deliver_event() + machine__deliver_event() + perf_evlist__deliver_event() + process_sample_event() + hist_entry_iter_add() --> only called in GUI case!!! + hist_iter__report__callback() + symbol__inc_addr_sample() Now this functions runs out of memory and returns -ENOMEM. This is reported all the way up until function perf_session__process_event() returns to its caller, where -ENOMEM is changed to -EINVAL and processing stops: if ((skip = perf_session__process_event(session, event, head)) < 0) { pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n", head, event->header.size, event->header.type); err = -EINVAL; goto out_err; } This occurred in the FINISHED_ROUND event when it has to process some 10000 entries and ran out of memory. This patch indicates the root cause and displays it in the status line of ther perf report GUI. Output before (on GUI status line): 0xf4198 [0x8]: failed to process type: 68 Output after: 0xf4198 [0x8]: failed to process type: 68 [not enough memory] Committer notes: the 'skip' variable needs to be initialized to -EINVAL, so that when the size is less than sizeof(struct perf_event_attr) we avoid this valid compiler warning: util/session.c: In function ‘perf_session__process_events’: util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized] err = skip; ~~~~^~~~~~ util/session.c:1874:6: note: ‘skip’ was declared here s64 skip; ^~~~ cc1: all warnings being treated as errors Signed-off-by: Thomas Richter <[email protected]> Reviewed-by: Hendrik Brueckner <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf bench numa: Add define for RUSAGE_THREAD if not presentArnaldo Carvalho de Melo1-0/+4
While cross building perf to the ARC architecture on a fedora 30 host, we were failing with: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function ‘worker_thread’: bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’? getrusage(RUSAGE_THREAD, &rusage); ^~~~~~~~~~~~~ SIGEV_THREAD bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in [perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1 arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 [perfbuilder@60d5802468f6 perf]$ Trying to reproduce a report by Vineet, I noticed that, with just cross-built zlib and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define, check for that and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define in the system headers, check if it is defined in the 'perf bench numa' sources and define it if not. Now it builds and I have to figure out if the problem reported by Vineet only takes place if we have libelf or some other library available. Cc: Arnd Bergmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Cc: Namhyung Kim <[email protected]> Cc: Vineet Gupta <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf annotate: Fix build on 32 bit for BPF annotationThadeu Lima de Souza Cascardo1-4/+4
Commit 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") adds support for BPF programs annotations but the new code does not build on 32-bit. Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> Acked-by: Song Liu <[email protected]> Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-05-02perf bpf: Return value with unlocking in perf_env__find_btf()Bo YU1-1/+1
In perf_env__find_btf(), we're returning without unlocking "env->bpf_progs.lock". There may be cause lockdep issue. Detected by CoversityScan, CID# 1444762:(program hangs(LOCK)) Signed-off-by: Bo YU <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 2db7b1e0bd49d: (perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-17perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()Jiri Olsa1-1/+3
We don't return NULL when we don't find the bpf_prog_info_node, fix that. Signed-off-by: Jiri Olsa <[email protected]> Reported-by: Song Liu <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 3792cb2ff43b ("perf bpf: Save BTF in a rbtree in perf_env") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-17perf tools: Fix map reference countingJiri Olsa1-3/+1
By calling maps__insert() we assume to get 2 references on the map, which we relese within maps__remove call. However if there's already same map name, we currently don't bump the reference and can crash, like: Program received signal SIGABRT, Aborted. 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 #1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6 #2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131 #5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148 #6 map__put (map=0x1224df0) at util/map.c:299 #7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953 #8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959 #9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65 #10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728 #11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741 #12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362 #13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243 #14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322 #15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8, ... Add the map to the list and getting the reference event if we find the map with same name. Signed-off-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Saint-Etienne <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Fixes: 1e6285699b30 ("perf symbols: Fix slowness due to -ffunction-section") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-17perf evlist: Fix side band thread drainingJiri Olsa1-5/+9
Current perf_evlist__poll_thread() code could finish without draining the data. Adding the logic that makes sure we won't finish before the drain. Signed-off-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Fixes: 657ee5531903 ("perf evlist: Introduce side band thread") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-17perf tools: Check maps for bpf programsSong Liu2-1/+19
As reported by Jiri Olsa in: "[BUG] perf: intel_pt won't display kernel function" https://lore.kernel.org/lkml/20190403143738.GB32001@krava Recent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT broke --kallsyms option. This is because it broke test __map__is_kmodule. This patch fixes this by adding check for bpf program, so that these maps are not mistaken as kernel modules. Signed-off-by: Song Liu <[email protected]> Reported-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yonghong Song <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Fixes: 76193a94522f ("perf, bpf: Introduce PERF_RECORD_KSYMBOL") Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-17perf bpf: Return NULL when RB tree lookup fails in ↵Jiri Olsa1-1/+3
perf_env__find_bpf_prog_info() We currently don't return NULL in case we don't find the bpf_prog_info_node, fixing that. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: e4378f0cb90b ("perf bpf: Save bpf_prog_info in a rbtree in perf_env") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-16perf top: Always sample time to satisfy needs of use of ordered queuingJiri Olsa1-0/+1
Bastian reported broken 'perf top -p PID' command, it won't display any data. The problem is that for -p option we monitor single thread, so we don't enable time in samples, because it's not needed. However since commit 16c66bc167cc we use ordered queues to stash data plus later commits added logic for dropping samples in case there's big load and we don't keep up. All this needs timestamp for sample. Enabling it unconditionally for perf top. Reported-by: Bastian Beischer <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: bastian beischer <[email protected]> Fixes: 16c66bc167cc ("perf top: Add processing thread") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-16perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user)Mao Han1-6/+6
On 32-bits platform with more than 32 registers, the 64 bits mask is truncate to the lower 32 bits and the return value of hweight_long will always smaller than 32. When kernel outputs more than 32 registers, but the user perf program only counts 32, there will be a data mismatch result to overflow check fail. Signed-off-by: Mao Han <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Fixes: 6a21c0b5c2ab ("perf tools: Add core support for sampling intr machine state regs") Fixes: d03f2170546d ("perf tools: Expand perf_event__synthesize_sample()") Fixes: 0f6a30150ca2 ("perf tools: Support user regs and stack in sample parsing") Link: http://lkml.kernel.org/r/29ad7947dc8fd1ff0abd2093a72cc27a2446be9f.1554883878.git.han_mao@c-sky.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-16perf stat: Disable DIR_FORMAT feature for 'perf stat record'Jiri Olsa1-0/+1
Arnaldo reported assertion in perf stat record: assertion failed at util/header.c:875 There's no support for this in the 'perf state record' command, disable the feature for that case. Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 258031c017c3 ("perf header: Add DIR_FORMAT feature to describe directory data") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-16perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_viewAdrian Hunter1-1/+1
Fix following error using calls_view: Query failed: ambiguous column name: parent_id Unable to execute statement Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Fixes: 8ce9a7251d11 ("perf scripts python: export-to-sqlite.py: Export calls parent_id") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-16perf header: Fix lock/unlock imbalances when processing BPF/BTF infoGustavo A. R. Silva1-9/+13
Fix lock/unlock imbalances by refactoring the code a bit and adding calls to up_write() before return. Signed-off-by: Gustavo A. R. Silva <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Addresses-Coverity-ID: 1444315 ("Missing unlock") Addresses-Coverity-ID: 1444316 ("Missing unlock") Fixes: a70a1123174a ("perf bpf: Save BTF information as headers to perf.data") Fixes: 606f972b1361 ("perf bpf: Save bpf_prog_info information as headers to perf.data") Link: http://lkml.kernel.org/r/20190408173355.GA10501@embeddedor [ Simplified the exit path to have just one up_write() + return ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Silvermont to v14Andi Kleen3-5/+22
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update GoldmontPlus to v1.01Andi Kleen3-37/+51
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Goldmont to v13Andi Kleen4-1127/+127
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Bonnell to V4Andi Kleen2-2/+2
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update KnightsLanding events to v9Andi Kleen4-477/+474
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Haswell events to v28Andi Kleen4-200/+213
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update IvyBridge events to v21Andi Kleen2-9/+5
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update SandyBridge events to v16Andi Kleen7-1295/+1301
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update JakeTown events to v20Andi Kleen2-11/+7
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update IvyTown events to v20Andi Kleen1-4/+0
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update HaswellX events to v20Andi Kleen3-178/+177
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update BroadwellX events to v14Andi Kleen4-189/+186
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update SkylakeX events to v1.12Andi Kleen5-1047/+899
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Skylake events to v42Andi Kleen4-168/+3163
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Broadwell-DE events to v7Andi Kleen2-7/+3
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update Broadwell events to v23Andi Kleen5-1676/+1685
Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf vendor events intel: Update metrics from TMAM 3.5Andi Kleen11-385/+2321
Update all the Intel JSON metrics from Ahmad Yasin's TMAM 3.5 for Intel big core from Sandy Bridge to Cascade Lake. This has many improvements and new metircs - New TopDownL1_SMT group that provides a per SMT thread version of --topdown that does not require -a anymore. The drawback is increased multiplexing though since L1 TopDown does not fit into 4 generic counters anymore. - Added SMT aware versions of other metrics - Split SMT aware metrics into separate metrics to avoid unnecessary event collections - New metrics for better branch analysis: Estimated Branch_Mispredict_Costs, Instructions per taken Branch, Branch Instructions per Taken Branch, etc. - Instruction mix metrics: Instructions per load, Instructions per store, Instructions per Branch, Instructions per Call - New Cache metrics: Bandwidth to L1/L2/L3 caches. L1/L2/L3 misses per kilo instructions. memory level parallelism - New memory controller metrics: Normalized memory bandwidth in interval mode, Average memory latency, Average number of parallel read requests, - 3DXP persistent memory metrics for Cascade Lake: 3dxp read latency, 3dxp read/write bandwidth - Some other useful metrics like Instruction Level Parallelism, - Various other improvements. Not all metrics are available on all CPUs. Skylake has best coverage. Signed-off-by: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf record: Implement --mmap-flush=<number> optionAlexey Budankov7-12/+89
Implement a --mmap-flush option that specifies minimal number of bytes that is extracted from mmaped kernel buffer to store into a trace. The default option value is 1 byte what means every time trace writing thread finds some new data in the mmaped buffer the data is extracted, possibly compressed and written to a trace. $ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc $ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc The option is independent from -z setting, doesn't vary with compression level and can serve two purposes. The first purpose is to increase the compression ratio of a trace data. Larger data chunks are compressed more effectively so the implemented option allows specifying data chunk size to compress. Also at some cases executing more write syscalls with smaller data size can take longer than executing less write syscalls with bigger data size due to syscall overhead so extracting bigger data chunks specified by the option value could additionally decrease runtime overhead. The second purpose is to avoid self monitoring live-lock issue in system wide (-a) profiling mode. Profiling in system wide mode with compression (-a -z) can additionally induce data into the kernel buffers along with the data from monitored processes. If performance data rate and volume from the monitored processes is high then trace streaming and compression activity in the tool is also high. High tool process activity can lead to subtle live-lock effect when compression of single new byte from some of mmaped kernel buffer leads to generation of the next single byte at some mmaped buffer. So perf tool process ends up in endless self monitoring. Implemented synch parameter is the mean to force data move independently from the specified flush threshold value. Despite the provided flush value the tool needs capability to unconditionally drain memory buffers, at least in the end of the collection. Committer testing: Running with the default value, i.e. as soon as there is something to read go on consuming, we first write the synthesized events, small chunks of about 128 bytes: # perf trace -m 2048 --call-graph dwarf -e write -- perf record <SNIP> 101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) process_synthesized_event (/home/acme/bin/perf) perf_tool__process_synth_event (inlined) perf_event__synthesize_mmap_events (/home/acme/bin/perf) Then we move to reading the mmap buffers consuming the events put there by the kernel perf infrastructure: 107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) record__pushfn (/home/acme/bin/perf) perf_mmap__push (/home/acme/bin/perf) record__mmap_read_evlist (inlined) record__mmap_read_all (inlined) __cmd_record (inlined) cmd_record (/home/acme/bin/perf) 12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984 <SNIP same backtrace as in the 107.561 timestamp> 12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816 <SNIP same backtrace as in the 107.561 timestamp> 12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832 <SNIP same backtrace as in the 107.561 timestamp> If we limit it to write only when more than 16MB are available for reading, it throttles that to a quarter of the --mmap-pages set for 'perf record', which by default get to 528384 bytes, found out using 'record -v': mmap flush: 132096 mmap size 528384B With that in place all the writes coming from record__mmap_read_evlist(), i.e. from the mmap buffers setup by the kernel perf infrastructure were at least 132096 bytes long. Trying with a bigger mmap size: perf trace -e write perf record -v -m 2048 --mmap-flush 16M 74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888 74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256 74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232 74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032 74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520 74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688 74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760 Was again limited to a quarter of the mmap size: mmap flush: 2098176 mmap size 8392704B A warning about that would be good to have but can be added later, something like: "max flush is a quarter of the mmap size, if wanting to bump the mmap flush further, bump the mmap size as well using -m/--mmap-pages" Also rename the 'sync' parameters to 'synch' to keep tools/perf building with older glibcs: cc1: warnings being treated as errors builtin-record.c: In function 'record__mmap_read_evlist': builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here builtin-record.c: In function 'record__mmap_read_all': builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD definesAlexey Budankov3-0/+25
Implement libzstd feature check, NO_LIBZSTD and LIBZSTD_DIR defines to override Zstd library sources or disable the feature from the command line: $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd/sources/ clean all $ make -C tools/perf NO_LIBZSTD=1 clean all Auto detection feature status is reported just before compilation starts. If your system has some version of the zstd library preinstalled then the build system finds and uses it during the build. If you still prefer to compile with some other version of zstd library you have capability to refer the compilation to that version using LIBZSTD_DIR define. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf tools, tools lib traceevent: Rename "pevent" member of struct tep_event ↵Tzvetomir Stoyanov6-7/+7
to "tep" The member "pevent" of the struct tep_event is renamed to "tep". This makes the struct consistent with the chosen naming convention: tep (trace event parser), instead of the old pevent. Signed-off-by: Tzvetomir Stoyanov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/linux-trace-devel/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01tools tools, tools lib traceevent: Make traceevent APIs more consistentTzvetomir Stoyanov2-3/+3
Rename some traceevent APIs for consistency: tep_pid_is_registered() to tep_is_pid_registered() tep_file_bigendian() to tep_is_file_bigendian() to make the names and return values consistent with other tep_is_... APIs tep_data_lat_fmt() to tep_data_latency_format() to make the name more descriptive tep_host_bigendian() to tep_is_bigendian() tep_set_host_bigendian() to tep_set_local_bigendian() tep_is_host_bigendian() to tep_is_local_bigendian() "host" can be confused with VMs, and "local" is about the local machine. All tep_is_..._bigendian(struct tep_handle *tep) APIs return the saved data in the tep handle, while tep_is_bigendian() returns the running machine's endianness. All tep_is_... functions are modified to return bool value, instead of int. Signed-off-by: Tzvetomir Stoyanov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] [ Removed some extra parenthesis around return statements ] Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf list: Output tool eventsAndi Kleen3-2/+25
Add support in 'perf list' to output tool internal events, currently only 'duration_time'. Committer testing: $ perf list dur* List of pre-defined events (to be used in -e): duration_time [Tool event] Metric Groups: $ perf list sw List of pre-defined events (to be used in -e): alignment-faults [Software event] bpf-output [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event] duration_time [Tool event] $ perf list | grep duration duration_time [Tool event] [L1D miss outstandings duration in cycles] page walk duration are excluded in Skylake] load. EPT page walk duration are excluded in Skylake] page walk duration are excluded in Skylake] store. EPT page walk duration are excluded in Skylake] (instruction fetch) request. EPT page walk duration are excluded in instruction fetch request. EPT page walk duration are excluded in $ Signed-off-by: Andi Kleen <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf evsel: Support printing evsel name for 'duration_time'Andi Kleen1-1/+10
Implement printing the correct name for duration_time Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf stat: Implement duration_time as a proper eventAndi Kleen6-13/+86
The perf metric expression use 'duration_time' internally to normalize events. Normal 'perf stat' without -x also prints the duration time. But when using -x, the interval is not output anywhere, which is inconvenient for any post processing which often wants to normalize values to time. So implement 'duration_time' as a proper perf event that can be specified explicitely with -e. The previous implementation of 'duration_time' only worked for metric processing. This adds the concept of a tool event that is handled by the tool. On the kernel level it is still mapped to the dummy software event, but the values are not read anymore, but instead computed by the tool. Add proper plumbing to handle this in the event parser, and display it in 'perf stat'. We don't want 'duration_time' to be added up, so it's only printed for the first CPU. % perf stat -e duration_time,cycles true Performance counter stats for 'true': 555,476 ns duration_time 771,958 cycles 0.000555476 seconds time elapsed 0.000644000 seconds user 0.000000000 seconds sys Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-04-01perf stat: Revert checks for duration_timeAndi Kleen1-18/+0
This reverts e864c5ca145e ("perf stat: Hide internal duration_time counter") but doing it manually since the code has now moved to a different file. The next patch will properly implement duration_time as a full event, so no need to hide it anymore. Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>