aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2021-02-18perf intel-pt: Retain the last PIP packet payload as isAdrian Hunter5-18/+16
Retain the PIP packet payload as is, instead of just the CR3, because it contains also the VMX NR flag which is needed to track VM-Entry. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf intel_pt: Add vmlaunch and vmresume as branchesAdrian Hunter3-0/+17
In preparation to support Intel PT decoding of virtual machine traces, add vmlaunch and vmresume as branch instructions. Note, sample flags will show "VMentry" even if the VM-Entry fails. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf script: Add branch types for VM-Entry and VM-ExitAdrian Hunter3-1/+9
In preparation to support Intel PT decoding of virtual machine traces, add branch types for VM-Entry and VM-Exit. Note they are both treated as "calls" because the VM-Exit transfers control to a different address. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf auxtrace: Automatically group aux-output eventsAdrian Hunter3-0/+23
aux-output events need to have an AUX area event as the group leader. However, grouping events does not allow the AUX area event to be given an address filter because the --filter option must come after the event, which conflicts with the grouping syntax. To allow filtering in that case, automatically create a group since that is the requirement anyway. Example: (requires Intel Tremont) perf record -c 500 -e 'intel_pt//u' --filter 'filter main @ /bin/ls' -e 'cycles/aux-output/pp' ls Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf test: Fix unaligned access in sample parsing testNamhyung Kim1-1/+1
The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd8542d ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processingKan Liang4-14/+43
For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency. Current perf forces the var2_w to the data->ins_lat in the generic code. It works well for now because X86 is the only architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may bring problems once other architectures support the sample type. For example, the var2_w may be used to capture something else on PowerPC. Create two architecture specific functions to parse and synthesize the weight related samples. Move the X86 specific codes to the X86 version functions. Other architectures can implement their own functions later separately. Signed-off-by: Kan Liang <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf intel-pt: Add PSB eventsAdrian Hunter7-53/+251
Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis of code with Intel PT, it is useful to know if a timing bubble was caused by Intel PT or not. Add reporting of PSB events via perf script. PSB events are printed with the existing itrace 'p' option which also prints power and frequency changes. The PSB event contains the trace offset at which the PSB occurs, to allow easy reference back to the PSB+ packets. The PSB event timestamp is always the timestamp from the PSB+ TSC packet, and the ip is always the address from the PSB+ FUP packet. The code changes are non-trivial because the decoder must walk to the PSB+ FUP address before outputting the PSB event. Example: $ perf record -e intel_pt/cyc,psb_period=0/u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] $ perf script --itrace=p --ns perf 17981 [006] 25617.510820383: psb: psb offs: 0 0 [unknown] ([unknown]) perf 17981 [006] 25617.510820383: cbr: cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) uname 17981 [006] 25617.510889753: psb: psb offs: 0xb50 7f78c12a212e __GI___tunables_init+0xee (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510899162: psb: psb offs: 0x12d0 7f78c128af1c dl_main+0x93c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510939242: psb: psb offs: 0x1a50 7f78c128eefc _dl_map_object_from_fd+0x13c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510981274: psb: psb offs: 0x21c8 7f78c1296307 _dl_relocate_object+0x927 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510993034: psb: psb offs: 0x2948 7f78c12940e4 _dl_lookup_symbol_x+0x14 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511003871: psb: psb offs: 0x30c8 7f78c12937b3 do_lookup_x+0x2f3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511019854: psb: psb offs: 0x3850 7f78c1295eed _dl_relocate_object+0x50d (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511029015: psb: psb offs: 0x4390 7f78c12a855a strcmp+0xf6a (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511064876: psb: psb offs: 0x4b10 0 [unknown] ([unknown]) uname 17981 [006] 25617.511080762: psb: psb offs: 0x5290 7f78c11db53d _dl_addr+0x13d (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511086035: psb: psb offs: 0x5a08 7f78c11db538 _dl_addr+0x138 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511091381: psb: psb offs: 0x6190 7f78c11db534 _dl_addr+0x134 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511096681: psb: psb offs: 0x6910 7f78c11db4c3 _dl_addr+0xc3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511119520: psb: psb offs: 0x7090 7f78c10ada5e _nl_intern_locale_data+0x12e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511126584: psb: psb offs: 0x7818 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511132775: psb: psb offs: 0x8358 7f78c10c20c0 getenv+0xa0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511134598: psb: psb offs: 0x8ad0 7f78c10ada09 _nl_intern_locale_data+0xd9 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511135685: psb: psb offs: 0x9258 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511138322: psb: psb offs: 0x99d0 7f78c11fffd9 __strncmp_avx2+0x39 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511158907: psb: psb offs: 0xa150 0 [unknown] ([unknown]) Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf intel-pt: Fix IPC with CYC thresholdAdrian Hunter3-0/+41
The code assumed every CYC-eligible packet has a CYC packet, which is not the case when CYC thresholds are used. Fix by checking if a CYC packet is actually present in that case. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf intel-pt: Fix premature IPCAdrian Hunter3-11/+17
The code assumed a change in cycle count means accurate IPC. That is not correct, for example when sampling both branches and instructions, or at a FUP packet (which is not CYC-eligible) address. Fix by using an explicit flag to indicate when IPC can be sampled. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf intel-pt: Fix missing CYC processing in PSBAdrian Hunter1-0/+3
Add missing CYC packet processing when walking through PSB+. This improves the accuracy of timestamps that follow PSB+, until the next MTC. Fixes: 3d49807870f08 ("perf tools: Add new Intel PT packet definitions") Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf unwind: Set userdata for all __report_module() pathsDave Rigby1-3/+8
When locating the DWARF module for a given address, __find_debuginfo() requires a 'struct dso' passed via the userdata argument. However, this field is only set in __report_module() if the module is found in via dwfl_addrmodule(), not if it is found later via dwfl_report_elf(). Set userdata irrespective of how the DWARF module was found, as long as we found a module. Fixes: bf53fc6b5f41 ("perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder") Signed-off-by: Dave Rigby <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211801 Acked-by: Jan Kratochvil <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/linux-perf-users/[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf record: Fix continue profiling after draining the bufferYang Jihong3-1/+13
Commit da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") uses eventfd() to solve a rare race where the setting and checking of 'done' which add done_fd to pollfd. When draining buffer, revents of done_fd is 0 and evlist__filter_pollfd function returns a non-zero value. As a result, perf record does not stop profiling. The following simple scenarios can trigger this condition: # sleep 10 & # perf record -p $! After the sleep process exits, perf record should stop profiling and exit. However, perf record keeps running. If pollfd revents contains only POLLERR or POLLHUP, perf record indicates that buffer is draining and need to stop profiling. Use fdarray_flag__nonfilterable() to set done eventfd to nonfilterable objects, so that evlist__filter_pollfd() does not filter and check done eventfd. Fixes: da231338ec9c0987 ("perf record: Use an eventfd to wakeup when done") Signed-off-by: Yang Jihong <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf tools: Simplify the calculation of variablesJiapeng Chong1-1/+1
Fix the following coccicheck warnings: ./tools/perf/util/header.c:3809:18-20: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/1612497255-87189-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf vendor events arm64: Add JSON metrics for imx8mp DDR PerfJoakim Zhang2-0/+503
Add JSON metrics for imx8mp DDR Perf. Signed-off-by: Joakim Zhang <[email protected]> Reviewed-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sascha Hauer <[email protected]> <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf vendor events arm64: Add JSON metrics for imx8mq DDR PerfJoakim Zhang2-0/+55
Add JSON metrics for imx8mq DDR Perf. Signed-off-by: Joakim Zhang <[email protected]> Reviewed-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sascha Hauer <[email protected]> <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf vendor events arm64: Add JSON metrics for imx8mn DDR PerfJoakim Zhang2-0/+55
Add JSON metrics for imx8mn DDR Perf. Signed-off-by: Joakim Zhang <[email protected]> Reviewed-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sascha Hauer <[email protected]> <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-18perf vendor events arm64: Fix indentation of brackets in imx8mm metricsJoakim Zhang1-2/+2
Fix indentation of brackets in imx8mm metrics. Signed-off-by: Joakim Zhang <[email protected]> Reviewed-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sascha Hauer <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-17perf annotate: Do not jump after 'k' is pressedMartin Liška1-1/+1
Do not jump when 'k' is pressed, the cursor show stay where it is. Right now, it jumps to the currently selected hot instruction. Signed-off-by: Martin Liška <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-17perf metricgroup: Remove unneeded semicolonYang Li1-1/+1
Eliminate the following coccicheck warning: ./tools/perf/util/metricgroup.c:382:3-4: Unneeded semicolon Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: yang li <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-17perf tools: Add OCaml demanglingFabian Hemmer11-11/+156
Detect symbols generated by the OCaml compiler based on their prefix. Demangle OCaml symbols, returning a newly allocated string (like the existing Java demangling functionality). Move a helper function (hex) from tests/code-reading.c to util/string.c To test: echo 'Printf.printf "%d\n" (Random.int 42)' > test.ml perf record ocamlopt.opt test.ml perf report -d ocamlopt.opt Signed-off-by: Fabian Hemmer <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> LPU-Reference: 20210203211537.b25ytjb6dq5jfbwx@nyu Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-17perf symbols: Resolve symbols against debug file firstJiri Slaby1-1/+15
With LTO, there are symbols like these: /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug 10305: 0000000000955fa4 0 NOTYPE LOCAL DEFAULT 29 Predicate.cpp.2bc410e7 This comes from a runtime/debug split done by the standard way: objcopy --only-keep-debug $runtime $debug objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime perf currently cannot resolve such symbols (relicts of LTO), as section 29 exists only in the debug file (29 is .debug_info). And perf resolves symbols only against runtime file. This results in all symbols from such a library being unresolved: 0.38% main2 libantlr4-runtime.so.4.8 [.] 0x00000000000671e0 So try resolving against the debug file first. And only if it fails (the section has NOBITS set), try runtime file. We can do this, as "objcopy --only-keep-debug" per documentation preserves all sections, but clears data of some of them (the runtime ones) and marks them as NOBITS. The correct result is now: 0.38% main2 libantlr4-runtime.so.4.8 [.] antlr4::IntStream::~IntStream Note that these LTO symbols are properly skipped anyway as they belong neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr, secstrs) is true). Signed-off-by: Jiri Slaby <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-1/+0
Daniel Borkmann says: ==================== pull-request: bpf-next 2021-02-16 The following pull-request contains BPF updates for your *net-next* tree. There's a small merge conflict between 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") from net-next tree and 9cacf81f8161 ("bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows: [...] lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc, &tss); err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname, &zc, &len, err); release_sock(sk); [...] We've added 116 non-merge commits during the last 27 day(s) which contain a total of 156 files changed, 5662 insertions(+), 1489 deletions(-). The main changes are: 1) Adds support of pointers to types with known size among global function args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov. 2) Add bpf_iter for task_vma which can be used to generate information similar to /proc/pid/maps, from Song Liu. 3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow rewriting bind user ports from BPF side below the ip_unprivileged_port_start range, both from Stanislav Fomichev. 4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map as well as per-cpu maps for the latter, from Alexei Starovoitov. 5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer for sleepable programs, both from KP Singh. 6) Extend verifier to enable variable offset read/write access to the BPF program stack, from Andrei Matei. 7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to query device MTU from programs, from Jesper Dangaard Brouer. 8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF tracing programs, from Florent Revest. 9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when otherwise too many passes are required, from Gary Lin. 10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function verification both related to zero-extension handling, from Ilya Leoshkevich. 11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa. 12) Batch of AF_XDP selftest cleanups and small performance improvement for libbpf's xsk map redirect for newer kernels, from Björn Töpel. 13) Follow-up BPF doc and verifier improvements around atomics with BPF_FETCH, from Brendan Jackman. 14) Permit zero-sized data sections e.g. if ELF .rodata section contains read-only data from local variables, from Yonghong Song. 15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-02-16Merge branch 'perf/urgent' into perf/coreArnaldo Carvalho de Melo4-10/+24
To get some fixes that didn't made into 5.11. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-16perf arm-spe: Set sample's data source fieldLeo Yan1-9/+60
The sample structure contains the field 'data_src' which is used to tell the data operation attributions, e.g. operation type is loading or storing, cache level, it's snooping or remote accessing, etc. At the end, the 'data_src' will be parsed by perf mem/c2c tools to display human readable strings. This patch is to fill the 'data_src' field in the synthesized samples base on different types. Currently perf tool can display statistics for L1/L2/L3 caches but it doesn't support the 'last level cache'. To fit to current implementation, 'data_src' field uses L3 cache for last level cache. Before this commit, perf mem report looks like this: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 81.56% 61945 0 N/A [.] 0x00000000000009d8 serial_c [.] 0000000000000000 [unknown] N/A N/A 18.44% 14003 0 N/A [.] 0x0000000000000828 serial_c [.] 0000000000000000 [unknown] N/A N/A Now on a system with Arm SPE, addresses and access types are displayed: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 0.43% 324 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794e00 anon N/A Walker hit 0.42% 322 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794580 anon N/A Walker hit Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-16perf arm-spe: Synthesize memory eventLeo Yan1-0/+30
The memory event can deliver two benefits: - The first benefit is the memory event can give out global view for memory accessing, rather than organizing events with scatter mode (e.g. uses separate event for L1 cache, last level cache, etc) which which can only display a event for single memory type, memory events include all memory accessing so it can display the data accessing cross memory levels in the same view; - The second benefit is the sample generation might introduce a big overhead and need to wait for long time for Perf reporting, we can specify itrace option '--itrace=M' to filter out other events and only output memory events, this can significantly reduce the overhead caused by generating samples. This patch is to enable memory event for Arm SPE. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-16perf arm-spe: Fill address info for samplesLeo Yan1-20/+30
To properly handle memory and branch samples, this patch divides into two functions for generating samples: arm_spe__synth_mem_sample() is for synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is to synthesize branch samples. Arm SPE backend decoder has passed virtual and physical address through packets, the address info is stored into the synthesize samples in the function arm_spe__synth_mem_sample(). Committer notes: Fixed this: 36 46.77 fedora:27 : FAIL clang version 5.0.2 (tags/RELEASE_502/final) util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; ^ util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; By using = { .ip = 0, }; Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf probe: Fix kretprobe issue caused by GCC bugJianlin Lv1-0/+10
Perf failed to add a kretprobe event with debuginfo of vmlinux which is compiled by gcc with -fpatchable-function-entry option enabled. The same issue with kernel module. Issue: # perf probe -v 'kernel_clone%return $retval' ...... Writing event: r:probe/kernel_clone__return _text+599624 $retval Failed to write event: Invalid argument Error: Failed to add events. Reason: Invalid argument (Code: -22) # cat /sys/kernel/debug/tracing/error_log [156.75] trace_kprobe: error: Retprobe address must be an function entry Command: r:probe/kernel_clone__return _text+599624 $retval ^ # llvm-dwarfdump vmlinux |grep -A 10 -w 0x00df2c2b 0x00df2c2b: DW_TAG_subprogram DW_AT_external (true) DW_AT_name ("kernel_clone") DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c") DW_AT_decl_line (2423) DW_AT_decl_column (0x07) DW_AT_prototyped (true) DW_AT_type (0x00dcd492 "pid_t") DW_AT_low_pc (0xffff800010092648) DW_AT_high_pc (0xffff800010092b9c) DW_AT_frame_base (DW_OP_call_frame_cfa) # cat /proc/kallsyms |grep kernel_clone ffff800010092640 T kernel_clone # readelf -s vmlinux |grep -i kernel_clone 183173: ffff800010092640 1372 FUNC GLOBAL DEFAULT 2 kernel_clone # objdump -d vmlinux |grep -A 10 -w \<kernel_clone\>: ffff800010092640 <kernel_clone>: ffff800010092640: d503201f nop ffff800010092644: d503201f nop ffff800010092648: d503233f paciasp ffff80001009264c: a9b87bfd stp x29, x30, [sp, #-128]! ffff800010092650: 910003fd mov x29, sp ffff800010092654: a90153f3 stp x19, x20, [sp, #16] The entry address of kernel_clone converted by debuginfo is _text+599624 (0x92648), which is consistent with the value of DW_AT_low_pc attribute. But the symbolic address of kernel_clone from /proc/kallsyms is ffff800010092640. This issue is found on arm64, -fpatchable-function-entry=2 is enabled when CONFIG_DYNAMIC_FTRACE_WITH_REGS=y; Just as objdump displayed the assembler contents of kernel_clone, GCC generate 2 NOPs at the beginning of each function. kprobe_on_func_entry detects that (_text+599624) is not the entry address of the function, which leads to the failure of adding kretprobe event. kprobe_on_func_entry ->_kprobe_addr ->kallsyms_lookup_size_offset ->arch_kprobe_on_func_entry // FALSE The cause of the issue is that the first instruction in the compile unit indicated by DW_AT_low_pc does not include NOPs. This issue exists in all gcc versions that support -fpatchable-function-entry option. I have reported it to the GCC community: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Currently arm64 and PA-RISC may enable fpatchable-function-entry option. The kernel compiled with clang does not have this issue. FIX: This GCC issue only cause the registration failure of the kretprobe event which doesn't need debuginfo. So, stop using debuginfo for retprobe. map will be used to query the probe function address. Signed-off-by: Jianlin Lv <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Frank Ch. Eigler <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sumanth Korikkar <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf symbols: Fix return value when loading PE DSONicholas Fraser1-1/+3
The first time dso__load() was called on a PE file it always returned -1 error. This caused the first call to map__find_symbol() to always fail on a PE file so the first sample from each PE file always had symbol <unknown>. Subsequent samples succeed however because the DSO is already loaded. This fixes dso__load() to return 0 when successfully loading a DSO with libbfd. Fixes: eac9a4342e5447ca ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Nicholas Fraser <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: Huw Davies <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Remi Bernon <[email protected]> Cc: Song Liu <[email protected]> Cc: Tommi Rantala <[email protected]> Cc: Ulrich Czekalla <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf symbols: Make dso__load_bfd_symbols() load PE files from debug cache onlyNicholas Fraser1-7/+1
dso__load_bfd_symbols() attempts to load a DSO at its original path, then closes it and loads the file in the debug cache. This is incorrect. It should ignore the original file and work with only the debug cache. The original file may have changed or may not even exist, for example if the debug cache has been transferred to another machine via "perf archive". This fix makes it only load the file in the debug cache. Further notes from Nicholas: dso__load_bfd_symbols() is called in a loop from dso__load() for a variety of paths. These are generated by the various DSO_BINARY_TYPEs in the binary_type_symtab list at the top of util/symbol.c. In each case the debugfile passed to dso__load_bfd_symbols() is the path to try. One of those iterations (the first one I believe) passes the original path as the debugfile. If the file still exists at the original path, this is the one that ends up being used in case the debugcache was deleted or the PE file doesn't have a build-id. A later iteration (BUILD_ID_CACHE) passes debugfile as the file in the debugcache if it has a build-id. Even if the file was previously loaded at its original path, (if I understand correctly) this load will override it so the debugcache file ends up being used. Committer notes: So if it fails to find in the cache, it will eventually hope for the best and look at the path in the local filesystem, which in many cases is enough. At some point we need to switch from this "hope for the best" approach to one that warns the user that there is no guarantee, if no buildid is present, that just by looking at the pathname the symbolisation will work. Signed-off-by: Nicholas Fraser <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: Huw Davies <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Remi Bernon <[email protected]> Cc: Song Liu <[email protected]> Cc: Tommi Rantala <[email protected]> Cc: Ulrich Czekalla <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf arm-spe: Store operation type in packetLeo Yan2-0/+12
This patch is to store operation type in packet structure. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf arm-spe: Store memory address in packetLeo Yan2-0/+6
This patch is to store virtual and physical memory addresses in packet, which will be used for memory samples. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf arm-spe: Enable sample type PERF_SAMPLE_DATA_SRCLeo Yan1-1/+1
This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in the perf data, when output the tracing data, it tells tools that it contains data source in the memory event. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Andre Przywara <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf env: Remove unneeded internal/cpumap inclusionsIan Rogers9-9/+0
Minor cleanup. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-12perf tools: Remove unused xyarray.c as it was moved to tools/lib/perfIan Rogers1-33/+0
Migrated to libperf in: 4b247fa7314ce482 ("libperf: Adopt xyarray class from perf") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf symbols: Use (long) for iterator for bfd symbolsDmitry Safonov1-2/+1
GCC (GCC) 8.4.0 20200304 fails to build perf with: : util/symbol.c: In function 'dso__load_bfd_symbols': : util/symbol.c:1626:16: error: comparison of integer expressions of different signednes : for (i = 0; i < symbols_count; ++i) { : ^ : util/symbol.c:1632:16: error: comparison of integer expressions of different signednes : while (i + 1 < symbols_count && : ^ : util/symbol.c:1637:13: error: comparison of integer expressions of different signednes : if (i + 1 < symbols_count && : ^ : cc1: all warnings being treated as errors It's unlikely that the symtable will be that big, but the fix is an oneliner and as perf has CORE_CFLAGS += -Wextra, which makes build to fail together with CORE_CFLAGS += -Werror Fixes: eac9a4342e54 ("perf symbols: Try reading the symbol table with libbfd") Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Jacek Caban <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Remi Bernon <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf annotate: Fix jump parsing for C++ code.Martin Liška2-0/+9
Considering the following testcase: int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } 'perf annotate' displays: 86.52 │40055e: → ja 40056c <foo(int, int)+0x26> 13.37 │400560: mov -0x18(%rbp),%eax │400563: add %eax,-0x14(%rbp) │400566: addl $0x1,-0x4(%rbp) 0.11 │40056a: → jmp 400557 <foo(int, int)+0x11> │40056c: mov -0x14(%rbp),%eax │40056f: pop %rbp and the 'ja 40056c' does not link to the location in the function. It's caused by fact that comma is wrongly parsed, it's part of function signature. With my patch I see: 86.52 │ ┌──ja 26 13.37 │ │ mov -0x18(%rbp),%eax │ │ add %eax,-0x14(%rbp) │ │ addl $0x1,-0x4(%rbp) 0.11 │ │↑ jmp 11 │26:└─→mov -0x14(%rbp),%eax and 'o' output prints: 86.52 │4005┌── ↓ ja 40056c <foo(int, int)+0x26> 13.37 │4005│0: mov -0x18(%rbp),%eax │4005│3: add %eax,-0x14(%rbp) │4005│6: addl $0x1,-0x4(%rbp) 0.11 │4005│a: ↑ jmp 400557 <foo(int, int)+0x11> │4005└─→ mov -0x14(%rbp),%eax On the contrary, compiling the very same file with gcc -x c, the parsing is fine because function arguments are not displayed: jmp 400543 <foo+0x1d> Committer testing: Before: $ cat cpp_args_annotate.c int foo(int a, int b) { for (unsigned i = 0; i < 1000000000; i++) a += b; return a; } int main() { foo (3, 4); return 0; } $ gcc --version |& head -1 gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) $ gcc -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ] $ perf annotate --stdio2 foo Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo>: foo(): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) ↓ jmp 1d a += b; 13.45 13: mov -0x18(%rbp),%eax add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.09 1d: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.46 ↑ jbe 13 return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ I.e. works for C, now lets switch to C++: $ g++ -g cpp_args_annotate.c -o cpp_args_annotate $ perf record ./cpp_args_annotate [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ] $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 → ja 40112c <foo(int, int)+0x26> a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 → jmp 401117 <foo(int, int)+0x11> return a; mov -0x14(%rbp),%eax } pop %rbp ← retq $ Reproduced. Now with this patch: Reusing the C++ built binary, as we can see here: $ readelf -wi cpp_args_annotate | grep producer <c> DW_AT_producer : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g $ And furthermore: $ file cpp_args_annotate cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped $ perf buildid-list -i cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 $ perf buildid-list | grep cpp_args_annotate 4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate $ It now works: $ perf annotate --stdio2 foo Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period] foo() /home/acme/c/cpp_args_annotate Percent 0000000000401106 <foo(int, int)>: foo(int, int): int foo(int a, int b) { push %rbp mov %rsp,%rbp mov %edi,-0x14(%rbp) mov %esi,-0x18(%rbp) for (unsigned i = 0; i < 1000000000; i++) movl $0x0,-0x4(%rbp) 11: cmpl $0x3b9ac9ff,-0x4(%rbp) 86.53 ↓ ja 26 a += b; 13.32 mov -0x18(%rbp),%eax 0.00 add %eax,-0x14(%rbp) for (unsigned i = 0; i < 1000000000; i++) addl $0x1,-0x4(%rbp) 0.15 ↑ jmp 11 return a; 26: mov -0x14(%rbp),%eax } pop %rbp ← retq $ Signed-off-by: Martin Liška <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Slaby <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tools: Replace lkml.org links with loreKees Cook2-2/+2
As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links with lore"), replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. Signed-off-by: Kees Kook <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon 'lock' testJiri Olsa1-0/+38
Add a test for the perf daemon 'lock' command ensuring only one instance of daemon can run over one base directory. Committer testing: [root@five ~]# perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 793255 test daemon list test daemon reconfig test daemon stop test daemon signal signal 12 sent to session 'test [793506]' signal 12 sent to session 'test [793506]' test daemon ping test daemon lock test child finished with 0 ---- end ---- daemon operations: Ok [root@five ~]# Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon 'ping' command testJiri Olsa1-0/+40
Add a test for the perf daemon 'ping' command. The tests verifies the ping command gets proper answer from sessions. Committer testing: [root@five ~]# perf test daemon 76: daemon operations : Ok [root@five ~]# perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 792143 test daemon list test daemon reconfig test daemon stop test daemon signal signal 12 sent to session 'test [792415]' signal 12 sent to session 'test [792415]' test daemon ping test child finished with 0 ---- end ---- daemon operations: Ok [root@five ~]# Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon 'signal' command testJiri Olsa1-0/+40
Add a test for the perf daemon 'signal' command. The test sends a signal to configured sessions and verifies the perf data files were generated accordingly. Committer testing: [root@five ~]# perf test daemon 76: daemon operations : Ok [root@five ~]# perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 790017 test daemon list test daemon reconfig test daemon stop test daemon signal signal 12 sent to session 'test [790268]' signal 12 sent to session 'test [790268]' test child finished with 0 ---- end ---- daemon operations: Ok [root@five ~]# Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon 'stop' command testJiri Olsa1-0/+54
Add a test for the perf daemon 'stop' command. The test stops the daemon and verifies all the configured sessions are properly terminated. Committer testing: [root@five ~]# time perf test daemon 76: daemon operations : Ok [root@five ~]# time perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 788560 test daemon list test daemon reconfig test daemon stop test child finished with 0 ---- end ---- daemon operations: Ok # Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon reconfig testJiri Olsa1-0/+119
Add a test for daemon reconfiguration. The test changes the configuration file and checks that the session is changed properly. Committer testing: [root@five ~]# perf test daemon 76: daemon operations : Ok [root@five ~]# time perf test daemon 76: daemon operations : Ok real 0m6.055s user 0m0.174s sys 0m0.147s [root@five ~]# time perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 786863 test daemon list test daemon reconfig test child finished with 0 ---- end ---- daemon operations: Ok real 0m6.127s user 0m0.222s sys 0m0.165s [root@five ~]# Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf tests: Add daemon 'list' command testJiri Olsa1-0/+184
Add test for basic perf daemon listing via the CSV output mode (-x option). Check that the configured sessions display expected values. Committer testing: [root@five ~]# perf test daemon 76: daemon operations : Ok [root@five ~]# [root@five ~]# perf test -v daemon 76: daemon operations : --- start --- test child forked, pid 785037 test daemon list test child finished with 0 ---- end ---- daemon operations: Ok [root@five ~]# Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add examples to man pageJiri Olsa1-0/+98
Add usage examples to the man page. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add up time for daemon/session listJiri Olsa1-0/+20
Display up time for both daemon and sessions. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start Get the details with up time: # perf daemon -v [778315:daemon] base: /opt/perfdata output: /opt/perfdata/output lock: /opt/perfdata/lock up: 15 minutes [778316:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output control: /opt/perfdata/session-cycles/control ack: /opt/perfdata/session-cycles/ack up: 10 minutes [778317:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output control: /opt/perfdata/session-sched/control ack: /opt/perfdata/session-sched/ack up: 2 minutes Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Use control to stop sessionJiri Olsa1-10/+46
Use the 'stop' control command to stop perf record session. If that fails, fall back to current SIGTERM/SIGKILL pair. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add 'ping' commandJiri Olsa2-0/+157
Add a 'ping' command to verify that the 'perf record' session is up and operational. It's used in the following patches via test code to make sure 'perf record' is ready to receive signals. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Ping all sessions: # perf daemon ping OK cycles OK sched Ping specific session: # perf daemon ping --session sched OK sched Committer notes: Fixed up bug pointed by clang: Buggy: if (!pollfd.revents & POLLIN) Correct code: if (!(pollfd.revents & POLLIN)) clang warning: builtin-daemon.c:560:6: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if (!pollfd.revents & POLLIN) { ^ ~ builtin-daemon.c:560:6: note: add parentheses after the '!' to evaluate the bitwise operator first Also use designated initialized with pollfd, i.e.: struct pollfd pollfd = { .events = POLLIN, }; Instead of: struct pollfd pollfd = { 0, }; To get past: builtin-daemon.c:510:30: error: missing field 'events' initializer [-Werror,-Wmissing-field-initializers] struct pollfd pollfd = { 0, }; ^ 1 error generated. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Set control fifo for sessionJiri Olsa2-1/+28
Setup control fifos for session and add --control option to session arguments. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start Use can list control fifos with (control and ack files): # perf daemon -v [776459:daemon] base: /opt/perfdata output: /opt/perfdata/output lock: /opt/perfdata/lock [776460:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output control: /opt/perfdata/session-cycles/control ack: /opt/perfdata/session-cycles/ack [776461:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output control: /opt/perfdata/session-sched/control ack: /opt/perfdata/session-sched/ack Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Allow only one daemon over base directoryJiri Olsa2-1/+77
Add 'lock' file under daemon base and flock it, so only one perf daemon can run on top of it. Each daemon tries to create and lock BASE/lock file, if it's successful we are sure we're the only daemon running over the BASE. Once daemon is finished, file descriptor to lock file is closed and lock is released. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start And try once more: # perf daemon start failed: another perf daemon (pid 775594) owns /opt/perfdata will end up with an error, because there's already one running on top of /opt/perfdata. Committer notes: Provide lockf(F_TLOCK) when not available, i.e. transform: lockf(fd, F_TLOCK, 0); into: flock(fd, LOCK_EX | LOCK_NB); Which should be equivalent. Noticed when cross building to some odd Android NDK. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add 'stop' commandJiri Olsa2-0/+35
Add 'perf daemon stop' command to stop daemon process and all running sessions. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Stop the daemon # perf daemon stop Daemon is not running, nothing to connect to: # perf daemon connect error: Connection refused Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>