aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2023-04-04perf list: Use relative path for tracepoint scanNamhyung Kim1-7/+31
Committer notes: Added missing #include <unistd.h> for the close() prototype to fix this on Alma Linux 8: 1 21.54 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC) util/print-events.c: In function 'print_tracepoint_events': util/print-events.c:103:4: error: implicit declaration of function 'close'; did you mean 'clone'? [-Werror=implicit-function-declaration] close(evt_fd); ^~~~~ clone Also use the newly added scandirat feature test to check if that function is available, providing a HAVE_SCANDIRAT_SUPPORT conditional warning to the user if it isn't available. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04tools build: Add a feature test for scandirat(), that is not implemented so ↵Arnaldo Carvalho de Melo1-0/+4
far in musl and uclibc We use it just when listing tracepoint events, and for root, so just emit a warning about it to get users to ask the library maintainers to implement it, as suggested in this systemd ticket: https://github.com/systemd/casync/issues/129 Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/ZCwv4z5Dh%[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf intel-pt: Fix CYC timestamps after standalone CBRAdrian Hunter1-0/+2
After a standalone CBR (not associated with TSC), update the cycles reference timestamp and reset the cycle count, so that CYC timestamps are calculated relative to that point with the new frequency. Fixes: cc33618619cefc6d ("perf tools: Add Intel PT support for decoding CYC packets") Signed-off-by: Adrian Hunter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf auxtrace: Fix address filter entire kernel sizeAdrian Hunter1-1/+4
kallsyms is not completely in address order. In find_entire_kern_cb(), calculate the kernel end from the maximum address not the last symbol. Example: Before: $ sudo cat /proc/kallsyms | grep ' [twTw] ' | tail -1 ffffffffc00b8bd0 t bpf_prog_6deef7357e7b4530 [bpf] $ sudo cat /proc/kallsyms | grep ' [twTw] ' | sort | tail -1 ffffffffc15e0cc0 t iwl_mvm_exit [iwlmvm] $ perf.d093603a05aa record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2ceba000 After: $ perf.8fb0f7a01f8e record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2e3e2000 Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Signed-off-by: Adrian Hunter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/storeRob Herring2-0/+12
Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging Extension (MTE) and memory operations (MOPS). The memory operations are memcpy and memset. Add support for decoding these new subclasses in the raw decoding. Reviewed-by: Leo Yan <[email protected] Signed-off-by: Rob Herring <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packetMike Leach2-18/+235
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet. Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping. Reviewed-by: James Clark <[email protected]> Signed-off-by: Mike Leach <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Darren Hart <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf cs-etm: Update record event to use new Trace ID protocolMike Leach1-10/+17
Trace IDs are now dynamically allocated. Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts. Trace ID will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record. Legacy ID algorithm renamed and retained for limited backward compatibility use. Reviewed-by: James Clark <[email protected]> Signed-off-by: Mike Leach <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Darren Hart <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf cs-etm: Move mapping of Trace ID and cpu into helper functionMike Leach3-35/+74
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file which if present will be used in preference to the AUXINFO values. To prepare for this we provide a helper functions to do the individual ID mapping, and one to extract the IDs from the completed metadata blocks. Reviewed-by: James Clark <[email protected]> Signed-off-by: Mike Leach <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Darren Hart <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf lock contention: Show detail failure reason for BPFNamhyung Kim4-11/+42
It can fail to collect lock stat from BPF for various reasons. For example, I've got a report that sometimes time calculation seems wrong in case of contended spinlocks. I suspect the time delta went negative for some reason. Count them separately and show in the output like below: $ sudo perf lock contention -abE5 sleep 10 contended total wait max wait avg wait type caller 13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14 10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27 9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36 114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5 132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62 === output for debug === bad: 1, total: 279 bad rate: 0.36 % histogram of failure reasons task: 1 stack: 0 time: 0 Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf lock contention: Fix debug stat if no contentionNamhyung Kim1-2/+2
It should not divide if the total number is 0. Otherwise it'd show NaN in the bad rate output. Also add a whitespace in the "output for debug" message. $ sudo perf lock contention -abv true Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols contended total wait max wait avg wait type caller === output for debug=== bad: 0, total: 0 bad rate: -nan % <------------------------- (here) histogram of events caused bad sequence acquire: 0 acquired: 0 contended: 0 release: 0 Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Update ivybridge and ivytownIan Rogers3-2/+18
Update to versions 24 and 23 respectively. Adds the event BR_MISP_EXEC.INDIRECT. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf bench numa: Fix type of loop iterator in do_work, it should be 'long'Andreas Herrmann1-1/+1
'j' is of type int and start/end are of type 'long'. Thus 'j' might become negative and cause segfault in access_data(). Fix it by using 'long' for 'j' as well. Reviewed-by: James Clark <[email protected]> Signed-off-by: Andreas Herrmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf symbol: Remove unused branch_callstackAdrian Hunter1-1/+0
branch_callstack was added by commit 8b7bad58efb7 ("perf callchain: Support handling complete branch stacks as histograms") but never used. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf top: Add --branch-history optionAdrian Hunter2-0/+21
Add --branch-history option, to act the same as that option does for perf report. Example: $ cat tcallf.c volatile a = 10000, b = 100000, c; __attribute__((noinline)) f2() { c = a / b; } __attribute__((noinline)) f1() { f2(); f2(); } main() { while (1) f1(); } $ gcc -w -g -o tcallf tcallf.c $ ./tcallf & [1] 29409 $ perf top -e cycles:u -t $(pidof tcallf) --stdio --no-children --branch-history PerfTop: 3819 irqs/sec kernel: 0.0% exact: 0.0% lost: 0/0 drop: 0/0 [4000Hz cycles:u], (target_tid: 29409) -------------------------------------------------------------------------------------------------------------------- 49.01% tcallf.c:5 [.] f2 tcallf | |--24.91%--f2 tcallf.c:4 | | | |--17.14%--f1 tcallf.c:11 (cycles:1) | | f1 tcallf.c:11 | | f2 tcallf.c:6 (cycles:3) | | f2 tcallf.c:4 | | f1 tcallf.c:10 (cycles:2) | | f1 tcallf.c:9 | | main tcallf.c:16 (cycles:1) | | main tcallf.c:16 | | main tcallf.c:16 (cycles:1) | | main tcallf.c:16 | | f1 tcallf.c:12 (cycles:1) | | f1 tcallf.c:12 | | f2 tcallf.c:6 (cycles:3) | | f2 tcallf.c:4 | | f1 tcallf.c:11 (cycles:1 iter:1 avg_cycles:12) | | f1 tcallf.c:11 | | f2 tcallf.c:6 (cycles:3 iter:1 avg_cycles:12) | | f2 tcallf.c:4 | | f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12) | | | --7.78%--f1 tcallf.c:10 (cycles:2) | f1 tcallf.c:9 | main tcallf.c:16 (cycles:1) | main tcallf.c:16 | main tcallf.c:16 (cycles:1) | main tcallf.c:16 | f1 tcallf.c:12 (cycles:1) | f1 tcallf.c:12 | f2 tcallf.c:6 (cycles:3) | f2 tcallf.c:4 | f1 tcallf.c:11 (cycles:1) | f1 tcallf.c:11 | f2 tcallf.c:6 (cycles:3) | f2 tcallf.c:4 | f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12) | f1 tcallf.c:9 | main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12) | main tcallf.c:16 | main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12) ... $ pkill tcallf [1]+ Terminated ./tcallf Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf build: Conditionally define NDEBUGIan Rogers1-0/+1
When a build is done without DEBUG=1 then define NDEBUG. This will compile out asserts and other debug code. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf block-range: Move debug code behind ifndef NDEBUGIan Rogers1-5/+1
Make good on a comment and avoid a unused-but-set-variable warning. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf bench: Avoid NDEBUG warningIan Rogers1-2/+6
With NDEBUG set the asserts are compiled out. This yields "unused-but-set-variable" variables. Move these variables behind NDEBUG to avoid the warning. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events: Update Alderlake for E-Core TMA v2.3Ian Rogers2-104/+148
https://github.com/intel/perfmon/pull/65 Generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py The PR notes state: - E-Core TMA version 2.3. - FP_UOPS changed to FPDIV_Uops - Added BR_MISP breakdown stats - Frontend_Bandwidth/Latency changed to Fetch_Bandwidth/Latency - Load_Store_Bound changed to Memory_Bound - Icache changed to ICache_Misses - ITLB changed to ITLB_Misses - Store_Fwd changed to Store_Fwd_Blk Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf symbol: Add command line support for addr2line pathIan Rogers10-14/+62
Allow addr2line to be set either on the command line or via the perfconfig file. This doesn't currently work with llvm-addr2line as the addr2line code emits two things: 1) the address to decode, 2) a bogus ',' value. The expectation is the bogus value will generate: ?? ??:0 that terminates the addr2line reading. However, the output from llvm-addr2line is a single line with just the input ',' locking up the addr2line reading that is expecting a second line. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf annotate: Allow objdump to be set in perfconfigIan Rogers2-1/+10
Allow the setting of the objdump command in the perfconfig. Update man page for this new option. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf annotate: Own objdump_path and disassembler_style stringsIan Rogers7-13/+56
Make struct annotation_options own the strings objdump_path and disassembler_style, freeing them on exit. Add missing strdup for disassembler_style when read from a config file. Committer notes: Converted free(obj->member) to zfree(&obj->member) in annotation_options__exit() Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf annotate: Add init/exit to annotation_options remove defaultIan Rogers7-18/+37
The annotation__default_options global variable was used to initialize annotation_options. Switch to the init/exit pattern as later changes will give ownership over strings and this will be necessary to avoid memory leaks. Committer note: Fix the GTK2=1 build, hist_entry__gtk_annotate() needs to receive a 'struct annotation_options' pointer. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf report: Additional config warningsIan Rogers1-0/+5
If the default_sort_order isn't correctly strdup-ed warn and return an error. Debug warn if no option is matched. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf annotate: Delete session for debug buildsIan Rogers1-10/+6
Use the debug build indicator as the guide to free the session. This implements a behavior described in a comment, which is consequentially removed. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf tools: Avoid warning in do_realloc_array_as_needed()Adrian Hunter1-1/+2
do_realloc_array_as_needed() used memcpy() of zero size with a NULL pointer. Check the size first to avoid sanitize warning. Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Reported-by: kernel test robot <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf symbols: Fix unaligned access in get_x86_64_plt_disp()Adrian Hunter1-1/+4
Use memcpy() to avoid unaligned access. Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64") Reported-by: kernel test robot <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf symbols: Fix use-after-free in get_plt_got_name()Adrian Hunter1-1/+4
Fix use-after-free in get_plt_got_name(). Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64") Reported-by: kernel test robot <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events power9: Remove UTF-8 characters from JSON filesKajol Jain2-3/+3
Commit 3c22ba5243040c13 ("perf vendor events powerpc: Update POWER9 events") added and updated power9 PMU JSON events. However some of the JSON events which are part of other.json and pipeline.json files, contains UTF-8 characters in their brief description. Having UTF-8 character could breaks the perf build on some distros. Fix this issue by removing the UTF-8 characters from other.json and pipeline.json files. Result without the fix: [command]# file -i pmu-events/arch/powerpc/power9/* pmu-events/arch/powerpc/power9/cache.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/floating-point.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/frontend.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/marked.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/memory.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/metrics.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/nest_metrics.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/other.json: application/json; charset=utf-8 pmu-events/arch/powerpc/power9/pipeline.json: application/json; charset=utf-8 pmu-events/arch/powerpc/power9/pmc.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/translation.json: application/json; charset=us-ascii [command]# Result with the fix: [command]# file -i pmu-events/arch/powerpc/power9/* pmu-events/arch/powerpc/power9/cache.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/floating-point.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/frontend.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/marked.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/memory.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/metrics.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/nest_metrics.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/other.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/pipeline.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/pmc.json: application/json; charset=us-ascii pmu-events/arch/powerpc/power9/translation.json: application/json; charset=us-ascii [command]# Fixes: 3c22ba5243040c13 ("perf vendor events powerpc: Update POWER9 events") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Kajol Jain <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Disha Goel <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf ftrace: Make system wide the default target for latency subcommandYang Jihong1-2/+4
If no target is specified for 'latency' subcommand, the execution fails because - 1 (invalid value) is written to set_ftrace_pid tracefs file. Make system wide the default target, which is the same as the default behavior of 'trace' subcommand. Before the fix: # perf ftrace latency -T schedule failed to set ftrace pid After the fix: # perf ftrace latency -T schedule ^C# DURATION | COUNT | GRAPH | 0 - 1 us | 0 | | 1 - 2 us | 0 | | 2 - 4 us | 0 | | 4 - 8 us | 2828 | #### | 8 - 16 us | 23953 | ######################################## | 16 - 32 us | 408 | | 32 - 64 us | 318 | | 64 - 128 us | 4 | | 128 - 256 us | 3 | | 256 - 512 us | 0 | | 512 - 1024 us | 1 | | 1 - 2 ms | 4 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 4 | | 256 - 512 ms | 2 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | Fixes: 53be50282269b46c ("perf ftrace: Add 'latency' subcommand") Signed-off-by: Yang Jihong <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf bench syscall: Add fork syscall benchmarkTiezhu Yang3-0/+37
This is a follow up patch for the execve bench which is actually fork + execve, it makes sense to add the fork syscall benchmark to compare the execve part precisely. Some archs have no __NR_fork definition which is used only as a check condition to call test_fork(), let us just define it as -1 to avoid build error. Suggested-by: Namhyung Kim <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf stat: Suppress warning when using cpum_cf events on s390Thomas Richter2-0/+24
Running command perf stat -vv -e cpu_cycles -C0 -- true displays this warning: Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,' that may result in non-fatal errors Make the PMU cpum_cf selectable and avoid this warning. While at it also fix this warning for PMUs pai_crypto and pai_ext. Output before: # ./perf stat -vv -e cpu_cycles -C0 -- true Using CPUID IBM,3931,704,A01,3.7,002f Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,' that may result in non-fatal errors After aliases, add event pmu 'cpum_cf' with 'event,' that may result in non-fatal errors cpu_cycles -> cpum_cf/event=0/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 10 size 128 config 0x1001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 cpu_cycles: 0: 290434 2479172 2479172: cpu_cycles: 290434 2479172 2479172 Performance counter stats for 'CPU(s) 0': 290,434 cpu_cycles 0.002465617 seconds time elapsed # Now the warning "Attempting to add event pmu 'cpum_cf' ..." does not show up anymore. Output after: # ./perf stat -vv -e cpu_cycles -C0 -- true Using CPUID IBM,3931,704,A01,3.7,002f After aliases, add event pmu 'cpum_cf' with 'event,' that may result in non-fatal errors cpu_cycles -> cpum_cf/event=0/ Control descriptor is not initialized .... Performance counter stats for 'CPU(s) 0': 357,023 cpu_cycles 0.002454995 seconds time elapsed # Signed-off-by: Thomas Richter <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf tests record_offcpu.sh: Fix redirection of stderr to stdinPatrice Duroux1-1/+1
It's not 2&>1, the correct is 2>&1 Fixes: ade1d0307b2fb3d9 ("perf offcpu: Update offcpu test for child process") Signed-off-by: Patrice Duroux <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Update metrics to detect pmem at runtimeIan Rogers3-15/+15
By detecting whether nvdimms are installed at runtime the number of events can be reduced if it isn't. These changes come from this PR: https://github.com/intel/perfmon/pull/63 Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf metrics: Add has_pmem literalIan Rogers1-0/+19
Add literal so that if nvdimms aren't installed we can record fewer events. The file detection mechanism was suggested by Dan Williams <[email protected]> in: https://lore.kernel.org/linux-perf-users/[email protected]/ Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Sandybridge v19 eventsIan Rogers2-1/+9
Adds BR_MISP_EXEC.INDIRECT event. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Jaketown v23 eventsIan Rogers2-1/+9
Adds BR_MISP_EXEC.INDIRECT event. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Haswellx v27 eventsIan Rogers5-13/+21
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Haswell v33 eventsIan Rogers4-39/+47
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Broadwellx v20 eventsIan Rogers9-469/+403
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Broadwellde v9 eventsIan Rogers10-183/+495
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events intel: Broadwell v27 eventsIan Rogers7-290/+305
Description updates and formatting changes. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Dan Williams <[email protected]> Cc: Edward Baker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf lock contention: Fix msan issue in lock_contention_read()Namhyung Kim1-1/+1
I got a report of a msan failure like below: $ sudo perf lock con -ab -- sleep 1 ... ==224416==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5651160d6c96 in lock_contention_read util/bpf_lock_contention.c:290:8 #1 0x565115f90870 in __cmd_contention builtin-lock.c:1919:3 #2 0x565115f90870 in cmd_lock builtin-lock.c:2385:8 #3 0x565115f03a83 in run_builtin perf.c:330:11 #4 0x565115f03756 in handle_internal_command perf.c:384:8 #5 0x565115f02d53 in run_argv perf.c:428:2 #6 0x565115f02d53 in main perf.c:562:3 #7 0x7f43553bc632 in __libc_start_main #8 0x565115e865a9 in _start It was because the 'key' variable is not initialized. Actually it'd be set by bpf_map_get_next_key() but msan didn't seem to understand it. Let's make msan happy by initializing the variable. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf vendor events s390: Remove UTF-8 characters from JSON fileThomas Richter1-5/+5
Commit 7f76b31130680fb3 ("perf list: Add IBM z16 event description for s390") contains the verbal description for z16 extended counter set. However some entries of the public description contain UTF-8 characters which breaks the build on some distros. Fix this and remove the UTF-8 characters. Fixes: 7f76b31130680fb3 ("perf list: Add IBM z16 event description for s390") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Suggested-by: Heiko Carstens <[email protected]> Signed-off-by: Thomas Richter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: https://lore.kernel.org/r/ZBwkl77/I31AQk12@osiris Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf hist: Improve srcfile sort key performance (really)Namhyung Kim1-6/+1
The earlier commit f0cdde28fecc0d7f ("perf hist: Improve srcfile sort key performance") updated the srcfile logic but missed to change the ->cmp() callback which is called for every sample. It should use the same logic like in the srcline to speed up the processing because it'd return the same information repeatedly for the same address. The real processing will be done in sort__srcfile_collapse(). Fixes: f0cdde28fecc0d7f ("perf hist: Improve srcfile sort key performance") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf test: Fix wrong size expectation for 'Setup struct perf_event_attr'Thomas Richter3-3/+3
The test case "perf test 'Setup struct perf_event_attr'" is failing. On s390 this output is observed: # ./perf test -Fvvvv 17 17: Setup struct perf_event_attr : --- start --- running './tests/attr/test-stat-C0' Using CPUID IBM,8561,703,T01,3.6,002f ..... Event event:base-stat fd = 1 group_fd = -1 flags = 0|8 cpu = * type = 0 size = 128 <<<--- wrong, specified in file base-stat config = 0 sample_period = 0 sample_type = 65536 ... 'PERF_TEST_ATTR=/tmp/tmpgw574wvg ./perf stat -o \ /tmp/tmpgw574wvg/perf.data -e cycles -C 0 kill >/dev/null \ 2>&1 ret '1', expected '1' loading result events Event event-0-0-4 fd = 4 group_fd = -1 cpu = 0 pid = -1 flags = 8 type = 0 size = 136 <<<--- actual size used in system call ..... compare matching [event-0-0-4] to [event:base-stat] [cpu] 0 * [flags] 8 0|8 [type] 0 0 [size] 136 128 ->FAIL match: [event-0-0-4] matches [] expected size=136, got 128 FAILED './tests/attr/test-stat-C0' - match failure This mismatch is caused by commit 09519ec3b19e ("perf: Add perf_event_attr::config3") which enlarges the structure perf_event_attr by 8 bytes. Fix this by adjusting the expected value of size. Output after: # ./perf test -Fvvvv 17 17: Setup struct perf_event_attr : --- start --- running './tests/attr/test-stat-C0' Using CPUID IBM,8561,703,T01,3.6,002f ... matched compare matching [event-0-0-4] to [event:base-stat] [cpu] 0 * [flags] 8 0|8 [type] 0 0 [size] 136 136 .... ->OK match: [event-0-0-4] matches ['event:base-stat'] matched Fixes: 09519ec3b19e4144 ("perf: Add perf_event_attr::config3") Signed-off-by: Thomas Richter <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf build: Add warning for when vmlinux.h generation failsIan Rogers1-1/+5
The warning advises on the NO_BPF_SKEL=1 option. Suggested-by: Stephen Rothwell <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf report: Append inlines to non-DWARF callchainsArtem Savkov1-0/+5
Append information about inlined functions to FP and LBR callchains from DWARF debuginfo when available. Do so by calling append_inlines() from add_callchain_ip(). Testing it: Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out' #include <stdio.h> #include <stdint.h> static __attribute__((noinline)) uint32_t func5(uint32_t i) { return i + 10; } static uint32_t func4(uint32_t i) { return func5(i + 5); } static inline uint32_t func3(uint32_t i) { return func4(i + 4); } static __attribute__((noinline)) uint32_t func2(uint32_t i) { return func3(i + 3); } static uint32_t func1(uint32_t i) { return func2(i + 2); } __attribute__((noinline)) uint64_t entry(void) { uint64_t ret = 0; uint32_t i = 0; for (i = 0; i < 1000000; i++) { ret += func1(i); ret -= func2(i); ret += func3(i); ret += func4(i); ret -= func5(i); } return ret; } int main(int argc, char **argv) { printf("%s\n", __func__); return entry(); } ====== Here is the output I get with '--call-graph callee --no-children' ====== # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 250 of event 'cycles:u' # Event count (approx.): 26819859 # # Overhead Command Shared Object Symbol # ........ ....... .................... ..................................... # 43.58% a.out a.out [.] func5 | |--28.93%--entry | main | __libc_start_call_main | --14.65%--func4 (inlined) | |--10.45%--entry | main | __libc_start_call_main | --4.20%--func3 (inlined) entry main __libc_start_call_main 38.80% a.out a.out [.] entry | |--23.27%--func4 (inlined) | | | |--20.28%--func3 (inlined) | | func2 | | main | | __libc_start_call_main | | | --2.99%--entry | main | __libc_start_call_main | |--8.17%--func5 | main | __libc_start_call_main | |--3.89%--func1 (inlined) | entry | main | __libc_start_call_main | --3.48%--entry main __libc_start_call_main 13.07% a.out a.out [.] func2 | ---func5 main __libc_start_call_main 1.54% a.out [unknown] [k] 0xffffffff81e011b7 1.16% a.out [unknown] [k] 0xffffffff81e00193 | --0.57%--__mmap64 (inlined) __mmap64 (inlined) 0.34% a.out ld-linux-x86-64.so.2 [.] __tunable_get_val 0.34% a.out ld-linux-x86-64.so.2 [.] strcmp 0.32% a.out libc.so.6 [.] strchr 0.31% a.out ld-linux-x86-64.so.2 [.] _dl_relocate_object 0.22% a.out ld-linux-x86-64.so.2 [.] _dl_init_paths 0.18% a.out ld-linux-x86-64.so.2 [.] get_common_cache_info.constprop.0 0.14% a.out ld-linux-x86-64.so.2 [.] __GI___tunables_init # # (Tip: Show individual samples with: perf script) # ====== It does not seem to be out of order, or at least it is consistent with what I get with dwarf unwinders. Committer notes: Adrian Hunter pointed out that this breaks --branch-history, so don't do it for branches, see the second Link below. Suggested-by: Andrii Nakryiko <[email protected]> Signed-off-by: <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-03-23sh: remove sh5/sh64 last fragmentsRandy Dunlap1-2/+0
A previous patch removed most of the sh5 (sh64) support from the kernel tree. Now remove the last stragglers. Fixes: 37744feebc08 ("sh: remove sh5 support") Signed-off-by: Randy Dunlap <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: [email protected] Acked-by: John Paul Adrian Glaubitz <[email protected]> Reviewed-by: John Paul Adrian Glaubitz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
2023-03-21perf tools: Add support for perf_event_attr::config3Rob Herring6-1/+24
perf_event_attr has gained a new field, config3, so add support for it extending the existing configN support. Signed-off-by: Rob Herring <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-03-21perf vendor events arm64: Add N1 metricsJames Clark23-2/+685
Generated from the telemetry solution repo[1] with this command: ./generate.py <linux-repo>/tools/perf/ --telemetry-files \ ../../data/pmu/cpu/neoverse/neoverse-n1.json Since this data source now includes the SPE events for N1, it has diverged from A76 which means the folder has to be split. The new data also uses more fine grained grouping, but this will be consistent for all future products. Long PublicDescriptions are now included even for common events because this can include product specific details. For non verbose mode the common BriefDescriptions remain the same. [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution Signed-off-by: James Clark <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>