aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2022-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+7
include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-16perf cpumap: Tidy libperf includesIan Rogers4-4/+3
Use public API when possible, don't include internal API in header files in evsel.h. Fix any related breakages. Committer note: There was one missing case, when building for arm64: arch/arm64/util/pmu.c: In function 'pmu_events_table__find': arch/arm64/util/pmu.c:18:30: error: invalid use of undefined type 'struct perf_cpu_map' 18 | if (pmu->cpus->nr != cpu__max_cpu().cpu) | ^~ Fix it by adding one more exception, including <internal/cpumap.h> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf thread_map: Reduce exposure of libperf internal APIIan Rogers6-7/+7
Remove unnecessary include of internal threadmap.h and refcount.h in thread_map.h. Switch to using public APIs when possible or including the internal header file in the C file. Fix a transitive dependency in openat-syscall.c broken by the clean up. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf expr: Tidy hashmap dependencyIan Rogers7-18/+4
hashmap.h comes from libbpf but isn't installed with its headers. Always use the header file of the code in util. Change the hashmap.h dependency in expr.h to a forward declaration, add the necessary header file includes in the C files. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf build: Install libsymbol locally when buildingIan Rogers1-5/+0
The perf build currently has a '-Itools/lib' on the CC command line. This causes issues as the libapi, libsubcmd, libtraceevent, libbpf and libsymbol headers are all found via this path, making it impossible to override include behavior. Change the libsymbol build mirroring the libbpf, libsubcmd, libapi, libperf and libtraceevent build, so that it is installed in a directory along with its headers. A later change will modify the include behavior. Don't build kallsyms.o as part of util as this will lead to duplicate definitions. Add kallsym's directory to the MANIFEST rather than individual files, so that the Build and Makefile are added to a source tar ball. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Add print_aggr_cgroup() for --for-each-cgroup and --topdownNamhyung Kim1-1/+40
Normally, --for-each-cgroup only works with AGGR_GLOBAL. However the --topdown on some cpu (e.g. Intel Skylake) converts it to the AGGR_CORE internally. To support those machines, add print_aggr_cgroup and handle the events like in print_cgroup_events(). $ perf stat -a --for-each-cgroup system.slice,user.slice --topdown sleep 1 nmi_watchdog enabled with topdown. May give wrong results. Disable with echo 0 > /proc/sys/kernel/nmi_watchdog Performance counter stats for 'system wide': retiring bad speculation frontend bound backend bound S0-D0-C0 2 system.slice 49.0% -46.6% 31.4% S0-D0-C1 2 system.slice 55.5% 8.0% 45.5% -9.0% S0-D0-C2 2 system.slice 87.8% 22.1% 30.3% -40.3% S0-D0-C3 2 system.slice 53.3% -11.9% 45.2% 13.4% S0-D0-C0 2 user.slice 123.5% 4.0% 48.5% -75.9% S0-D0-C1 2 user.slice 19.9% 6.5% 89.9% -16.3% S0-D0-C2 2 user.slice 29.9% 7.9% 71.3% -9.1 S0-D0-C3 2 user.slice 28.0% 7.2% 43.3% 21.5% 1.004136937 seconds time elapsed Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Support --for-each-cgroup and --metric-onlyNamhyung Kim1-11/+47
When we have events for each cgroup, the metric should be printed for each cgroup separately. Add print_cgroup_counter() to handle that situation properly. Also change print_metric_headers() not to print duplicate headers by checking cgroups. $ perf stat -a --for-each-cgroup system.slice,user.slice --metric-only sleep 1 Performance counter stats for 'system wide': GHz insn per cycle branch-misses of all branches system.slice 3.792 0.61 3.24% user.slice 3.661 2.32 0.37% 1.016111516 seconds time elapsed Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Factor out print_metric_{begin,end}()Namhyung Kim1-22/+34
For the metric-only case, add new functions to handle the start and the end of each metric display. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Factor out prefix displayNamhyung Kim1-28/+15
The prefix is needed for interval mode to print timestamp at the beginning of each line. But the it's tricky for the metric only mode since it doesn't print every evsel and combines the metrics into a single line. So it needed to pass 'first' argument to print_counter_aggrdata() to determine if the current event is being printed at first. This makes the code hard to read. Let's move the logic out of the function and do it in the outer print loop. This would enable further cleanups later. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Move condition to print_footer()Namhyung Kim1-2/+4
Likewise, I think it'd better to have the control inside the function, and keep the higher level function clearer. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Rework header displayNamhyung Kim1-79/+106
There are print_header() and print_interval() to print header lines before actual counter values. Also print_metric_headers() needs to be called for the metric-only case. Let's move all these logics to a single place including num_print_iv to refresh the headers for interval mode. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Remove impossible conditionNamhyung Kim1-3/+0
The print would run only if metric_only is not set, but it's already in a block that says it's in metric_only case. And there's no place to change the setting. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Cleanup interval print alignmentNamhyung Kim1-74/+91
Instead of using magic values, define symbolic constants and use them. Also add aggr_header_std[] array to simplify aggr_mode handling. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Factor out prepare_interval()Namhyung Kim1-15/+24
This logic does not print the time directly, but it just puts the timestamp in the buffer as a prefix. To reduce the confusion, factor out the code into a separate function. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Split print_metric_headers() functionNamhyung Kim1-15/+37
The print_metric_headers() shows metric headers a little bit for each mode. Split it out to make the code clearer. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Align cgroup namesNamhyung Kim1-1/+1
We don't know how long cgroup name is, but at least we can align short ones like below. $ perf stat -a --for-each-cgroup system.slice,user.slice true Performance counter stats for 'system wide': 0.13 msec cpu-clock system.slice # 0.010 CPUs utilized 4 context-switches system.slice # 31.989 K/sec 1 cpu-migrations system.slice # 7.997 K/sec 0 page-faults system.slice # 0.000 /sec 450,673 cycles system.slice # 3.604 GHz (92.41%) 161,216 instructions system.slice # 0.36 insn per cycle (92.41%) 32,678 branches system.slice # 261.332 M/sec (92.41%) 2,628 branch-misses system.slice # 8.04% of all branches (92.41%) 14.29 msec cpu-clock user.slice # 1.163 CPUs utilized 35 context-switches user.slice # 2.449 K/sec 12 cpu-migrations user.slice # 839.691 /sec 57 page-faults user.slice # 3.989 K/sec 49,683,026 cycles user.slice # 3.477 GHz (99.38%) 110,790,266 instructions user.slice # 2.23 insn per cycle (99.38%) 24,552,255 branches user.slice # 1.718 G/sec (99.38%) 127,779 branch-misses user.slice # 0.52% of all branches (99.38%) 0.012289431 seconds time elapsed Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Add before_metric argumentNamhyung Kim1-40/+42
Unfortunately, event running time, percentage and noise data are printed in different positions in normal output than CSV/JSON. I think it's better to put such details in where it actually prints. So add before_metric argument to print_noise() and print_running() and call them twice before and after the metric. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Handle bad events in abs_printout()Namhyung Kim1-41/+27
In the printout() function, it checks if the event is bad (i.e. not counted or not supported) and print the result. But it does the same what abs_printout() is doing. So add an argument to indicate the value is ok or not and use the same function in both cases. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Factor out print_counter_value() functionNamhyung Kim1-28/+53
And split it for each output mode like others. I believe it makes the code simpler and more intuitive. Now abs_printout() becomes just to call sub-functions. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Split aggr_printout() functionNamhyung Kim1-99/+121
The aggr_printout() function is to print aggr_id and count (nr). Split it for each output mode to simplify the code. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Split print_cgroup() functionNamhyung Kim1-2/+19
Likewise, split print_cgroup() for each output mode. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Split print_noise_pct() functionNamhyung Kim1-4/+23
Likewise, split print_noise_pct() for each output mode. Although it's a tiny function, more logic will be added soon so it'd be better split it and treat it in the same way. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf stat: Split print_running() functionNamhyung Kim1-10/+27
To make the code more obvious and hopefully simpler, factor out the code for each output mode - stdio, CSV, JSON. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf pmu: Restructure print_pmu_events() to avoid memory allocationsIan Rogers1-98/+110
Previously print_pmu_events() would compute the values to be printed, place them in struct sevent, sort them and then print them. Modify the code so that struct sevent holds just the PMU and event, sort these and then in the main print loop calculate aliases for names, etc. This avoids memory allocations for copied values as they are computed then printed. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf list: Simplify symbol event printingIan Rogers1-58/+21
The current code computes an array of symbol names then sorts and prints them. Use a strlist to create a list of names that is sorted and then print it. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf list: Simplify cache event printingIan Rogers1-103/+27
The current code computes an array of cache names then sorts and prints them. Use a strlist to create a list of names that is sorted. Keep the hybrid names, it is unclear how to generalize it, but drop the computation of evt_pmus that is never used. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kang Minchul <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] [ Fixed up clash with cf9f67b36303de65 ("perf print-events: Remove redundant comparison with zero")] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf list: Generalize limiting to a PMU nameIan Rogers2-4/+3
Deprecate the --cputype option and add a --unit option where '--unit cpu_atom' behaves like '--cputype atom'. The --unit option can be used with arbitrary PMUs, for example: ``` $ perf list --unit msr pmu List of pre-defined events (to be used in -e or -M): msr/aperf/ [Kernel PMU event] msr/cpu_thermal_margin/ [Kernel PMU event] msr/mperf/ [Kernel PMU event] msr/pperf/ [Kernel PMU event] msr/smi/ [Kernel PMU event] msr/tsc/ [Kernel PMU event] ``` Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf tracepoint: Sort events in iteratorIan Rogers1-71/+37
In print_tracepoint_events() use tracing_events__scandir_alphasort() and scandir alphasort so that the subsystem and events are sorted and don't need a secondary qsort. Locally this results in the following change: ... ext4:ext4_zero_range [Tracepoint event] - fib6:fib6_table_lookup [Tracepoint event] fib:fib_table_lookup [Tracepoint event] + fib6:fib6_table_lookup [Tracepoint event] filelock:break_lease_block [Tracepoint event] ... ie fib6 now is after fib and not before it. This is more consistent with how numbers are more generally sorted, such as: ... syscalls:sys_enter_renameat [Tracepoint event] syscalls:sys_enter_renameat2 [Tracepoint event] ... and so an improvement over the qsort approach. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf pmu: Add data structure documentationIan Rogers2-6/+132
Add documentation to 'struct perf_pmu' and the associated structs of 'perf_pmu_alias' and 'perf_pmu_format'. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-15perf pmu: Remove mostly unused 'struct perf_pmu' 'is_hybrid' memberIan Rogers5-16/+6
Replace usage with perf_pmu__is_hybrid(). Suggested-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Add missing separator in the CSV headerNamhyung Kim1-2/+2
It should have a comma after 'cpus' for socket and die aggregation mode. The output of the following command shows the issue. $ sudo perf stat -a --per-socket -x, --metric-only -I1 true Before: +--- here V time,socket,cpusGhz,insn per cycle,branch-misses of all branches, 0.000908461,S0,8,0.950,1.65,1.21, After: time,socket,cpus,GHz,insn per cycle,branch-misses of all branches, 0.000683094,S0,8,0.593,2.00,0.60, Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Fix summary output in CSV with --metric-onlyNamhyung Kim1-3/+8
It should not print "summary" for each event when --metric-only is set. Before: $ sudo perf stat -a --per-socket --summary -x, --metric-only true time,socket,cpusGhz,insn per cycle,branch-misses of all branches, 0.000709079,S0,8,0.893,2.40,0.45, S0,8, summary, summary, summary, summary, summary,0.893, summary,2.40, summary, summary,0.45, After: $ sudo perf stat -a --per-socket --summary -x, --metric-only true time,socket,cpusGHz,insn per cycle,branch-misses of all branches, 0.000882297,S0,8,0.598,1.64,0.64, summary,S0,8,0.598,1.64,0.64, Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo2-3/+7
To pick up fixes that went thru perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Consolidate condition to print metricsNamhyung Kim1-3/+1
The pm variable holds an appropriate function to print metrics for CSV anf JSON already. So we can combine the if statement to simplify the code a little bit. This also matches to the above condition for non-CSV and non-JSON case. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Fix condition in print_interval()Namhyung Kim1-2/+2
The num_print_interval and config->interval_clear should be checked together like other places like later in the function. Otherwise, the --interval-clear option could print the headers for the CSV or JSON output unnecessarily. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Add header for interval in JSON outputNamhyung Kim1-0/+4
It missed to print a matching header line for intervals. Before: # perf stat -a -e cycles,instructions --metric-only -j -I 500 {"unit" : "insn per cycle"} {"interval" : 0.500544283}{"metric-value" : "1.96"} ^C After: # perf stat -a -e cycles,instructions --metric-only -j -I 500 {"unit" : "sec"}{"unit" : "insn per cycle"} {"interval" : 0.500515681}{"metric-value" : "2.31"} ^C Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Do not indent headers for JSONNamhyung Kim1-1/+1
Currently --metric-only with --json indents header lines. This is not needed for JSON. $ perf stat -aA --metric-only -j true {"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"} {"cpu" : "0", {"metric-value" : "0.101"}{"metric-value" : "0.86"}{"metric-value" : "1.91"} {"cpu" : "1", {"metric-value" : "0.102"}{"metric-value" : "0.87"}{"metric-value" : "2.02"} {"cpu" : "2", {"metric-value" : "0.085"}{"metric-value" : "1.02"}{"metric-value" : "1.69"} ... Note that the other lines are broken JSON, but it will be handled later. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Fix --metric-only --json outputNamhyung Kim1-19/+3
Currently it prints all metric headers for JSON output. But actually it skips some metrics with valid_only_metric(). So the output looks like: $ perf stat --metric-only --json true {"unit" : "CPUs utilized", "unit" : "/sec", "unit" : "/sec", "unit" : "/sec", "unit" : "GHz", "unit" : "insn per cycle", "unit" : "/sec", "unit" : "branch-misses of all branches"} {"metric-value" : "3.861"}{"metric-value" : "0.79"}{"metric-value" : "3.04"} As you can see there are 8 units in the header but only 3 metric-values are there. It should skip the unused headers as well. Also each unit should be printed as a separate object like metric values. With this patch: $ perf stat --metric-only --json true {"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"} {"metric-value" : "4.166"}{"metric-value" : "0.73"}{"metric-value" : "2.96"} Fixes: df936cadfb58ba93 ("perf stat: Add JSON output option") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Claire Jensen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Move common code in print_metric_headers()Namhyung Kim1-5/+8
The struct perf_stat_output_ctx is set in a loop with the same values. Move the code out of the loop and keep the loop minimal. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Clear screen only if output file is a ttyNamhyung Kim1-1/+1
The --interval-clear option makes perf stat to clear the terminal at each interval. But it doesn't need to clear the screen when it saves to a file. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-14perf stat: Increase metric length to align outputsNamhyung Kim1-1/+1
When perf stat is called with very detailed events, the output doesn't align well like below: $ sudo perf stat -a -ddd sleep 1 Performance counter stats for 'system wide': 8,020.23 msec cpu-clock # 7.997 CPUs utilized 3,970 context-switches # 494.998 /sec 169 cpu-migrations # 21.072 /sec 586 page-faults # 73.065 /sec 649,568,060 cycles # 0.081 GHz (30.42%) 304,044,345 instructions # 0.47 insn per cycle (38.40%) 60,313,022 branches # 7.520 M/sec (38.89%) 2,766,919 branch-misses # 4.59% of all branches (39.26%) 74,422,951 L1-dcache-loads # 9.279 M/sec (39.39%) 8,025,568 L1-dcache-load-misses # 10.78% of all L1-dcache accesses (39.22%) 3,314,995 LLC-loads # 413.329 K/sec (30.83%) 1,225,619 LLC-load-misses # 36.97% of all LL-cache accesses (30.45%) <not supported> L1-icache-loads 20,420,493 L1-icache-load-misses # 0.00% of all L1-icache accesses (30.29%) 58,017,947 dTLB-loads # 7.234 M/sec (30.37%) 704,677 dTLB-load-misses # 1.21% of all dTLB cache accesses (30.27%) 234,225 iTLB-loads # 29.204 K/sec (30.29%) 417,166 iTLB-load-misses # 178.10% of all iTLB cache accesses (30.32%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002947355 seconds time elapsed Increase the METRIC_LEN by 3 so that it can align properly. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-11libbpf: Hashmap.h update to fix build issues using LLVM14Eduard Zingerman1-1/+2
A fix for the LLVM compilation error while building bpftool. Replaces the expression: _Static_assert((p) == NULL || ...) by expression: _Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) || ...) When "p" is not a constant the former is not considered to be a constant expression by LLVM 14. The error was introduced in the following patch-set: [1]. The error was reported here: [2]. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/ Reported-by: kernel test robot <[email protected]> Fixes: c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values") Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-11-10perf print-events: Remove redundant comparison with zeroKang Minchul1-4/+2
Since variable npmus is unsigned int, comparing with 0 is unnecessary. Signed-off-by: Kang Minchul <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-10perf data: Add tracepoint fields when converting to JSONDmitrii Dolgov1-0/+20
When converting recorded data into JSON format, perf data omits probe variables. Add them to the output in the format "field name": "field value" using tep_print_field: $ perf data convert --to-json output.json // output.json { "linux-perf-json-version": 1, "headers": { ... }, "samples": [ { "timestamp": 29182079082999, "pid": 309194, [...] "__probe_ip": "0x93ee35", "query_string_string": "select 2;", "nxids": "0" } ] } Signed-off-by: Dmitrii Dolgov <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-09libbpf: Hashmap interface update to allow both long and void* keys/valuesEduard Zingerman8-82/+96
An update for libbpf's hashmap interface from void* -> void* to a polymorphic one, allowing both long and void* keys and values. This simplifies many use cases in libbpf as hashmaps there are mostly integer to integer. Perf copies hashmap implementation from libbpf and has to be updated as well. Changes to libbpf, selftests/bpf and perf are packed as a single commit to avoid compilation issues with any future bisect. Polymorphic interface is acheived by hiding hashmap interface functions behind auxiliary macros that take care of necessary type casts, for example: #define hashmap_cast_ptr(p) \ ({ \ _Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\ #p " pointee should be a long-sized integer or a pointer"); \ (long *)(p); \ }) bool hashmap_find(const struct hashmap *map, long key, long *value); #define hashmap__find(map, key, value) \ hashmap_find((map), (long)(key), hashmap_cast_ptr(value)) - hashmap__find macro casts key and value parameters to long and long* respectively - hashmap_cast_ptr ensures that value pointer points to a memory of appropriate size. This hack was suggested by Andrii Nakryiko in [1]. This is a follow up for [2]. [1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/ [2] https://lore.kernel.org/bpf/[email protected]/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-11-08perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged dataMasami Hiramatsu (Google)1-17/+4
DWARF version 5 standard Sec 2.14 says that Any debugging information entry representing the declaration of an object, module, subprogram or type may have DW_AT_decl_file, DW_AT_decl_line and DW_AT_decl_column attributes, each of whose value is an unsigned integer constant. So it should be an unsigned integer data. Also, even though the standard doesn't clearly say the DW_AT_call_file is signed or unsigned, the elfutils (eu-readelf) interprets it as unsigned integer data and it is natural to handle it as unsigned integer data as same as DW_AT_decl_file. This changes the DW_AT_call_file as unsigned integer data too. Fixes: 3f4460a28fb2f73d ("perf probe: Filter out redundant inline-instances") Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/r/166761727445.480106.3738447577082071942.stgit@devnote3 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-08perf test: Fix skipping branch stack sampling testJames Clark1-1/+3
Commit f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling test to include sanity check for branch filter") added a skip if certain branch options aren't available. But the change added both -b (--branch-any) and --branch-filter options at the same time, which will always result in a failure on any platform because the arguments can't be used together. Fix this by removing -b (--branch-any) and leaving --branch-filter which already specifies 'any'. Also add warning messages to the test and perf tool. Output on x86 before this fix: $ sudo ./perf test branch 108: Check branch stack sampling : Skip After: $ sudo ./perf test branch 108: Check branch stack sampling : Ok Fixes: f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling test to include sanity check for branch filter") Signed-off-by: James Clark <[email protected]> Tested-by: Athira Jajeev <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-08perf stat: Fix printing os->prefix in CSV metrics outputAthira Rajeev1-1/+1
'perf stat' with CSV output option prints an extra empty string as first field in metrics output line. Sample output below: # ./perf stat -x, --per-socket -a -C 1 ls S0,1,1.78,msec,cpu-clock,1785146,100.00,0.973,CPUs utilized S0,1,26,,context-switches,1781750,100.00,0.015,M/sec S0,1,1,,cpu-migrations,1780526,100.00,0.561,K/sec S0,1,1,,page-faults,1779060,100.00,0.561,K/sec S0,1,875807,,cycles,1769826,100.00,0.491,GHz S0,1,85281,,stalled-cycles-frontend,1767512,100.00,9.74,frontend cycles idle S0,1,576839,,stalled-cycles-backend,1766260,100.00,65.86,backend cycles idle S0,1,288430,,instructions,1762246,100.00,0.33,insn per cycle ====> ,S0,1,,,,,,,2.00,stalled cycles per insn The above command line uses field separator as "," via "-x," option and per-socket option displays socket value as first field. But here the last line for "stalled cycles per insn" has "," in the beginning. Sample output using interval mode: # ./perf stat -I 1000 -x, --per-socket -a -C 1 ls 0.001813453,S0,1,1.87,msec,cpu-clock,1872052,100.00,0.002,CPUs utilized 0.001813453,S0,1,2,,context-switches,1868028,100.00,1.070,K/sec ------ 0.001813453,S0,1,85379,,instructions,1856754,100.00,0.32,insn per cycle ====> 0.001813453,,S0,1,,,,,,,1.34,stalled cycles per insn Above result also has an extra CSV separator after the timestamp. Patch addresses extra field separator in the beginning of the metric output line. The counter stats are displayed by function "perf_stat__print_shadow_stats" in code "util/stat-shadow.c". While printing the stats info for "stalled cycles per insn", function "new_line_csv" is used as new_line callback. The new_line_csv function has check for "os->prefix" and if prefix is not null, it will be printed along with cvs separator. Snippet from "new_line_csv": if (os->prefix) fprintf(os->fh, "%s%s", os->prefix, config->csv_sep); Here os->prefix gets printed followed by "," which is the cvs separator. The os->prefix is used in interval mode option ( -I ), to print time stamp on every new line. But prefix is already set to contain CSV separator when used in interval mode for CSV option. Reference: Function "static void print_interval" Snippet: sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); Also if prefix is not assigned (if not used with -I option), it gets set to empty string. Reference: function printout() in util/stat-display.c Snippet: .prefix = prefix ? prefix : "", Since prefix already set to contain cvs_sep in interval option, patch removes printing config->csv_sep in new_line_csv function to avoid printing extra field. After the patch: # ./perf stat -x, --per-socket -a -C 1 ls S0,1,2.04,msec,cpu-clock,2045202,100.00,1.013,CPUs utilized S0,1,2,,context-switches,2041444,100.00,979.289,/sec S0,1,0,,cpu-migrations,2040820,100.00,0.000,/sec S0,1,2,,page-faults,2040288,100.00,979.289,/sec S0,1,254589,,cycles,2036066,100.00,0.125,GHz S0,1,82481,,stalled-cycles-frontend,2032420,100.00,32.40,frontend cycles idle S0,1,113170,,stalled-cycles-backend,2031722,100.00,44.45,backend cycles idle S0,1,88766,,instructions,2030942,100.00,0.35,insn per cycle S0,1,,,,,,,1.27,stalled cycles per insn Fixes: 92a61f6412d3a09d ("perf stat: Implement CSV metrics output") Reported-by: Disha Goel <[email protected]> Reviewed-By: Kajol Jain <[email protected]> Signed-off-by: Athira Jajeev <[email protected]> Tested-by: Disha Goel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nageswara R Sastry <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-08perf stat: Fix crash with --per-node --metric-only in CSV modeNamhyung Kim1-1/+3
The following command will get segfault due to missing aggr_header_csv for AGGR_NODE: $ sudo perf stat -a --per-node -x, --metric-only true Committer testing: Before this patch: # perf stat -a --per-node -x, --metric-only true Segmentation fault (core dumped) # After: # gdb perf -bash: gdb: command not found # perf stat -a --per-node -x, --metric-only true node,Ghz,frontend cycles idle,backend cycles idle,insn per cycle,branch-misses of all branches, N0,32,0.335,2.10,0.65,0.69,0.03,1.92, # Fixes: 86895b480a2f10c7 ("perf stat: Add --per-node agregation support") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-04perf bpf: Rename perf_include_dir to libbpf_include_dirArnaldo Carvalho de Melo2-5/+5
As this is where we expect to find bpf/bpf_helpers.h, etc. This needs more work to make it follow LIBBPF_DYNAMIC=1 usage, i.e. when not using the system libbpf it should use the headers in the in-kernel sources libbpf in tools/lib/bpf. We need to do that anyway to avoid this mixup system libbpf and in-kernel files, so we'll get this sorted out that way. And this also may become moot as we move to using BPF skels for this feature. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>