aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2022-11-24perf list: Support newlines in wordwrapIan Rogers1-4/+7
Rather than a newline starting from column 0, record a newline was seen and then add the newline and space before the next word. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf symbol: correction while adjusting symbolAjay Kaher1-1/+1
perf doesn't provide proper symbol information for specially crafted .debug files. Sometimes .debug file may not have similar program header as runtime ELF file. For example if we generate .debug file using objcopy --only-keep-debug resulting file will not contain .text, .data and other runtime sections. That means corresponding program headers will have zero FileSiz and modified Offset. Example: program header of text section of libxxx.so: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x00000000003d3000 0x00000000003d3000 0x00000000003d3000 0x000000000055ae80 0x000000000055ae80 R E 0x1000 Same program header after executing: objcopy --only-keep-debug libxxx.so libxxx.so.debug LOAD 0x0000000000001000 0x00000000003d3000 0x00000000003d3000 0x0000000000000000 0x000000000055ae80 R E 0x1000 Offset and FileSiz have been changed. Following formula will not provide correct value, if program header taken from .debug file (syms_ss): sym.st_value -= phdr.p_vaddr - phdr.p_offset; Correct program header information is located inside runtime ELF file (runtime_ss). Fixes: 2d86612aacb7805f ("perf symbol: Correct address for bss symbols") Signed-off-by: Ajay Kaher <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Makhalov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Srivatsa S. Bhat <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Vasavi Sirnapalli <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf vendor events intel: Update events and metrics for alderlakeZhengjun Xing11-2691/+1525
Update JSON events and metrics for alderlake to perf. Based on ADL JSON event list v1.16: https://github.com/intel/perfmon/tree/main/ADL/events Generate the event list and metrics with the converter scripts: https://github.com/intel/perfmon/pull/32 Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf vendor events intel: Add metrics for Alderlake-NZhengjun Xing1-0/+583
Add JSON metrics for Alderlake-N to perf. It only included E-core metrics. E-core metrics based on E-core TMA v2.2 (E-core_TMA_Metrics.csv) It is downloaded from: https://github.com/intel/perfmon/ Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf vendor events intel: Add uncore event list for Alderlake-NZhengjun Xing2-0/+208
Add JSON uncore events for Alderlake-N Based on JSON list v1.16: https://github.com/intel/perfmon/tree/main/ADL/events/ Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf vendor events intel: Add core event list for Alderlake-NZhengjun Xing8-1/+1075
Alderlake-N only has E-core, it has been moved to non-hybrid code path on the kernel side, so add the cpuid for Alderlake-N separately. Add core event list for Alderlake-N, it is based on the ADL gracemont v1.16 JSON file. https://github.com/intel/perfmon/tree/main/ADL/events/ Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Tidy up JSON metric-only output when no metricsNamhyung Kim1-10/+17
It printed empty strings for each metric. I guess it's needed for CSV output to match the column number. We could just ignore the empty metrics in JSON but it ended up with a broken JSON object with a trailing comma. So I added a dummy '"metric-value" : "none"' part. To do that, it needs to pass struct outstate to print_metric_end() to check if any metric value is printed or not. Before: # perf stat -aj --metric-only --per-socket --for-each-cgroup system.slice true {"socket" : "S0", "cpu-count" : 8, "cgroup" : "system.slice", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : ""} After: # perf stat -aj --metric-only --per-socket --for-each-cgroup system.slice true {"socket" : "S0", "cpu-count" : 8, "cgroup" : "system.slice", "metric-value" : "none"} Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Rename "aggregate-number" to "cpu-count" in JSONNamhyung Kim1-4/+4
As the JSON output has been broken for a little while, I guess there are not many users. Let's rename the field to more intuitive one. :) Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Fix JSON output in metric-only modeNamhyung Kim1-18/+24
It generated a broken JSON output when aggregation mode or cgroup is used with --metric-only option. Also get rid of the header line and make the output single line for each entry. It needs to know whether the current metric is the first one or not. So add 'first' field in the outstate and mark it false after printing. Before: # perf stat -a -j --metric-only true {"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"} {{"metric-value" : "0.797"}{"metric-value" : "1.65"}{"metric-value" : "0.89"} ^ # perf stat -a -j --metric-only --per-socket true {"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"} {"socket" : "S0", "aggregate-number" : 8, {"metric-value" : "0.295"}{"metric-value" : "1.88"}{"metric-value" : "0.64"} ^ After: # perf stat -a -j --metric-only true {"GHz" : "0.990", "insn per cycle" : "2.06", "branch-misses of all branches" : "0.59"} # perf stat -a -j --metric-only --per-socket true {"socket" : "S0", "aggregate-number" : 8, "GHz" : "0.439", "insn per cycle" : "2.14", "branch-misses of all branches" : "0.51"} Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Pass through 'struct outstate'Namhyung Kim4-63/+50
Now most of the print functions take a pointer to the struct outstate. We have one in the evlist__print_counters() and pass it through the child functions. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Do not pass runtime_stat to printout()Namhyung Kim1-5/+4
It always passes a pointer to rt_stat as it's the only one. Let's not pass it and directly refer it in the printout(). Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Pass struct outstate to printout()Namhyung Kim1-20/+18
The printout() takes a lot of arguments and sets an outstate with the value. Instead, we can fill the outstate first and then pass it to reduce the number of arguments. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Pass 'struct outstate' to print_metric_begin()Namhyung Kim1-22/+28
It passes prefix and cgroup pointers but the outstate already has them. Let's pass the outstate pointer instead. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Use 'struct outstate' in evlist__print_counters()Namhyung Kim1-11/+14
This is a preparation for the later cleanup. No functional changes intended. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Pass const char *prefix to display routinesNamhyung Kim2-10/+10
This is a minor cleanup and preparation for the later change. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Remove metric_only argument in print_counter_aggrdata()Namhyung Kim1-11/+6
It already passes the stat_config argument, then it can find the value in the config. No need to pass it separately. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Remove prefix argument in print_metric_headers()Namhyung Kim1-16/+10
It always passes a whitespace to the function, thus we can just add it to the function body. Furthermore, it's only used in the normal output mode. Well, actually CSV used it but it doesn't need to since we don't care about the indentation or alignment in the CSV output. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Use scnprintf() in prepare_interval()Namhyung Kim1-10/+10
It should not use sprintf() anymore. Let's pass the buffer size and use the safer scnprintf() instead. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Do not align time prefix in CSV outputNamhyung Kim1-3/+6
We don't care about the alignment in the CSV output as it's intended for machine processing. Let's get rid of it to make the output more compact. Before: # perf stat -a --summary -I 1 -x, true 0.001149309,219.20,msec,cpu-clock,219322251,100.00,219.200,CPUs utilized 0.001149309,144,,context-switches,219241902,100.00,656.935,/sec 0.001149309,38,,cpu-migrations,219173705,100.00,173.358,/sec 0.001149309,61,,page-faults,219093635,100.00,278.285,/sec 0.001149309,10679310,,cycles,218746228,100.00,0.049,GHz 0.001149309,6288296,,instructions,218589869,100.00,0.59,insn per cycle 0.001149309,1386904,,branches,218428851,100.00,6.327,M/sec 0.001149309,56863,,branch-misses,218219951,100.00,4.10,of all branches summary,219.20,msec,cpu-clock,219322251,100.00,20.025,CPUs utilized summary,144,,context-switches,219241902,100.00,656.935,/sec summary,38,,cpu-migrations,219173705,100.00,173.358,/sec summary,61,,page-faults,219093635,100.00,278.285,/sec summary,10679310,,cycles,218746228,100.00,0.049,GHz summary,6288296,,instructions,218589869,100.00,0.59,insn per cycle summary,1386904,,branches,218428851,100.00,6.327,M/sec summary,56863,,branch-misses,218219951,100.00,4.10,of all branches After: 0.001148449,224.75,msec,cpu-clock,224870589,100.00,224.747,CPUs utilized 0.001148449,176,,context-switches,224775564,100.00,783.103,/sec 0.001148449,38,,cpu-migrations,224707428,100.00,169.079,/sec 0.001148449,61,,page-faults,224629326,100.00,271.416,/sec 0.001148449,12172071,,cycles,224266368,100.00,0.054,GHz 0.001148449,6901907,,instructions,224108764,100.00,0.57,insn per cycle 0.001148449,1515655,,branches,223946693,100.00,6.744,M/sec 0.001148449,70027,,branch-misses,223735385,100.00,4.62,of all branches summary,224.75,msec,cpu-clock,224870589,100.00,21.066,CPUs utilized summary,176,,context-switches,224775564,100.00,783.103,/sec summary,38,,cpu-migrations,224707428,100.00,169.079,/sec summary,61,,page-faults,224629326,100.00,271.416,/sec summary,12172071,,cycles,224266368,100.00,0.054,GHz summary,6901907,,instructions,224108764,100.00,0.57,insn per cycle summary,1515655,,branches,223946693,100.00,6.744,M/sec summary,70027,,branch-misses,223735385,100.00,4.62,of all branches Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Move summary prefix printing logic in CSV outputNamhyung Kim1-7/+7
It matches to the prefix (interval timestamp), so better to have them together. No functional change intended. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-24perf stat: Fix cgroup display in JSON outputNamhyung Kim1-1/+1
It missed the 'else' keyword after checking json output mode. Fixes: 41cb875242e71bf1 ("perf stat: Split print_cgroup() function") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf lock contention: Do not use BPF task local storageNamhyung Kim2-12/+23
It caused some troubles when a lock inside kmalloc is contended because task local storage would allocate memory using kmalloc. It'd create a recusion and even crash in my system. There could be a couple of workarounds but I think the simplest one is to use a pre-allocated hash map. We could fix the task local storage to use the safe BPF allocator, but it takes time so let's change this until it happens actually. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Blake Jones <[email protected]> Cc: Chris Li <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Fix record test on KVM guestsMichael Petlan1-1/+1
Using precise flag with br_inst_retired.near_call causes the test fail on KVM guests, even when the guests have PMU forwarding enabled and the event itself is supported. Remove the precise flag in order to make the test work on KVM guests. Signed-off-by: Michael Petlan <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf inject: Set PERF_RECORD_MISC_BUILD_ID_SIZENamhyung Kim1-1/+2
With perf inject -b, it synthesizes build-id event for DSOs. But it missed to set the size and resulted in having trailing zeros. As perf record sets the size in write_build_id(), let's set the size here as well. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Skip watchpoint tests if no watchpoints availableNaveen N. Rao1-5/+7
On IBM Power9, perf watchpoint tests fail since no hardware breakpoints are available. Detect this by checking the error returned by perf_event_open() and skip the tests in that case. Reported-by: Disha Goel <[email protected]> Signed-off-by: Naveen N. Rao <[email protected]> Acked-by: Ian Rogers <[email protected]> Reviewed-by: Kajol Jain<[email protected]> Tested-by: Kajol Jain<[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected]
2022-11-23perf trace: Remove unused bpf map 'syscalls'Leo Yan2-118/+0
augmented_raw_syscalls.c defines the bpf map 'syscalls' which is initialized by perf tool in user space to indicate which system calls are enabled for tracing, on the other flip eBPF program relies on the map to filter out the trace events which are not enabled. The map also includes a field 'string_args_len[6]' which presents the string length if the corresponding argument is a string type. Now the map 'syscalls' is not used, bpf program doesn't use it as filter anymore, this is replaced by using the function bpf_tail_call() and PROG_ARRAY syscalls map. And we don't need to explicitly set the string length anymore, bpf_probe_read_str() is smart to copy the string and return string length. Therefore, it's safe to remove the bpf map 'syscalls'. To consolidate the code, this patch removes the definition of map 'syscalls' from augmented_raw_syscalls.c and drops code for using the map in the perf trace. Note, since function trace__set_ev_qualifier_bpf_filter() is removed, calling trace__init_syscall_bpf_progs() from it is also removed. We don't need to worry it because trace__init_syscall_bpf_progs() is still invoked from trace__init_syscalls_bpf_prog_array_maps() for initialization the system call's bpf program callback. After: # perf trace -e examples/bpf/augmented_raw_syscalls.c,open* --max-events 10 perf stat --quiet sleep 0.001 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libelf.so.1", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libdw.so.1", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libunwind.so.8", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libunwind-aarch64.so.8", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libcrypto.so.3", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libslang.so.2", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libperl.so.5.34", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 # perf trace -e examples/bpf/augmented_raw_syscalls.c --max-events 10 perf stat --quiet sleep 0.001 ... [continued]: execve()) = 0 brk(NULL) = 0xaaaab1d28000 faccessat(-100, "/etc/ld.so.preload", 4) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 close(3</usr/lib/aarch64-linux-gnu/libcrypto.so.3>) = 0 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3</usr/lib/aarch64-linux-gnu/libcrypto.so.3>, 0xfffff33f70d0, 832) = 832 munmap(0xffffb5519000, 28672) = 0 munmap(0xffffb55b7000, 32880) = 0 mprotect(0xffffb55a6000, 61440, PROT_NONE) = 0 Signed-off-by: Leo Yan <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf augmented_raw_syscalls: Remove unused variable 'syscall'Leo Yan1-1/+0
The local variable 'syscall' is not used anymore, remove it. Signed-off-by: Leo Yan <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf trace: Handle failure when trace point folder is missedLeo Yan1-7/+10
On Arm64 a case is perf tools fails to find the corresponding trace point folder for system calls listed in the table 'syscalltbl_arm64', e.g. the generated system call table contains "lookup_dcookie" but we cannot find out the matched trace point folder for it. We need to figure out if there have any issue for the generated system call table, on the other hand, we need to handle the case when trace point folder is missed under sysfs, this patch sets the flag syscall::nonexistent as true and returns the error from trace__read_syscall_info(). Another problem is for trace__syscall_info(), it returns two different values if a system call doesn't exist: at the first time calling trace__syscall_info() it returns NULL when the system call doesn't exist, later if call trace__syscall_info() again for the same missed system call, it returns pointer of syscall. trace__syscall_info() checks the condition 'syscalls.table[id].name == NULL', but the name will be assigned in the first invoking even the system call is not found. So checking system call's name in trace__syscall_info() is not the right thing to do, this patch simply checks flag syscall::nonexistent to make decision if a system call exists or not, finally trace__syscall_info() returns the consistent result (NULL) if a system call doesn't existed. Fixes: b8b1033fcaa091d8 ("perf trace: Mark syscall ids that are not allocated to avoid unnecessary error messages") Signed-off-by: Leo Yan <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf trace: Return error if a system call doesn't existLeo Yan1-2/+2
When a system call is not detected, the reason is either because the system call ID is out of scope or failure to find the corresponding path in the sysfs, trace__read_syscall_info() returns zero. Finally, without returning an error value it introduces confusion for the caller. This patch lets the function trace__read_syscall_info() to return -EEXIST when a system call doesn't exist. Fixes: b8b1033fcaa091d8 ("perf trace: Mark syscall ids that are not allocated to avoid unnecessary error messages") Signed-off-by: Leo Yan <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf trace: Use macro RAW_SYSCALL_ARGS_NUM to replace numberLeo Yan1-4/+7
This patch defines a macro RAW_SYSCALL_ARGS_NUM to replace the open coded number '6'. Signed-off-by: Leo Yan <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf list: Add JSON output optionIan Rogers2-67/+245
Output events and metrics in a JSON format by overriding the print callbacks. Currently other command line options aren't supported and metrics are repeated once per metric group. Committer testing: $ perf list cache List of pre-defined events (to be used in -e or -M): L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-loads [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] iTLB-loads [Hardware cache event] $ perf list --json cache [ { "Unit": "cache", "EventName": "L1-dcache-load-misses", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "L1-dcache-loads", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "L1-dcache-prefetches", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "L1-icache-load-misses", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "L1-icache-loads", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "branch-load-misses", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "branch-loads", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "dTLB-load-misses", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "dTLB-loads", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "iTLB-load-misses", "EventType": "Hardware cache event" }, { "Unit": "cache", "EventName": "iTLB-loads", "EventType": "Hardware cache event" } ] $ Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf list: Reorganize to use callbacks to allow honouring command line optionsIan Rogers7-512/+624
Rather than controlling the list output with passed flags, add callbacks that are called when an event or metric are encountered. State is passed to the callback so that command line options can be respected, alternatively the callbacks can be changed. Fix a few bugs: - wordwrap to columns metric descriptions and expressions; - remove unnecessary whitespace after PMU event names; - the metric filter is a glob but matched using strstr which will always fail, switch to using a proper globmatch, - the detail flag gives details for extra kernel PMU events like branch-instructions. In metricgroup.c switch from struct mep being a rbtree of metricgroups containing a list of metrics, to the tree directly containing all the metrics. In general the alias for a name is passed to the print routine rather than being contained in the name with OR. Committer notes: Check the asprint() return to address this on fedora 36: util/print-events.c: In function ‘print_sdt_events’: util/print-events.c:183:33: error: ignoring return value of ‘asprintf’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result] 183 | asprintf(&evt_name, "%s@%s(%.12s)", sdt_name->s, path, bid); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors $ gcc --version | head -1 gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2) $ Fix ps.pmu_glob setting when dealing with *:* events, it was being left with a freed pointer that then at the end of cmd_list() would be double freed. Check if pmu_name is NULL in default_print_event() before calling strglobmatch(pmu_name, ...) to avoid a segfault. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Xin Gao <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf build: Fix LIBTRACEEVENT_DYNAMICIan Rogers2-4/+24
The tools/lib includes fixes break LIBTRACEVENT_DYNAMIC as the makefile erroneously had dependencies on building libtraceevent even when not linking with it. This change fixes the issues with LIBTRACEEVENT_DYNAMIC by making the built files optional. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Replace data symbol test workload with datasymNamhyung Kim1-28/+1
So that it can get rid of requirement of a compiler. $ sudo ./perf test -v 109 109: Test data symbol : --- start --- test child forked, pid 844526 Recording workload... [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.354 MB /tmp/__perf_test.perf.data.GFeZO (4847 samples) ] Cleaning up files... test child finished with 0 ---- end ---- Test data symbol: Ok Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Add 'datasym' test workloadNamhyung Kim4-0/+28
The datasym workload is to check if perf mem command gets the data addresses precisely. This is needed for data symbol test. $ perf test -w datasym I had to keep the buf1 in the data section, otherwise it could end up in the BSS and was mmaped as a separate //anon region, then it was not symbolized at all. It needs to be fixed separately. Committer notes: Add a -U _FORTIFY_SOURCE to the datasym CFLAGS, as the main perf flags set it and it requires building with optimization, and this new test has a -O0. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Replace brstack test workloadNamhyung Kim1-54/+12
So that it can get rid of requirement of a compiler. Also rename the symbols to match with the perf test workload. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Acked-by: German Gomez <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Add 'brstack' test workloadNamhyung Kim4-0/+44
The brstack is to run different kinds of branches repeatedly. This is necessary for brstack test case to verify if it has correct branch info. $ perf test -w brstack I renamed the internal functions to have brstack_ prefix as it's too generic name. Add a -U_FORTIFY_SOURCE to the brstack CFLAGS, as the main perf flags set it and it requires building with optimization, and this new test has a -O0. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Replace arm spe fork test workload with sqrtloopNamhyung Kim1-43/+1
So that it can get rid of requirement of a compiler. I've also removed killall as it'll kill perf process now and run the test workload for 10 sec instead. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Add 'sqrtloop' test workloadNamhyung Kim4-0/+49
The sqrtloop creates a child process to run an infinite loop calling sqrt() with rand(). This is needed for ARM SPE fork test. $ perf test -w sqrtloop It can take an optional argument to specify how long it will run in seconds (default: 1). Committer notes: Explicitely ignored the sqrt() return to fix the build on systems where the compiler complains it isn't being used. And added a sqrtloop specific CFLAGS to disable optimizations to make this a bit more robust wrt dead code elimination. Doing that a -U_FORTIFY_SOURCE needs to be added, as -O0 is incompatible with it. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Replace arm callgraph fp test workload with leafloopNamhyung Kim1-31/+3
So that it can get rid of requirement of a compiler. Reviewed-by: Leo Yan <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-23perf test: Add 'leafloop' test workloadNamhyung Kim4-0/+39
The leafloop workload is to run an infinite loop in the test_leaf function. This is needed for the ARM fp callgraph test to verify if it gets the correct callchains. $ perf test -w leafloop Committer notes: Add a: -U_FORTIFY_SOURCE to the leafloop CFLAGS as the main perf flags set it and it requires building with optimization, and this new test has a -O0. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-20perf test: Replace record test workload with thloopNamhyung Kim1-56/+3
So that it can get rid of requirements for a compiler. $ sudo ./perf test -v 92 92: perf record tests : --- start --- test child forked, pid 740204 Basic --per-thread mode test Basic --per-thread mode test [Success] Register capture test Register capture test [Success] Basic --system-wide mode test Basic --system-wide mode test [Success] Basic target workload test Basic target workload test [Success] test child finished with 0 ---- end ---- perf record tests: Ok Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-20perf test: Add 'thloop' test workloadNamhyung Kim4-0/+56
The thloop is similar to noploop but runs in two threads. This is needed to verify perf record --per-thread to handle multi-threaded programs properly. $ perf test -w thloop It also takes an optional argument to specify runtime in seconds (default: 1). Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-20perf test: Replace pipe test workload with noploopNamhyung Kim1-45/+10
So that it can get rid of requirement of a compiler. Also define and use more local symbols to ease future changes. $ sudo ./perf test -v pipe 87: perf pipe recording and injection test : --- start --- test child forked, pid 748003 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 748014 748014 -1 |perf [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 99.83% perf perf [.] noploop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 99.85% perf perf [.] noploop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.160 MB /tmp/perf.data.2XYPdw (4007 samples) ] 99.83% perf perf [.] noploop test child finished with 0 ---- end ---- perf pipe recording and injection test: Ok Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-20perf test: Add -w/--workload optionNamhyung Kim5-0/+83
The -w/--workload option is to run a simple workload used by testing. This adds a basic framework to run the workloads and 'noploop' workload as an example. $ perf test -w noploop The noploop does a loop doing nothing (NOP) for a second by default. It can have an optional argument to specify the time in seconds. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhengjun Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-4/+12
include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-16perf build: Use tools/lib headers from install pathIan Rogers2-4/+14
Switch -I from tools/lib to the install path for the tools/lib libraries. Add the include_headers build targets to prepare target, as well as pmu-events.c compilation that dependes on libperf. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf cpumap: Tidy libperf includesIan Rogers6-5/+5
Use public API when possible, don't include internal API in header files in evsel.h. Fix any related breakages. Committer note: There was one missing case, when building for arm64: arch/arm64/util/pmu.c: In function 'pmu_events_table__find': arch/arm64/util/pmu.c:18:30: error: invalid use of undefined type 'struct perf_cpu_map' 18 | if (pmu->cpus->nr != cpu__max_cpu().cpu) | ^~ Fix it by adding one more exception, including <internal/cpumap.h> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf thread_map: Reduce exposure of libperf internal APIIan Rogers10-9/+12
Remove unnecessary include of internal threadmap.h and refcount.h in thread_map.h. Switch to using public APIs when possible or including the internal header file in the C file. Fix a transitive dependency in openat-syscall.c broken by the clean up. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf expr: Tidy hashmap dependencyIan Rogers9-18/+6
hashmap.h comes from libbpf but isn't installed with its headers. Always use the header file of the code in util. Change the hashmap.h dependency in expr.h to a forward declaration, add the necessary header file includes in the C files. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>