aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2020-05-05perf evsel: Rename perf_evsel__store_ids() to evsel__store_id()Arnaldo Carvalho de Melo2-2/+2
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__env() to evsel__env()Arnaldo Carvalho de Melo4-4/+4
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__group_idx() to evsel__group_idx()Arnaldo Carvalho de Melo2-7/+6
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__fallback() to evsel__fallback()Arnaldo Carvalho de Melo3-5/+3
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__has*() to evsel__has*()Arnaldo Carvalho de Melo4-6/+6
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__{prev,next}() to evsel__{prev,next}()Arnaldo Carvalho de Melo2-3/+3
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__parse_sample*() to evsel__parse_sample*()Arnaldo Carvalho de Melo4-13/+11
As these are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename *perf_evsel__read*() to *evsel__read()Arnaldo Carvalho de Melo3-24/+17
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Ditch perf_evsel__cmp(), not used for quite a whileArnaldo Carvalho de Melo1-6/+0
In 4c358d5cf361 ("perf stat: Replace transaction event possition check with id check") all its uses were removed, so ditch it. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__is_*() to evsel__is*()Arnaldo Carvalho de Melo14-74/+71
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf pmu: Add perf_pmu__find_by_type helperStephane Eranian2-0/+12
This is used by libpfm4 during event parsing to locate the pmu for an event. Signed-off-by: Stephane Eranian <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiwei Sun <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Cc: yuzhoujian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf mem2node: Avoid double free related to reallocIan Rogers1-1/+2
Realloc of size zero is a free not an error, avoid this causing a double free. Caught by clang's address sanitizer: ==2634==ERROR: AddressSanitizer: attempting double-free on 0x6020000015f0 in thread T0: #0 0x5649659297fd in free llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3 #1 0x5649659e9251 in __zfree tools/lib/zalloc.c:13:2 #2 0x564965c0f92c in mem2node__exit tools/perf/util/mem2node.c:114:2 #3 0x564965a08b4c in perf_c2c__report tools/perf/builtin-c2c.c:2867:2 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #8 0x564965942e41 in main tools/perf/perf.c:538:3 0x6020000015f0 is located 0 bytes inside of 1-byte region [0x6020000015f0,0x6020000015f1) freed by thread T0 here: #0 0x564965929da3 in realloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0x564965c0f55e in mem2node__init tools/perf/util/mem2node.c:97:16 #2 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8 #3 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #4 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #5 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #6 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #7 0x564965942e41 in main tools/perf/perf.c:538:3 previously allocated by thread T0 here: #0 0x564965929c42 in calloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0x5649659e9220 in zalloc tools/lib/zalloc.c:8:9 #2 0x564965c0f32d in mem2node__init tools/perf/util/mem2node.c:61:12 #3 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #8 0x564965942e41 in main tools/perf/perf.c:538:3 v2: add a WARN_ON_ONCE when the free condition arises. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__{str,int}val() and other tracepoint field ↵Arnaldo Carvalho de Melo3-18/+12
metehods to to evsel__*() As those are not 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__open_per_*() to evsel__open_per_*()Arnaldo Carvalho de Melo3-12/+6
As those are not 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__*filter*() to evsel__*filter*()Arnaldo Carvalho de Melo5-20/+18
As those are not 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename *perf_evsel__*set_sample_*() to *evsel__*set_sample_*()Arnaldo Carvalho de Melo6-60/+57
As they are not 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc()Arnaldo Carvalho de Melo4-5/+5
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename *perf_evsel__*name() to *evsel__*name()Arnaldo Carvalho de Melo14-70/+59
As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename __perf_evsel__sample_size() to __evsel__sample_size()Arnaldo Carvalho de Melo3-5/+5
As it is a 'struct evsel' related method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__calc_id_pos() to evsel__calc_id_pos()Arnaldo Carvalho de Melo3-6/+6
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__config*() to evsel__config*()Arnaldo Carvalho de Melo4-24/+19
As they are all 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__exit() to evsel__exit()Arnaldo Carvalho de Melo3-4/+4
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__is_aux_event() to evsel__is_aux_event()Arnaldo Carvalho de Melo4-6/+6
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__find_pmu() to evsel__find_pmu()Arnaldo Carvalho de Melo3-4/+4
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__compute_deltas() to evsel__compute_deltas()Arnaldo Carvalho de Melo3-7/+7
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename perf_evsel__nr_cpus() to evsel__nr_cpus()Arnaldo Carvalho de Melo3-8/+8
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Rename 'struct perf_evsel__sb_cb_t' to 'struct evsel__sb_cb_t'Arnaldo Carvalho de Melo3-7/+7
As the "perf_" prefix should be restricted to functions and types in tools/lib/perf/, aka libperf, this way we reduce a bit the confusion for types only in libperf or the ones in the more contained tools/perf/ project. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf intel-pt: Add support for synthesizing branch stacks for regular eventsAdrian Hunter1-7/+66
Use the new thread_stack__br_sample_late() function to create a thread stack for regular events. Example: # perf record --kcore --aux-sample -e '{intel_pt//,cycles:ppp}' -c 10000 uname Linux [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.743 MB perf.data ] # perf report --itrace=Le --stdio | head -30 | tail -18 # Samples: 11K of event 'cycles:ppp' # Event count (approx.): 11648 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................ ............................ .................. # 5.49% uname libc-2.30.so [.] _dl_addr [.] _dl_addr - 2.41% uname ld-2.30.so [.] _dl_relocate_object [.] _dl_relocate_object - 2.31% uname ld-2.30.so [.] do_lookup_x [.] do_lookup_x - 2.17% uname [kernel.kallsyms] [k] unmap_page_range [k] unmap_page_range - 2.05% uname ld-2.30.so [k] _dl_start [k] _dl_start - 1.97% uname ld-2.30.so [.] _dl_lookup_symbol_x [.] _dl_lookup_symbol_x - 1.94% uname [kernel.kallsyms] [k] filemap_map_pages [k] filemap_map_pages - 1.60% uname [kernel.kallsyms] [k] __handle_mm_fault [k] __handle_mm_fault - 1.44% uname [kernel.kallsyms] [k] page_add_file_rmap [k] page_add_file_rmap - 1.12% uname [kernel.kallsyms] [k] vma_interval_tree_insert [k] vma_interval_tree_insert - 0.94% uname [kernel.kallsyms] [k] perf_iterate_ctx [k] perf_iterate_ctx - Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf thread-stack: Add thread_stack__br_sample_late()Adrian Hunter2-0/+107
Add a thread stack function to create a branch stack for hardware events where the sample records get created some time after the event occurred. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evsel: Add support for synthesized branch stack sample typeAdrian Hunter2-1/+11
Allow for a synthesized branch stack to be added to samples. As with synthesized call chains, the sample type cannot be changed because it is needed to continue to parse events. So add and use helper function evsel__has_br_stack() to indicate a branch stack, whether original or synthesized. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf auxtrace: Add option to synthesize branch stack for regular eventsAdrian Hunter3-2/+9
There is an existing option to synthesize branch stacks for synthesized events. Add a new option to synthesize branch stacks for regular events. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf intel-pt: Change branch stack support to use thread-stacksAdrian Hunter1-102/+39
Change Intel PT's branch stack support to use thread stacks. The advantages of using branch stack support from the thread-stack are: 1. the branches are accumulated separately for each thread 2. the branch stack is cleared only in between continuous traces This helps pave the way for adding branch stacks to regular events, not just synthesized events as at present. While the 2 approaches are not identical, in simple cases the results can be identical e.g. Before: # perf record --kcore -e intel_pt// uname # perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp1.txt After: # perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp2.txt # diff -s cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt are identical Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf intel-pt: Consolidate thread-stack use conditionAdrian Hunter1-2/+6
The components of the condition do not change, so consolidate them in one variable. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf thread-stack: Add branch stack supportAdrian Hunter4-14/+108
Intel PT already has support for creating branch stacks for each context (per-cpu or per-thread). In the more common per-cpu case, the branch stack is not separated for different threads, instead being cleared in between each sample. That approach will not work very well for adding branch stacks to regular events. The branch stacks really need to be accumulated separately for each thread. As a start to accomplishing that, this patch adds support for putting branch stack support into the thread-stack. The advantages are: 1. the branches are accumulated separately for each thread 2. the branch stack is cleared only in between continuous traces This helps pave the way for adding branch stacks to regular events, not just synthesized events as at present. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf tools: Simplify checking if SMT is active.Konstantin Khlebnikov1-0/+4
SMT now could be disabled via "/sys/devices/system/cpu/smt/control". Status is shown in "/sys/devices/system/cpu/smt/active" simply as "0" / "1". If this knob isn't here then fallback to checking topology as before. Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dmitry Monakhov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/158817741394.748034.9273604089138009552.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf tools: Fix reading new topology attribute "core_cpus"Konstantin Khlebnikov1-3/+3
Check if access("devices/system/cpu/cpu%d/topology/core_cpus", F_OK) fails, which will happen unless the current directory is "/sys". Simply try to read this file first. Fixes: 0ccdb8407a46 ("perf tools: Apply new CPU topology sysfs attributes") Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dmitry Monakhov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/158817718710.747528.11009278875028211991.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf parse-events: Fix another memory leaks found on parse_events()Ian Rogers1-0/+1
Fix another memory leak found by applying LLVM's libfuzzer on parse_events(). Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf parse-events: Fix memory leaks found on parse_eventsIan Rogers1-1/+1
free_list_evsel() deals with tools/perf/ evsels, not with libperf perf_evsels, use the right destructor and avoid a leak, as evsel__delete() will delete something perf_evsel__delete() doesn't. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf parse-events: Fix memory leaks found on parse_eventsIan Rogers1-0/+1
Fix a memory leak found by applying LLVM's libfuzzer on parse_events(). Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] [ split from a larger patch, use zfree() ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evlist: Allow reusing the side band thread for more purposesArnaldo Carvalho de Melo2-0/+24
I.e. so far we had just one event in that side band thread, a dummy one with attr.bpf_event set, so that 'perf record' can go ahead and ask the kernel for further information about BPF programs being loaded. Allow for more than one event to be there, so that we can use it as well for the upcoming --switch-output-event feature. Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf evlist: Move the sideband thread routines to separate objectArnaldo Carvalho de Melo3-117/+126
To avoid dragging more stuff into the perf python binding in the following csets. Reported-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf parse-events: Add parse_events_option() variant that creates evlistArnaldo Carvalho de Melo2-0/+24
For the upcoming --switch-output-event option we want to create the side band event, populate it with the specified events and then, if it is present multiple times, go on adding to it, then, if the BPF tracking is required, use the first event to set its attr.bpf_event to get those PERF_RECORD_BPF_EVENT metadata events too. Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf bpf: Decouple creating the evlist from adding the SB eventArnaldo Carvalho de Melo4-24/+9
Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and requiring that the evlist be allocated beforehand. This will allow using the same side band thread and evlist to be used for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon after they are generated. Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf top: Move sb_evlist to 'struct perf_top'Arnaldo Carvalho de Melo1-1/+1
Where state related to a 'perf top' session is grouped. Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Song Liu <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-05perf tools: Move routines that probe for perf API features to separate fileArnaldo Carvalho de Melo8-158/+183
Trying to disentangle this a bit further, unfortunately it uses parse_events(), its interesting to have it separated anyway, so do it. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-30perf tools: Enable Hz/hz prinitg for --metric-only optionKajol Jain1-2/+0
Commit 54b5091606c18 ("perf stat: Implement --metric-only mode") added function 'valid_only_metric()' which drops "Hz" or "hz", if it is part of "ScaleUnit". This patch enable it since hv_24x7 supports couple of frequency events. Signed-off-by: Kajol Jain <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anju T Sudhakar <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jin Yao <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mamatha Inamdar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-30perf metricgroups: Enhance JSON/metric infrastructure to handle "?"Kajol Jain6-23/+67
Patch enhances current metric infrastructure to handle "?" in the metric expression. The "?" can be use for parameters whose value not known while creating metric events and which can be replace later at runtime to the proper value. It also add flexibility to create multiple events out of single metric event added in JSON file. Patch adds function 'arch_get_runtimeparam' which is a arch specific function, returns the count of metric events need to be created. By default it return 1. This infrastructure needed for hv_24x7 socket/chip level events. "hv_24x7" chip level events needs specific chip-id to which the data is requested. Function 'arch_get_runtimeparam' implemented in header.c which extract number of sockets from sysfs file "sockets" under "/sys/devices/hv_24x7/interface/". With this patch basically we are trying to create as many metric events as define by runtime_param. For that one loop is added in function 'metricgroup__add_metric', which create multiple events at run time depend on return value of 'arch_get_runtimeparam' and merge that event in 'group_list'. To achieve that we are actually passing this parameter value as part of `expr__find_other` function and changing "?" present in metric expression with this value. As in our JSON file, there gonna be single metric event, and out of which we are creating multiple events. To understand which data count belongs to which parameter value, we also printing param value in generic_metric function. For example, command:# ./perf stat -M PowerBUS_Frequency -C 0 -I 1000 1.000101867 9,356,933 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0 1.000101867 9,366,134 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1 2.000314878 9,365,868 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0 2.000314878 9,366,092 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1 So, here _0 and _1 after PowerBUS_Frequency specify parameter value. Signed-off-by: Kajol Jain <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anju T Sudhakar <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jin Yao <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mamatha Inamdar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-30perf tools: Remove unneeded semicolonsZou Wei4-4/+4
Fixes coccicheck warnings: tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon Reported-by: Hulk Robot <[email protected]> Signed-off-by: Zou Wei <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-30perf synthetic events: Remove use of sscanf from /proc readingIan Rogers1-52/+105
The synthesize benchmark, run on a single process and thread, shows perf_event__synthesize_mmap_events as the hottest function with fgets and sscanf taking the majority of execution time. fscanf performs similarly well. Replace the scanf call with manual reading of each field of the /proc/pid/maps line, and remove some unnecessary buffering. This change also addresses potential, but unlikely, buffer overruns for the string values read by scanf. Performance before is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 102.810 usec (+- 0.027 usec) Average num. events: 17.000 (+- 0.000) Average time per event 6.048 usec Average data synthesis took: 106.325 usec (+- 0.018 usec) Average num. events: 89.000 (+- 0.000) Average time per event 1.195 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 68103.100 usec (+- 441.234 usec) Average num. events: 30703.000 (+- 0.730) Average time per event 2.218 usec And after is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 50.388 usec (+- 0.031 usec) Average num. events: 17.000 (+- 0.000) Average time per event 2.964 usec Average data synthesis took: 52.693 usec (+- 0.020 usec) Average num. events: 89.000 (+- 0.000) Average time per event 0.592 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 45022.400 usec (+- 552.740 usec) Average num. events: 30624.200 (+- 10.037) Average time per event 1.470 usec On a Intel Xeon 6154 compiling with Debian gcc 9.2.1. Committer testing: On a AMD Ryzen 5 3600X 6-Core Processor: Before: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 267.491 usec (+- 0.176 usec) Average num. events: 56.000 (+- 0.000) Average time per event 4.777 usec Average data synthesis took: 277.257 usec (+- 0.169 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.966 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 81599.500 usec (+- 346.315 usec) Average num. events: 36096.100 (+- 2.523) Average time per event 2.261 usec # After: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 110.125 usec (+- 0.080 usec) Average num. events: 56.000 (+- 0.000) Average time per event 1.967 usec Average data synthesis took: 118.518 usec (+- 0.057 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.413 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 43490.700 usec (+- 284.527 usec) Average num. events: 37028.500 (+- 0.563) Average time per event 1.175 usec # Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrey Zhizhikin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-23perf record: Add num-synthesize-threads optionStephane Eranian1-0/+1
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option. On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable. As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/[email protected]/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch: https://lore.kernel.org/lkml/[email protected]/ Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4. Signed-off-by: Stephane Eranian <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tony Jones <[email protected]> Cc: yuzhoujian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>