aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2021-07-30Revert "perf map: Fix dso->nsinfo refcounting"Arnaldo Carvalho de Melo1-2/+0
This makes 'perf top' abort in some cases, and the right fix will involve surgery that is too much to do at this stage, so revert for now and fix it in the next merge window. This reverts commit 2d6b74baa7147251c30a46c4996e8cc224aa2dc5. Cc: Riccardo Mancini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-27perf pmu: Fix alias matchingJohn Garry1-9/+24
Commit c47a5599eda324ba ("perf tools: Fix pattern matching for same substring in different PMU type"), may have fixed some alias matching, but has broken some others. Firstly it cannot handle the simple scenario of PMU name in form pmu_name{digits} - it can only handle pmu_name_{digits}. Secondly it cannot handle more complex matching in the case where we have multiple tokens. In this scenario, the code failed to realise that we may examine multiple substrings in the PMU name. Fix in two ways: - Change perf_pmu__valid_suffix() to accept a PMU name without '_' in the suffix - Only pay attention to perf_pmu__valid_suffix() for the final token Also add const qualifiers as necessary to avoid casting. Fixes: c47a5599eda324ba ("perf tools: Fix pattern matching for same substring in different PMU type") Signed-off-by: John Garry <[email protected]> Tested-by: Jin Yao <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-27perf cs-etm: Split --dump-raw-trace by AUX recordsJames Clark1-2/+18
Currently --dump-raw-trace skips queueing and splitting buffers because of an early exit condition in cs_etm__process_auxtrace_info(). Once that is removed we can print the split data by using the queues and searching for split buffers with the same reference as the one that is currently being processed. This keeps the same behaviour of dumping in file order when an AUXTRACE event appears, rather than moving trace dump to where AUX records are in the file. There will be a newline and size printout for each fragment. For example this buffer is comprised of two AUX records, but was printed as one: 0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0 offset: 0 ref: 0x491a4dfc52fc0e6e idx: 0 t . ... CoreSight ETM Trace data: size 160 bytes Idx:0; ID:10; I_ASYNC : Alignment Synchronisation. Idx:12; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } Idx:17; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000; Idx:80; ID:10; I_ASYNC : Alignment Synchronisation. Idx:92; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } Idx:97; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4; But is now printed as two fragments: 0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0 offset: 0 ref: 0x491a4dfc52fc0e6e idx: 0 t . ... CoreSight ETM Trace data: size 80 bytes Idx:0; ID:10; I_ASYNC : Alignment Synchronisation. Idx:12; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } Idx:17; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000; . ... CoreSight ETM Trace data: size 80 bytes Idx:80; ID:10; I_ASYNC : Alignment Synchronisation. Idx:92; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 } Idx:97; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4; Decoding errors that appeared in problematic files are now not present, for example: Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC] ... PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822 Signed-off-by: James Clark <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-18perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernelYang Jihong6-42/+38
The "address" member of "struct probe_trace_point" uses long data type. If kernel is 64-bit and perf program is 32-bit, size of "address" variable is 32 bits. As a result, upper 32 bits of address read from kernel are truncated, an error occurs during address comparison in kprobe_warn_out_range(). Before: # perf probe -a schedule schedule is out of .text, skip it. Error: Failed to add events. Solution: Change data type of "address" variable to u64 and change corresponding address printing and value assignment. After: # perf.new.new probe -a schedule Added new event: probe:schedule (on schedule) You can now use it in all perf tools, such as: perf record -e probe:schedule -aR sleep 1 # perf probe -l probe:schedule (on schedule@kernel/sched/core.c) # perf record -e probe:schedule -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ] # perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'probe:schedule' # Event count (approx.): 1366 # # Overhead Command Shared Object Symbol # ........ ............... ................. ............ # 6.22% migration/0 [kernel.kallsyms] [k] schedule 6.22% migration/1 [kernel.kallsyms] [k] schedule 6.22% migration/2 [kernel.kallsyms] [k] schedule 6.22% migration/3 [kernel.kallsyms] [k] schedule 6.15% migration/10 [kernel.kallsyms] [k] schedule 6.15% migration/11 [kernel.kallsyms] [k] schedule 6.15% migration/12 [kernel.kallsyms] [k] schedule 6.15% migration/13 [kernel.kallsyms] [k] schedule 6.15% migration/14 [kernel.kallsyms] [k] schedule 6.15% migration/15 [kernel.kallsyms] [k] schedule 6.15% migration/4 [kernel.kallsyms] [k] schedule 6.15% migration/5 [kernel.kallsyms] [k] schedule 6.15% migration/6 [kernel.kallsyms] [k] schedule 6.15% migration/7 [kernel.kallsyms] [k] schedule 6.15% migration/8 [kernel.kallsyms] [k] schedule 6.15% migration/9 [kernel.kallsyms] [k] schedule 0.22% rcu_sched [kernel.kallsyms] [k] schedule ... # # (Cannot load tips.txt file, please install perf!) # Signed-off-by: Yang Jihong <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Frank Ch. Eigler <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jianlin Lv <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Huafei <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Srikar Dronamraju <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-18perf data: Close all files in close_dir()Riccardo Mancini1-1/+1
When using 'perf report' in directory mode, the first file is not closed on exit, causing a memory leak. The problem is caused by the iterating variable never reaching 0. Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions") Signed-off-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zhen Lei <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-18perf probe-file: Delete namelist in del_events() on the error pathRiccardo Mancini1-2/+2
ASan reports some memory leaks when running: # perf test "42: BPF filter" This second leak is caused by a strlist not being dellocated on error inside probe_file__del_events. This patch adds a goto label before the deallocation and makes the error path jump to it. Signed-off-by: Riccardo Mancini <[email protected]> Fixes: e7895e422e4da63d ("perf probe: Split del_perf_probe_events()") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf lzma: Close lzma stream on exitRiccardo Mancini1-3/+5
ASan reports memory leaks when running: # perf test "88: Check open filename arg using perf trace + vfs_getname" One of these is caused by the lzma stream never being closed inside lzma_decompress_to_file(). This patch adds the missing lzma_end(). Signed-off-by: Riccardo Mancini <[email protected]> Fixes: 80a32e5b498a7547 ("perf tools: Add lzma decompression support for kernel module") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/aaf50bdce7afe996cfc06e1bbb36e4a2a9b9db93.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf session: Cleanup trace_eventRiccardo Mancini1-0/+1
ASan reports several memory leaks when running: # perf test "82: Use vfs_getname probe to get syscall args filenames" many of which are related to session->tevent. This patch will solve this problem, then next patch will fix the remaining memory leaks in 'perf script'. This bug is due to a missing deallocation of the trace_event data strutures. This patch adds the missing trace_event__cleanup() in perf_session__delete(). Signed-off-by: Riccardo Mancini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/fa2a3f221d90e47ce4e5b7e2d6e64c3509ddc96a.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf report: Free generated help strings for sort optionRiccardo Mancini2-2/+2
ASan reports the memory leak of the strings allocated by sort_help() when running perf report. This patch changes the returned pointer to char* (instead of const char*), saves it in a temporary variable, and finally deallocates it at function exit. Signed-off-by: Riccardo Mancini <[email protected]> Fixes: 702fb9b415e7c99b ("perf report: Show all sort keys in help output") Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/a38b13f02812a8a6759200b9063c6191337f44d4.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf env: Fix memory leak of cpu_pmu_capsRiccardo Mancini1-0/+1
ASan reports memory leaks while running: # perf test "83: Zstd perf.data compression/decompression" The first of the leaks is caused by env->cpu_pmu_caps not being freed. This patch adds the missing (z)free inside perf_env__exit. Signed-off-by: Riccardo Mancini <[email protected]> Fixes: 6f91ea283a1ed23e ("perf header: Support CPU PMU capabilities") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/6ba036a8220156ec1f3d6be3e5d25920f6145028.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf dso: Fix memory leak in dso__new_map()Riccardo Mancini1-1/+3
ASan reports a memory leak when running: # perf test "65: maps__merge_in". The causes of the leaks are two, this patch addresses only the first one, which is related to dso__new_map(). The bug is that dso__new_map() creates a new dso but never decreases the refcount it gets from creating it. This patch adds the missing dso__put(). Signed-off-by: Riccardo Mancini <[email protected]> Fixes: d3a7c489c7fd2463 ("perf tools: Reference count struct dso") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/60bfe0cd06e89e2ca33646eb8468d7f5de2ee597.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf env: Fix sibling_dies memory leakRiccardo Mancini1-0/+1
ASan reports a memory leak in perf_env while running: # perf test "41: Session topology" Caused by sibling_dies not being freed. This patch adds the required free. Fixes: acae8b36cded0ee6 ("perf header: Add die information in CPU topology") Signed-off-by: Riccardo Mancini <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/2140d0b57656e4eb9021ca9772250c24c032924b.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf probe: Fix dso->nsinfo refcountingRiccardo Mancini1-1/+3
ASan reports a memory leak of nsinfo during the execution of: # perf test "31: Lookup mmap thread". The leak is caused by a refcounted variable being replaced without dropping the refcount. This patch makes sure that the refcnt of nsinfo is decreased whenever a refcounted variable is replaced with a new value. Signed-off-by: Riccardo Mancini <[email protected]> Fixes: 544abd44c7064c8a ("perf probe: Allow placing uprobes in alternate namespaces.") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/55223bc8821b34ccb01f92ef1401c02b6a32e61f.1626343282.git.rickyman7@gmail.com [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-15perf map: Fix dso->nsinfo refcountingRiccardo Mancini1-0/+2
ASan reports a memory leak of nsinfo during the execution of # perf test "31: Lookup mmap thread" The leak is caused by a refcounted variable being replaced without dropping the refcount. This patch makes sure that the refcnt of nsinfo is decreased whenever a refcounted variable is replaced with a new value. Signed-off-by: Riccardo Mancini <[email protected]> Fixes: bf2e710b3cb8445c ("perf maps: Lookup maps in both intitial mountns and inner mountns.") Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krister Johansen <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/55223bc8821b34ccb01f92ef1401c02b6a32e61f.1626343282.git.rickyman7@gmail.com [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-14perf cs-etm: Split Coresight decode by aux recordsJames Clark1-1/+167
Populate the auxtrace queues using AUX records rather than whole auxtrace buffers so that the decoder is reset between each aux record. This is similar to the auxtrace_queues__process_index() -> auxtrace_queues__add_indexed_event() flow where perf_session__peek_event() is used to read AUXTRACE events out of random positions in the file based on the auxtrace index. But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer using the index, and add a fragment of that buffer to the auxtrace queues. No other changes to decoding were made, apart from populating the auxtrace queues. The result of decoding is identical to before, except in cases where decoding failed completely, due to not resetting the decoder. The reason for this change is because AUX records are emitted any time tracing is disabled, for example when the process is scheduled out. Because ETM was disabled and enabled again, the decoder also needs to be reset to force the search for a sync packet. Otherwise there would be fatal decoding errors. Testing ======= Testing was done with the following script, to diff the decoding results between the patched and un-patched versions of perf: #!/bin/bash set -ex $1 script -i $3 $4 > split.script $2 script -i $3 $4 > default.script diff split.script default.script | head -n 20 And it was run like this, with various itrace options depending on the quantity of synthesised events: compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns No changes in output were observed in the following scenarios: * Simple per-cpu perf record -e cs_etm/@tmc_etr0/u top * Per-thread, single thread perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C * Per-thread multiple threads (but only one thread collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-thread multiple threads (both threads collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-cpu explicit threads: perf record -e cs_etm/@tmc_etr0/u --pid 853,854 * System-wide (per-cpu): perf record -e cs_etm/@tmc_etr0/u -a * No data collected (no aux buffers) Can happen with any command when run for a short period * Containing truncated records Can happen with any command * Containing aux records with 0 size Can happen with any command * Snapshot mode (various files with and without buffer wrap) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Some differences were observed in the following scenario: * Snapshot mode (with duplicate buffers) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Fewer samples are generated in snapshot mode if duplicate buffers were gathered because buffers with the same offset are now only added once. This gives different, but more correct results and no duplicate data is decoded any more. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Tested-by: Leo Yan <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-14libperf: Fix build error with LIBPFM4=1Heiko Carstens1-1/+1
Fix build error with LIBPFM4=1: CC util/pfm.o util/pfm.c: In function ‘parse_libpfm_events_option’: util/pfm.c:102:30: error: ‘struct evsel’ has no member named ‘leader’ 102 | evsel->leader = grp_leader; | ^~ Committer notes: There is this entry in 'make -C tools/perf build-test' to test the build with libpfm: $ grep libpfm tools/perf/tests/make make_with_libpfm4 := LIBPFM4=1 run += make_with_libpfm4 $ But the test machine lacked libpfm-devel, now its installed and further cases like this shouldn't happen. Committer testing: Before this patch this fails, after applying it: $ make -C tools/perf build-test make: Entering directory '/var/home/acme/git/perf/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . make_static: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1 -j24 DESTDIR=/tmp/tmp.KzFSfvGRQa <SNIP> make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1 make_with_libpfm4_O: make LIBPFM4=1 make_install_prefix_O: make install prefix=/tmp/krava make_no_auxtrace_O: make NO_AUXTRACE=1 <SNIP> $ rpm -q libpfm-devel libpfm-devel-4.11.0-4.fc34.x86_64 $ FIXME: This shows a need for 'build-test' to bail out when a build option is specified that has no required library devel files installed. Fixes: fba7c86601e2e42d ("libperf: Move 'leader' from tools/perf to perf_evsel::leader") Signed-off-by: Heiko Carstens <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-14perf stat: Merge uncore events by default for hybrid platformJin Yao1-1/+13
On a hybrid platform, by default 'perf stat' aggregates and reports the event counts per PMU. For example, # perf stat -e cycles -a true Performance counter stats for 'system wide': 1,400,445 cpu_core/cycles/ 680,881 cpu_atom/cycles/ 0.001770773 seconds time elapsed But for uncore events that's not a suitable method. Uncore has nothing to do with hybrid. So for uncore events, we aggregate event counts from all PMUs and report the counts without PMUs. Before: # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true Performance counter stats for 'system wide': 2,058 uncore_arb_0/event=0x81,umask=0x1/ 2,028 uncore_arb_1/event=0x81,umask=0x1/ 0 uncore_arb_0/event=0x84,umask=0x1/ 0 uncore_arb_1/event=0x84,umask=0x1/ 0.000614498 seconds time elapsed After: # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true Performance counter stats for 'system wide': 3,996 arb/event=0x81,umask=0x1/ 0 arb/event=0x84,umask=0x1/ 0.000630046 seconds time elapsed Of course, we also keep the '--no-merge' working for uncore events. # perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true Performance counter stats for 'system wide': 1,952 uncore_arb_0/event=0x81,umask=0x1/ 1,921 uncore_arb_1/event=0x81,umask=0x1/ 0 uncore_arb_0/event=0x84,umask=0x1/ 0 uncore_arb_1/event=0x84,umask=0x1/ 0.000575536 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-14perf pmu: Skip invalid hybrid pmuJin Yao1-1/+8
On hybrid platform, such as Alderlake, if atom CPUs are offlined, the kernel still exports the sysfs path '/sys/devices/cpu_atom/' for 'cpu_atom' pmu but the file '/sys/devices/cpu_atom/cpus' is empty, which indicates this is an invalid pmu. Need to check and skip the invalid hybrid pmu. Before: # perf list ... branch-instructions OR cpu_atom/branch-instructions/ [Kernel PMU event] branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event] branch-misses OR cpu_atom/branch-misses/ [Kernel PMU event] branch-misses OR cpu_core/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_atom/bus-cycles/ [Kernel PMU event] bus-cycles OR cpu_core/bus-cycles/ [Kernel PMU event] ... The cpu_atom events are still displayed even if atom CPUs are offlined. After: # perf list ... branch-instructions OR cpu_core/branch-instructions/ [Kernel PMU event] branch-misses OR cpu_core/branch-misses/ [Kernel PMU event] bus-cycles OR cpu_core/bus-cycles/ [Kernel PMU event] ... Now only cpu_core events are displayed. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09perf tools: Fix pattern matching for same substring in different PMU typeJin Yao3-2/+37
Some different PMU types may have the same substring. For example, on Icelake server we have PMU types "uncore_imc" and "uncore_imc_free_running". Both PMU types have the substring "uncore_imc". But the parser wrongly thinks they are the same PMU type. We enable an imc event, perf stat -e uncore_imc/event=0xe3/ -a -- sleep 1 Perf actually expands the event to: uncore_imc_0/event=0xe3/ uncore_imc_1/event=0xe3/ uncore_imc_2/event=0xe3/ uncore_imc_3/event=0xe3/ uncore_imc_4/event=0xe3/ uncore_imc_5/event=0xe3/ uncore_imc_6/event=0xe3/ uncore_imc_7/event=0xe3/ uncore_imc_free_running_0/event=0xe3/ uncore_imc_free_running_1/event=0xe3/ uncore_imc_free_running_3/event=0xe3/ uncore_imc_free_running_4/event=0xe3/ That's because the "uncore_imc_free_running" matches the pattern "uncore_imc*". Now we check that the last characters of PMU name is '_<digit>'. For example, for pattern "uncore_imc*", "uncore_imc_0" is parsed ok, but "uncore_imc_free_running_0" fails. Fixes: b2b9d3a3f0211c5d ("perf pmu: Support wildcards on pmu name in dynamic pmu events") Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Kan Liang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Agustin Vega-Frias <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09libperf: Adopt evlist__set_leader() from tools/perf as perf_evlist__set_leader()Jiri Olsa3-20/+2
Move the implementation of evlist__set_leader() to a new libperf perf_evlist__set_leader() function with the same functionality make it a libperf exported API. Signed-off-by: Jiri Olsa <[email protected]> Requested-by: Shunsuke Nakamura <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09libperf: Move 'nr_groups' from tools/perf to evlist::nr_groupsJiri Olsa5-7/+6
Move evsel::nr_groups to perf_evsel::nr_groups, so we can move the group interface to libperf. Signed-off-by: Jiri Olsa <[email protected]> Requested-by: Shunsuke Nakamura <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09libperf: Move 'leader' from tools/perf to perf_evsel::leaderJiri Olsa12-38/+61
Move evsel::leader to perf_evsel::leader, so we can move the group interface to libperf. Also add several evsel helpers to ease up the transition: struct evsel *evsel__leader(struct evsel *evsel); - get leader evsel bool evsel__has_leader(struct evsel *evsel, struct evsel *leader); - true if evsel has leader as leader bool evsel__is_leader(struct evsel *evsel); - true if evsel is itw own leader void evsel__set_leader(struct evsel *evsel, struct evsel *leader); - set leader for evsel Committer notes: Fix this when building with 'make BUILD_BPF_SKEL=1' tools/perf/util/bpf_counter.c - if (evsel->leader->core.nr_members > 1) { + if (evsel->core.leader->nr_members > 1) { Signed-off-by: Jiri Olsa <[email protected]> Requested-by: Shunsuke Nakamura <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09libperf: Move 'idx' from tools/perf to perf_evsel::idxJiri Olsa10-35/+31
Move evsel::idx to perf_evsel::idx, so we can move the group interface to libperf. Committer notes: Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that appeared in my tree in my local tree. Also fixed up these: $ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx' tools/perf/ui/gtk/annotate.c: evsel->idx + i); tools/perf/ui/gtk/annotate.c: evsel->idx); $ That running 'make -C tools/perf build-test' caught. Signed-off-by: Jiri Olsa <[email protected]> Requested-by: Shunsuke Nakamura <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf intel-pt: Add a config for max loops without consuming a packetAdrian Hunter3-4/+15
The Intel PT decoder limits the number of unconditional branches (e.g. jmps) decoded without consuming any trace packets. Generally, a loop needs a conditional branch which generates a TNT packet, whereas a "ret" instruction will generate a TIP or TNT packet. So exceeding the limit is assumed to be a never-ending loop, which can happen if there has been a decoding error putting the decoder at the wrong place in the code. Up until now, the limit of 10000 has been enough but some analytic purposes have been reported to exceed that. Increase the limit to 100000, and make it configurable via perf config intel-pt.max-loops. Also amend the "Never-ending loop" message to mention the configuration entry. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf stat: Disable the NMI watchdog message on hybridJin Yao1-3/+6
If we run a single workload that only runs on big core, there is always a ugly message about disabling the NMI watchdog because the atom is not counted. Before: # ./perf stat true Performance counter stats for 'true': 0.43 msec task-clock # 0.396 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 45 page-faults # 103.918 K/sec 639,634 cpu_core/cycles/ # 1.477 G/sec <not counted> cpu_atom/cycles/ (0.00%) 643,498 cpu_core/instructions/ # 1.486 G/sec <not counted> cpu_atom/instructions/ (0.00%) 123,715 cpu_core/branches/ # 285.694 M/sec <not counted> cpu_atom/branches/ (0.00%) 4,094 cpu_core/branch-misses/ # 9.454 M/sec <not counted> cpu_atom/branch-misses/ (0.00%) 0.001092407 seconds time elapsed 0.001144000 seconds user 0.000000000 seconds sys Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true Performance counter stats for 'true': <not counted> cpu_atom/cycles/ (0.00%) <not counted> msr/tsc/ (0.00%) 0.001904106 seconds time elapsed 0.001947000 seconds user 0.000000000 seconds sys Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog The events in group usually have to be from the same PMU. Try reorganizing the group. Now we disable the NMI watchdog message on hybrid, otherwise there are too many false positives. After: # ./perf stat true Performance counter stats for 'true': 0.79 msec task-clock # 0.419 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 48 page-faults # 60.889 K/sec 777,692 cpu_core/cycles/ # 986.519 M/sec <not counted> cpu_atom/cycles/ (0.00%) 669,147 cpu_core/instructions/ # 848.828 M/sec <not counted> cpu_atom/instructions/ (0.00%) 128,635 cpu_core/branches/ # 163.176 M/sec <not counted> cpu_atom/branches/ (0.00%) 4,089 cpu_core/branch-misses/ # 5.187 M/sec <not counted> cpu_atom/branch-misses/ (0.00%) 0.001880649 seconds time elapsed 0.001935000 seconds user 0.000000000 seconds sys # ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true Performance counter stats for 'true': <not counted> cpu_atom/cycles/ (0.00%) <not counted> msr/tsc/ (0.00%) 0.000963319 seconds time elapsed 0.000999000 seconds user 0.000000000 seconds sys Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf script python: Fix buffer size to report iregs in perf scriptKajol Jain1-5/+12
Commit 48a1f565261d2ab1 ("perf script python: Add more PMU fields to event handler dict") added functionality to report fields like weight, iregs, uregs etc via perf report. That commit predefined buffer size to 512 bytes to print those fields. But in PowerPC, since we added extended regs support in: 068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs") d735599a069f6936 ("powerpc/perf: Add extended regs support for power10 platform") Now iregs can carry more bytes of data and this predefined buffer size can result to data loss in perf script output. This patch resolves this issue by making the buffer size dynamic, based on the number of registers needed to print. It also changes the regs_map() return type from int to void, as it is not being used by the set_regs_in_dict(), its only caller. Fixes: 068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs") Signed-off-by: Kajol Jain <[email protected]> Tested-by: Nageswara R Sastry <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf top: Fix overflow in elf_sec__is_text()Riccardo Mancini1-3/+14
ASan reports a heap-buffer-overflow in elf_sec__is_text when using perf-top. The bug is caused by the fact that secstrs is built from runtime_ss, while shdr is built from syms_ss if shdr.sh_type != SHT_NOBITS. Therefore, they point to two different ELF files. This patch renames secstrs to secstrs_run and adds secstrs_sym, so that the correct secstrs is chosen depending on shdr.sh_type. $ ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 ./perf top ================================================================= ==363148==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300009add6 at pc 0x00000049875c bp 0x7f4f56446440 sp 0x7f4f56445bf0 READ of size 1 at 0x61300009add6 thread T6 #0 0x49875b in StrstrCheck(void*, char*, char const*, char const*) (/home/user/linux/tools/perf/perf+0x49875b) #1 0x4d13a2 in strstr (/home/user/linux/tools/perf/perf+0x4d13a2) #2 0xacae36 in elf_sec__is_text /home/user/linux/tools/perf/util/symbol-elf.c:176:9 #3 0xac3ec9 in elf_sec__filter /home/user/linux/tools/perf/util/symbol-elf.c:187:9 #4 0xac2c3d in dso__load_sym /home/user/linux/tools/perf/util/symbol-elf.c:1254:20 #5 0x883981 in dso__load /home/user/linux/tools/perf/util/symbol.c:1897:9 #6 0x8e6248 in map__load /home/user/linux/tools/perf/util/map.c:332:7 #7 0x8e66e5 in map__find_symbol /home/user/linux/tools/perf/util/map.c:366:6 #8 0x7f8278 in machine__resolve /home/user/linux/tools/perf/util/event.c:707:13 #9 0x5f3d1a in perf_event__process_sample /home/user/linux/tools/perf/builtin-top.c:773:6 #10 0x5f30e4 in deliver_event /home/user/linux/tools/perf/builtin-top.c:1197:3 #11 0x908a72 in do_flush /home/user/linux/tools/perf/util/ordered-events.c:244:9 #12 0x905fae in __ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:323:8 #13 0x9058db in ordered_events__flush /home/user/linux/tools/perf/util/ordered-events.c:341:9 #14 0x5f19b1 in process_thread /home/user/linux/tools/perf/builtin-top.c:1109:7 #15 0x7f4f6a21a298 in start_thread /usr/src/debug/glibc-2.33-16.fc34.x86_64/nptl/pthread_create.c:481:8 #16 0x7f4f697d0352 in clone ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 0x61300009add6 is located 10 bytes to the right of 332-byte region [0x61300009ac80,0x61300009adcc) allocated by thread T6 here: #0 0x4f3f7f in malloc (/home/user/linux/tools/perf/perf+0x4f3f7f) #1 0x7f4f6a0a88d9 (/lib64/libelf.so.1+0xa8d9) Thread T6 created by T0 here: #0 0x464856 in pthread_create (/home/user/linux/tools/perf/perf+0x464856) #1 0x5f06e0 in __cmd_top /home/user/linux/tools/perf/builtin-top.c:1309:6 #2 0x5ef19f in cmd_top /home/user/linux/tools/perf/builtin-top.c:1762:11 #3 0x7b28c0 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #4 0x7b119f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #5 0x7b2423 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #6 0x7b0c19 in main /home/user/linux/tools/perf/perf.c:539:3 #7 0x7f4f696f7b74 in __libc_start_main /usr/src/debug/glibc-2.33-16.fc34.x86_64/csu/../csu/libc-start.c:332:16 SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/user/linux/tools/perf/perf+0x49875b) in StrstrCheck(void*, char*, char const*, char const*) Shadow bytes around the buggy address: 0x0c268000b560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c268000b5b0: 00 00 00 00 00 00 00 00 00 04[fa]fa fa fa fa fa 0x0c268000b5c0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 0x0c268000b5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c268000b5f0: 07 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c268000b600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==363148==ABORTING Suggested-by: Jiri Slaby <[email protected]> Signed-off-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Remi Bernon <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf symbol-elf: Decode dynsym even if symtab existsMasami Hiramatsu1-28/+54
In Fedora34, libc-2.33.so has both .dynsym and .symtab sections and most of (not all) symbols moved to .dynsym. In this case, perf only decode the symbols in .symtab, and perf probe can not list up the functions in the library. To fix this issue, decode both .symtab and .dynsym sections. Without this fix, ----- $ ./perf probe -x /usr/lib64/libc-2.33.so -F @plt @plt calloc@plt free@plt malloc@plt memalign@plt realloc@plt ----- With this fix. ----- $ ./perf probe -x /usr/lib64/libc-2.33.so -F @plt @plt a64l abort abs accept accept4 access acct addmntent ----- Reported-by: Thomas Richter <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Thomas Richter <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Stefan Liebler <[email protected]> Cc: Sven Schnelle <[email protected]> Link: http://lore.kernel.org/lkml/162532652681.393143.10163733179955267999.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf probe: Fix debuginfo__new() to enable build-id based debuginfoMasami Hiramatsu1-0/+5
Fix debuginfo__new() to set the build-id to dso before dso__read_binary_type_filename() so that it can find DSO_BINARY_TYPE__BUILDID_DEBUGINFO debuginfo correctly. However, this may not change the result, because elfutils (libdwfl) has its own debuginfo finder. With/without this patch, the perf probe correctly find the debuginfo file. This is just a failsafe and keep code's sanity (if you use dso__read_binary_type_filename(), you must set the build-id to the dso.) Reported-by: Thomas Richter <[email protected]> Acked-by: Thomas Richter <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Stefan Liebler <[email protected]> Cc: Sven Schnelle <[email protected]> Link: http://lore.kernel.org/lkml/162532651863.393143.11692691321219235810.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-05perf stat: Enable BPF counter with --for-each-cgroupNamhyung Kim6-0/+507
Recently bperf was added to use BPF to count perf events for various purposes. This is an extension for the approach and targetting to cgroup usages. Unlike the other bperf, it doesn't share the events with other processes but it'd reduce unnecessary events (and the overhead of multiplexing) for each monitored cgroup within the perf session. When --for-each-cgroup is used with --bpf-counters, it will open cgroup-switches event per cpu internally and attach the new BPF program to read given perf_events and to aggregate the results for cgroups. It's only called when task is switched to a task in a different cgroup. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf session: Add missing evlist__delete when deleting a sessionRiccardo Mancini1-1/+4
ASan reports a memory leak caused by evlist not being deleted on exit in perf-report, perf-script and perf-data. The problem is caused by evlist->session not being deleted, which is allocated in perf_session__read_header, called in perf_session__new if perf_data is in read mode. In case of write mode, the session->evlist is filled by the caller. This patch solves the problem by calling evlist__delete in perf_session__delete if perf_data is in read mode. Changes in v2: - call evlist__delete from within perf_session__delete v1: https://lore.kernel.org/lkml/[email protected]/ ASan report follows: $ ./perf script report flamegraph ================================================================= ==227640==ERROR: LeakSanitizer: detected memory leaks <SNIP unrelated> Indirect leak of 2704 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x7f999e in evlist__new /home/user/linux/tools/perf/util/evlist.c:77:26 #3 0x8ad938 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3797:20 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 568 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x80ce88 in evsel__new_idx /home/user/linux/tools/perf/util/evsel.c:268:24 #3 0x8aed93 in evsel__new /home/user/linux/tools/perf/util/evsel.h:210:9 #4 0x8ae07e in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3853:11 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 264 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbe3e70 in xyarray__new /home/user/linux/tools/lib/perf/xyarray.c:10:23 #3 0xbd7754 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:361:21 #4 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbd77e0 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:365:14 #3 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 7 byte(s) in 1 object(s) allocated from: #0 0x4b8207 in strdup (/home/user/linux/tools/perf/perf+0x4b8207) #1 0x8b4459 in evlist__set_event_name /home/user/linux/tools/perf/util/header.c:2292:16 #2 0x89d862 in process_event_desc /home/user/linux/tools/perf/util/header.c:2313:3 #3 0x8af319 in perf_file_section__process /home/user/linux/tools/perf/util/header.c:3651:9 #4 0x8aa6e9 in perf_header__process_sections /home/user/linux/tools/perf/util/header.c:3427:9 #5 0x8ae3e7 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3886:2 #6 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #7 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #8 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #9 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #10 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #11 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #12 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #13 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) SUMMARY: AddressSanitizer: 3728 byte(s) leaked in 7 allocation(s). Signed-off-by: Riccardo Mancini <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf dlfilter: Add object_code() to perf_dlfilter_fnsAdrian Hunter2-1/+37
Add a function, for use by dlfilters, to read object code. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf dlfilter: Add attr() to perf_dlfilter_fnsAdrian Hunter2-1/+14
Add a function, for use by dlfilters, to return the perf_event_attr structure. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf dlfilter: Add srcline() to perf_dlfilter_fnsAdrian Hunter2-1/+31
Add a function, for use by dlfilters, to return source code file name and line number. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf dlfilter: Add insn() to perf_dlfilter_fnsAdrian Hunter2-1/+35
Add a function, for use by dlfilters, to return instruction bytes. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf dlfilter: Add resolve_address() to perf_dlfilter_fnsAdrian Hunter2-1/+35
Add a function, for use by dlfilters, to resolve addresses from branch stacks or callchains. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add option to pass arguments to dlfiltersAdrian Hunter3-9/+46
Add option --dlarg to pass arguments to dlfilters. The --dlarg option can be repeated to pass more than 1 argument. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add option to list dlfiltersAdrian Hunter3-1/+124
Add option --list-dlfilters to list dlfilters in the current directory or the exec-path e.g. ~/libexec/perf-core/dlfilters. Use with option -v (must come before option --list-dlfilters) to show long descriptions. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add dlfilter__filter_event_early()Adrian Hunter3-4/+32
filter_event_early() can be more than 30% faster than filter_event() because it is called before internal filtering. In other respects it is the same as filter_event(), except that it will be passed events that have yet to be filtered out. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add API for filtering via dynamically loaded shared objectAdrian Hunter4-0/+528
In some cases, users want to filter very large amounts of data (e.g. from AUX area tracing like Intel PT) looking for something specific. While scripting such as Python can be used, Python is 10 to 20 times slower than C. So define a C API so that custom filters can be written and loaded. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf llvm: Return -ENOMEM when asprintf() failsArnaldo Carvalho de Melo1-0/+2
Zhihao sent a patch but it made llvm__compile_bpf() return what asprintf() returns on error, which is just -1, but since this function returns -errno, fix it by returning -ENOMEM for this case instead. Fixes: cb76371441d098 ("perf llvm: Allow passing options to llc ...") Fixes: 5eab5a7ee032ac ("perf llvm: Display eBPF compiling command ...") Reported-by: Hulk Robot <[email protected]> Reported-by: Zhihao Cheng <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yu Kuai <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()James Clark1-1/+5
Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and non-timeless mode starts decoding on the fist PERF_RECORD_AUX record. This can cause the "data has no samples!" error if the first PERF_RECORD_AUX record comes before the first (or any relevant) PERF_RECORD_MMAP2 record because the mmaps are required by the decoder to access the binary data. This change pushes the start of non-timeless decoding to the very end of parsing the file. The PERF_RECORD_EXIT event can't be used because it might not exist in system-wide or snapshot modes. I have not been able to find the exact cause for the events to be intermittently in the wrong order in the basic scenario: perf record -e cs_etm/@tmc_etr0/u top But it can be made to happen every time with the --delay option. This is because "enable_on_exec" is disabled, which causes tracing to start before the process to be launched is exec'd. For example: perf record -e cs_etm/@tmc_etr0/u --delay=1 top perf report -D | grep 'AUX\|MAP' 0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 [] 0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 [] 0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 [] 4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top 4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so Another scenario in which decoding from the first aux record fails is a workload that forks. Although the aux record comes after 'bash', it comes before 'top', which is what we are interested in. For example: perf record -e cs_etm/@tmc_etr0/u -- bash -c top perf report -D | grep 'AUX\|MAP' 4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash 4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so 4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] 0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 [] ... 0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 [] 4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top 4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so 4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] A third scenario is when the majority of time is spent in a shared library that is not loaded at startup. For example a dynamically loaded plugin. Testing ======= Testing was done by checking if any samples that are present in the old output are missing from the new output. Timestamps must be stripped out with awk because now they are set to the last AUX sample, rather than the first: ./perf script $4 | awk '!($4="")' > new.script ./perf-default script $4 | awk '!($4="")' > default.script comm -13 <(sort -u new.script) <(sort -u default.script) Testing showed that the new output is a superset of the old. When lines appear in the comm output, it is not because they are missing but because [unknown] is now resolved to sensible locations. For example last putp branch here now resolves to libtinfo, so it's not missing from the output, but is actually improved: Old: top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown]) New: top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9) In the following two modes, decoding now works and the "data has no samples!" error is not displayed any more: perf record -e cs_etm/@tmc_etr0/u -- bash -c top perf record -e cs_etm/@tmc_etr0/u --delay=1 top In snapshot mode, there is also an improvement to decoding. Previously samples for the 'kill' process that was used to send SIGUSR2 were completely missing, because the process hadn't started yet. But now there are additional samples present: perf record -e cs_etm/@tmc_etr0/u --snapshot -a perf script stress 19380 [003] 161627.938153: 1000000 instructions:uH: aaaabb612fb4 [unknown] (/usr/bin/stress) kill 19644 [000] 161627.938153: 1000000 instructions:uH: ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so) stress 19380 [003] 161627.938153: 1000000 instructions:uH: ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so) Also tested was the round trip of 'perf inject' followed by 'perf report' which has the same differences and improvements. Signed-off-by: James Clark <[email protected]> Reviewed-by: Leo Yan <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Mathieu Poirier <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Don't wait for PERF_RECORD_EXIT eventLeo Yan1-5/+1
When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last perf event) for processing trace data, which is needless and even might cause logic error, e.g. it might fail to correlate perf events with Arm SPE events correctly. So this patch removes the condition checking for PERF_RECORD_EXIT event. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Bail out if the trace is later than perf eventLeo Yan1-3/+34
It's possible that record in Arm SPE trace is later than perf event and vice versa. This asks to correlate the perf events and Arm SPE synthesized events to be processed in the manner of correct timing. To achieve the time ordering, this patch reverses the flow, it firstly calls arm_spe_sample() and then calls arm_spe_decode(). By comparing the timestamp value and detect the perf event is coming earlier than Arm SPE trace data, it bails out from the decoding loop, the last record is pushed into auxtrace stack and is deferred to generate sample. To track the timestamp, everytime it updates timestamp for the latest record. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Assign kernel time to synthesized eventLeo Yan1-1/+1
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Convert event kernel time to counter valueLeo Yan1-1/+1
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Save clock parameters from TIME_CONV eventLeo Yan1-0/+26
During the recording phase, "perf record" tool synthesizes event PERF_RECORD_TIME_CONV for the hardware clock parameters and saves the event into the data file. Afterwards, when processing the data file, the event TIME_CONV will be processed at the very early time and is stored into session context. This patch extracts these parameters from the session context and saves into the structure "spe->tc" with the type perf_tsc_conversion, so that the parameters are ready for conversion between clock counter and time stamp. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf bpf_counter: Move common functions to bpf_counter.hNamhyung Kim2-52/+52
Some helper functions will be used for cgroup counting too. Move them to a header file for sharing. Committer notes: Fix the build on older systems with: - struct bpf_map_info map_info = {0}; + struct bpf_map_info map_info = { .id = 0, }; This wasn't breaking the build in such systems as bpf_counter.c isn't built due to: tools/perf/util/Build: perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o The bpf_counter.h file on the other hand is included from places that are built everywhere. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf tools: Add cgroup_is_v2() helperNamhyung Kim2-0/+21
The cgroup_is_v2() is to check if the given subsystem is mounted on cgroup v2 or not. It'll be used by BPF cgroup code later. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf tools: Add read_cgroup_id() functionNamhyung Kim2-0/+35
The read_cgroup_id() is to read a cgroup id from a file handle using name_to_handle_at(2) for the given cgroup. It'll be used by bperf cgroup stat later. Committer notes: -int read_cgroup_id(struct cgroup *cgrp) +static inline int read_cgroup_id(struct cgroup *cgrp __maybe_unused) To fix the build when HAVE_FILE_HANDLE is not defined. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>