aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2021-05-25perf scripting python: Add context switchAdrian Hunter1-0/+45
Add context_switch to general python scripting. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Add cpumodeAdrian Hunter1-0/+3
Add cpumode to python scripting. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Add IPCAdrian Hunter1-0/+8
Add IPC to python scripting. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Add sample flagsAdrian Hunter1-0/+26
Add sample flags to python scripting. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf script: Factor out perf_sample__sprintf_flags()Adrian Hunter1-0/+3
Factor out perf_sample__sprintf_flags() so it can be reused. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Add 'addr_location' for 'addr'Adrian Hunter6-20/+29
If sample addr correlates to a symbol, add "addr_dso", "addr_symbol", and "addr_symoff" to python scripting. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Factor out set_sym_in_dict()Adrian Hunter1-8/+17
Factor out set_sym_in_dict() so it can be reused. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf scripting python: Fix tuple_set_u64()Adrian Hunter1-65/+81
tuple_set_u64() produces a signed value instead of an unsigned value. That works for database export but not other cases. Rename to tuple_set_d64() for database export and fix tuple_set_u64(). Fixes: df919b400ad3f ("perf scripting python: Extend interface to export data in a database-friendly way") Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-05-25perf auxtrace: Make perf_event__process_auxtrace*() callableArnaldo Carvalho de Melo1-3/+20
As we'll use it in the upcoming python interfaces and when built with: make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 +NO_LIBZSTD=1 NO_LIBCAP=1 NO_SYSCALL_TABLE=1 make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 +NO_SYSCALL_TABLE=1 BUILD: Doing 'make -j24' parallel build <SNIP> CC /tmp/tmp.rGrdpQlTCr/builtin-daemon.o In file included from util/events_stats.h:8, from util/evlist.h:12, from builtin-script.c:18: builtin-script.c: In function ‘process_auxtrace_error’: util/auxtrace.h:708:57: error: called object is not a function or function pointer 708 | #define perf_event__process_auxtrace_error 0 | ^ builtin-script.c:2443:16: note: in expansion of macro ‘perf_event__process_auxtrace_error’ 2443 | return perf_event__process_auxtrace_error(session, event); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MKDIR /tmp/tmp.rGrdpQlTCr/tests/ MKDIR /tmp/tmp.rGrdpQlTCr/bench/ CC /tmp/tmp.rGrdpQlTCr/tests/builtin-test.o CC /tmp/tmp.rGrdpQlTCr/bench/sched-messaging.o builtin-script.c:2444:1: error: control reaches end of non-void function [-Werror=return-type] 2444 | } | ^ To: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25perf script: Find script file relative to exec pathAdrian Hunter4-0/+5
Allow perf script to find a script in the exec path. Example: Before: $ perf record -a -e intel_pt/branch=0/ sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.954 MB perf.data ] $ perf script intel-pt-events.py 2>&1 | head -3 Error: Couldn't find script `intel-pt-events.py' See perf script -l for available scripts. $ perf script -s intel-pt-events.py 2>&1 | head -3 Can't open python script "intel-pt-events.py": No such file or directory $ perf script ~/libexec/perf-core/scripts/python/intel-pt-events.py 2>&1 | head -3 Error: Couldn't find script `/home/ahunter/libexec/perf-core/scripts/python/intel-pt-events.py' See perf script -l for available scripts. $ After: $ perf script intel-pt-events.py 2>&1 | head -3 Intel PT Power Events and PTWRITE perf 8123/8123 [000] 551.230753986 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) perf 8123/8123 [001] 551.230808216 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) $ perf script -s intel-pt-events.py 2>&1 | head -3 Intel PT Power Events and PTWRITE perf 8123/8123 [000] 551.230753986 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) perf 8123/8123 [001] 551.230808216 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) $ perf script ~/libexec/perf-core/scripts/python/intel-pt-events.py 2>&1 | head -3 Intel PT Power Events and PTWRITE perf 8123/8123 [000] 551.230753986 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) perf 8123/8123 [001] 551.230808216 cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) $ Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-25Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo7-10/+32
To pick up fixes from perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-21perf stat: Skip evlist__[enable|disable] when all events uses BPFSong Liu1-3/+0
When all events of a perf-stat session use BPF, it is not necessary to call evlist__enable() and evlist__disable(). Skip them when all_counters_use_bpf is true. Signed-off-by: Song Liu <[email protected]> Reported-by: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-21perf script: Add missing PERF_IP_FLAG_CHARS for VM-Entry and VM-ExitAdrian Hunter1-1/+1
Add 'g' (guest) for VM-Entry and 'h' (host) for VM-Exit. Fixes: c025d46cd932c ("perf script: Add branch types for VM-Entry and VM-Exit") Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-21perf parse-events: Check if the software events array slots are populatedArnaldo Carvalho de Melo1-2/+7
To avoid a NULL pointer dereference when the kernel supports the new feature but the tooling still hasn't an entry for it. This happened with the recently added PERF_COUNT_SW_CGROUP_SWITCHES software event. Reported-by: Thomas Richter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Sumanth Korikkar <[email protected]> Link: https://lore.kernel.org/linux-perf-users/[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-19perf tools: Add 'cgroup-switches' software eventNamhyung Kim2-0/+5
It counts how often cgroups are changed actually during the context switches. # perf stat -a -e context-switches,cgroup-switches -a sleep 1 Performance counter stats for 'system wide': 11,267 context-switches 10,950 cgroup-switches 1.015634369 seconds time elapsed Committer notes: The kernel patches landed in v5.13, but this entry wasn't filled in perf's parse-events tables, which was leading to a segfault when running 'perf list' on a kernel with that feature, as reported by Thomas Richter. Also removed the part touching tools/include/uapi/linux/perf_event.h as it was updated in the usual sync with the kernel UAPI headers, in a previous, already upstream, patch. Signed-off-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Richter <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-19perf intel-pt: Remove redundant setting of ptq->insn_lenAdrian Hunter1-1/+0
Remove redundant "ptq->insn_len = 0" statement. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-19perf intel-pt: Fix sample instruction bytesAdrian Hunter1-1/+4
The decoder reports the current instruction if it was decoded. In some cases the current instruction is not decoded, in which case the instruction bytes length must be set to zero. Ensure that is always done. Note perf script can anyway get the instruction bytes for any samples where they are not present. Also note, that there is a redundant "ptq->insn_len = 0" statement which is not removed until a subsequent patch in order to make this patch apply cleanly to stable branches. Example: A machne that supports TSX is required. It will have flag "rtm". Kernel parameter tsx=on may be required. # for w in `cat /proc/cpuinfo | grep -m1 flags `;do echo $w | grep rtm ; done rtm Test program: #include <stdio.h> #include <immintrin.h> int main() { int x = 0; if (_xbegin() == _XBEGIN_STARTED) { x = 1; _xabort(1); } else { printf("x = %d\n", x); } return 0; } Compile with -mrtm i.e. gcc -Wall -Wextra -mrtm xabort.c -o xabort Record: perf record -e intel_pt/cyc/u --filter 'filter main @ ./xabort' ./xabort Before: # perf script --itrace=xe -F+flags,+insn,-period --xed --ns xabort 1478 [007] 92161.431348581: transactions: x 400b81 main+0x14 (/root/xabort) mov $0xffffffff, %eax xabort 1478 [007] 92161.431348624: transactions: tx abrt 400b93 main+0x26 (/root/xabort) mov $0xffffffff, %eax After: # perf script --itrace=xe -F+flags,+insn,-period --xed --ns xabort 1478 [007] 92161.431348581: transactions: x 400b81 main+0x14 (/root/xabort) xbegin 0x6 xabort 1478 [007] 92161.431348624: transactions: tx abrt 400b93 main+0x26 (/root/xabort) xabort $0x1 Fixes: faaa87680b25d ("perf intel-pt/bts: Report instruction bytes and length in sample") Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-19perf intel-pt: Fix transaction abort handlingAdrian Hunter1-1/+5
When adding support for power events, some handling of FUP packets was unified. That resulted in breaking reporting of TSX aborts, by not considering the associated TIP packet. Fix that. Example: A machine that supports TSX is required. It will have flag "rtm". Kernel parameter tsx=on may be required. # for w in `cat /proc/cpuinfo | grep -m1 flags `;do echo $w | grep rtm ; done rtm Test program: #include <stdio.h> #include <immintrin.h> int main() { int x = 0; if (_xbegin() == _XBEGIN_STARTED) { x = 1; _xabort(1); } else { printf("x = %d\n", x); } return 0; } Compile with -mrtm i.e. gcc -Wall -Wextra -mrtm xabort.c -o xabort Record: perf record -e intel_pt/cyc/u --filter 'filter main @ ./xabort' ./xabort Before: # perf script --itrace=be -F+flags,+addr,-period,-event --ns xabort 1478 [007] 92161.431348552: tr strt 0 [unknown] ([unknown]) => 400b6d main+0x0 (/root/xabort) xabort 1478 [007] 92161.431348624: jmp 400b96 main+0x29 (/root/xabort) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431348624: return 400bb4 main+0x47 (/root/xabort) => 400b87 main+0x1a (/root/xabort) xabort 1478 [007] 92161.431348637: jcc 400b8a main+0x1d (/root/xabort) => 400b98 main+0x2b (/root/xabort) xabort 1478 [007] 92161.431348644: tr end call 400ba9 main+0x3c (/root/xabort) => 40f690 printf+0x0 (/root/xabort) xabort 1478 [007] 92161.431360859: tr strt 0 [unknown] ([unknown]) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431360882: tr end return 400bb4 main+0x47 (/root/xabort) => 401139 __libc_start_main+0x309 (/root/xabort) After: # perf script --itrace=be -F+flags,+addr,-period,-event --ns xabort 1478 [007] 92161.431348552: tr strt 0 [unknown] ([unknown]) => 400b6d main+0x0 (/root/xabort) xabort 1478 [007] 92161.431348624: tx abrt 400b93 main+0x26 (/root/xabort) => 400b87 main+0x1a (/root/xabort) xabort 1478 [007] 92161.431348637: jcc 400b8a main+0x1d (/root/xabort) => 400b98 main+0x2b (/root/xabort) xabort 1478 [007] 92161.431348644: tr end call 400ba9 main+0x3c (/root/xabort) => 40f690 printf+0x0 (/root/xabort) xabort 1478 [007] 92161.431360859: tr strt 0 [unknown] ([unknown]) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431360882: tr end return 400bb4 main+0x47 (/root/xabort) => 401139 __libc_start_main+0x309 (/root/xabort) Fixes: a472e65fc490a ("perf intel-pt: Add decoder support for ptwrite and power event packets") Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-19perf test: Fix libpfm4 support (63) test error for nested event groupsThomas Richter1-1/+10
Compiling perf with make LIBPFM4=1 includes libpfm support and enables test case 63 'Test libpfm4 support'. This test reports an error on all platforms for subtest 63.2 'test groups of --pfm-events'. The reported error message is 'nested event groups not supported' # ./perf test -F 63 63: Test libpfm4 support : 63.1: test of individual --pfm-events : Error: failed to parse event stereolab : event not found Error: failed to parse event stereolab,instructions : event not found Error: failed to parse event instructions,stereolab : event not found Ok 63.2: test groups of --pfm-events : Error: nested event groups not supported <------ Error message here Error: failed to parse event {stereolab} : event not found Error: failed to parse event {instructions,cycles},{instructions,stereolab} :\ event not found Ok # This patch addresses the error message 'nested event groups not supported'. The root cause is function parse_libpfm_events_option() which parses the event string '{},{instructions}' and can not handle a leading empty group notation '{},...'. The code detects the first (empty) group indicator '{' but does not terminate group processing on the following group closing character '}'. So when the second group indicator '{' is detected, the code assumes a nested group and returns an error. With the error message fixed, also change the expected event number to one for the test case to succeed. While at it also fix a memory leak. In good case the function does not free the duplicated string given as first parameter. Output after: # ./perf test -F 63 63: Test libpfm4 support : 63.1: test of individual --pfm-events : Error: failed to parse event stereolab : event not found Error: failed to parse event stereolab,instructions : event not found Error: failed to parse event instructions,stereolab : event not found Ok 63.2: test groups of --pfm-events : Error: failed to parse event {stereolab} : event not found Error: failed to parse event {instructions,cycles},{instructions,stereolab} : \ event not found Ok # Error message 'nested event groups not supported' is gone. Signed-off-by: Thomas Richter <[email protected]> Acked-By: Ian Rogers <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-17perf cs-etm: Prevent and warn on underflows during timestamp calculation.James Clark1-11/+34
When a zero timestamp is encountered, warn once. This is to make hardware or configuration issues visible. Also suggest that the issue can be worked around with the --itrace=Z option. When an underflow with a non-zero timestamp occurs, warn every time. This is an unexpected scenario, and with increasing timestamps, it's unlikely that it would occur more than once, therefore it should be ok to warn every time. Only try to calculate the timestamp by subtracting the instruction count if neither of the above cases are true. This makes attempting to decode files with zero timestamps in non-timeless mode more consistent. Currently it can half work if the timestamp wraps around and becomes non-zero, although the behavior is undefined and unpredictable. Signed-off-by: James Clark <[email protected]> Reviewed-by: Leo Yan <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-17perf cs-etm: Start reading 'Z' --itrace optionJames Clark1-0/+4
Recently the 'Z' --itrace option was added to override detection of timeless decoding. This is also useful in Coresight to work around issues with invalid timestamps on some hardware. When the 'Z' option is provided, the existing timeless decoding mode will be used, even if timestamps were recorded. Signed-off-by: James Clark <[email protected]> Reviewed-by: Leo Yan <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-17perf cs-etm: Move synth_opts initialisationJames Clark1-8/+8
Move initialisation of synth_opts earlier in the function so that synth_opts can be used at an earlier stage in a later commit. Signed-off-by: James Clark <[email protected]> Reviewed-by: Leo Yan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-17perf header: Support HYBRID_CPU_PMU_CAPS featureJin Yao4-19/+159
Perf has supported the CPU_PMU_CAPS feature to display a list of CPU PMU capabilities. But on a hybrid platform, it may have several CPU PMUs (such as "cpu_core" and "cpu_atom"). The CPU_PMU_CAPS feature is hard to extend to support multiple CPU PMUs well if it needs to be compatible for the case of old perf data file + new perf tool. So for better compatibility we now create a new feature HYBRID_CPU_PMU_CAPS in the header. For the perf.data generated on hybrid platform, root@otcpl-adl-s-2:~# perf report --header-only -I # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA For the perf.data generated on non-hybrid platform root@kbl-ppc:~# perf report --header-only -I # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY HYBRID_CPU_PMU_CAPS Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-17perf header: Support HYBRID_TOPOLOGY featureJin Yao7-0/+205
It is useful to let the user know about the hybrid topology. Add the HYBRID_TOPOLOGY feature in header to indicate the core CPUs and the atom CPUs. With this patch a perf.data generated on a hybrid platform reports the hybrid CPU list: root@otcpl-adl-s-2:~# perf report --header-only -I ... # hybrid cpu system: # cpu_core cpu list : 0-15 # cpu_atom cpu list : 16-23 For a perf.data generated on a non-hybrid platform, reports a message that HYBRID_TOPOLOGY is missing: root@kbl-ppc:~# perf report --header-only -I ... # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf cs-etm: Set time on synthesised samples to preserve orderingJames Clark1-2/+13
The following attribute is set when synthesising samples in timed decoding mode: attr.sample_type |= PERF_SAMPLE_TIME; This results in new samples that appear to have timestamps but because we don't assign any timestamps to the samples, when the resulting inject file is opened again, the synthesised samples will be on the wrong side of the MMAP or COMM events. For example, this results in the samples being associated with the perf binary, rather than the target of the record: perf record -e cs_etm/@tmc_etr0/u top perf inject -i perf.data -o perf.inject --itrace=i100il perf report -i perf.inject Where 'Command' == perf should show as 'top': # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ...................... ...................... .................. # 31.08% perf [unknown] [.] 0x000000000040c3f8 [.] 0x000000000040c3e8 - If the perf.data file is opened directly with perf, without the inject step, then this already works correctly because the events are synthesised after the COMM and MMAP events and no second sorting happens. Re-sorting only happens when opening the perf.inject file for the second time so timestamps are needed. Using the timestamp from the AUX record mirrors the current behaviour when opening directly with perf, because the events are generated on the call to cs_etm__process_queues(). The ETM trace could optionally contain time stamps, but there is no way to correlate this with the kernel time. So, the best available time value is that of the AUX_RECORD header. This patch uses the timestamp from the header for all the samples. The ordering of the samples are implicit in the trace and thus is fine with respect to relative ordering. Reviewed-by: Leo Yan <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Co-developed-by: Al Grant <[email protected]> Signed-off-by: Al Grant <[email protected]> Signed-off-by: James Clark <[email protected]> Acked-by: Suzuki K Poulos <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf cs-etm: Refactor timestamp variable namesJames Clark3-33/+31
Remove ambiguity in variable names relating to timestamps. A later commit will save the sample kernel timestamp in one of the etm structs, so name all elements appropriately to avoid confusion. This is also removes some ambiguity arising from the fact that the --timestamp argument to perf record refers to sample kernel timestamps, and the /timestamp/ event modifier refers to CS timestamps, so the term is overloaded. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf stat: Use aggregated counts directlyNamhyung Kim2-16/+4
The ps->res_stats is for repeated runs, so the interval code should not touch it. Actually the aggregated counts are available in the counter->counts->aggr, so we can (and should) use it directly IMHO. No functional change intended. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jin Yao <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Parse VM Time Correlation options and set up decodingAdrian Hunter1-1/+97
Add parsing and validation of VM Time Correlation options, and pass parameters to the decoder. Also update the Intel PT documentation accordingly. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Add VM Time Correlation to decoderAdrian Hunter3-0/+696
VM Time Correlation means determining if each TSC packet belongs to a VM Guest or the Host. When the trace is "in context" that is indicated by the NR flag in the PIP packet. However, when tracing kernel-only, userspace only, or using address filters, the trace can be "out of context" in which case timing packets are produced but not PIP packets. Nevertheless, it is very unlikely the VM Guest timestamps will be in the same range as the Host timestamps. Host time ranges are established by a starting side-band event timestamp, and subsequently by the buffer timestamp, written when the buffer is copied to the perf.data file. This patch supports updating the VM Guest timestamp packets, assuming an unchanging (during perf record) VMX TSC Offset and no VMX TSC scaling. Furthermore, it is possible to determine what the VMX TSC Offset is, although not necessarily at the start. The dry-run option lets that information be determined so that the user can pass it to a subsequent run. For more detail, refer to the example in the Intel PT documentation in a subsequent patch. VM Time Correlation is also performed on the TSC value in PEBs-via-PT records. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Better 7-byte timestamp wraparound logicAdrian Hunter1-3/+9
A timestamp should not go backwards. If it does it is assumed that the 7-byte TSC packet value has wrapped. Improve that logic so that it will not allow the timestamp to go past the buffer timestamp (which is recorded when the buffer is copied out) Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Pass the first timestamp to the decoderAdrian Hunter3-0/+31
VM Time Correlation will use time ranges to determine whether a TSC packet belongs to the Host or Guest. To start, the first non-zero timestamp is needed. Pass that to the decoder. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Add a tree for VMCS informationAdrian Hunter3-0/+79
Even when VMX TSC Offset is not changing (during perf record), different virtual machines can have different TSC Offsets. There is a Virtual Machine Control Structure (VMCS) for each virtual CPU, the address of which is reported to Intel PT in the VMCS packet. We do not know which VMCS belongs to which virtual machine, so use a tree to keep track of VMCS information. Then the decoder will be able to use the current VMCS value to look up the current TSC Offset. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Let overlap detection handle VM timestampsAdrian Hunter3-6/+19
Intel PT timestamps are affected by virtualization. While TSC packets can still be considered to be unique, the TSC values need not be in order any more. Adjust the algorithm accordingly. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf auxtrace: Allow buffers to be mapped read / writeAdrian Hunter2-3/+8
To support in-place update, allow buffers to be mapped read / write. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf inject: Add --vm-time-correlation optionAdrian Hunter1-0/+6
Intel PT timestamps are affected by virtualization. Add a new option that will allow the Intel PT decoder to correlate the timestamps and translate the virtual machine timestamps to host timestamps. The advantages of making this a separate step, rather than a part of normal decoding are that it is simpler to implement, and it needs to be done only once. This patch adds only the option. Later patches add Intel PT support. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf inject: Add facility to do in place updateAdrian Hunter4-2/+13
When there is a need to modify only timestamps, it is much simpler and quicker to do it to the existing file rather than re-write all the contents. In preparation for that, add the ability to modify the input file in place. In practice that just means making the file descriptor and mmaps writable. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Support Z itrace option for timeless decodingAdrian Hunter1-1/+1
Correlating virtual machine TSC packets is not supported at present, so instead support the Z itrace option. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf intel-pt: Move synth_opts initialization earlierAdrian Hunter1-15/+14
Move synth_opts initialization earlier, so it can be used earlier. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-12perf auxtrace: Add Z itrace option for timeless decodingAdrian Hunter2-0/+5
Issues correlating timestamps can be avoided with timeless decoding. Add an option for that, so that timeless decoding can be used even when timestamps are present. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-10perf tools: Fix dynamic libbpf linkJiri Olsa1-0/+7
Justin reported broken build with LIBBPF_DYNAMIC=1. When linking libbpf dynamically we need to use perf's hashmap object, because it's not exported in libbpf.so (only in libbpf.a). Following build is now passing: $ make LIBBPF_DYNAMIC=1 BUILD: Doing 'make -j8' parallel build ... $ ldd perf | grep libbpf libbpf.so.0 => /lib64/libbpf.so.0 (0x00007fa7630db000) Fixes: eee19501926d ("perf tools: Grab a copy of libbpf's hashmap") Reported-by: Justin M. Forbes <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-10perf session: Fix swapping of cpu_map and stat_config recordsDmitry Koshelev1-2/+2
'data' field in perf_record_cpu_map_data struct is 16-bit wide and so should be swapped using bswap_16(). 'nr' field in perf_record_stat_config struct should be swapped before being used for size calculation. Signed-off-by: Dmitry Koshelev <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-10perf record: Disallow -c and -F option at the same timeNamhyung Kim1-1/+7
It's confusing which one is effective when the both options are given. The current code happens to use -c in this case but users might not be aware of it. We can change it to complain about that instead of relying on the implicit priority. Before: $ perf record -c 111111 -F 99 true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ] $ perf evlist -F cycles: sample_period=111111 $ After: $ perf record -c 111111 -F 99 true cannot set frequency and period at the same time $ So this change can break existing usages, but I think it's rare to have both options and it'd be better changing them. Suggested-by: Alexey Alexandrov <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-05-01Merge tag 'perf-tools-for-v5.13-2021-04-29' of ↵Linus Torvalds90-300/+2588
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "perf stat: - Add support for hybrid PMUs to support systems such as Intel Alderlake and its BIG/little core/atom cpus. - Introduce 'bperf' to share hardware PMCs with BPF. - New --iostat option to collect and present IO stats on Intel hardware. This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP) in commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") It is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port - Align CSV output for summary. - Clarify --null use cases: Assess raw overhead of 'perf stat' or measure just wall clock time. - Improve readability of shadow stats. perf record: - Change the COMM when starting tha workload so that --exclude-perf doesn't seem to be not honoured. - Improve 'Workload failed' message printing events + what was exec'ed. - Fix cross-arch support for TIME_CONV. perf report: - Add option to disable raw event ordering. - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'. - Improvements to --stat output, that shows information about PERF_RECORD_ events. - Preserve identifier id in OCaml demangler. perf annotate: - Show full source location with 'l' hotkey in the 'perf annotate' TUI. - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode. - Add --demangle and --demangle-kernel to 'perf annotate'. - Allow configuring annotate.demangle{,_kernel} in 'perf config'. - Fix sample events lost in stdio mode. perf data: - Allow converting a perf.data file to JSON. libperf: - Add support for user space counter access. - Update topdown documentation to permit rdpmc calls. perf test: - Add 'perf test' for 'perf stat' CSV output. - Add 'perf test' entries to test the hybrid PMU support. - Cleanup 'perf test daemon' if its 'perf test' is interrupted. - Handle metric reuse in pmu-events parsing 'perf test' entry. - Add test for PE executable support. - Add timeout for wait for daemon start in its 'perf test' entries. Build: - Enable libtraceevent dynamic linking. - Improve feature detection output. - Fix caching of feature checks caching. - First round of updates for tools copies of kernel headers. - Enable warnings when compiling BPF programs. Vendor specific events: - Intel: - Add missing skylake & icelake model numbers. - arm64: - Add Hisi hip08 L1, L2 and L3 metrics. - Add Fujitsu A64FX PMU events. - PowerPC: - Initial JSON/events list for power10 platform. - Remove unsupported power9 metrics. - AMD: - Add Zen3 events. - Fix broken L2 Cache Hits from L2 HWPF metric. - Use lowercases for all the eventcodes and umasks. Hardware tracing: - arm64: - Update CoreSight ETM metadata format. - Fix bitmap for CS-ETM option. - Support PID tracing in config. - Detect pid in VMID for kernel running at EL2. Arch specific updates: - MIPS: - Support MIPS unwinding and dwarf-regs. - Generate mips syscalls_n64.c syscall table. - PowerPC: - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC. - Support pipeline stage cycles for powerpc. libbeauty: - Fix fsconfig generator" * tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits) perf build: Defer printing detected features to the end of all feature checks tools build: Allow deferring printing the results of feature detection perf build: Regenerate the FEATURE_DUMP file after extra feature checks perf session: Dump PERF_RECORD_TIME_CONV event perf session: Add swap operation for event TIME_CONV perf jit: Let convert_timestamp() to be backwards-compatible perf tools: Change fields type in perf_record_time_conv perf tools: Enable libtraceevent dynamic linking perf Documentation: Document intel-hybrid support perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid perf tests: Support 'Convert perf time to TSC' test for hybrid perf tests: Support 'Session topology' test for hybrid perf tests: Support 'Parse and process metrics' test for hybrid perf tests: Support 'Track with sched_switch' test for hybrid perf tests: Skip 'Setup struct perf_event_attr' test for hybrid perf tests: Add hybrid cases for 'Roundtrip evsel->name' test perf tests: Add hybrid cases for 'Parse event definition strings' test perf record: Uniquify hybrid event name perf stat: Warn group events from different hybrid PMU perf stat: Filter out unmatched aggregation for hybrid event ...
2021-04-29perf session: Dump PERF_RECORD_TIME_CONV eventLeo Yan3-1/+46
Now perf tool uses the common stub function process_event_op2_stub() for dumping TIME_CONV event, thus it doesn't output the clock parameters contained in the event. This patch adds the callback function for dumping the hardware clock parameters in TIME_CONV event. Before: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV : unhandled! [...] After: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV ... Time Shift 21 ... Time Muliplier 20971520 ... Time Zero 18446743935180835206 ... Time Cycles 13852918225 ... Time Mask 0xffffffffffffff ... Cap Time Zero 1 ... Cap Time Short 1 : unhandled! [...] Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf session: Add swap operation for event TIME_CONVLeo Yan1-1/+14
Since commit d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV"), the event PERF_RECORD_TIME_CONV has extended the data structure for clock parameters. To be backwards-compatible, this patch adds a dedicated swap operation for the event PERF_RECORD_TIME_CONV, based on checking if the event contains field "time_cycles", it can support both for the old and new event formats. Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf jit: Let convert_timestamp() to be backwards-compatibleLeo Yan1-10/+20
Commit d110162cafc80dad ("perf tsc: Support cap_user_time_short for event TIME_CONV") supports the extended parameters for event TIME_CONV, but it broke the backwards compatibility, so any perf data file with old event format fails to convert timestamp. This patch introduces a helper event_contains() to check if an event contains a specific member or not. For the backwards-compatibility, if the event size confirms the extended parameters are supported in the event TIME_CONV, then copies these parameters. Committer notes: To make this compiler backwards compatible add this patch: - struct perf_tsc_conversion tc = { 0 }; + struct perf_tsc_conversion tc = { .time_shift = 0, }; Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf stat: Warn group events from different hybrid PMUJin Yao5-0/+58
If a group has events which are from different hybrid PMUs, shows a warning: "WARNING: events in group from different hybrid PMUs!" This is to remind the user not to put the core event and atom event into one group. Next, just disable grouping. # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1 WARNING: events in group from different hybrid PMUs! WARNING: grouped events cpus do not match, disabling group: anon group { cpu_core/cycles/, cpu_atom/cycles/ } Performance counter stats for 'system wide': 5,438,125 cpu_core/cycles/ 3,914,586 cpu_atom/cycles/ 1.004250966 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf stat: Filter out unmatched aggregation for hybrid eventJin Yao1-0/+3
perf-stat has supported some aggregation modes, such as --per-core, --per-socket and etc. While for hybrid event, it may only available on part of cpus. So for --per-core, we need to filter out the unavailable cores, for --per-socket, filter out the unavailable sockets, and so on. Before: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 479,530 cpu_core/cycles/ S0-D0-C4 2 175,007 cpu_core/cycles/ S0-D0-C8 2 166,240 cpu_core/cycles/ S0-D0-C12 2 704,673 cpu_core/cycles/ S0-D0-C16 2 865,835 cpu_core/cycles/ S0-D0-C20 2 2,958,461 cpu_core/cycles/ S0-D0-C24 2 163,988 cpu_core/cycles/ S0-D0-C28 2 164,729 cpu_core/cycles/ S0-D0-C32 0 <not counted> cpu_core/cycles/ S0-D0-C33 0 <not counted> cpu_core/cycles/ S0-D0-C34 0 <not counted> cpu_core/cycles/ S0-D0-C35 0 <not counted> cpu_core/cycles/ S0-D0-C36 0 <not counted> cpu_core/cycles/ S0-D0-C37 0 <not counted> cpu_core/cycles/ S0-D0-C38 0 <not counted> cpu_core/cycles/ S0-D0-C39 0 <not counted> cpu_core/cycles/ 1.003597211 seconds time elapsed After: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 210,428 cpu_core/cycles/ S0-D0-C4 2 444,830 cpu_core/cycles/ S0-D0-C8 2 435,241 cpu_core/cycles/ S0-D0-C12 2 423,976 cpu_core/cycles/ S0-D0-C16 2 859,350 cpu_core/cycles/ S0-D0-C20 2 1,559,589 cpu_core/cycles/ S0-D0-C24 2 163,924 cpu_core/cycles/ S0-D0-C28 2 376,610 cpu_core/cycles/ 1.003621290 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Co-developed-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf record: Create two hybrid 'cycles' events by defaultJin Yao6-5/+62
When evlist is empty, for example no '-e' specified in perf record, one default 'cycles' event is added to evlist. While on hybrid platform, it needs to create two default 'cycles' events. One is for cpu_core, the other is for cpu_atom. This patch actually calls evsel__new_cycles() two times to create two 'cycles' events. # ./perf record -vv -a -- sleep 1 ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 21 ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 27 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 28 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 29 ------------------------------------------------------------ We have to create evlist-hybrid.c otherwise due to the symbol dependency the perf test python would be failed. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf parse-events: Support event inside hybrid pmuJin Yao1-0/+63
On hybrid platform, user may want to enable events on one pmu. Following syntax are supported: cpu_core/<event>/ cpu_atom/<event>/ But the syntax doesn't work for cache event. Before: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 event syntax error: 'cpu_core/LLC-loads/' \___ unknown term 'LLC-loads' for pmu 'cpu_core' Cache events are a bit complex. We can't create aliases for them. We use another solution. For example, if we use "cpu_core/LLC-loads/", in parse_events_add_pmu(), term->config is "LLC-loads". Then we create a new parser to scan "LLC-loads". The parse_events_add_cache() would be called during parsing. The parse_state->hybrid_pmu_name is used to identify the pmu where the event should be enabled on. After: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 Performance counter stats for 'system wide': 24,593 cpu_core/LLC-loads/ 1.003911601 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>