Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid
systems to collect metadata records") adds a dummy event on hybrid
systems to fix the symbol "unknown" issue when the workload is created
in a P-core but runs on an E-core. The added dummy event will cause
"perf script -F iregs" to fail. Dummy events do not have "iregs"
attribute set, so when we do evsel__check_attr, the "iregs" attribute
check will fail, so the issue happened.
The following commit [1] has fixed a similar issue by skipping the attr
check for the dummy event because it does not have any samples anyway. It
works okay for the normal mode, but the issue still happened when running
the test in the pipe mode. In the pipe mode, it calls process_attr() which
still checks the attr for the dummy event. This commit fixed the issue by
skipping the attr check for the dummy event in the API evsel__check_attr,
Otherwise, we have to patch everywhere when evsel__check_attr() is called.
Before:
#./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field.
0x120 [0x90]: failed to process type: 64
#
After:
# ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
ABI:2 CX:0x55b8efa87000 DX:0x55b8efa7e000 DI:0xffffba5e625efbb0 R8:0xffff90e51f8ae100
ABI:2 CX:0x7f1dae1e4000 DX:0xd0 DI:0xffff90e18c675ac0 R8:0x71
ABI:2 CX:0xcc0 DX:0x1 DI:0xffff90e199880240 R8:0x0
ABI:2 CX:0xffff90e180dd7500 DX:0xffff90e180dd7500 DI:0xffff90e180043500 R8:0x1
ABI:2 CX:0x50 DX:0xffff90e18c583bd0 DI:0xffff90e1998803c0 R8:0x58
#
[1]https://lore.kernel.org/lkml/[email protected]/
Fixes: b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid systems to collect metadata records")
Suggested-by: Namhyung Kim <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Hongtao Yu reported problem when displaying uregs in perf script
for system wide perf.data:
# perf script -F uregs | head -10
Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field.
The problem is the extra dummy event added for system wide,
which does not have proper sample_type setup.
Skipping attr check completely for dummy event as suggested
by Namhyung, because it does not have any samples anyway.
Reported-by: Hongtao Yu <[email protected]>
Suggested-by: Namhyung Kim <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Delete the repeated word "from" in code.
Signed-off-by: shaomin Deng <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add fields machine_pid and vcpu. These are displayed only if machine_pid is
non-zero.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When reviewing the results of perf inject, it is useful to be able to see
the events in the order they appear in the file.
So add --dump-unsorted-raw-trace option to do an unsorted dump.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When the -D option is used, the details of thread-map, cpu-map and
event-update events are not currently dumped. Add prints so that they are.
Example:
# perf record --kcore sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf script -D | grep 'THREAD\|CPU'
0 0x4950 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 35116
0 0x4978 [0x20]: PERF_RECORD_CPU_MAP: 0-7
# perf script -D | grep -A4 'UPDATE'
0 0x4920 [0x30]: PERF_RECORD_EVENT_UPDATE
... id: 147
... 0-7
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add an option to indicate that guest code can be found in the hypervisor
process.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.
To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.
Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:
$ echo -n "Hello" | od -t x8
0000000 0000006f6c6c6548
0000005
$ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=ew
eg_ptw 35563 [005] 18256.087338: ptwrite: IP: 0 payload: 0x6f6c6c6548 Hello 55e764db5196 perf_emulate_ptwrite+0x16 (/home/user/eg_ptw)
$
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:
# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.
This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.
Fixes: e55ed3423c1bb29f ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We support short command 'rec*' for 'record' and 'rep*' for 'report' in
lots of sub-commands, but the matching is not quite strict currnetly.
It may be puzzling sometime, like we mis-type a 'recport' to report but
it will perform 'record' in fact without any message.
To fix this, add a check to ensure that the short cmd is valid prefix
of the real command.
Committer testing:
[root@quaco ~]# perf c2c re sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
# perf c2c rec sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ]
# perf c2c recport sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
# perf c2c record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ]
# perf c2c records sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
#
Signed-off-by: Wei Li <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Rui Xiang <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When analyzing with 'perf script', it's useful to understand the
captured instruction and the next sequential instruction.
To calculate the address of the next sequential instruction, the length
of the captured instruction is required.
For example, you can’t know the next sequential instruction after an
unconditional branch unless you calculate that based on its length.
For branch stacks, 'perf script' only prints the instruction bytes with
'brstackinsn', but lacks the instruction length.
Add 'brstackinsnlen' to print the instruction length.
$ perf script -F ip,brstackinsn,brstackinsnlen --xed
7fa555be8f75
_start:
00007fa555be8090 mov %rsp, %rdi ilen: 3
00007fa555be8093 callq 0x7fa555be8ea0 ilen: 5 # PRED 102 cycles [102] 0.02 IPC
_dl_start+38:
00007fa555be8ec6 movq %rdx,0x227853(%rip) ilen: 7
00007fa555be8ecd leaq 0x227f94(%rip),%rdx ilen: 7
Signed-off-by: Kan Liang <[email protected]>
Cc: Ahmad Yasin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Added the new field to tools/perf/Documentation/perf-script.txt ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes that went thru perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The type info is saved when using '-j save_type'. Output this in 'perf
script' so it can be accessed by other tools or for debugging.
It's appended to the end of the list of fields so any existing tools
that split on / and access fields via an index are not affected. Also
output '-' instead of 'N/A' when the branch type isn't saved because /
is used as a field separator.
Entries before this change look like this:
0xaaaadb350838/0xaaaadb3507a4/P/-/-/0
And afterwards like this:
0xaaaadb350838/0xaaaadb3507a4/P/-/-/0/CALL
or this if no type info is saved:
0x7fb57586df6b/0x7fb5758731f0/P/-/-/143/-
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Remove duplicate code so that future changes to flags are always made to
all 3 printing variations.
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In SPE traces the 'weight' field can't be printed in 'perf script'
because the 'dummy:u' event doesn't have the WEIGHT attribute set.
Use evsel__do_check_stype(..) to check this field, as it's done with
other fields such as "phys_addr".
Before:
$ perf record -e arm_spe_0// -- sleep 1
$ perf script -F event,ip,weight
Samples for 'dummy:u' event do not have WEIGHT attribute set. Cannot print 'weight' field.
After:
$ perf script -F event,ip,weight
l1d-access: 12 ffffaf629d4cb320
tlb-access: 12 ffffaf629d4cb320
memory: 12 ffffaf629d4cb320
Fixes: b0fde9c6e291e528 ("perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHT")
Signed-off-by: German Gomez <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Amend the display to include D and t flags in the same way as the x flag.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Similar to other Intel PT synth events, display changes to the interrupt
flag represented by the MODE.Exec packet.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
synthesized event
Similar to other Intel PT synth events, display Event Trace events recorded
by CFE / EVD packets.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf script was failed to print the phys_addr for SPE profiling.
One 'dummy' event is added by SPE profiling but it doesn't have PHYS_ADDR
attribute set, perf script then exits with error.
Now referring to 'addr', use evsel__do_check_stype() to check the type.
Before:
# perf record -e arm_spe_0/branch_filter=0,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,\
store_filter=0,min_latency=0,event_filter=2/ -p 4064384 -- sleep 3
# perf script -F pid,tid,addr,phys_addr
Samples for 'dummy:u' event do not have PHYS_ADDR attribute set. Cannot print 'phys_addr' field.
After:
# perf record -e arm_spe_0/branch_filter=0,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,\
store_filter=0,min_latency=0,event_filter=2/ -p 4064384 -- sleep 3
# perf script -F pid,tid,addr,phys_addr
4064384/4064384 ffff802f921be0d0 2f921be0d0
4064384/4064384 ffff802f921be0d0 2f921be0d0
Reviewed-by: German Gomez <[email protected]>
Signed-off-by: Yao Jin <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Hanjun Guo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Wei Li <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A common problem is confusing CPU map indices with the CPU, by wrapping
the CPU with a struct then this is avoided. This approach is similar to
atomic_t.
Committer notes:
To make it build with BUILD_BPF_SKEL=1 these files needed the
conversions to 'struct perf_cpu' usage:
tools/perf/util/bpf_counter.c
tools/perf/util/bpf_counter_cgroup.c
tools/perf/util/bpf_ftrace.c
Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
cpu_map__build_map to cpu function".
Additionally these needed to be fixed for the ARM builds to complete:
tools/perf/arch/arm/util/cs-etm.c
tools/perf/arch/arm64/util/pmu.c
Suggested-by: John Garry <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Vineet Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf_counts are accessed by the densely packed index.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Vineet Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use perf_cpu_map__for_each_cpu() to help with readability.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Vineet Singh <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
CPU filtering was not being applied to a script's switch events.
Fixes: 5bf83c29a0ad2e78 ("perf script: Add scripting operation process_switch()")
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Enable dwarf_callchain_users on arm64 which will be needed to do a
DWARF unwind in order to get the caller of the leaf frame.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Alexandre Truong <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: German Gomez <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Refactoring script__setup_sample_type() by using callchain_param_setup()
to replace the duplicate code for callchain parameter setting up.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Alexandre Truong <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: German Gomez <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When reading a perf.data file with register values, there is a mismatch
between the names and the values of the registers because the tool is
built using only the register names from the local architecture.
Reading a perf.data file that was recorded on ARM64, gives the following
erroneous output on an X86 machine:
# perf report -i perf_arm64.data -D
[...]
24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0
... user regs: mask 0x1ffffffff ABI 64-bit
.... AX 0x0000ffffd1515817
.... BX 0x0000ffffd1515480
.... CX 0x0000aaaadabf6c80
.... DX 0x000000000000002e
.... SI 0x0000000040100401
.... DI 0x0040600200000080
.... BP 0x0000ffffd1510e10
.... SP 0x0000000000000000
.... IP 0x00000000000000dd
.... FLAGS 0x0000ffffd1510cd0
.... CS 0x0000000000000000
.... SS 0x0000000000000030
.... DS 0x0000ffffa569a208
.... ES 0x0000000000000000
.... FS 0x0000000000000000
.... GS 0x0000000000000000
.... R8 0x0000aaaad3de9650
.... R9 0x0000ffffa57397f0
.... R10 0x0000000000000001
.... R11 0x0000ffffa57fd000
.... R12 0x0000ffffd1515817
.... R13 0x0000ffffd1515480
.... R14 0x0000aaaadabf6c80
.... R15 0x0000000000000000
.... unknown 0x0000000000000001
.... unknown 0x0000000000000000
.... unknown 0x0000000000000000
.... unknown 0x0000000000000000
.... unknown 0x0000000000000000
.... unknown 0x0000ffffd1510d90
.... unknown 0x0000ffffa5739b90
.... unknown 0x0000ffffd1510d80
.... XMM0 0x0000ffffa57392c8
... thread: perf-exec:43239
...... dso: [kernel.kallsyms]
As can be seen, the register names correspond to X86 registers, even
though the perf.data file was recorded on an ARM64 system. After this
patch, the output of the command displays the correct register names:
# perf report -i perf_arm64.data -D
[...]
24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0
... user regs: mask 0x1ffffffff ABI 64-bit
.... x0 0x0000ffffd1515817
.... x1 0x0000ffffd1515480
.... x2 0x0000aaaadabf6c80
.... x3 0x000000000000002e
.... x4 0x0000000040100401
.... x5 0x0040600200000080
.... x6 0x0000ffffd1510e10
.... x7 0x0000000000000000
.... x8 0x00000000000000dd
.... x9 0x0000ffffd1510cd0
.... x10 0x0000000000000000
.... x11 0x0000000000000030
.... x12 0x0000ffffa569a208
.... x13 0x0000000000000000
.... x14 0x0000000000000000
.... x15 0x0000000000000000
.... x16 0x0000aaaad3de9650
.... x17 0x0000ffffa57397f0
.... x18 0x0000000000000001
.... x19 0x0000ffffa57fd000
.... x20 0x0000ffffd1515817
.... x21 0x0000ffffd1515480
.... x22 0x0000aaaadabf6c80
.... x23 0x0000000000000000
.... x24 0x0000000000000001
.... x25 0x0000000000000000
.... x26 0x0000000000000000
.... x27 0x0000000000000000
.... x28 0x0000000000000000
.... x29 0x0000ffffd1510d90
.... lr 0x0000ffffa5739b90
.... sp 0x0000ffffd1510d80
.... pc 0x0000ffffa57392c8
... thread: perf-exec:43239
...... dso: [kernel.kallsyms]
Tester comments:
Athira reports:
"Looks good to me. Tested this patchset in powerpc by capturing regs in
powerpc and doing perf report to read the data from x86."
Reported-by: Alexandre Truong <[email protected]>
Reviewed-by: Athira Jajeev <[email protected]>
Signed-off-by: German Gomez <[email protected]>
Tested-by: Athira Jajeev <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Only perf report checked the validity of these arguments so apply the
same check to all tools that read them for consistency.
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Denis Nikitin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
-F weight in perf script is broken.
# ./perf mem record
# ./perf script -F weight
Samples for 'dummy:HG' event do not have WEIGHT attribute set. Cannot
print 'weight' field.
The sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. They share the same space, weight. The
lower 32 bits are exactly the same for both sample type. The higher 32
bits may be different for different architecture. For a new kernel on
x86, the PERF_SAMPLE_WEIGHT_STRUCT is used. For an old kernel or other
ARCHs, the PERF_SAMPLE_WEIGHT is used.
With -F weight, current perf script will only check the input string
"weight" with the PERF_SAMPLE_WEIGHT sample type. Because the commit
ea8d0ed6eae3 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") didn't
update the PERF_SAMPLE_WEIGHT_STRUCT sample type for perf script. For a
new kernel on x86, the check fails.
Use PERF_SAMPLE_WEIGHT_TYPE, which supports both sample types, to
replace PERF_SAMPLE_WEIGHT
Fixes: ea8d0ed6eae37b01 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT")
Reported-by: Joe Mario <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Tested-by: Joe Mario <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Joe Mario <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in 'perf script' for NULL
session->header.env.arch.
Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.
Committer notes:
If it is a pipe, we already assume is a native arch, so no need to check
session->header.env.arch.
Signed-off-by: Song Liu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The instruction latency information can be recorded on
some platforms, e.g., the Intel Sapphire Rapids server. With both memory
latency (weight) and the new instruction latency information, users can
easily locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.
Add a new field "ins_lat" to filter the instruction latency information,
which is available with sample type PERF_SAMPLE_WEIGHT_STRUCT.
Signed-off-by: Kan Liang <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joe Mario <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
set_print_ip_opts() was not being called when type != attr->type
because there is not a one-to-one relationship between output types
and attr->type. That resulted in ip not printing.
The attr_type() function is removed, and the match of attr->type to
output type is corrected.
Example on ADL using taskset to select an atom cpu:
# perf record -e cpu_atom/cpu-cycles/ taskset 0x1000 uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (7 samples) ]
Before:
# perf script | head
taskset 428 [-01] 10394.179041: 1 cpu_atom/cpu-cycles/:
taskset 428 [-01] 10394.179043: 1 cpu_atom/cpu-cycles/:
taskset 428 [-01] 10394.179044: 11 cpu_atom/cpu-cycles/:
taskset 428 [-01] 10394.179045: 407 cpu_atom/cpu-cycles/:
taskset 428 [-01] 10394.179046: 16789 cpu_atom/cpu-cycles/:
taskset 428 [-01] 10394.179052: 676300 cpu_atom/cpu-cycles/:
uname 428 [-01] 10394.179278: 4079859 cpu_atom/cpu-cycles/:
After:
# perf script | head
taskset 428 10394.179041: 1 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms])
taskset 428 10394.179043: 1 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms])
taskset 428 10394.179044: 11 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms])
taskset 428 10394.179045: 407 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms])
taskset 428 10394.179046: 16789 cpu_atom/cpu-cycles/: ffffffff95a0bb97 __intel_pmu_enable_all.constprop.48+0x47 ([kernel.kallsyms])
taskset 428 10394.179052: 676300 cpu_atom/cpu-cycles/: 7f829ef73800 cfree+0x0 (/lib/libc-2.32.so)
uname 428 10394.179278: 4079859 cpu_atom/cpu-cycles/: ffffffff95bae912 vma_interval_tree_remove+0x1f2 ([kernel.kallsyms])
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf_events may sometimes throttle an event due to creating too many
samples during a given timer tick.
As of now, the perf tool will not report on throttling, which means this
is a silent error.
Implement a callback for the throttle and unthrottle events within the
Python scripting engine, which can allow scripts to detect and report
when events may have been lost due to throttling.
The simplest script to report throttle events is:
def throttle(*args):
print("throttle" + repr(args))
def unthrottle(*args):
print("unthrottle" + repr(args))
Signed-off-by: Stephen Brennan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
machine_resolve() may have already been called. Test for that to avoid
calling it again unnecessarily.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The repipe argument is only used by perf inject and the all others
passes 'false'. Let's remove it from the function signature and add
__perf_session__new() to be called from perf inject directly.
This is a preparation of the change the pipe input/output.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
[ Fixed up some trivial conflicts as this patchset fell thru the cracks ;-( ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
ASan reports several memory leaks while running:
# perf test "82: Use vfs_getname probe to get syscall args filenames"
Two of these are caused by some refcounts not being decreased on
perf-script exit, namely script.threads and script.cpus.
This patch adds the missing __put calls in a new perf_script__exit
function, which is called at the end of cmd_script.
This patch concludes the fixes of all remaining memory leaks in perf
test "82: Use vfs_getname probe to get syscall args filenames".
Signed-off-by: Riccardo Mancini <[email protected]>
Fixes: cfc8874a48599249 ("perf script: Process cpu/threads maps")
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/5ee73b19791c6fa9d24c4d57f4ac1a23609400d7.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
ASan reports several memory leak while running:
# perf test "82: Use vfs_getname probe to get syscall args filenames"
One of the leaks is caused by zstd data not being released on exit in
perf-script.
This patch adds the missing zstd_fini().
Signed-off-by: Riccardo Mancini <[email protected]>
Fixes: b13b04d9382113f7 ("perf script: Initialize zstd_data")
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/39388e8cc2f85ca219ea18697a17b7bd8f74b693.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Move evsel::leader to perf_evsel::leader, so we can move the group
interface to libperf.
Also add several evsel helpers to ease up the transition:
struct evsel *evsel__leader(struct evsel *evsel);
- get leader evsel
bool evsel__has_leader(struct evsel *evsel, struct evsel *leader);
- true if evsel has leader as leader
bool evsel__is_leader(struct evsel *evsel);
- true if evsel is itw own leader
void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
- set leader for evsel
Committer notes:
Fix this when building with 'make BUILD_BPF_SKEL=1'
tools/perf/util/bpf_counter.c
- if (evsel->leader->core.nr_members > 1) {
+ if (evsel->core.leader->nr_members > 1) {
Signed-off-by: Jiri Olsa <[email protected]>
Requested-by: Shunsuke Nakamura <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add option --dlarg to pass arguments to dlfilters. The --dlarg option can
be repeated to pass more than 1 argument.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add option --list-dlfilters to list dlfilters in the current directory or
the exec-path e.g. ~/libexec/perf-core/dlfilters. Use with option -v (must
come before option --list-dlfilters) to show long descriptions.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
filter_event_early() can be more than 30% faster than filter_event()
because it is called before internal filtering. In other respects it
is the same as filter_event(), except that it will be passed events
that have yet to be filtered out.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In some cases, users want to filter very large amounts of data (e.g.
from AUX area tracing like Intel PT) looking for something specific.
While scripting such as Python can be used, Python is 10 to 20 times
slower than C. So define a C API so that custom filters can be written
and loaded.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Share the addr_location of 'addr' so that it need not be resolved more than
once.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To make it possible to use filtering with scripts, move filtering before
scripting.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Generally, it should be more efficient if filter_cpu() comes before
machine__resolve() because filter_cpu() is much less code than
machine__resolve().
Example:
$ perf record --sample-cpu -- make -C tools/perf >/dev/null
Before:
$ perf stat -- perf script -C 0 >/dev/null
Performance counter stats for 'perf script -C 0':
116.94 msec task-clock # 0.992 CPUs utilized
2 context-switches # 17.103 /sec
0 cpu-migrations # 0.000 /sec
8,187 page-faults # 70.011 K/sec
478,351,812 cycles # 4.091 GHz
564,785,464 instructions # 1.18 insn per cycle
114,341,105 branches # 977.789 M/sec
2,615,495 branch-misses # 2.29% of all branches
0.117840576 seconds time elapsed
0.085040000 seconds user
0.032396000 seconds sys
After:
$ perf stat -- perf script -C 0 >/dev/null
Performance counter stats for 'perf script -C 0':
107.45 msec task-clock # 0.992 CPUs utilized
3 context-switches # 27.919 /sec
0 cpu-migrations # 0.000 /sec
7,964 page-faults # 74.117 K/sec
438,417,260 cycles # 4.080 GHz
522,571,855 instructions # 1.19 insn per cycle
105,187,488 branches # 978.921 M/sec
2,356,261 branch-misses # 2.24% of all branches
0.108282546 seconds time elapsed
0.095935000 seconds user
0.011991000 seconds sys
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Factor out script_fetch_insn() so it can be reused.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is preparation for allowing a script to set the itrace options
for the session if they have not already been set.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add auxtrace_error to general python scripting.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Factor out perf_sample__sprintf_flags() so it can be reused.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|