Age | Commit message (Collapse) | Author | Files | Lines |
|
Add drop count to 'perf top' headers:
# perf top --stdio
PerfTop: 3549 irqs/sec kernel:51.8% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles:ppp], (all, 8 CPUs)
# perf top
Samples: 0 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0 drop: 0/0
The format is: <current period drop>/<total drop>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use conditional variable logic to synchronize between the reading and
processing threads. Currently it's done by having mutex around rotation
code.
Using a POSIX cond variable to sync both threads after queues rotation:
Process thread:
- Detects data
- Switches queues
- Sets rotate variable
- Waits in pthread_cond_wait()
Read thread:
- Detects rotate is set
- Kicks the process thread with a pthread_cond_signal()
After this rotation is safely completed and both threads can continue
with the new queue.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new thread that takes care of the hist creating to alleviate the
main reader thread so it can keep perf mmaps served in time so that we
reduce the possibility of losing events.
The 'perf top' command now spawns 2 extra threads, the data processing
is the following:
1) The main thread reads the data from mmaps and queues them to
ordered events object;
2) The processing threads takes the data from the ordered events
object and create initial histogram;
3) The GUI thread periodically sorts the initial histogram and
presents it.
Passing the data between threads 1 and 2 is done by having 2 ordered
events queues. One is always being stored by thread 1 while the other is
flushed out in thread 2.
Passing the data between threads 2 and 3 stays the same as was initially
for threads 1 and 3.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a 'lost count' to 'perf top' headers:
# perf top --stdio
PerfTop: 3850 irqs/sec kernel:49.0% exact: 100.0% lost: 0/0 [4000Hz cycles:ppp], (all, 8 CPUs)
# perf top
Samples: 0 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0
The format is: <current period lost>/<total lost>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We will need it in following patch, where we can't use the
container_of() trick to get the higher level object.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Decide to use the progress bar one level higher, we will need this in
following patch.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When looking at PT or brstackinsn traces with 'perf script' it can be
very useful to see the source code. This adds a simple facility to print
them with 'perf script', if the information is available through dwarf
% perf record ...
% perf script -F insn,ip,sym,srccode
...
4004c6 main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004c6 main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004cd main
5 for (i = 0; i < 10000000; i++)
4004b3 main
6 v++;
% perf record -b ...
% perf script -F insn,ip,sym,srccode,brstackinsn
...
main+22:
0000000000400543 insn: e8 ca ff ff ff # PRED
|18 f1();
f1:
0000000000400512 insn: 55
|10 {
0000000000400513 insn: 48 89 e5
0000000000400516 insn: b8 00 00 00 00
|11 f2();
000000000040051b insn: e8 d6 ff ff ff # PRED
f2:
00000000004004f6 insn: 55
|5 {
00000000004004f7 insn: 48 89 e5
00000000004004fa insn: 8b 05 2c 0b 20 00
|6 c = a / b;
0000000000400500 insn: 8b 0d 2a 0b 20 00
0000000000400506 insn: 99
0000000000400507 insn: f7 f9
0000000000400509 insn: 89 05 29 0b 20 00
000000000040050f insn: 90
|7 }
0000000000400510 insn: 5d
0000000000400511 insn: c3 # PRED
f1+14:
0000000000400520 insn: b8 00 00 00 00
|12 f2();
0000000000400525 insn: e8 cc ff ff ff # PRED
f2:
00000000004004f6 insn: 55
|5 {
00000000004004f7 insn: 48 89 e5
00000000004004fa insn: 8b 05 2c 0b 20 00
|6 c = a / b;
Not supported for callchains currently, would need some layout changes
there.
Committer notes:
Fixed the build on Alpine Linux (3.4 .. 3.8) by addressing this
warning:
In file included from util/srccode.c:19:0:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
#warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
^~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Andi Kleen <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The default timeout of 500ms for parsing /proc/<pid>/maps files is too
short for profiling many of our services.
This can be overridden by passing --proc-map-timeout to the relevant
command but it'd be nice to globally increase our default value.
This patch permits setting a different default with the
core.proc-map-timeout config file parameter.
Signed-off-by: Mark Drayton <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.
No change in functionality intended.
Committer notes:
This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease
cherry-picking and/or backporting, split this into multiple patches.
Just typos in comments, no need to backport, reducing the possibility of
possible backporting artifacts.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.
No change in functionality intended.
Committer notes:
This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease cherry
picking and/or backporting, split this into multiple patches.
This one has information that is presented to the user, albeit in debug
mode.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds support for generating instruction samples from trace of
AArch32 programs using the A32 and T32 instruction sets.
T32 has variable 2 or 4 byte instruction size, so the conversion between
addresses and instruction counts requires extra information from the
trace decoder, requiring version 0.10.0 of OpenCSD. A check for the
OpenCSD library version has been added to the feature check for OpenCSD.
Signed-off-by: Robert Walker <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In order to make libtraceevent into a proper library, its API should be
straightforward. The __tep_data2host*() functions are going to no longer
be available as a libtraceevent API, tep_read_number() should be used
instead. This patch replaces __tep_data2host*() usage with
tep_read_number() in perf.
Signed-off-by: Tzvetomir Stoyanov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
'struct tep_event'
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts.
This renames 'struct tep_event_format' to 'struct tep_event', which
describes more closely the purpose of the struct.
Signed-off-by: Tzvetomir Stoyanov <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
[ Fixup conflict with 6e33c250a88f ("tools lib traceevent: Fix compile warnings in tools/lib/traceevent/event-parse.c") ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Support displaying the average IPC and IPC coverage per symbol in 'perf
report' --tui and --stdio modes.
For example,
$ perf record -b ...
$ perf report -s symbol
Overhead Symbol IPC [IPC Coverage]
39.60% [.] __random 2.30 [ 54.8%]
18.02% [.] main 0.43 [ 54.3%]
14.21% [.] compute_flag 2.29 [100.0%]
14.16% [.] rand 0.36 [100.0%]
7.06% [.] __random_r 2.57 [ 70.5%]
6.85% [.] rand@plt 0.00 [ 0.0%]
Jiri Olsa <[email protected]> provided the patch to support the --stdio
mode. I merged Jiri's code in this patch.
$ perf report -s symbol --stdio
# Overhead Symbol IPC [IPC Coverage]
# ........ ........................... ....................
#
39.60% [.] __random 2.30 [ 54.8%]
18.02% [.] main 0.43 [ 54.3%]
14.21% [.] compute_flag 2.29 [100.0%]
14.16% [.] rand 0.36 [100.0%]
7.06% [.] __random_r 2.57 [ 70.5%]
6.85% [.] rand@plt 0.00 [ 0.0%]
0.02% [k] run_timer_softirq 1.60 [ 57.2%]
The columns "IPC" and "[IPC Coverage]" are automatically enabled when
the sort-key "symbol" is specified. If the perf.data file doesn't
contain timed LBR information, columns are filled with "-".
For example,
# Overhead Symbol IPC [IPC Coverage]
# ........ ........................... ....................
#
46.57% [.] main - -
17.60% [.] rand - -
15.84% [.] __random_r - -
11.90% [.] __random - -
6.50% [.] compute_flag - -
1.59% [.] rand@plt - -
0.00% [.] _dl_relocate_object - -
0.00% [k] tlb_flush_mmu - -
0.00% [k] perf_event_mmap - -
0.00% [k] native_sched_clock - -
0.00% [k] intel_pmu_handle_irq_v4 - -
0.00% [k] native_write_msr - -
v3:
---
Removed the sortkey 'ipc' from command-line. The columns "IPC"
and "[IPC Coverage]" are automatically enabled when "symbol"
is specified.
v2:
---
Merge in Jiri's patch to support stdio mode
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We often use the symbol__annotate2() to annotate a specified symbol.
While annotating may take some time, so in order to avoid annotating the
same symbol repeatedly, the patch creates a new flag to indicate the
symbol has been annotated.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.
For example:
$ perf annotate --stdio2
Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)
Disassembly of section .text:
000000000003aac0 <random@@GLIBC_2.2.5>:
8.32 3.28 sub $0x18,%rsp
3.28 mov $0x1,%esi
3.28 xor %eax,%eax
3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
11.57 3.28 1 ↓ je 20
lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 29
↓ jmp 43
11.57 1.10 20: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
0.00 1.10 1 ↓ je 43
29: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_lock_wait_private
add $0x80,%rsp
0.00 3.00 43: lea __ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00 lea 0xc(%rsp),%rsi
8.49 3.00 1 → callq __random_r
7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
0.00 1.94 1 ↓ je 68
lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne 70
↓ jmp 8a
0.00 2.00 68: decl __abort_msg@@GLIBC_PRIVATE+0x8a0
21.56 2.00 1 ↓ je 8a
70: lea __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
sub $0x80,%rsp
→ callq __lll_unlock_wake_private
add $0x80,%rsp
21.56 2.90 8a: movslq 0xc(%rsp),%rax
2.90 add $0x18,%rsp
9.03 2.90 1 ← retq
It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Multi AIO trace writing allows caching more kernel data into userspace
memory postponing trace writing for the sake of overall profiling data
thruput increase. It could be seen as kernel data buffer extension into
userspace memory.
With an --aio option value different from 0 (default value is 1) the
tool has capability to cache more and more data into user space along
with delegating spill to AIO.
That allows avoiding to suspend at record__aio_sync() between calls of
record__mmap_read_evlist() and increases profiling data thruput at the
cost of userspace memory.
Signed-off-by: Alexey Budankov <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The trace file offset is read once before mmaps iterating loop and
written back after all performance data is enqueued for aio writing.
The trace file offset is incremented linearly after every successful aio
write operation.
record__aio_sync() blocks till completion of the started AIO operation
and then proceeds.
record__aio_mmap_read_sync() implements a barrier for all incomplete
aio write requests.
Signed-off-by: Alexey Budankov <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The map->data buffer is used to preserve map->base profiling data for
writing to disk. AIO map->cblock is used to queue corresponding
map->data buffer for asynchronous writing.
Signed-off-by: Alexey Budankov <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)). This
makes it more readable and also fix this warning detected by
err_cast.cocci:
tools/perf/util/bpf-loader.c:1606:11-18: WARNING: ERR_CAST can be used with op
Signed-off-by: Wen Yang <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wen Yang <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Branch stacks do not necessarily have the same cpumode as the 'ip'. Use
the fallback functions in those cases.
This patch depends on patch "perf tools: Add fallback functions for cases
where cpumode is insufficient".
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
thread__resolve() is used in the sample_addr_correlates_sym() cases
where 'addr' is a destination of a branch which does not necessarily
have the same cpumode as the 'ip'. Use the fallback function in that
case.
This patch depends on patch "perf tools: Add fallback functions for
cases where cpumode is insufficient".
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For branch stacks or branch samples, the sample cpumode might not be
correct because it applies only to the sample 'ip' and not necessary to
'addr' or branch stack addresses. Add fallback functions that can be
used to deal with those cases
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some architectures have a single address space for kernel and user
addresses, which makes it possible to determine if an address is in
kernel space or user space. Some don't, e.g.: sparc.
Cache that info in perf_env so that, for instance, code needing to
fallback failed symbol lookups at the kernel space in single address
space arches can lookup at userspace.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We'll set a new machine field based on env->arch, which for live mode,
like with 'perf top' means we need to use uname() to figure the name of
the arch, fix perf_env__arch() to consider both (env == NULL) and
(env->arch == NULL) as local operation.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected] # 4.19
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A double pointer is used in map__find() where a single pointer is enough
because the function doesn't affect the rbtree and the rbtree is locked.
Signed-off-by: Eric Saint-Etienne <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Saint-Etienne <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When using the -x option, perf stat prints CSV-style output with one
event per line. For each event, it prints the count, the unit, the
event name, the cgroup, and a bunch of other event specific fields (such
as insn per cycles).
When you use CSV-style mode, you expect a normalized output where each
event is printed with the same number of fields regardless of what it is
so it can easily be imported into a spreadsheet or parsed.
For instance, if an event does not have a unit, then print an empty
field for it.
Although this approach was implemented for the unit, it was not for the
cgroup.
When mixing cgroup and non-cgroup events, then non-cgroup events would
not show an empty field, instead the next field was printed, make
columns not line up correctly.
This patch fixes the cgroup output issues by forcing an empty field
for non-cgroup events as soon as one event has cgroup.
Before:
<not counted> @ @cycles @foo @ 0 @100.00@@
2531614 @ @cycles @[email protected]@ @
foo cgroup lines up with time_running!
After:
<not counted> @ @cycles @foo @0 @100.00@@
2594834 @ @cycles @ @5287372 @100.00@@
Fields line up.
Signed-off-by: Stephane Eranian <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
function") introduced scale and unit for clock events. Thus,
perf_stat__update_shadow_stats() now saves scaled values of clock events
in msecs, instead of original nsecs. But while calculating values of
shadow stats we still consider clock event values in nsecs. This results
in a wrong shadow stat values. Ex,
# ./perf stat -e task-clock,cycles ls
<SNIP>
2.60 msec task-clock:u # 0.877 CPUs utilized
2,430,564 cycles:u # 1215282.000 GHz
Fix this by saving original nsec values for clock events in
perf_stat__update_shadow_stats(). After patch:
# ./perf stat -e task-clock,cycles ls
<SNIP>
3.14 msec task-clock:u # 0.839 CPUs utilized
3,094,528 cycles:u # 0.985 GHz
Suggested-by: Jiri Olsa <[email protected]>
Reported-by: Anton Blanchard <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: [email protected]
Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The weak functions, strcmp_cpuid_str() and get_cpuid_str(), are defined
in pmu.c.
Most of the cpuid related functions, including *_cpuid_str()'s
declaration and platform specific definition, are in header.c/h.
To make the declaration and definition of all cpuid related functions in
a consistent place, move the weak functions to header.c.
There is no functional change.
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf can take minutes to parse an image when -ffunction-section is used.
This is especially true with the kernel image when it is compiled this
way, which is the arm64 default since the patcheset "Enable deadcode
elimination at link time".
Perf organize maps using a rbtree. Whenever perf finds a new symbols, it
first searches this rbtree for the map it belongs to, by strcmp()'aring
section names. When it finds the map with the right name, it uses it to
add the symbol. With a usual image there aren't so many maps but when
using -ffunction-section there's basically one map per function. With
the kernel image that's north of 40,000 maps. For most symbols perf has
to parses the entire rbtree to eventually create a new map and add it.
Consequently perf spends most of the time browsing a rbtree that keeps
getting larger.
This performance fix introduces a secondary rbtree that indexes maps
based on the section name.
Signed-off-by: Eric Saint-Etienne <[email protected]>
Reviewed-by: Dave Kleikamp <[email protected]>
Reviewed-by: David Aldridge <[email protected]>
Reviewed-by: Rob Gardner <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf tools cannot find the proper event list for the Cascadelake
server. Because the Cascadelake server and the Skylake server have the
same CPU model number, which are used by the perf tools to find the
event list.
The stepping for Skylake server is up to 4.
The stepping for Cascadelake server starts from 5.
The stepping can be used to distinguish between them.
The stepping is added in get_cpuid_str().
The stepping information for Skylake server is updated in mapfile.csv.
A x86 specific strcmp_cpuid_cmp() function is added to handle two CPUID
formats in mapfile.csv, "vendor-family-model-stepping" and
"vendor-family-model":
- If a cpuid-regular-expression from the mapfile.csv using the new
stepping format, a cpuid-string generated on the machine must include
stepping. Otherwise, it is a mismatch.
- If the cpuid-regular-expression using the old non-stepping format,
the stepping in the cpuid-string will be ignored.
The script, using environment string "PERF_CPUID" without stepping on
Skylake server, will be broken. If so, users must fix their scripts.
Committer notes:
Fixed this build error on centos:6 and debian:7:
arch/x86/util/header.c: In function 'is_full_cpuid':
arch/x86/util/header.c:82:39: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
arch/x86/util/header.c: In function 'strcmp_cpuid_str':
arch/x86/util/header.c:98:56: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
cc1: all warnings being treated as errors
Signed-off-by: Kan Liang <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We already have function to check if a given event is either
SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.
Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Depending on which functions are inlined in util/pmu.c, the snprintf()
calls in perf_pmu__parse_{scale,unit,per_pkg,snapshot}() might trigger a
warning:
util/pmu.c: In function 'pmu_aliases':
util/pmu.c:178:31: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
snprintf(path, PATH_MAX, "%s/%s.unit", dir, name);
^~
I found this when trying to build perf from Linux 3.16 with gcc 8.
However I can reproduce the problem in mainline if I force
__perf_pmu__new_alias() to be inlined.
Suppress this by using scnprintf() as has been done elsewhere in perf.
Signed-off-by: Ben Hutchings <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To better reflect that this is a tracepoint filter, as opposed, for
instance to map based BPF filters.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When reporting on 'record' server we try to retrieve/use the mnt
namespace of the profiled tasks. We use following API with cookie to
hold the return namespace, roughly:
nsinfo__mountns_enter(struct nsinfo *nsi, struct nscookie *nc)
setns(newns, 0);
...
new ns related open..
...
nsinfo__mountns_exit(struct nscookie *nc)
setns(nc->oldns)
Once finished we setns to old namespace, which also sets the current
working directory (cwd) to "/", trashing the cwd we had.
This is mostly fine, because we use absolute paths almost everywhere,
but it screws up 'perf diff':
# perf diff
failed to open perf.data: No such file or directory (try 'perf record' first)
...
Adding the current working directory to be part of the cookie and
restoring it in the nsinfo__mountns_exit call.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Krister Johansen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 843ff37bb59e ("perf symbols: Find symbols in different mount namespace")
Link: http://lkml.kernel.org/r/[email protected]
[ No need to check for NULL args for free(), use zfree() for struct members ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As the namespace support code will use this, which is not available in
some non _GNU_SOURCE libraries such as Android's bionic used in my
container build tests (r12b and r15c at the moment).
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adam reported a record command crash for simple session like:
$ perf record -e cpu-clock ls
with following backtrace:
Program received signal SIGSEGV, Segmentation fault.
3543 ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]);
(gdb) bt
#0 perf_event__synthesize_event_update_unit
#1 0x000000000051e469 in perf_event__synthesize_extra_attr
#2 0x00000000004445cb in record__synthesize
#3 0x0000000000444bc5 in __cmd_record
...
We synthesize an update event that needs to touch the evsel id array,
which is not defined at that time. Fix this by forcing the id allocation
for events with their unit defined.
Reflecting possible read_format ID bit in the attr tests.
Reported-by: Yongxin Liu <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Adam Lee <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477
Fixes: bfd8f72c2778 ("perf record: Synthesize unit/scale/... in event update")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent improvements and fixes from Arnaldo Carvalho de Melo:
Intel PT SQL viewer: (Adrian Hunter)
- Fall back to /usr/local/lib/libxed.so
- Add Selected branches report
- Add help window
- Fix table find when table re-ordered
Intel PT debug log (Adrian Hunter)
- Add more event information
- Add MTC and CYC timestamps
perf record: (Andi Kleen)
- Support weak groups, just like with 'perf stat'
perf trace: (Arnaldo Carvalho de Melo)
- Start augmenting raw_syscalls:{sys_enter,sys_exit}: goal is to have a
generic, arch independent eBPF kernel component that is programmed with
syscall table details, what to copy, how many bytes, pid, arg filters from the
userspace via eBPF maps by the 'perf trace' tool that continues to use all its
argument beautifiers, just taking advantage of the extra pointer contents.
JVMTI: (Gustavo Romero)
- Fix undefined symbol scnprintf in libperf-jvmti.so
perf top: (Jin Yao)
- Display the LBR stats in callchain entries
perf stat: (Thomas Richter)
- Handle different PMU names with common prefix
arm64: Will (Deacon)
- Fix arm64 tools build failure wrt smp_load_{acquire,release}.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Andi reported following malfunction:
# perf record -e '{ref-cycles,cycles}:S' -a sleep 1
# perf script
non matching sample_id_all
That's because we disable sample_id_all bit for non-sampling group
members. We can't do that, because it needs to be the same over the
whole event list. This patch keeps it untouched again.
Reported-by: Andi Kleen <[email protected]>
Tested-by: Andi Kleen <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Fixes: e9add8bac6c6 ("perf evsel: Disable write_backward for leader sampling group events")
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
One cause of decoding errors is un-synchronized side-band data.
Timestamps are needed to debug such cases. TSC packet timestamps are
logged. Log also MTC and CYC timestamps.
Signed-off-by: Adrian Hunter <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
More event information is useful for debugging, especially MMAP events.
Signed-off-by: Adrian Hunter <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
On s390 the CPU Measurement Facility for counters now supports
2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
for one and the same CPU.
Running command
[root@s35lp76 perf]# ./perf stat -e tx_c_tend \
-- ~/mytests/cf-tx-events 1
Measuring transactions
TX_C_TABORT_NO_SPECIAL: 0 expected:0
TX_C_TABORT_SPECIAL: 0 expected:0
TX_C_TEND: 1 expected:1
TX_NC_TABORT: 11 expected:11
TX_NC_TEND: 1 expected:1
Performance counter stats for '/root/mytests/cf-tx-events 1':
2 tx_c_tend
0.002120091 seconds time elapsed
0.000121000 seconds user
0.002127000 seconds sys
[root@s35lp76 perf]#
displays output which is unexpected (and wrong):
2 tx_c_tend
The test program definitely triggers only one transaction, as shown
in line 'TX_C_TEND: 1 expected:1'.
This is caused by the following call sequence:
pmu_lookup() scans and installs a PMU.
+--> pmu_aliases() parses all aliases in directory
.../<pmu-name>/events/* which are file names.
+--> pmu_aliases_parse() Read each file in directory and create
an new alias entry. This is done with
+--> perf_pmu__new_alias() and
+--> __perf_pmu__new_alias() which also check for
identical alias names.
After pmu_aliases() returns, a complete list of event names
for this pmu has been created. Now function
pmu_add_cpu_aliases() is called to add the events listed in the json
| files to the alias list of the cpu.
+--> perf_pmu__find_map() Returns a pointer to the json events.
Now function pmu_add_cpu_aliases() scans through all events listed
in the JSON files for this CPU.
Each json event pmu name is compared with the current PMU being
built up and if they mismatch, the json event is added to the
current PMUs alias list.
To avoid duplicate entries the following comparison is done:
if (!is_arm_pmu_core(name)) {
pname = pe->pmu ? pe->pmu : "cpu";
if (strncmp(pname, name, strlen(pname)))
continue;
}
The culprit is the strncmp() function.
Using current s390 PMU naming, the first PMU is 'cpum_cf'
and a long list of events is added, among them 'tx_c_tend'
When the second PMU named 'cpum_cf_diag' is added, only one event
named 'CF_DIAG' is added by the pmu_aliases() function.
Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
Since the CPUID string is the same for both PMUs, json file events
for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'
This happens because the strncmp() actually compares:
strncmp("cpum_cf", "cpum_cf_diag", 6);
The first parameter is the pmu name taken from the event in
the json file. The second parameter is the pmu name of the PMU
currently being built.
They are different, but the length of the compare only tests the
common prefix and this returns 0(true) when it should return false.
Now all events for PMU cpum_cf are added to the alias list for pmu
cpum_cf_diag.
Later on in function parse_events_add_pmu() the event 'tx_c_end' is
searched in all available PMUs and found twice, adding it two
times to the evsel_list global variable which is the root
of all events. This results in a counter value of 2 instead
of 1.
Output with this patch:
[root@s35lp76 perf]# ./perf stat -e tx_c_tend \
-- ~/mytests/cf-tx-events 1
Measuring transactions
TX_C_TABORT_NO_SPECIAL: 0 expected:0
TX_C_TABORT_SPECIAL: 0 expected:0
TX_C_TEND: 1 expected:1
TX_NC_TABORT: 11 expected:11
TX_NC_TEND: 1 expected:1
Performance counter stats for '/root/mytests/cf-tx-events 1':
1 tx_c_tend
0.001815365 seconds time elapsed
0.000123000 seconds user
0.001756000 seconds sys
[root@s35lp76 perf]#
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Hendrik Brueckner <[email protected]>
Reviewed-by: Sebastien Boisvert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: [email protected]
Fixes: 292c34c10249 ("perf pmu: Fix core PMU alias list for X86 platform")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
- Move the function from builtin-stat to evlist for reuse
- Rename to evlist to match purpose better
- Pass the evlist as first argument.
- No functional changes
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates and fixes from Ingo Molnar:
"These are almost all tooling updates: 'perf top', 'perf trace' and
'perf script' fixes and updates, an UAPI header sync with the merge
window versions, license marker updates, much improved Sparc support
from David Miller, and a number of fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
perf intel-pt/bts: Calculate cpumode for synthesized samples
perf intel-pt: Insert callchain context into synthesized callchains
perf tools: Don't clone maps from parent when synthesizing forks
perf top: Start display thread earlier
tools headers uapi: Update linux/if_link.h header copy
tools headers uapi: Update linux/netlink.h header copy
tools headers: Sync the various kvm.h header copies
tools include uapi: Update linux/mmap.h copy
perf trace beauty: Use the mmap flags table generated from headers
perf beauty: Wire up the mmap flags table generator to the Makefile
perf beauty: Add a generator for MAP_ mmap's flag constants
tools include uapi: Update asound.h copy
tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies
tools include uapi: Update linux/fs.h copy
perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
perf cs-etm: Correct CPU mode for samples
perf unwind: Take pgoff into account when reporting elf to libdwfl
perf top: Do not use overwrite mode by default
perf top: Allow disabling the overwrite mode
perf trace: Beautify mount's first pathname arg
...
|
|
In the absence of a fallback, samples must provide a correct cpumode for
the 'ip'. Do that now there is no fallback.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the absence of a fallback, callchains must encode also the callchain
context. Do that now there is no fallback.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: [email protected] # 4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When synthesizing FORK events, we are trying to create thread objects
for the already running tasks on the machine.
Normally, for a kernel FORK event, we want to clone the parent's maps
because that is what the kernel just did.
But when synthesizing, this should not be done. If we do, we end up
with overlapping maps as we process the sythesized MMAP2 events that
get delivered shortly thereafter.
Use the FORK event misc flags in an internal way to signal this
situation, so we can elide the map clone when appropriate.
Signed-off-by: David S. Miller <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joe Mario <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Added comment about flag use in machine__process_fork_event(),
use ternary op in thread__clone_map_groups() as suggested by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.
The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.
This seems silly and more expensive than it needs to be but it is enough
for a initial fix.
The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.
Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.
Signed-off-by: David S. Miller <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Souvik Banerjee <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected] # 4.19
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since commit edeb0c90df35 ("perf tools: Stop fallbacking to kallsyms for
vdso symbols lookup"), the kernel address cannot be properly parsed to
kernel symbol with command 'perf script -k vmlinux'. The reason is
CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER,
thus it fails to find corresponding map/dso in below flows:
process_sample_event()
`-> machine__resolve()
`-> thread__find_map(thread, sample->cpumode, sample->ip, al);
In this flow it needs to pass argument 'sample->cpumode' to tell what's
the CPU mode, before it always passed PERF_RECORD_MISC_USER but without
any failure until the commit edeb0c90df35 ("perf tools: Stop fallbacking
to kallsyms for vdso symbols lookup") has been merged. The reason is
even with the wrong CPU mode the function thread__find_map() firstly
fails to find map but it will rollback to find kernel map for vdso
symbols lookup. In the latest code it has removed the fallback code,
thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map
anymore with kernel address.
This patch is to correct samples CPU mode setting, it creates a new
helper function cs_etm__cpu_mode() to tell what's the CPU mode based on
the address with the info from machine structure; this patch has a bit
extension to check not only kernel and user mode, but also check for
host/guest and hypervisor mode. Finally this patch uses the function in
instruction and branch samples and also apply in cs_etm__mem_access()
for a minor polishing.
Signed-off-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: David Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # v4.19
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
libdwfl parses an ELF file itself and creates mappings for the
individual sections. perf on the other hand sees raw mmap events which
represent individual sections. When we encounter an address pointing
into a mapping with pgoff != 0, we must take that into account and
report the file at the non-offset base address.
This fixes unwinding with libdwfl in some cases. E.g. for a file like:
```
using namespace std;
mutex g_mutex;
double worker()
{
lock_guard<mutex> guard(g_mutex);
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 1000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << endl;
return s;
}
int main()
{
vector<std::future<double>> results;
for (int i = 0; i < 10000; ++i) {
results.push_back(async(launch::async, worker));
}
return 0;
}
```
Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp -o cpp-locking`,
then record it with `perf record --call-graph dwarf -e
sched:sched_switch`.
When you analyze it with `perf script` and libunwind, you should see:
```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
563b9cb506fb double std::__invoke_impl<double, double (*)()>(std::__invoke_other, double (*&&)())+0x2b (inlined)
563b9cb506fb std::__invoke_result<double (*)()>::type std::__invoke<double (*)()>(double (*&&)())+0x2b (inlined)
563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<double (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined)
563b9cb506fb std::thread::_Invoker<std::tuple<double (*)()> >::operator()()+0x2b (inlined)
563b9cb506fb std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<double>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<double (*)()> >, dou>
563b9cb506fb std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_>
563b9cb507e8 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const+0x28 (inlined)
563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)+0x28 (/ssd/milian/>
7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
563b9cb51149 __gthread_once+0xe9 (inlined)
563b9cb51149 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>
563b9cb51149 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool)+0xe9 (inlined)
563b9cb51149 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >&&)::{lambda()#1}::op>
563b9cb51149 void std::__invoke_impl<void, std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double>
563b9cb51149 std::__invoke_result<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >>
563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_>
563b9cb51149 std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<dou>
563b9cb51149 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread>
7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25)
7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
7f38e42ccb22 __GI___clone+0x42 (inlined)
```
Before this patch, using libdwfl, you would see:
```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
a041161e77950c5c [unknown] ([unknown])
```
With this patch applied, we get a bit further in unwinding:
```
cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
6eab825c1ee3e4ff [unknown] ([unknown])
```
Note that the backtrace is still stopping too early, when compared to
the nice results obtained via libunwind. It's unclear so far what the
reason for that is.
Committer note:
Further comment by Milian on the thread started on the Link: tag below:
---
The remaining issue is due to a bug in elfutils:
https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
With both patches applied, libunwind and elfutils produce the same output for
the above scenario.
---
Signed-off-by: Milian Wolff <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|