Age | Commit message (Collapse) | Author | Files | Lines |
|
There's no need to call reset_dimensions within __setup_output_field
function. It's already called in its caller setup_sorting right before
perf_hpp__init, which will be changed in following patch to respect
taken dimension.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Sorting on 'symbol' gives to broad a resolution as it can cover a range
of IP address. Use the iaddr instead to get proper sorting on IP
addresses. Need to use the 'mem_sort' feature of perf record.
New sort option is: symbol_iaddr, header label is 'Code Symbol'.
$ perf mem report --stdio -F +symbol_iaddr
# Overhead Samples Code Symbol Local Weight
# ........ ............ ........................ ............
#
54.08% 1 [k] nmi_handle 192
4.51% 1 [k] finish_task_switch 16
3.66% 1 [.] malloc 13
3.10% 1 [.] __strcoll_l 11
Signed-off-by: Don Zickus <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch enable perf report to sort by processor socket:
$ perf report --stdio --sort socket,comm,dso,symbol
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 686 of event 'cycles'
# Event count (approx.): 349215462
#
# Overhead SOCKET Command Shared Object Symbol
# ........ ...... ....... ................ ............................
#
97.05% 000 test test [.] plusB_c
0.98% 000 test test [.] plusA_c
0.93% 001 perf [kernel.vmlinux] [k] smp_call_function_single
0.19% 001 perf [kernel.vmlinux] [k] page_fault
0.19% 001 swapper [kernel.vmlinux] [k] pm_qos_request
0.16% 000 test [kernel.vmlinux] [k] add_mm_counter_fast
Signed-off-by: Kan Liang <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Fix col calc, un-allcapsify col header & read the topology when not using perf.data ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When profiling the kernel with the 'srcfile' sort key it's common to
"get stuck" in include. For example a lot of code uses current or other
inlines, so they get accounted to some random include file. This is not
very useful as a high level categorization.
For example just profiling the idle loop usually shows mostly inlines,
so you never see the actual cpuidle file.
This patch changes the 'srcfile' sort key to always unwind the inline
stack using BFD/DWARF. So we always account to the base function that
called the inline.
In a few cases include is still shown (for example for MSR accesses),
but that is because they get inlining expanded as part of assigning to a
global function pointer. For the majority it works fine though.
v2: Use simpler while loop. Add maximum iteration count.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Handle the SRCLINE_UNKNOWN case correctly when processing "srcfile".
Commiter note:
We can't just free it, as it was't allocated via malloc, its a guard
variable.
Reported-by: Namhyung Kim <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In some cases it's useful to characterize samples by file. This is
useful to get a higher level categorization, for example to map cost to
subsystems.
Add a srcfile sort key to perf report. It builds on top of the existing
srcline support.
Commiter notes:
E.g.:
# perf record -F 10000 usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (13 samples) ]
[root@zoo ~]# perf report -s srcfile --stdio
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 869878
#
# Overhead Source File
# ........ ...........
60.99% .
20.62% paravirt.h
14.23% rmap.c
4.04% signal.c
0.11% msr.h
#
The first line is collecting all the files for which srcfiles couldn't somehow
get resolved to:
# perf report -s srcfile,dso --stdio
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 869878
#
# Overhead Source File Shared Object
# ........ ........... ................
40.97% . ld-2.20.so
20.62% paravirt.h [kernel.vmlinux]
20.02% . libc-2.20.so
14.23% rmap.c [kernel.vmlinux]
4.04% signal.c [kernel.vmlinux]
0.11% msr.h [kernel.vmlinux]
#
XXX: Investigate why that is not resolving on Fedora 21, Andi says he hasn't
seen this on Fedora 22.
Signed-off-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Added column length update, from 0e65bdb3f90f ('perf hists: Update the column width for the "srcline" sort key') ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Display the cycles by default in branch sort mode.
To make enough room for the new column I removed dso_to. It is usually
redundant with dso_from.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
cycles is a new branch_info field available on some CPUs that indicates
the time deltas between branches in the LBR.
Add a sort key and output code for the cycles to allow to display the
basic block cycles individually in perf report.
We also pass in the cycles for weight when LBRs are processed, which
allows to get global and local weight, to get an estimate of the total
cost.
And also print the cycles information for perf report -D. I also added
printing for the previously missing LBR flags (mispredict etc.)
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When using a map file from a JIT, due to memory reuse, we can obtain
multiple symbols with the same start address but a different length.
The symbols__find does check for the end so not doing it in
sort__sym_cmp was causing the hist_entry in the annotate part of a
report to match to the wrong entry, causing a fatal error.
Signed-off-by: Yannick Brosseau <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently the se_cmp and se_collapse use pointer comparison,
which is ok for for testing equality of strings. It's not ok
as comparing function for rbtree insertion, because it gives
different results based on current pointer values.
We saw test 32 (hists cumulation test) failing based on different
environment setup. Having all sort functions straightened fix the
test for us.
Reported-by: Jan Stancek <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jan Stancek <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.
Actually, perf diff once intended to compare the symbol names. The
commit as below can look for a pair by name.
604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.
For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.
Then the commit as below used IP to replace symbol name.
9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.
The global dsos is discarded from a patch in 2010.
a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.
This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.
Here are some examples:
The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.
example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
Old perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
Event 'cycles'
Baseline Delta Shared Object Symbol
........ ....... ................ ...............................
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% example [.] f2
53.71% -7.55% example [.] f3
+53.81% example [.] f3
0.02% example [.] main
New perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% -0.08% example [.] f2
53.71% +0.11% example [.] f3
0.02% example [.] main
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently ->cmp, ->collapse and ->sort callbacks doesn't pass
corresponding fmt. But it'll be needed by upcoming changes in
perf diff command.
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ fix build by passing perf_hpp_fmt pointer to hist_entry__cmp_ methods ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When the source line is not found fall back to sym + offset. This is
generally much more useful than a raw address.
For this we need to pass in the symbol from the caller.
For some callers it's awkward to compute, so we stay at the old
behaviour.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Problem introduced in:
commit 5b5916696051 "perf report: Honor column width setting"
Where the left justification signal was after the width, which ended up,
when the width was, say, 11, always printing:
%11.11-s
Instead of src:line left justified and limited to 11 chars.
Resulting in a like:
70.93% %11.11-s [.] f2 tcall
When it should instead be:
70.93% tcall.c:5 [.] f2 tcall
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F mispredict
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F in_tx
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F abort
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Not all tools need a hists instance per perf_evsel, so lets pave the way
to remove evsel->hists while leaving a way to access the hists from a
specially allocated evsel, one that comes with space at the end where
lives the evsel.
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding support to add field(s) to default sort order via using the '+'
prefix, like for report:
$ perf report
Samples: 2K of event 'cycles', Event count (approx.): 882172583
Overhead Command Shared Object Symbol
7.39% swapper [kernel.kallsyms] [k] intel_idle
1.97% firefox libpthread-2.17.so [.] pthread_mutex_lock
1.39% firefox [snd_hda_intel] [k] azx_get_position
1.11% firefox libpthread-2.17.so [.] pthread_mutex_unlock
$ perf report -s +cpu
Samples: 2K of event 'cycles', Event count (approx.): 882172583
Overhead Command Shared Object Symbol CPU
2.89% swapper [kernel.kallsyms] [k] intel_idle 000
2.61% swapper [kernel.kallsyms] [k] intel_idle 002
1.20% swapper [kernel.kallsyms] [k] intel_idle 001
0.82% firefox libpthread-2.17.so [.] pthread_mutex_lock 002
Works in general for commands using --sort option.
v2 with changes suggested:
- Use dynamic memory instead static buffer
- Fix error message typo
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding support to add field(s) to default field order via using the '+'
prefix, like for report:
$ perf report
Samples: 10 of event 'cycles', Event count (approx.): 4463799
Overhead Command Shared Object Symbol
32.40% ls [kernel.kallsyms] [k] filemap_fault
28.19% ls [kernel.kallsyms] [k] get_page_from_freelist
23.38% ls [kernel.kallsyms] [k] enqueue_entity
15.04% ls [kernel.kallsyms] [k] mmap_region
$ perf report -F +period,sample
Samples: 10 of event 'cycles', Event count (approx.): 4463799
Overhead Period Samples Command Shared Object Symbol
32.40% 1446493 1 ls [kernel.kallsyms] [k] filemap_fault
28.19% 1258486 1 ls [kernel.kallsyms] [k] get_page_from_freelist
23.38% 1043754 1 ls [kernel.kallsyms] [k] enqueue_entity
15.04% 671160 1 ls [kernel.kallsyms] [k] mmap_region
Works in general for commands using --field option.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jean Pihet <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It makes the code a bit simpler and easier to debug IMHO.
I guess it can also remove similar code in perf diff, but let's keep
it for a future work. :)
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Set column width and do not change it if user gives -w/--column-widths
option. It'll truncate longer symbols than the width if exists.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Save column length in the hpp format and pass it to print functions.
This is a preparation for users to control column width in the output.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now perf left-aligns column headers but the contents does not. It
should have same alignment. This requires a change in pid sort key - it
consists of two part (pid and comm). As length of comm can be vary it'd
be better to change the order of them.
Thanks to Jiri Olsa for pointing this out.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Looks better and avoids it moving to the end of the screen as the column
width changes over time in 'perf top'.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In perf's 'mem-mode', one can get access to a whole bunch of details specific to a
particular sample instruction. A bunch of those details relate to the data
address.
One interesting thing you can do with data addresses is to convert them into a unique
cacheline they belong too. Organizing these data cachelines into similar groups and sorting
them can reveal cache contention.
This patch creates an alogorithm based on various sample details that can help group
entries together into data cachelines and allows 'perf report' to sort on it.
The algorithm relies on having proper mmap2 support in the kernel to help determine
if the memory map the data address belongs to is private to a pid or globally shared.
The alogortithm is as follows:
o group cpumodes together
o group entries with discovered maps together
o sort on major, minor, inode and inode generation numbers
o if userspace anon, then sort on pid
o sort on cachelines based on data addresses
The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'.
Sample output:
#
# Samples: 206 of event 'cpu/mem-loads/pp'
# Total weight : 2534
# Sort order : dcacheline,pid
#
# Overhead Samples Data Cacheline Command: Pid
# ........ ............ ...................................................................... ..................
#
13.22% 1 [k] 0xffff88042f08ebc0 swapper: 0
9.27% 1 [k] 0xffff88082e8cea80 swapper: 0
3.59% 2 [k] 0xffffffff819ba180 swapper: 0
0.32% 1 [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0 swapper: 0
0.32% 1 [k] timekeeper_seq+0xfffffffffffffff8 swapper: 0
Note: Added a '+1' to symlen size in hists__calc_col_len to prevent the next column
from prematurely tabbing over and mis-aligning. Not sure what the problem is.
Signed-off-by: Don Zickus <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
After output/sort fields refactoring, it's expensive
to check the elide bool in its current location inside
the 'struct sort_entry'.
The perf_hpp__should_skip function gets highly noticable in
workloads with high number of output/sort fields, like for:
$ perf report -i perf-test.data -F overhead,sample,period,comm,pid,dso,symbol,cpu --stdio
Performance report:
9.70% perf [.] perf_hpp__should_skip
Moving the elide bool into the 'struct perf_hpp_fmt', which
makes the perf_hpp__should_skip just single struct read.
Got speedup of around 22% for my test perf.data workload.
The change should not harm any other workload types.
Performance counter stats for (10 runs):
before:
358,319,732,626 cycles ( +- 0.55% )
467,129,581,515 instructions # 1.30 insns per cycle ( +- 0.00% )
150.943975206 seconds time elapsed ( +- 0.62% )
now:
278,785,972,990 cycles ( +- 0.12% )
370,146,797,640 instructions # 1.33 insns per cycle ( +- 0.00% )
116.416670507 seconds time elapsed ( +- 0.31% )
Acked-by: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
There's no need to setup elide of sort_dso sort entry again
with symbol_conf.dso_list list.
The only difference were list names of memory mode data,
which does not make much sense to me.
Acked-by: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
When reset_output_field() is called, also reset field/sort order to
NULL so that it can have the default values. It's needed for testing.
Signed-off-by: Namhyung Kim <[email protected]>
CC: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
Print accumulated stat of a hist entry if requested.
To do that, add new HPP_PERCENT_ACC_FNS macro and generate a
perf_hpp_fmt using it. The __hpp__sort_acc() function sorts entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.
Also add "overhead_children" output field to be selected by user.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arun Sharma <[email protected]>
Tested-by: Rodrigo Campos <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
The reset_output_field() function is for clearing output field
settings and will be used for test code in later patch.
Signed-off-by: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
Now we moved to the perf_hpp_[_sort]_list so no need to keep the old
hist_entry__sort_list and sort__first_dimension. Also the
hist_entry__sort_snprintf() can be gone as hist_entry__snprintf()
provides the functionality.
Signed-off-by: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
Some fields missed to set default column length so it broke align in
--stdio output. Add perf_hpp__reset_width() to set it to a sane
default value.
Note that this change will ignore -w/--column-widths option for now.
Before:
$ perf report -F cpu,comm,overhead --stdio
...
# CPU Command Overhead
# ............... ........
#
0 firefox 2.65%
0 kworker/0:0 1.45%
0 swapper 5.52%
0 synergys 0.92%
1 firefox 4.54%
After:
# CPU Command Overhead
# ... ............... ........
#
0 firefox 2.65%
0 kworker/0:0 1.45%
0 swapper 5.52%
0 synergys 0.92%
1 firefox 4.54%
Signed-off-by: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
When it converted sort entries to hpp formats, it missed se->elide
handling, so add it for compatibility.
Signed-off-by: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
Currently, what the sort_entry does is just identifying hist entries
so that they can be grouped properly. However, with -F option
support, it indeed needs to sort entries appropriately to be shown to
users. So add ->sort() member to do it.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
The -F/--fields option is to allow user setup output field in any
order. It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
The output fields also affect sort order unless you give -s/--sort
option. And any keys specified on -s option, will also be added to
the output field list automatically.
$ perf report -F sym,sample,overhead
...
# Symbol Samples Overhead
# .......................... ............ ........
#
[.] __cxa_atexit 2 2.50%
[.] __libc_csu_init 4 5.00%
[.] __new_exitfn 3 3.75%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_name_match_p 4 5.00%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start 1 1.25%
[.] _init 5 6.25%
[.] _setjmp 6 7.50%
[.] a 8 10.00%
[.] b 8 10.00%
[.] brk 1 1.25%
[.] c 8 10.00%
Note that, the example output above is captured after applying next
patch which fixes sort/comparing behavior.
Requested-by: Ingo Molnar <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
The perf uses different default sort orders for different use-cases,
and this was scattered throughout the code. Add get_default_sort_
order() function to handle this and change initial value of sort_order
to NULL to distinguish it from user-given one.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
Add overhead{,_sys,_us,_guest_sys,_guest_us}, sample and period sort
keys so that they can be selected with --sort/-s option.
$ perf report -s period,comm --stdio
...
# Overhead Period Command
# ........ ............ ...............
#
47.06% 152 swapper
13.93% 45 qemu-system-arm
12.38% 40 synergys
3.72% 12 firefox
2.48% 8 xchat
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
This is a preparation of consolidating management of output field and
sort keys.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
|
|
The commit 09600e0f9ebb ("perf tools: Compare dso's also when comparing
symbols") added a comparison of dso when comparing symbol.
But if the sort key already has dso, it doesn't need to do it again
since entries have a different dso already filtered out.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rodrigo Campos <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If a hist entry doesn't have symbol information, compare it with its
address. Currently it only compares its level or whether it's NULL.
This can lead to an undesired result like an overhead exceeds 100%
especially when callchain accumulation is enabled by later patch.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rodrigo Campos <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If given sort keys are all elided there'll be no output except for the
overhead column - actually the TUI shows a noisy output. In this case
it'd be better to show up the sort keys rather than elide.
Before:
$ perf report -s comm -c perf
(...)
# Overhead
# ........
#
100.00%
After:
$ perf report -s comm -c perf
(...)
# Overhead Command
# ........ .......
#
100.00% perf
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Us curly braces around multi-line statements, as requested by Ingo Molnar ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
At insert time, a hist entry should reference comm at the time otherwise
it'll get the last comm anyway.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
[ Fixed up const pointer issues ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now that comm strings are allocated only once and refcounted to be shared
among threads, these can now be safely compared by addresses. This
should remove most hists collapses on post processing.
Signed-off-by: Frederic Weisbecker <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
As the thread comm is going to be implemented by way of a more
complicated data structure than just a pointer to a string from the
thread struct, convert the readers of comm to use an accessor instead of
accessing it directly.
The accessor will be later overriden to support an enhanced comm
implementation.
Signed-off-by: Frederic Weisbecker <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
[ Rename thread__comm_curr() to thread__comm_str() ]
Signed-off-by: Namhyung Kim <[email protected]>
[ Fixed up some minor const pointer issues ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As suggested by tglx, 'self' should be replaced by something that is
more useful.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|