Age | Commit message (Collapse) | Author | Files | Lines |
|
Linus reported that sometimes 'perf report -s symbol' exits without any
message on TUI. David and Jiri found that it's because it failed to add
a hist entry due to an invalid symbol length.
It turns out that sorting by symbol (address) was broken since it only
compares symbol addresses. The symbol address is a relative address
within a dso thus just checking its address can result in merging
unrelated symbols together. Fix it by checking dso before comparing
symbol address.
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently the srcline sort key compares ip rather than srcline info. I
guess this was due to a performance reason to run external addr2line
utility. Now we have implemented the functionality inside, use the
srcline info when comparing hist entries.
Also constantly print "??:0" string for unknown srcline rather than
printing ip.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is a preparation of next change. No functional changes are
intended.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No need to call addr2line since they don't have such information.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently external addr2line tool is used for srcline sort key and
annotate with srcline info. Separate the common code to prepare
upcoming enhancements.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the hist_entry__srcline_snprintf(), path and self->srcline are
pointing the same memory region, but they are doubly allocated.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support for recording and displaying the transaction flags.
They are essentially a new sort key. Also display them
in a nice way to the user.
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Extend the perf branch sorting code to support sorting by in_tx
or abort_tx qualifiers. Also print out those qualifiers.
This also fixes up some of the existing sort key documentation.
We do not support no_tx here, because it's simply not showing
the in_tx flag.
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This is a partial revert of Namhyung's patch
afab87b91f3f331d55664172dad8e476e6ffca9d
perf sort: Separate out memory-specific sort keys
He wrote
For global/local weights, I'm not entirely sure to place them into the
memory dimension. But it's the only user at this time.
Well TSX is another (in fact the original) user of the flags, and it
needs them to be common. So move local/global weight back to the common
sort keys.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For example, in an application with an expensive function implemented
with deeply nested recursive calls, the default call-graph presentation
is dominated by the different callchains within that function. By
ignoring these callees, we can collect the callchains leading into the
function and compactly identify what to blame for expensive calls.
For example, in this report the callers of garbage_collect() are
scattered across the tree:
$ perf report -d ruby 2>- | grep -m10 ^[^#]*[a-z]
22.03% ruby [.] gc_mark
--- gc_mark
|--59.40%-- mark_keyvalue
| st_foreach
| gc_mark_children
| |--99.75%-- rb_gc_mark
| | rb_vm_mark
| | gc_mark_children
| | gc_marks
| | |--99.00%-- garbage_collect
If we ignore the callees of garbage_collect(), its callers are coalesced:
$ perf report --ignore-callees garbage_collect -d ruby 2>- | grep -m10 ^[^#]*[a-z]
72.92% ruby [.] garbage_collect
--- garbage_collect
vm_xmalloc
|--47.08%-- ruby_xmalloc
| st_insert2
| rb_hash_aset
| |--98.45%-- features_index_add
| | rb_provide_feature
| | rb_require_safe
| | vm_call_method
Signed-off-by: Greg Price <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Cc: Fengguang Wu <[email protected]>
[ remove spaces at beginning of line, reported by Fengguang Wu ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As evident from 'machine__process_fork_event()' and
'machine__process_exit_event()' the 'pid' member of struct thread is
actually the tid.
Rename 'pid' to 'tid' in struct thread accordingly.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The sort__has_sym variable is set only if a symbol-related sort key was
added. Since branch stack and memory sort dimensions are separated, it
doesn't need to be checked from common dimension.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The same code was duplicate to places, factor them out to common
sort__setup_elide().
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since they're used only for perf mem, separate out them to a different
dimension so that normal user cannot access them by any chance.
For global/local weights, I'm not entirely sure to place them into the
memory dimension. But it's the only user at this time.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Let's remove duplicate code.
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It's used for determining current sort mode which can be one of
NORMAL, BRANCH and new MEMORY.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When -v option is given, the symbol sort key prints its address also but
it wasn't properly aligned since hists__calc_col_len() misses the
additional part. Also it missed 2 spaces for 0x prefix when printing.
$ perf report --stdio -v -s sym
# Samples: 133 of event 'cycles'
# Event count (approx.): 50536717
#
# Overhead Symbol
# ........ ..............................
#
12.20% 0xffffffff81384c50 v [k] intel_idle
7.62% 0xffffffff8170976a v [k] ftrace_caller
7.02% 0x2d986d B [.] 0x00000000002d986d
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The symbol addresses in a dso have relative offsets from the start of a
mapping. So in order to ouput correct offset value from @ip, one of
them should be converted.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds the sorting and histogram support
functions to enable profiling of memory accesses.
The following sorting orders are added:
- symbol_daddr: data address symbol (or raw address)
- dso_daddr: data address shared object
- locked: access uses locked transaction
- tlb : TLB access
- mem : memory level of the access (L1, L2, L3, RAM, ...)
- snoop: access snoop mode
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: changed to cope with fc5871ed, the move of methods to
machine.[ch], and the rename of dsrc to data_src, to match the change
made in the PERF_SAMPLE_DSRC in a previous patch. ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf record has a new option -W that enables weightened sampling.
Add sorting support in top/report for the average weight per sample and the
total weight sum. This allows to both compare relative cost per event
and the total cost over the measurement period.
Add the necessary glue to perf report, record and the library.
v2: Merge with new hist refactoring.
v3: Fix manpage. Remove value check.
Rename global_weight to weight and weight to local_weight.
v4: Readd sort keys to manpage
v5: Move weight to end
v6: Move weight to template
v7: Rename weight key.
Original patch from Andi modified by Stephane Eranian <[email protected]>
to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: changed to cope with fc5871ed and the hists_link perf test entry ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When setup_sorting() is called, 'str' is passed to strtok_r() but it's
not checked to have a valid pointer. As strtok_r() accepts NULL pointer
on a first argument and use the third argument in that case, it can
cause a trouble since our third argument, tmp, is not initialized.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently the setup_sorting() is called for parsing sort keys and exits
if it failed to add the sort key. As it's included in libperf it'd be
better returning an error code rather than exiting application inside of
the library.
Signed-off-by: Namhyung Kim <[email protected]>
Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current _sort__sym_cmp() function is used for comparing symbols between
two hist entries on symbol, symbol_from and symbol_to sort keys. Those
functions pass addresses of symbols but it's meaningless since it gets
over-written inside of the _sort__sym_cmp function to a start address of
the symbol. So just get rid of them.
This might cause a difference than prior output for branch stacks since
it seems not using start address of the symbol but branch address.
However AFAICS it'd be same as it gets overwritten anyway.
Also remove redundant part of code in sort__sym_cmp().
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
cppcheck message:
[tools/perf/util/sort.c:277]: (error) Mismatching allocation and deallocation: fp
Also fix descriptor leak on error and always initialize the "fp" variable.
Signed-off-by: Thomas Jarosch <[email protected]>
Link: http://lkml.kernel.org/r/1359112354.yZcisNZ4k0@storm
Link: http://lkml.kernel.org/r/2266358.qvDXKLvJ67@storm
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current perf report gets segmentation fault when a branch stack specific
sort key is provided by --sort option to a perf.data file which contains
no branch infomation. It's because those sort keys reference branch
info of a hist entry unconditionally. Maybe we can change it checks
whether such branch info is valid or not. But if the branch stacks are
not recorded, it'd be nop. Thus it'd be better to make those keys are
unselectable.
This patch separates those keys to a different dimension array, so that
if user passes such a key to a file which has no branch stack will get
following message rather than a segfault.
Error: Invalid --sort key: `symbol_from'
Signed-off-by: Namhyung Kim <[email protected]>
Reported-by: Stefan Beller <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It doesn't need to compare to every sort key names since the index
already has the required information.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since cpu number is a natural number, it'd be more appropriate
aligning it to right.
Before:
# Overhead CPU Command: Pid Shared Object
# ........ ... ................. .....................
#
8.91% 8 gnome-shell: 1497 perf-1497.map
8.90% 7 gnome-shell: 1497 perf-1497.map
8.86% 9 gnome-shell: 1497 perf-1497.map
8.83% 6 gnome-shell: 1497 perf-1497.map
8.81% 10 gnome-shell: 1497 perf-1497.map
7.44% 5 gnome-shell: 1497 perf-1497.map
6.20% 3 gnome-shell: 1497 perf-1497.map
5.10% 0 gnome-shell: 1497 perf-1497.map
After:
# Overhead CPU Command: Pid Shared Object
# ........ ... ................. .....................
#
8.91% 8 gnome-shell: 1497 perf-1497.map
8.90% 7 gnome-shell: 1497 perf-1497.map
8.86% 9 gnome-shell: 1497 perf-1497.map
8.83% 6 gnome-shell: 1497 perf-1497.map
8.81% 10 gnome-shell: 1497 perf-1497.map
7.44% 5 gnome-shell: 1497 perf-1497.map
6.20% 3 gnome-shell: 1497 perf-1497.map
5.10% 0 gnome-shell: 1497 perf-1497.map
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The "pid" sort key prints "Command: Pid" output but it's misaligned.
It's because of the offset of 6 was added to the column length during
the calculation in order to reserve an space for Pid part but it isn't
honored when printed. The output before this patch was like this:
# Overhead Command: Pid Shared Object
# ........ ............. .................
#
99.70% noploop:17814 noploop
0.29% noploop:17814 [kernel.kallsyms]
0.01% noploop:17814 ld-2.15.so
Fix it by subtracting 6 for printing comm part.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some functions have set __maybe_unused on its arguments that are used
actually. Remove them.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some functions are misplaced along with other entries. Move them to a
right place so that it can be found together with related functions.
No functional change intended.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We already check that sym_l and sum_r are non-NULLs, no need to do it
twice.
Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When using the srcline sort key with perf report, I see many lines of
warning related to JIT samples like below:
addr2line: '/tmp/perf-1397.map': No such file
Since it's not a ELF binary and doesn't provide such information, just
use the raw ip address.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Irina Tirdea <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The srcline sort key is for grouping samples based on their source file
and line number. It use addr2line tool to get the information but it
requires dso name. It caused a segfault when a sample does not have the
name by dereferencing a NULL pointer. Fix it by using raw ip addresses
for those samples.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The sort__has_sym variable is for checking whether the sort_list
includes 'symbol' as a sort key. It will be used for later patch.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf defines both __used and __unused variables to use for marking
unused variables. The variable __used is defined to
__attribute__((__unused__)), which contradicts the kernel definition to
__attribute__((__used__)) for new gcc versions. On Android, __used is
also defined in system headers and this leads to warnings like: warning:
'__used__' attribute ignored
__unused is not defined in the kernel and is not a standard definition.
If __unused is included everywhere instead of __used, this leads to
conflicts with glibc headers, since glibc has a variables with this name
in its headers.
The best approach is to use __maybe_unused, the definition used in the
kernel for __attribute__((unused)). In this way there is only one
definition in perf sources (instead of 2 definitions that point to the
same thing: __used and __unused) and it works on both Linux and Android.
This patch simply replaces all instances of __used and __unused with
__maybe_unused.
Signed-off-by: Irina Tirdea <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The repsep_snprintf function was still using standalone field_sep, which
not even set anymore.
Replacing it with 'symbol_conf.field_sep'.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Using addr2line for now, requires debuginfo, needs more work to support
detached debuginfo, aka foo-debuginfo packages.
Example:
[root@sandy ~]# perf record -a sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.555 MB perf.data (~24236 samples) ]
[root@sandy ~]# perf report -s dso,srcline 2>&1 | grep -v ^# | head -5
22.41% [kernel.kallsyms] /home/git/linux/drivers/idle/intel_idle.c:280
4.79% [kernel.kallsyms] /home/git/linux/drivers/cpuidle/cpuidle.c:148
4.78% [kernel.kallsyms] /home/git/linux/arch/x86/include/asm/atomic64_64.h:121
4.49% [kernel.kallsyms] /home/git/linux/kernel/sched/core.c:1690
4.30% [kernel.kallsyms] /home/git/linux/include/linux/seqlock.h:90
[root@sandy ~]#
[root@sandy ~]# perf top -U -s dso,symbol,srcline
Samples: 1K of event 'cycles', Event count (approx.): 589617389
18.66% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:143
7.83% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:39
6.59% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:38
3.66% [kernel] [k] page_fault /home/git/linux/arch/x86/kernel/entry_64.S:1379
3.25% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:40
3.12% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:37
2.74% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:36
2.39% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:43
2.12% [kernel] [k] ioread32 /home/git/linux/lib/iomap.c:90
1.51% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:144
1.19% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:154
Suggested-by: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
...
static_key_slow_inc();
...
static_key_slow_inc();
...
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
|
|
I have a workload where perf top scribbles over the stack and we SEGV.
What makes it interesting is that an snprintf is causing this.
The workload is a c++ gem that has method names over 3000 characters
long, but snprintf is designed to avoid overrunning buffers. So what
went wrong?
The problem is we assume snprintf returns the number of characters
written:
ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", self->level);
...
ret += repsep_snprintf(bf + ret, size - ret, "%s", self->ms.sym->name);
Unfortunately this is not how snprintf works. snprintf returns the
number of characters that would have been written if there was enough
space. In the above case, if the first snprintf returns a value larger
than size, we pass a negative size into the second snprintf and happily
scribble over the stack. If you have 3000 character c++ methods thats a
lot of stack to trample.
This patch fixes repsep_snprintf by clamping the value at size - 1 which
is the maximum snprintf can write before adding the NULL terminator.
I get the sinking feeling that there are a lot of other uses of snprintf
that have this same bug, we should audit them all.
Cc: David Ahern <[email protected]>
Cc: Eric B Munson <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yanmin Zhang <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20120307114249.44275ca3@kryten
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch enhances perf report to auto-detect when the
perf.data file contains samples with branch stacks. That way it
is not necessary to use the -b option.
To force branch view mode to off, simply use --no-branch-stack.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch adds:
- ability to parse samples with PERF_SAMPLE_BRANCH_STACK
- sort on branches (dso_from, symbol_from, dso_to, symbol_to, mispredict)
- build histograms on branches
Signed-off-by: Roberto Agostino Vitillo <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
for threads
And also no leed to show the [.] (level: k, . for userspace) when
showing just one DSO.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
I took a profile that suggested 60% of total CPU time was in the
hypervisor:
...
60.20% [H] 0x33d43c
4.43% [k] ._spin_lock_irqsave
1.07% [k] ._spin_lock
Using perf stat to get the user/kernel/hypervisor breakdown contradicted
this.
The problem is we merge all unresolved samples into the one unknown
bucket. If add a comparison by sample type to sort__sym_cmp we get the
real picture:
...
57.11% [.] 0x80fbf63c
4.43% [k] ._spin_lock_irqsave
1.07% [k] ._spin_lock
0.65% [H] 0x33d43c
So it was almost all userspace, not hypervisor as the initial profile
suggested.
I found another issue while adding this. Symbol sorting sometimes shows
multiple entries for the unknown bucket:
...
16.65% [.] 0x6cd3a8
7.25% [.] 0x422460
5.37% [.] yylex
4.79% [.] malloc
4.78% [.] _int_malloc
4.03% [.] _int_free
3.95% [.] hash_source_code_string
2.82% [.] 0x532908
2.64% [.] 0x36b538
0.94% [H] 0x8000000000e132a4
0.82% [H] 0x800000000000e8b0
This happens because we aren't consistent with our sorting. On
one hand we check to see if both symbols match and for two unresolved
samples sym is NULL so we match:
if (left->ms.sym == right->ms.sym)
return 0;
On the other hand we use sample IP for unresolved samples when
comparing against a symbol:
ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
This means unresolved samples end up spread across the rbtree and we
can't merge them all.
If we use cmp_null all unresolved samples will end up in the one bucket
and the output makes more sense:
...
39.12% [.] 0x36b538
5.37% [.] yylex
4.79% [.] malloc
4.78% [.] _int_malloc
4.03% [.] _int_free
3.95% [.] hash_source_code_string
2.26% [H] 0x800000000000e8b0
Acked-by: Eric B Munson <[email protected]>
Cc: Eric B Munson <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ian Munsie <[email protected]>
Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So that the parent sort dimension can be registered twice: once
if we add it as an explicit sort dimension (-s parent) and twice
if we request a parent filter (-p foo).
We'll have only one parent sort dimension in the end but this
allows to override the default parent filter with we gave in "-p"
option. The goal of this is to prepare to allow the use of
"-s parent" and "-p foo" at the same time, ie: sort by filtered
parent.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Sam Liao <[email protected]>
|
|
These don't need to be globally visible.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Sam Liao <[email protected]>
|
|
In the event that a DSO has not been identified, just print out [unknown]
instead of the instruction pointer as we previously were doing, which is pretty
meaningless for a shared object (at least to the users perspective).
The IP we print out is fairly meaningless in general anyway - it's just one
(the first) of the many addresses that were lumped together as unidentified,
and could span many shared objects and symbols. In reality if we see this
[unknown] output then the report -D output is going to be more useful anyway as
we can see all the different address that it represents.
If we are printing the symbols we are still going to see this IP in that column
anyway since they shouldn't resolve either.
This patch also changes the symbol address printouts so that they print out 0x
before the address, are left aligned, and changes the %L format string (which
relies on a glibc bug) to %ll.
Before:
74.11% :3259 4a6c [k] 4a6c
After:
74.11% :3259 [unknown] [k] 0x4a6c
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ian Munsie <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
By using BITS_PER_LONG/4 as the width specifier.
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
They were globals, and since we support multiple hists and sessions
at the same time, it doesn't make sense to calculate those values
considereing all symbols in all sessions.
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In a shared multi-core environment, users want to analyze why their
program was slow. In particular, if the code ran slower only on certain
CPUs due to interference from other programs or kernel threads, the user
should be able to notice that.
Sample usage:
perf record -f -a -- sleep 3
perf report --sort cpu,comm
Workload:
program is running on 16 CPUs
Experiencing interference from an antagonist only on 4 CPUs.
Samples: 106218177676 cycles
Overhead CPU Command
........ ... ...............
6.25% 2 program
6.24% 6 program
6.24% 11 program
6.24% 5 program
6.24% 9 program
6.24% 10 program
6.23% 15 program
6.23% 7 program
6.23% 3 program
6.23% 14 program
6.22% 1 program
6.20% 13 program
3.17% 12 program
3.15% 8 program
3.14% 0 program
3.13% 4 program
3.11% 4 antagonist
3.11% 0 antagonist
3.10% 8 antagonist
3.07% 12 antagonist
Cc: David S. Miller <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Arun Sharma <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
tools is to set something that has an enum type, that is builtin
compatible with unsigned int.
Several string constifications were done to make OPT_STRING require a
const char * type.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|