Age | Commit message (Collapse) | Author | Files | Lines |
|
And add the annotation output knobs to all the tools that have
integrated annotation (top, report).
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The goal of this patch is to include more information about the host
environment into the perf.data so it is more self-descriptive. Overtime,
profiles are captured on various machines and it becomes hard to track
what was recorded, on what machine and when.
This patch provides a way to solve this by extending the perf.data file
with basic information about the host machine. To add those extensions,
we leverage the feature bits capabilities of the perf.data format. The
change is backward compatible with existing perf.data files.
We define the following useful new extensions:
- HEADER_HOSTNAME: the hostname
- HEADER_OSRELEASE: the kernel release number
- HEADER_ARCH: the hw architecture
- HEADER_CPUDESC: generic CPU description
- HEADER_NRCPUS: number of online/avail cpus
- HEADER_CMDLINE: perf command line
- HEADER_VERSION: perf version
- HEADER_TOPOLOGY: cpu topology
- HEADER_EVENT_DESC: full event description (attrs)
- HEADER_CPUID: easy-to-parse low level CPU identication
The small granularity for the entries is to make it easier to extend
without breaking backward compatiblity. Many entries are provided as
ASCII strings.
Perf report/script have been modified to print the basic information as
easy-to-parse ASCII strings. Extended information about CPU and NUMA
topology may be requested with the -I option.
Thanks to David Ahern for reviewing and testing the many versions of
this patch.
$ perf report --stdio
# ========
# captured on : Mon Sep 26 15:22:14 2011
# hostname : quad
# os release : 3.1.0-rc4-tip
# perf version : 3.1.0-rc4
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
# cpuid : GenuineIntel,6,15,11
# total memory : 8105360 kB
# cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
...
$ perf report --stdio -I
# ========
# captured on : Mon Sep 26 15:22:14 2011
# hostname : quad
# os release : 3.1.0-rc4-tip
# perf version : 3.1.0-rc4
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
# cpuid : GenuineIntel,6,15,11
# total memory : 8105360 kB
# cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
# sibling cores : 0-3
# sibling threads : 0
# sibling threads : 1
# sibling threads : 2
# sibling threads : 3
# node0 meminfo : total = 8320608 kB, free = 7571024 kB
# node0 cpu list : 0-3
# ========
#
...
Reviewed-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
Signed-off-by: Stephane Eranian <[email protected]>
[ committer notes: Use --show-info in the tools as was in the docs, rename
perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
conflict with f69b64f7 "perf: Support setting the disassembler style" ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This allows passing a timer to be run periodically, which will update
the hists tree that then gers refreshed on the screen, just like the
Live mode (symbol entries, annotation) we already have in 'perf top
--tui'.
Will be used by the new hist_entry/hists based 'top' tool.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Just like --show-nr-samples, to help in diagnosing problems in the
tools.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So that we can reuse hists__fprintf for in the new perf top tool.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the past we tried to avoid printing the name of the event when just
one event was found in the perf.data file, after some refactorings it
ended up not printing the event name if just one hist_entry was found in
one of the events.
Fix it by always printing the name of the event, even if just one is
found.
Reported-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add -M option to report/annotate to pass directly to objdump. This
allows to use -M intel for intel style disassembler syntax, which is
useful for people who are very used to the Intel syntax.
Link: http://lkml.kernel.org/r/[email protected]
[committer note: Add missing Documentation bits, fixup conflicts with 3e6a2a7]
Cc: Frederic Weisbecker <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So that we get a proper warning in the TUI in cases like:
$ perf report --stdio -g fractal,0.5,caller --sort pid
Selected -g but no callchain data. Did you call 'perf record' without -g?
$
The --stdio case is ok because it uses fprintf, ui__warning is needed to
figure out if --stdio or --tui is being used.
Cc: Arun Sharma <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sam Liao <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add an option to perf report/annotate/script to specify which
CPUs to operate on. This enables us to take a single system wide
profile and analyse each CPU (or group of CPUs) in isolation.
This was useful when profiling a multiprocess workload where the
bottleneck was on one CPU but this was hidden in the overall
profile. Per process and per thread breakdowns didn't help
because multiple processes were running on each CPU and no
single process consumed an entire CPU.
The patch converts the list of CPUs returned by cpu_map__new
into a bitmap for fast lookup. I wanted to use -C to be
consistent with perf top/record/stat, but unfortunately perf
report already uses -C <comms>.
v2: Incorporate suggestions from David Ahern:
- Added -c to perf script
- Check that SAMPLE_CPU is set when -c is used
- Update documentation
v3: Create perf_session__cpu_bitmap()
Signed-off-by: Anton Blanchard <[email protected]>
Acked-by: David Ahern <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We don't need to display the parent field if the parent
sorting machinery is only used for parent filtering
(as in "-p foo").
However if parent filtering is used in combination with
explicit parent sorting ( -s parent), we want to
display it.
Result with:
perf report -p kernel_thread -s parent
Before:
# Overhead Parent symbol
# ........ .............
#
0.07%
|
--- ioread8
ata_sff_check_status
ata_sff_tf_load
ata_sff_qc_issue
ata_bmdma_qc_issue
ata_qc_issue
ata_scsi_translate
ata_scsi_queuecmd
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
__make_request
generic_make_request
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
kernel_thread_helpe
After:
# Overhead Parent symbol
# ........ .............
#
0.07% kernel_thread_helper
|
--- ioread8
ata_sff_check_status
ata_sff_tf_load
ata_sff_qc_issue
ata_bmdma_qc_issue
ata_qc_issue
ata_scsi_translate
ata_scsi_queuecmd
scsi_dispatch_cmd
scsi_request_fn
__blk_run_queue
__make_request
generic_make_request
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
kernel_thread_helper
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Sam Liao <[email protected]>
|
|
Add "caller/callee" option to support inverted butterfly report,
in the inverted report (with caller option), the call graph start
from the callee's ancestor. Users can use such view to catch system's
performance bottleneck from a sysprof like view. Using this option
with specified sort order like pid gives us high level view of call
graph statistics.
Also add "-G" alias for inverted call graph.
Signed-off-by: Sam Liao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Suggested-by: Ingo Molnar <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf uses /proc/modules to figure out where kernel modules are loaded.
With the advent of kptr_restrict, non root users get zeroes for all module
start addresses.
So check if kptr_restrict is non zero and don't generate the syntethic
PERF_RECORD_MMAP events for them.
Warn the user about it in perf record and in perf report.
In perf report the reference relocation symbol being zero means that
kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
use it to fixup symbol addresses when using a valid kallsyms (in the buildid
cache) or vmlinux (in the vmlinux path) build-id located automatically or
specified by the user.
Provide an explanation about it in 'perf report' if kernel samples were taken,
checking if a suitable vmlinux or kallsyms was found/specified.
Restricted /proc/kallsyms don't go to the buildid cache anymore.
Example:
[acme@emilia ~]$ perf record -F 100000 sleep 1
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
/proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux file is
not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved even
with a suitable vmlinux or kallsyms file.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
[acme@emilia ~]$
[acme@emilia ~]$ perf report --stdio
Kernel address maps (/proc/{kallsyms,modules}) were restricted,
check /proc/sys/kernel/kptr_restrict before running 'perf record'.
If some relocation was applied (e.g. kexec) symbols may be misresolved.
Samples in kernel modules can't be resolved as well.
# Events: 13 cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .....................
#
20.24% sleep [kernel.kallsyms] [k] page_fault
20.04% sleep [kernel.kallsyms] [k] filemap_fault
19.78% sleep [kernel.kallsyms] [k] __lru_cache_add
19.69% sleep ld-2.12.so [.] memcpy
14.71% sleep [kernel.kallsyms] [k] dput
4.70% sleep [kernel.kallsyms] [k] flush_signal_handlers
0.73% sleep [kernel.kallsyms] [k] perf_event_comm
0.11% sleep [kernel.kallsyms] [k] native_write_msr_safe
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
[acme@emilia ~]$
This is because it found a suitable vmlinux (build-id checked) in
/lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
file name).
If we remove that file from the vmlinux path:
[root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
/lib/modules/2.6.39-rc7+/build/vmlinux.OFF
[acme@emilia ~]$ perf report --stdio
[kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
not found, continuing without symbols
Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
/proc/sys/kernel/kptr_restrict before running 'perf record'.
As no suitable kallsyms nor vmlinux was found, kernel samples can't be
resolved.
Samples in kernel modules can't be resolved as well.
# Events: 13 cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ......
#
80.31% sleep [kernel.kallsyms] [k] 0xffffffff8103425a
19.69% sleep ld-2.12.so [.] memcpy
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
[acme@emilia ~]$
Reported-by: Stephane Eranian <[email protected]>
Suggested-by: David Miller <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Miller <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Resolving the sample->id to an evsel since the most advanced tools,
report and annotate, and the others will too when they evolve to
properly support multi-event perf.data files.
Good also because it does an extra validation, checking that the ID is
valid when present. When that is not the case, the overhead is just a
branch + function call (perf_evlist__id2evsel).
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So that we can reuse things like the id to attr lookup routine
(perf_evlist__id2evsel) that uses a hash table instead of the linear
lookup done in the older perf_header_attr routines, etc.
Also to make evsels/evlist more pervasive an API, simplyfing using the
emerging perf lib.
cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When multiple events were used in 'perf record', allow the user to
choose which one is wanted before showing the per event histograms.
Annotations will be performed on the chosen event.
Allow going back and forth from event to event quickly using just the
arrow keys and enter.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: William Cohen <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
By creating an perf_evlist out of the attributes in the perf.data file
header, so that we can use evlists and evsels when reading recorded
sessions in addition to when we record sessions.
More work is needed to allow tools to allow the user to select which
events are wanted when browsing sessions, be it just one or a subset of
them, aggregated or showed at the same time but with different
indications on the UI to allow seeing workloads thru different views at
the same time.
But the overall goal/trend is to more uniformly use evsels and evlists.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
[root@emilia ~]# perf report --stdio
The perf.data file has no samples!
[root@emilia ~]#
The TUI shows a popup warning message with the same message.
Reported-by: Ingo Molnar <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We only allocate it when in TUI mode. In --stdio mode unconditionally
initializing this area leads to memory corruption.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since we'll need it when implementing the live annotate TUI browser.
This also simplifies things a bit by having the list head for the source
code to be in the dynamicly allocated part of struct annotation, that
way we don't have to pass it around, it can be found from the struct
symbol that is passed everywhere.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf annotate tool continues aggregating everything on just one
histograms, but to support the top model add support for one histogram
perf evsel in the evlist.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
They will be used by perf top, so that we have just one set of routines
to do annotation.
Rename "struct sym_priv" to "struct annotation", etc, to clarify this
code a bit.
Rename "struct sym_ext" to "struct source_line", to give it a meaningful
name, that clarifies that it is a the result of an addr2line call, that
is sorted by percentage one particular source code line appeared in the
annotation.
And since we're moving things around also rename 'sym_hist->ip' to
'sym_hist->addr' as we want to do data structure annotation at some
point.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Because in tools like 'top' we don't want the pager.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
And move the event_t methods to the perf_event__ too.
No code changes, just namespace consistency.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Making the namespace more uniform.
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To make the callchain API naming more consistent.
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The callchains are fed with an array of a fixed size.
As a result we iterate over each callchains three times:
- 1st to resolve symbols
- 2nd to filter out context boundaries
- 3rd for the insertion into the tree
This also involves some pairs of memory allocation/deallocation
everytime we insert a callchain, for the filtered out array of
addresses and for the array of symbols that comes along.
Instead, feed the callchains through a linked list with persistent
allocations. It brings several pros like:
- Merge the 1st and 2nd iterations in one. That was possible before
but in a way that would involve allocating an array slightly taller
than necessary because we don't know in advance the number of context
boundaries to filter out.
- Much lesser allocations/deallocations. The linked list keeps
persistent empty entries for the next usages and is extendable at
will.
- Makes it easier for multiple sources of callchains to feed a
stacktrace together. This is deemed to pave the way for cfi based
callchains wherein traditional frame pointer based kernel
stacktraces will precede cfi based user ones, producing an overall
callchain which size is hardly predictable. This requirement
makes the static array obsolete and makes a linked list based
iterator a much more flexible fit.
Basic testing on a big perf file containing callchains (~ 176 MB)
has shown a throughput gain of about 11% with perf report.
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Using %L[uxd] has issues in some architectures, like on ppc64. Fix it
by making our 64 bit integers typedefs of stdint.h types and using
PRI[ux]64 like, for instance, git does.
Reported by Denis Kirjanov that provided a patch for one case, I went
and changed all cases.
Reported-by: Denis Kirjanov <[email protected]>
Tested-by: Denis Kirjanov <[email protected]>
LKML-Reference: <[email protected]>
Cc: Denis Kirjanov <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pingtian Han <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The symfs argument allows analysis of perf.data file using a locally accessible
filesystem tree with debug symbols - e.g., tree created during image builds,
sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
with an OS tree can be analyzed from anywhere without the need to populate a
local data store with build-ids.
Commiter notes:
o Fixed up symfs="/" variants handling.
o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
outside the symfs directory.
LKML-Reference: <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch changes perf report to ask for the ID info on all events be
default if recording from multiple CPUs.
Perf report, annotate and diff will now process the events in order if
the kernel is able to provide timestamps on all events. This ensures
that events such as COMM and MMAP which are necessary to correctly
interpret samples are processed prior to those samples so that they are
attributed correctly.
Before:
# perf record ./cachetest
# perf report
# Events: 6K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
74.11% :3259 [unknown] [k] 0x4a6c
1.50% cachetest ld-2.11.2.so [.] 0x1777c
1.46% :3259 [kernel.kallsyms] [k] .perf_event_mmap_ctx
1.25% :3259 [kernel.kallsyms] [k] restore
0.74% :3259 [kernel.kallsyms] [k] ._raw_spin_lock
0.71% :3259 [kernel.kallsyms] [k] .filemap_fault
0.66% :3259 [kernel.kallsyms] [k] .memset
0.54% cachetest [kernel.kallsyms] [k] .sha_transform
0.54% :3259 [kernel.kallsyms] [k] .copy_4K_page
0.54% :3259 [kernel.kallsyms] [k] .find_get_page
0.52% :3259 [kernel.kallsyms] [k] .trace_hardirqs_off
0.50% :3259 [kernel.kallsyms] [k] .__do_fault
<SNIP>
After:
# perf report
# Events: 6K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...............................
#
44.28% cachetest cachetest [.] sumArrayNaive
22.53% cachetest cachetest [.] sumArrayOptimal
6.59% cachetest ld-2.11.2.so [.] 0x1777c
2.13% cachetest [unknown] [k] 0x340
1.46% cachetest [kernel.kallsyms] [k] .perf_event_mmap_ctx
1.25% cachetest [kernel.kallsyms] [k] restore
0.74% cachetest [kernel.kallsyms] [k] ._raw_spin_lock
0.71% cachetest [kernel.kallsyms] [k] .filemap_fault
0.66% cachetest [kernel.kallsyms] [k] .memset
0.54% cachetest [kernel.kallsyms] [k] .copy_4K_page
0.54% cachetest [kernel.kallsyms] [k] .find_get_page
0.54% cachetest [kernel.kallsyms] [k] .sha_transform
0.52% cachetest [kernel.kallsyms] [k] .trace_hardirqs_off
0.50% cachetest [kernel.kallsyms] [k] .__do_fault
<SNIP>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ian Munsie <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If we are running the new perf on an old kernel without support for
sample_id_all, we should fall back to the old unordered processing of
events. If we didn't than we would *always* process events without
timestamps out of order, whether or not we hit a reordering race. In
other words, instead of there being a chance of not attributing samples
correctly, we would guarantee that samples would not be attributed.
While processing all events without timestamps before events with
timestamps may seem like an intuitive solution, it falls down as
PERF_RECORD_EXIT events would also be processed before any samples.
Even with a workaround for that case, samples before/after an exec would
not be attributed correctly.
This patch allows commands to indicate whether they need to fall back to
unordered processing, so that commands that do not care about timestamps
on every event will not be affected. If we do fallback, this will print
out a warning if report -D was invoked.
This patch adds the test in perf_session__new so that we only need to
test once per session. Commands that do not use an event_ops (such as
record and top) can simply pass NULL in it's place.
Acked-by: Thomas Gleixner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ian Munsie <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is useful for analyzing a perf data file on a different system than
the one data was collected on and still include symbols from loaded
kernel modules in the output.
Commiter note: Updated the man page accordingly.
LKML-Reference: <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
At perf_session__process_event, so that we reduce the number of lines in eache
tool sample processing routine that now receives a sample_data pointer already
parsed.
This will also be useful in the next patch, where we'll allow sample the
identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu,
timestamp) just after before every event.
Also validate callchains in perf_session__process_event, i.e. as early as
possible, and keep a counter of the number of events discarded due to invalid
callchains, warning the user about it if it happens.
There is an assumption that was kept that all events have the same sample_type,
that will be dealt with in the future, when this preexisting limitation will be
removed.
Tested-by: Thomas Gleixner <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Ian Munsie <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Ian Munsie <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Do that to start a consistant callchain API namespace.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Hellwig <[email protected]>
|
|
Relying just on ~/.perfconfig or rebuilding the tool disabling support
for the TUI is too cumbersome, so allow specifying which UI to use and
make the command line switch override whatever is in ~/.perfconfig.
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When cmd_record exits the whole perf binary will exit right after,
so no need to traverse lots of complex data structures freeing them.
Sticked a comment for leak detectives and for a experiment with obstacks
to be performed so that we can speed up freeing that memory.
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Only in verbose mode so as not to bloat struct symbol too much.
The key used is '/', just like in vi, less, etc.
More work is needed to allocate space on the symbol in a more clear way.
This experiment shows how to do it for the hist_browser, in the main
window.
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Tidy-up patch to remove some code and struct perf_session data members
which are no longer needed due to the previous patch: "perf tools: Don't
abbreviate file paths relative to the cwd".
LKML-Reference: <new-submission>
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Merge reason: Pick up the latest perf fixes.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Hists have their hits increased by the event period. And this
period based counting is the foundation of all the stats in
perf report.
But callchains still use the raw number of hits, without taking
the period into account. So when we compute the percentage,
absolute based percentages are totally broken, and relative ones
too in the first parent level. Because we pass the number of events
muliplied by their period as the total number of hits to the
callchain filtering, while callchains expect this number to be
the number of raw hits.
perf report -g graph was simply not working, showing no graph unless
the min percent was zero. And even there the percentage of the
branches was always 0. And may be fractal filtering was broken on
the first branch level too.
flat also was broken, but it was hidden because of other breakages.
Anyway fix this by counting using periods on callchains.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
|
|
Simplifying the tools that were using both in sequence and allowing
upcoming simplifications, such as Arun's patch to sort by cpus.
Cc: David S. Miller <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It is useful to know on which CPU a sample was captured on.
The information is captured with perf record -R but it was
not printed out by perf report -D. This patch adds this.
When -R is not used, cpu is set to -1to indicate that
the CPU is unknown (it is not captured).
Cc: David S. Miller <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The hists__tty_browse_tree function was created with the loop to print
all events, and its equivalent, hists__tui_browse_tree, was created in a
similar fashion, where it is possible to switch among the multiple
events, if present, using TAB to go the next event, and shift+TAB
(UNTAB) to go to the previous.
The report TUI now shows as the window title the name of the event and a
leak was fixed wrt pstacks.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Using the same scheme as for git's/perf's pager setup, i.e. if one
doesn't want to, on a newt enabled perf binary, to disable the TUI for
'perf report', its just a matter of doing:
[root@doppio linux-2.6-tip]# printf "[tui]\n\nreport = off\n" >
/root/.perfconfig
[root@doppio linux-2.6-tip]# cat /root/.perfconfig
[tui]
report = off
[root@doppio linux-2.6-tip]#
System wide settings are also possible, by editing /etc/perfconfig, etc,
i.e. the git machinery for config files applies to perf as well, so when
in doubt where to put your settings, consult the git documentation, if
it fails, please let us know.
Suggested-by: Ingo Molnar <[email protected]>
Discussed-with: Stephane Eranian <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
tools is to set something that has an enum type, that is builtin
compatible with unsigned int.
Several string constifications were done to make OPT_STRING require a
const char * type.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Number of samples is meaningless after we switched to auto-freq, so
report the number of events, i.e. not the sum of the different periods,
but the number PERF_RECORD_SAMPLE emitted by the kernel.
While doing this I noticed that naming "count" to the sum of all the
event periods can be confusing, so rename it to .period, just like in
struct sample.data, so that we become more consistent.
This helps with the next step, that was to record in struct hist_entry
the number of sample events for each instance, we need that because we
use it to generate the number of events when applying filters to the
tree of hist entries like it is being done in the TUI report browser.
Suggested-by: Ingo Molnar <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The events_stats.total field is too generic, rename it to .total_period,
and also add a comment explaining that it is the sum of all the .period
fields in samples, that is needed because we use auto-freq to avoid
sampling artifacts.
Ditto for events_stats.lost, that is the sum of all lost_event.lost
fields, i.e. the number of events the kernel dropped.
Looking at the users, builtin-sched.c can make use of these fields and
stop doing it again.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is one more thing that started global but are more useful per hist
or per session.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now we don't anymore use popen to run 'perf annotate' for the selected
symbol, instead we collect per address samplings when processing samples
in 'perf report' if we're using the newt browser, then we use this data
directly to do annotation.
Done this way we can actually traverse the objdump_line objects
directly, matching the addresses to the collected samples and colouring
them appropriately using lower level slang routines.
The new ui_browser class will be reused for the main, callchain aware,
histogram browser, when it will be made generic and don't assume that
the objects are always instances of the objdump_line class maintained
using list_heads.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Those are really not specific to the newt code, can be used by other UI
frontends.
Cc: Frédéric Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|