Age | Commit message (Collapse) | Author | Files | Lines |
|
Provide the basic infrastructure to provide per task stats.
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The PERF_EVENT_READ implementation made me realize we don't
actually need the sample_type int the output sample, since
we already have that in the perf_counter_attr information.
Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the
event->type overloading, and imply put counter overflow
samples in a PERF_EVENT_SAMPLE type.
This also fixes the issue that event->type was only 32-bit
and sample_type had 64 usable bits.
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Create a structured file format that includes the full
perf_counter_attr and all its relevant counter IDs so that
the reporting program has full information.
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Martin Schwidefsky reported "perf report" symbol resolution
problems on S390.
Since we only report MMAP, not MUNMAP, we have to deal with
overlapping maps.
We used to simply throw out the old map on the assumption whole
maps got unmapped. This obviously doesn't deal with partial
unmaps. However it appears some dynamic linkers do fancy
partial unmaps (s390), so do something more elaborate and
truncate the old maps, only removing them when they've been
fully covered.
This resolves (part of) the S390 symbol resolution problems.
Reported-by: Martin Schwidefsky <[email protected]>
Tested-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Print more symbol relocation related info under -vv.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Building perfcounter tools raises the following warnings:
builtin-record.c: In function ‘atexit_header’:
builtin-record.c:464: erreur: ignoring return value of ‘pwrite’, declared with attribute warn_unused_result
builtin-record.c: In function ‘__cmd_record’:
builtin-record.c:503: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1403: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
This patch handles these IO return values.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long. This causes compiler warnings every time we
print a __u64 value with %Lx.
Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8. These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.
The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__". The other changes are:
* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
to kill two remaining warnings.
Signed-off-by: Paul Mackerras <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add a data file header so we can transfer data between record and report.
LKML-Reference: <new-submission>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Update the tools to reflect the new callchain sampling format.
LKML-Reference: <new-submission>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make it easier to use parent filtering - default to a filtered
output. Also add the parent column so that we get collapsing but
dont display it by default.
add --no-exclude-other to override this.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make use of the new ->data_tail mechanism to tell kernel-space
about user-space draining the data stream. Emit lost events
(and display them) if they happen.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Introduce isprint() to print out raw event dumps to ASCII, etc.
(This is an extension to upstream Git's ctype.c.)
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
[ removed openssl.h inclusion from util.h - it leaked ctype.h ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add boundary checks for call-chain events. In case of corrupted
entries we could crash otherwise.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
features
Instead of the ambigious 'call' naming use the much more
specific 'parent' naming:
- rename --call <regex> to --parent <regex>
- rename --sort call to --sort parent
- rename [unmatched] to [other] - to signal that this is not
an error but the inverse set
Also add pagefaults to the default parent-symbol pattern too,
as it's a 'syscall overhead category' in a sense.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The Git utils came with a ctype replacement that doesn't provide
isprint(). Add a replacement.
Solves a build bug on certain distros.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Implement sorting by callchain symbols, --sort <call>.
It will create a new column which will show a match to
--call $regex or "[unmatched]".
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Yong Wang reported the following compiler warning:
builtin-report.c: In function 'process_overflow_event':
builtin-report.c:984: error: cast to pointer from integer of different size
Which happens because we try to print ->ips[] out with a limited
format, losing the high 32 bits. Print it out using %016Lx instead.
Reported-by: Yong Wang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Take advantage of call-graph percounter sampling/recording to
display a non-trivial histogram: the true, collapsed/summarized
cost measurement, on a per system call total overhead basis:
aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
#
# (3536 samples)
#
# Overhead Symbol
# ........ ......
#
40.75% [k] sys_write
40.21% [k] sys_read
4.44% [k] do_nmi
...
This is done by accounting each (reliable) call-chain that chains back
to a given system call to that system call function.
[ So in the above example we can see that hackbench spends about 40% of
its total time somewhere in sys_write() and 40% somewhere in
sys_read(), the rest of the time is spent in user-space. The time
is not spent in sys_write() _itself_ but in one of its many child
functions. ]
Or, a recording of a (source files are already in the page-cache) kernel build:
$ perf record -g -m 512 -f -- make -j32 kernel
$ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi
4.14% [k] do_page_fault
1.20% [k] sys_write
1.10% [k] sys_open
0.63% [k] sys_exit_group
0.48% [k] smp_apic_timer_interrupt
0.37% [k] sys_read
0.37% [k] sys_execve
0.20% [k] sys_mmap
0.18% [k] sys_close
0.14% [k] sys_munmap
0.13% [k] sys_poll
0.09% [k] sys_newstat
0.07% [k] sys_clone
0.06% [k] sys_newfstat
0.05% [k] sys_access
0.05% [k] schedule
Shows the true total cost of each syscall variant that gets used
during a kernel build. This profile reveals it that pagefaults are
the costliest, followed by read()/write().
An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
per 100 seconds of kernel build-time. (this was done with HZ=1000)
The summary is done in 'perf report', i.e. in the post-processing
stage - so once we have a good call-graph recording, this type of
non-trivial high-level analysis becomes possible.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Pekka Enberg <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add the first steps of call-graph profiling:
- add the -c (--call-graph) option to perf record
- parse the call-graph record and printout out under -D (--dump-trace)
The call-graph data is not put into the histogram yet, but it
can be seen that it's being processed correctly:
0x3ce0 [0x38]: event: 35
.
. ... raw event: size 56 bytes
. 0000: 23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff #.....8........
. 0010: 60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00 `...`..........
. 0020: d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00 .........a.A6..
. 0030: 04 92 e6 41 36 00 00 00 .a.A6..
.
0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
... chain: u:2, k:1, nr:3
..... 0: 0xffffffff810edfd4
..... 1: 0x3641ed61a0
..... 2: 0x3641e69204
... thread: perf:2912
...... dso: [kernel]
This shows a 3-entry call-graph: with 1 kernel-space and two user-space
entries
Cc: Frederic Weisbecker <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Print out events in hexa dump format, when -D is specified:
0x4868 [0x48]: event: 1
.
. ... raw event: size 72 bytes
. 0000: 01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00 ......H..r...r.
. 0010: 00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00 ..@.>....0.....
. 0020: 00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62 ......../usr/li
. 0030: 36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e 64/libelf-0.141
. 0040: 73 6f 00 00 00 00 00 00 f-0.141
.
0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so
This helps the debugging of mis-parsing of data files, and helps
the addition of new sample/trace formats.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A build error slipped in:
builtin-report.c: In function ‘hist_entry__fprintf’:
builtin-report.c:711: error: format ‘%12d’ expects type ‘int’, but argument 3 has type ‘uint64_t’
Because we got a bit sloppy with those types. uint64_t really sucks,
because there's no printf format for it. So standardize on __u64
instead - for all types that go to or come from the ABI (which is __u64),
or for values that need to be large enough even on 32-bit.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When we use variable period sampling, add the period to the sample
data and use that to normalize the samples.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Create the counter in a disabled state and only enable it after we
mmap() the buffer, this allows us to see the first few samples (and
observe the frequency ramp).
Furthermore, print the period in the verbose report.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The rule is:
- high overhead: red
- mid overhead: green
- low overhead: normal (white/black)
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch adds support for profiling JIT generated code to 'perf
report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
symbols map that is parsed when looking and displaying symbols.
Thanks to Peter Zijlstra for his help with this patch!
Example "perf report" output with the Jato JIT:
#
# (40311 samples)
#
# Overhead Command Shared Object Symbol
# ........ ................ ......................... ......
#
97.80% jato /tmp/perf-11915.map [.] Fibonacci.fib(I)I
0.56% jato 00000000b7fa023b 0x000000b7fa023b
0.45% jato /tmp/perf-11915.map [.] Fibonacci.main([Ljava/lang/String;)V
0.38% jato [kernel] [k] get_page_from_freelist
0.06% jato [kernel] [k] kunmap_atomic
0.05% jato ./jato [.] utf8Hash
0.04% jato ./jato [.] executeJava
0.04% jato ./jato [.] defineClass
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Before:
$ perf report
failed to open file: No such file or directory
After:
$ perf report
failed to open file: perf.data (try 'perf record' first)
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Several people have suggested that 'perf' has become a full-fledged
tool that should be moved out of Documentation/. Move it to the
(new) tools/ directory.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|