Age | Commit message (Collapse) | Author | Files | Lines |
|
The output format for metrics has been reorganized, update documentation
to reflect the changes for it.
Signed-off-by: Leo Yan <[email protected]>
Cc: Al Grant <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch enables perf-diff with "--stream" option.
"--stream": Enable hot streams comparison
Now let's see example.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream
[ Matched hot streams ]
hot chain pair 1:
cycles: 1, hits: 27.77% cycles: 1, hits: 9.24%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 34, hits: 20.06% cycles: 27, hits: 16.98%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39
hot chain pair 3:
cycles: 9, hits: 4.48% cycles: 6, hits: 4.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot streams in old perf data only ]
hot chain 1:
cycles: 18, hits: 6.75%
--------------------------
__random_r random_r.c:360
__random_r random_r.c:388
__random_r random_r.c:388
__random_r random_r.c:380
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
hot chain 2:
cycles: 29, hits: 2.78%
--------------------------
compute_flag div.c:22
main div.c:40
main div.c:40
main div.c:39
[ Hot streams in new perf data only ]
hot chain 1:
cycles: 4, hits: 4.54%
--------------------------
main div.c:42
compute_flag div.c:28
hot chain 2:
cycles: 5, hits: 3.51%
--------------------------
main div.c:39
main div.c:44
main div.c:42
compute_flag div.c:28
Signed-off-by: Jin Yao <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Document the higher level --insn-trace etc. perf script options.
Include the howto how to build xed into the manpage
Signed-off-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Peter suggested that using the exclusive mode in perf could avoid some
problems with bad scheduling of groups. Exclusive is implemented in the
kernel, but wasn't exposed by the perf tool, so hard to use without
custom low level API users.
Add support for marking groups or events with :e for exclusive in the
perf tool. The implementation is basically the same as the existing
pinned attribute.
Committer testing:
# perf test "parse event"
6: Parse event definition strings : Ok
# perf test -v "parse event" |& grep :u*e
running test 56 'instructions:uep'
running test 57 '{cycles,cache-misses,branch-misses}:e'
#
#
# grep "model name" -m1 /proc/cpuinfo
model name : AMD Ryzen 9 3900X 12-Core Processor
#
# perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1
Performance counter stats for 'system wide':
<not counted> cycles (0.00%)
<not counted> cache-misses (0.00%)
<not counted> branch-misses (0.00%)
1.001269893 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
# echo 0 > /proc/sys/kernel/nmi_watchdog
# perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1
Performance counter stats for 'system wide':
1,298,663,141 cycles
30,962,215 cache-misses
5,325,150 branch-misses
1.001474934 seconds time elapsed
#
# The output for asking for precise events on AMD needs to improve, it
# supposedly works only for system wide or per CPU
#
# perf stat -a -e '{cycles,cache-misses,branch-misses}:uep' sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.
# perf stat -a -e '{cycles,cache-misses,branch-misses}:ue' sleep 1
Performance counter stats for 'system wide':
746,363,126 cycles
16,881,611 cache-misses
2,871,259 branch-misses
1.001636066 seconds time elapsed
#
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like 'perf record', we can even more speedup build-id processing by just
using all DSOs. Then we don't need to look at all the sample events
anymore. The following patch will update 'perf bench' to show the result
of the --buildid-all option too.
Signed-off-by: Namhyung Kim <[email protected]>
Original-patch-by: Stephane Eranian <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The --for-each-cgroup option is a syntax sugar to monitor large number
of cgroups easily. Current command line requires to list all the events
and cgroups even if users want to monitor same events for each cgroup.
This patch addresses that usage by copying given events for each cgroup
on user's behalf.
For instance, if they want to monitor 6 events for 200 cgroups each they
should write 1200 event names (with -e) AND 1200 cgroup names (with -G)
on the command line. But with this change, they can just specify 6
events and 200 cgroups with a new option.
A simpler example below: It wants to measure 3 events for 2 cgroups ('A'
and 'B'). The result is that total 6 events are counted like below.
$ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1
Performance counter stats for 'system wide':
988.18 msec cpu-clock A # 0.987 CPUs utilized
3,153,761,702 cycles A # 3.200 GHz (100.00%)
8,067,769,847 instructions A # 2.57 insn per cycle (100.00%)
982.71 msec cpu-clock B # 0.982 CPUs utilized
3,136,093,298 cycles B # 3.182 GHz (99.99%)
8,109,619,327 instructions B # 2.58 insn per cycle (99.99%)
1.001228054 seconds time elapsed
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf has many undocumented options, such as:-vv, --exec-path,
--html-path, -p, --paginate,--no-pager, --debugfs-dir, --buildid-dir,
--list-cmds, --list-opts.
Add entris for these options in perf.txt.
Signed-off-by: Zejiang Tang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add some documentation how to use the topdown metrics in ring 3.
Signed-off-by: Andi Kleen <[email protected]>
Co-developed-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.
Signed-off-by: Andi Kleen <[email protected]>
Co-developed-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Same as 'perf probe -F', this patch adds filter support for the ftrace
subcommand option '-F, --funcs <[FILTER]>'.
Here is an example that only lists functions which start with 'vfs_':
$ sudo perf ftrace -F vfs_*
vfs_fadvise
vfs_fallocate
vfs_truncate
vfs_open
vfs_setpos
vfs_llseek
vfs_readf
vfs_writef
...
Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Changbin Du <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The documentation describes snapshot mode. Update it to include the new
snapshot control command.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > default
# perf config annotate.disassembler_style=intel
# perf config annotate.disassembler_style
annotate.disassembler_style=intel
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > intel
# diff -u default intel
--- default 2020-09-04 13:09:26.019205732 -0300
+++ intel 2020-09-04 13:09:52.823795081 -0300
@@ -1,42 +1,42 @@
Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 990065316, [percent: local period]
acpi_processor_ffh_cstate_enter() /lib/modules/5.9.0-rc3/build/vmlinux
-Percent → callq __fentry__
- mov cpu_number,%edx
- mov %edx,%edx
- mov cpu_cstate_entry,%rax
- add -0x7dbe9700(,%rdx,8),%rax
- movzbl 0x9(%rdi),%edx
- mov 0x4(%rax,%rdx,8),%edi
- mov (%rax,%rdx,8),%esi
- → jmpq 137ccc6
- 2d: → jmpq 137ccd8
+Percent → call __fentry__
+ mov edx,DWORD PTR gs:[rip+0x7e541d74]
+ mov edx,edx
+ mov rax,QWORD PTR [rip+0x152b8fb]
+ add rax,QWORD PTR [rdx*8-0x7dbe9700]
+ movzx edx,BYTE PTR [rdi+0x9]
+ mov edi,DWORD PTR [rax+rdx*8+0x4]
+ mov esi,DWORD PTR [rax+rdx*8]
+ → jmp 137ccc6
+ 2d: → jmp 137ccd8
mfence
- mov %gs:0x17bc0,%rax
- clflush (%rax)
+ mov rax,QWORD PTR gs:0x17bc0
+ clflush BYTE PTR [rax]
mfence
- xor %edx,%edx
- mov %rdx,%rcx
- mov %gs:0x17bc0,%rax
- 0.00 monitor %rax,%ecx,%edx
- mov (%rax),%rax
- test $0x8,%al
+ xor edx,edx
+ mov rcx,rdx
+ mov rax,QWORD PTR gs:0x17bc0
+ 0.00 monitor
+ mov rax,QWORD PTR [rax]
+ test al,0x8
↓ jne 71
- ↓ jmpq 68
- verw 0x538b08(%rip) # ffffffff82008150 <ds.0>
- 68: mov %rsi,%rax
- mov %rdi,%rcx
-100.00 mwait %eax,%ecx
- 71: mov %gs:0x17bc0,%rax
- lock andb $0xdf,0x2(%rax)
- lock addl $0x0,-0x4(%rsp)
- mov (%rax),%rax
- test $0x8,%al
+ ↓ jmp 68
+ verw WORD PTR [rip+0x538b08] # ffffffff82008150 <ds.0>
+ 68: mov rax,rsi
+ mov rcx,rdi
+100.00 mwait
+ 71: mov rax,QWORD PTR gs:0x17bc0
+ lock and BYTE PTR [rax+0x2],0xdf
+ lock add DWORD PTR [rsp-0x4],0x0
+ mov rax,QWORD PTR [rax]
+ test al,0x8
↓ je 97
- andl $0x7fffffff,__preempt_count
- 97: ← retq
- mov %gs:0x17bc0,%rax
- lock orb $0x20,0x2(%rax)
- mov (%rax),%rax
- test $0x8,%al
+ and DWORD PTR gs:[rip+0x7e548509],0x7fffffff
+ 97: ret
+ mov rax,QWORD PTR gs:0x17bc0
+ lock or BYTE PTR [rax+0x2],0x20
+ mov rax,QWORD PTR [rax]
+ test al,0x8
↑ jne 71
- ↑ jmpq 2d
+ ↑ jmp 2d
#
Requested-by: Matt P. Dziubinski <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add 'snapshot' control command to create an AUX area tracing snapshot
the same as if sending SIGUSR2. The advantage of the FIFO is that access
is governed by access to the FIFO.
Example:
$ mkfifo perf.control
$ mkfifo perf.ack
$ cat perf.ack &
[1] 15235
$ sudo ~/bin/perf record --control fifo:perf.control,perf.ack -S -e intel_pt//u -- sleep 60 &
[2] 15243
$ ps -e | grep perf
15244 pts/1 00:00:00 perf
$ kill -USR2 15244
bash: kill: (15244) - Operation not permitted
$ echo snapshot > perf.control
ack
$
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Enable the --control option to accept file names as an alternative to
file descriptors.
Example:
$ mkfifo perf.control
$ mkfifo perf.ack
$ cat perf.ack &
[1] 6808
$ perf record --control fifo:perf.control,perf.ack -- sleep 300 &
[2] 6810
$ echo disable > perf.control
$ Events disabled
ack
$ echo enable > perf.control
$ Events enabled
ack
$ echo disable > perf.control
$ Events disabled
ack
$ kill %2
[ perf record: Woken up 4 times to write data ]
$ [ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ]
[1]- Done cat perf.ack
[2]+ Terminated perf record --control fifo:perf.control,perf.ack -- sleep 300
$
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The --control option does not display well in man pages unless AsciiDoc
formatting is used.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Event modifiers are not mentioned in the perf record or perf stat
manpages. Add them to orient new users more effectively by pointing
them to the perf list manpage for details.
Fixes: 2055fdaf8703 ("perf list: Document precise event sampling for AMD IBS")
Signed-off-by: Kim Phillips <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tony Jones <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
There's a risk that outputting interval mode summaries by default breaks
CSV consumers. It already broke pmu-tools/toplev.
So now we turn off the summary by default but we create a new option
'--summary' to enable the summary. This is active even when not using
CSV mode.
Before:
root@kbl-ppc:~# perf stat -I1000 --interval-count 2
# time counts unit events
1.000265904 8,005.73 msec cpu-clock # 8.006 CPUs utilized
1.000265904 601 context-switches # 0.075 K/sec
1.000265904 10 cpu-migrations # 0.001 K/sec
1.000265904 0 page-faults # 0.000 K/sec
1.000265904 66,746,521 cycles # 0.008 GHz
1.000265904 71,874,398 instructions # 1.08 insn per cycle
1.000265904 13,356,781 branches # 1.668 M/sec
1.000265904 298,756 branch-misses # 2.24% of all branches
2.001857667 8,012.52 msec cpu-clock # 8.013 CPUs utilized
2.001857667 164 context-switches # 0.020 K/sec
2.001857667 10 cpu-migrations # 0.001 K/sec
2.001857667 2 page-faults # 0.000 K/sec
2.001857667 5,822,188 cycles # 0.001 GHz
2.001857667 2,186,170 instructions # 0.38 insn per cycle
2.001857667 442,378 branches # 0.055 M/sec
2.001857667 44,750 branch-misses # 10.12% of all branches
Performance counter stats for 'system wide':
16,018.25 msec cpu-clock # 7.993 CPUs utilized
765 context-switches # 0.048 K/sec
20 cpu-migrations # 0.001 K/sec
2 page-faults # 0.000 K/sec
72,568,709 cycles # 0.005 GHz
74,060,568 instructions # 1.02 insn per cycle
13,799,159 branches # 0.861 M/sec
343,506 branch-misses # 2.49% of all branches
2.004118489 seconds time elapsed
After:
root@kbl-ppc:~# perf stat -I1000 --interval-count 2
# time counts unit events
1.001336393 8,013.28 msec cpu-clock # 8.013 CPUs utilized
1.001336393 82 context-switches # 0.010 K/sec
1.001336393 8 cpu-migrations # 0.001 K/sec
1.001336393 0 page-faults # 0.000 K/sec
1.001336393 4,199,121 cycles # 0.001 GHz
1.001336393 1,373,991 instructions # 0.33 insn per cycle
1.001336393 270,681 branches # 0.034 M/sec
1.001336393 31,659 branch-misses # 11.70% of all branches
2.003905006 8,020.52 msec cpu-clock # 8.021 CPUs utilized
2.003905006 184 context-switches # 0.023 K/sec
2.003905006 8 cpu-migrations # 0.001 K/sec
2.003905006 2 page-faults # 0.000 K/sec
2.003905006 5,446,190 cycles # 0.001 GHz
2.003905006 2,312,547 instructions # 0.42 insn per cycle
2.003905006 451,691 branches # 0.056 M/sec
2.003905006 37,925 branch-misses # 8.40% of all branches
root@kbl-ppc:~# perf stat -I1000 --interval-count 2 --summary
# time counts unit events
1.001313128 8,013.20 msec cpu-clock # 8.013 CPUs utilized
1.001313128 83 context-switches # 0.010 K/sec
1.001313128 8 cpu-migrations # 0.001 K/sec
1.001313128 0 page-faults # 0.000 K/sec
1.001313128 4,470,950 cycles # 0.001 GHz
1.001313128 1,440,045 instructions # 0.32 insn per cycle
1.001313128 283,222 branches # 0.035 M/sec
1.001313128 33,576 branch-misses # 11.86% of all branches
2.003857385 8,020.34 msec cpu-clock # 8.020 CPUs utilized
2.003857385 154 context-switches # 0.019 K/sec
2.003857385 8 cpu-migrations # 0.001 K/sec
2.003857385 2 page-faults # 0.000 K/sec
2.003857385 4,515,676 cycles # 0.001 GHz
2.003857385 2,180,449 instructions # 0.48 insn per cycle
2.003857385 435,254 branches # 0.054 M/sec
2.003857385 31,179 branch-misses # 7.16% of all branches
Performance counter stats for 'system wide':
16,033.53 msec cpu-clock # 7.992 CPUs utilized
237 context-switches # 0.015 K/sec
16 cpu-migrations # 0.001 K/sec
2 page-faults # 0.000 K/sec
8,986,626 cycles # 0.001 GHz
3,620,494 instructions # 0.40 insn per cycle
718,476 branches # 0.045 M/sec
64,755 branch-misses # 9.01% of all branches
2.006124542 seconds time elapsed
Fixes: c7e5b328a8d4 ("perf stat: Report summary for interval mode")
Signed-off-by: Jin Yao <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This allows us to trace single thread instead of the whole process.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '-D/--delay' to allow us to start tracing some times
later after workload is launched.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is to have a consistent view of all graph tracer options.
The original option '--graph-depth' is marked as deprecated.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '--graph-opts thresh' to setup trace duration
threshold for funcgraph tracer.
$ sudo ./perf ftrace -G '*' --graph-opts thresh=100
3) ! 184.060 us | } /* schedule */
3) ! 185.600 us | } /* exit_to_usermode_loop */
2) ! 225.989 us | } /* schedule_idle */
2) # 4140.051 us | } /* do_idle */
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Sometimes we want ftrace display more and longer information about the
trace.
$ sudo perf ftrace -G '*'
2) 0.979 us | mutex_unlock();
2) 1.540 us | __fsnotify_parent();
2) 0.433 us | fsnotify();
$ sudo perf ftrace -G '*' --graph-opts verbose
14160.770883 | 0) <...>-47814 | .... | 1.289 us | mutex_unlock();
14160.770886 | 0) <...>-47814 | .... | 1.624 us | __fsnotify_parent();
14160.770887 | 0) <...>-47814 | .... | 0.636 us | fsnotify();
14160.770888 | 0) <...>-47814 | .... | 0.328 us | __sb_end_write();
14160.770888 | 0) <...>-47814 | d... | 0.430 us | fpregs_assert_state_consistent();
14160.770889 | 0) <...>-47814 | d... | | do_syscall_64() {
14160.770889 | 0) <...>-47814 | .... | | __x64_sys_close() {
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds support to display irq context info for function tracer. To do
this, just specify a '--func-opts irq-info' option.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '--graph-opts noirqs' to filter out functions executed
in irq context.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '--graph-opts nosleep-time' which allow us only to
measure on-CPU time. This option is function_graph tracer only.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds support to display call trace for function tracer. To do this,
just specify a '--func-opts call-graph' option.
Example:
$ sudo perf ftrace -T vfs_read --func-opts call-graph
iio-sensor-prox-855 [003] 6168.369657: vfs_read <-ksys_read
iio-sensor-prox-855 [003] 6168.369677: <stack trace>
=> vfs_read
=> ksys_read
=> __x64_sys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
...
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '--inherit' to allow us trace children
processes spawned by our target.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '-m/--buffer-size' to allow us set the size of per-cpu
tracing buffer.
Committer testing:
Before running with this option:
# find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
1408
1408
1408
1408
1408
1408
1408
1408
1408
#
Then, run:
# perf ftrace -m 2048K | head -10
2) | mutex_unlock() {
2) ==========> |
2) | smp_irq_work_interrupt() {
2) | irq_enter() {
2) 0.121 us | rcu_irq_enter();
2) 0.128 us | irqtime_account_irq();
2) 0.719 us | }
2) | __wake_up() {
2) | __wake_up_common_lock() {
2) 0.105 us | _raw_spin_lock_irqsave();
#
Now look at those tracefs knobs:
# find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
2048
2048
2048
2048
2048
2048
2048
2048
2048
#
This should be similar to the -m option in the other perf tools, such as
'perf record', 'perf trace', etc.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This adds an option '-F/--funcs' to list all available functions to
trace, which is read from tracing file 'available_filter_functions'.
$ sudo ./perf ftrace -F | head
trace_initcall_finish_cb
initcall_blacklisted
do_one_initcall
do_one_initcall
trace_initcall_start_cb
run_init_process
try_to_run_init_process
match_dev_by_label
match_dev_by_uuid
rootfs_init_fs_context
$
Committer notes:
This is the same command line option and for the same purpose as in
'perf probe'.
Signed-off-by: Changbin Du <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The '-g/-G' options have already implied function_graph tracer should be
used instead of function tracer. So we don't need extra option
'--tracer' in this case.
This patch changes the behavior as below:
- If '-g' or '-G' option is on, then function_graph tracer is used.
- If '-T' or '-N' option is on, then function tracer is used.
- The function_graph has priority over function tracer.
- The option '--tracer' only take effect if neither -g/-G nor -T/-N
is specified.
Here are some examples.
This will start tracing all functions using default tracer:
$ sudo perf ftrace
This will trace all functions using function graph tracer:
$ sudo perf ftrace -G '*'
This will trace function vfs_read using function graph tracer:
$ sudo perf ftrace -G vfs_read
This will trace function vfs_read using function tracer:
$ sudo perf ftrace -T vfs_read
Committer notes:
Using '-h -G' will tell what that option is about, so to further clarify
the above examples:
# perf ftrace -h -G
-G, --graph-funcs <func> Set graph filter on given functions
# perf ftrace -h -g
-g, --nograph-funcs <func> Set nograph filter on given functions
# perf ftrace -h -T
-T, --trace-funcs <func> trace given functions only
# perf ftrace -h -N
-N, --notrace-funcs <func> do not trace given functions
#
Signed-off-by: Changbin Du <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adad support to convert and store time of day in CTF data conversion for
'perf data convert' subcommand.
The perf.data used for conversion needs to have clock data information -
must be recorded with -k/--clockid option).
New --tod option is added to 'perf data convert' subcommand to convert
data with timestamps converted to wall clock time.
Record data with clockid set:
# perf record -k CLOCK_MONOTONIC kill
kill: not enough arguments
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.033 MB perf.data (8 samples) ]
Convert data with TOD timestamps:
# perf data convert --tod --to-ctf ./ctf
[ perf data convert: Converted 'perf.data' into CTF data './ctf' ]
[ perf data convert: Converted and wrote 0.000 MB (8 samples) ]
Display data in perf script:
# perf script -F+tod --ns
perf 262150 2020-07-13 18:38:50.097678523 153633.958246159: 1 cycles: ...
perf 262150 2020-07-13 18:38:50.097682941 153633.958250577: 1 cycles: ...
perf 262150 2020-07-13 18:38:50.097684997 153633.958252633: 7 cycles: ...
...
Display data in babeltrace:
# babeltrace --clock-date ./ctf
[2020-07-13 18:38:50.097678523] (+?.?????????) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
[2020-07-13 18:38:50.097682941] (+0.000004418) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
[2020-07-13 18:38:50.097684997] (+0.000002056) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
...
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Error is display if you want to use --tod on data without clockid
specified:
# perf data convert --tod --to-ctf ./ctf
Can't provide --tod time, missing clock data. Please record with -k/--clockid option.
Failed to setup CTF writer.
Error during conversion setup.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Geneviève Bastien <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jeremie Galarneau <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Geneviève Bastien <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jeremie Galarneau <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A single q option decodes ip from only FUP/TIP packets. Make it so that
repeating the q option (i.e. qq) decodes only PSB+, getting ip if there
is a FUP packet within PSB+ (i.e. between PSB and PSBEND).
Example:
$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289
real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694
real 0m4.453s
user 0m4.455s
sys 0m0.328s
$ time perf script --itrace=biqq | wc -l
1883
real 0m0.047s
user 0m0.043s
sys 0m0.009s
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the new itrace 'q' option to add support for a mode of decoding that
ignores TNT, does not walk object code, but gets the ip from FUP and TIP
packets.
Example:
$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289
real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694
real 0m4.453s
user 0m4.455s
sys 0m0.328s
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The 'q' option is for modes of decoding that are quicker because they
skip or omit decoding some aspects of trace data.
If supported, the 'q' option may be repeated to increase the effect.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Change the debug logging (when used with the --time option) to time
filter logged perf events, but allow that to be overridden by using
"d+a" instead of plain "d".
That can reduce the size of the log file.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The "d" option may be followed by flags which affect what debug messages
will or will not be logged. Each flag must be preceded by either '+' or
'-'. The flags support by Intel PT are:
-a Suppress logging of perf events
Suppressing perf events is useful for decreasing the size of the log.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Allow the 'd' option to be followed by flags which will affect what debug
messages will or will not be reported. Each flag must be preceded by either
'+' or '-'. The flags are:
a all perf events
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The itrace "e" option may be followed by flags which affect what errors
will or will not be reported. Each flag must be preceded by either '+' or '-'.
The flags supported by Intel PT are:
-o Suppress overflow errors
-l Suppress trace data lost errors
For example, for errors but not overflow or data lost errors:
--itrace=e-o-l
Suppressing those errors can be useful for testing and debugging because
they are not due to decoding.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Allow the 'e' option to be followed by flags which will affect what errors
will or will not be reported. Each flag must be preceded by either '+' or
'-'. The flags are:
o overflow
l trace data lost
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line.
Extend perf-record.txt file with --control fd:ctl-fd[,ack-fd] options
description.
Document possible usage model introduced by --control fd:ctl-fd[,ack-fd]
options by providing example bash shell script.
Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Extend -D,--delay option with -1 to start collection with events
disabled to be enabled later by 'enable' command provided via control
file descriptor.
Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line. Extend perf-stat.txt file with
--control fd:ctl-fd[,ack-fd] options description. Document possible
usage model introduced by --control fd:ctl-fd[,ack-fd] options by
providing example bash shell script.
Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support to specify raw event with 'r0<HEX>' syntax within pmu term
syntax like:
-e cpu/r0xdead/
It will be used to specify raw events in cases where they conflict with
real pmu terms, like 'read', which is valid raw event syntax, but also a
possible pmu term name as reported by Jin Yao.
Reported-by: Jin Yao <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Richter <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The usefulness of having a standard way of testing syscall performance
has come up from time to time[0]. Furthermore, some of our testing
machinery (such as 'mmtests') already makes use of a simplified version
of the microbenchmark. This patch mainly takes the same idea to measure
syscall throughput compatible with 'perf-bench' via getppid(2), yet
without any of the additional template stuff from Ingo's version (based
on numa.c). The code is identical to what mmtests uses.
[0] https://lore.kernel.org/lkml/[email protected]/
Committer notes:
Add mising stdlib.h and unistd.h to get the prototypes for exit() and
getppid().
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$
$ perf bench syscall
# List of available benchmarks for collection 'syscall':
basic: Benchmark for basic getppid(2) calls
all: Run all syscall benchmarks
$ perf bench syscall basic
# Running 'syscall/basic' benchmark:
# Executed 10000000 getppid() calls
Total time: 3.679 [sec]
0.367957 usecs/op
2717708 ops/sec
$ perf bench syscall all
# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 3.644 [sec]
0.364456 usecs/op
2743815 ops/sec
$
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/20190308181747.l36zqz2avtivrr3c@linux-r8p5
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Extend -D,--delay option with -1 value to start monitoring with
events disabled to be enabled later by enable command provided
via control file descriptor.
Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Consistent with other new events, add an option to perf script to
display text poke events and ksymbol events. Both text poke events and
ksymbol events are displayed because some text pokes (e.g. ftrace
trampolines) have corresponding ksymbol events.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch is to add four options to synthesize events which are
described as below:
'f': synthesize first level cache events
'm': synthesize last level cache events
't': synthesize TLB events
'a': synthesize remote access events
This four options will be used by ARM SPE as their first consumer.
Signed-off-by: Tan Xiaojun <[email protected]>
Tested-by: James Clark <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Al Grant <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: James Clark <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch links perf with the libpfm4 library if it is available and
LIBPFM4 is passed to the build. The libpfm4 library contains hardware
event tables for all processors supported by perf_events. It is a helper
library that helps convert from a symbolic event name to the event
encoding required by the underlying kernel interface. This library is
open-source and available from: http://perfmon2.sf.net.
With this patch, it is possible to specify full hardware events by name.
Hardware filters are also supported. Events must be specified via the
--pfm-events and not -e option. Both options are active at the same time
and it is possible to mix and match:
$ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles ....
One needs to explicitely ask for its inclusion by using the LIBPFM4 make
command line option, ie its opt-in rather than opt-out of feature
detection and build support.
Signed-off-by: Stephane Eranian <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Igor Lubashev <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jiwei Sun <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: yuzhoujian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Context switch events are added automatically by Intel PT and Coresight.
Make it possible to suppress them. That is useful for tracing the
scheduler without the disturbance that the switch event processing
creates.
Example:
Prerequisites:
$ which perf
~/bin/perf
$ sudo setcap "cap_sys_rawio,cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_ipc_lock=ep" ~/bin/perf
$ sudo chmod +r /proc/kcore
Before:
$ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.938 MB perf.data ]
$ perf script -D | grep PERF_RECORD_SWITCH | wc -l
572
After:
$ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001
Warning:
Intel Processor Trace decoding will not be possible except for kernel tracing!
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.838 MB perf.data ]
$ perf script -D | grep PERF_RECORD_SWITCH | wc -l
0
$ sudo chmod go-r /proc/kcore
$ sudo setcap -r ~/bin/perf
Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|