Age | Commit message (Collapse) | Author | Files | Lines |
|
Its equivalent to using map->groups to obtain the machine struct.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf report:
Jin Yao:
- Introduce --total-cycles, for basic block profiling, further using data
obtained from LBR, an example should suffice:
# perf record -b
^C[ perf record: Woken up 595 times to write data ]
[ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Sampled Avg Avg
# Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object
# ....... ...... ....... ..... .................................... ................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
perf record:
Adrian Hunter:
- Allow storing perf.data in a directory together with a copy of /proc/kcore.
Jiwei Sun:
- Add support for limit perf output file size, i.e.:
# perf record --all-cpus -F 10000 --max-size=4M sleep 10h
[ perf record: perf size limit reached (4097 KB), stopping session ]
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ]
Terminated
# ls -lah perf.data
-rw-------. 1 root root 4.1M Nov 7 15:27 perf.data
#
perf stat:
Jiri Olsa:
- Add --per-node agregation support:
In live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
...
perf probe:
Masami Hiramatsu:
Various fixes related to recent additions to the DWARF format:
- Fix to find range-only function instance
- Walk function lines in lexical blocks
- Fix to show function entry line as probe-able
- Fix wrong address verification
- Fix to probe a function which has no entry pc
- Fix to probe an inline function which has no entry pc
- Fix to list probe event with correct line number
- Fix to show inlined function callsite without entry_pc
- Fix to show ranges of variables in functions without entry_pc
- Return a better scope DIE if there is no best scope
- Skip end-of-sequence and non statement lines
- Filter out instances except for inlined subroutine and subprogram
- Fix to show calling lines of inlined functions
- Skip overlapped location on searching variables
perf inject:
Adrian Hunter:
- Do not strip evsels with --strip, as they are needed for create_gcov
(see the autofdo example in tools/perf/Documentation/intel-pt.txt).
Intel PT:
Adrian Hunter:
- Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid
repeated decoding. Add an auxtrace_cache__remove to handle text poke events.
core:
Andi Kleen:
- Always preserve errno while cleaning up perf_event_open failures.
llvm:
Arnaldo Carvalho de Melo:
- No need to tell that the request for saving a .o file for BPF events, as
expressed in ~/.perfconfig was satisfied, make that a debug message.
perf vendor events:
Intel:
Haiyan Song:
- Update CascadelakeX events to v1.05.
- Update all the Intel JSON metrics from TMAM 3.6.
Treewide:
Ian Rogers:
- Improve error paths, plugging leaks found using LLVM tools
such as libFuzzer.
jevents:
Yunfeng Ye:
- Fix resource leak in process_mapfile() and main()
perf kvm:
Igor Lubashev:
- Use evlist layer api when possible.
libsubcmd:
James Clark:
- Move EXTRA_FLAGS to the end to allow overriding existing flags.
- Use -O0 with DEBUG=1
perf diff:
Jin Yao:
- Don't use hack to skip column length calculation
CoreSight ETM:
Leo yan:
- Fix definition of macro TO_CS_QUEUE_NR
ARM64:
John Garry:
- Do not try to include libelf header files when its feature detection
failed, fixing the cross build for ARM64.
perf tests:
Leo Yan:
- Fix out of bounds memory access in the backward ring buffer test.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Previous patch has implemented a new option "--total-cycles". But only
stdio mode is supported.
This patch supports the tui mode and support '--percent-limit'.
For example,
perf record -b ./div
perf report --total-cycles --percent-limit 1
# Samples: 2753248 of event 'cycles'
Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
--------------------------------------------------
v7:
---
1. Since we have used use_browser in report__browse_block_hists
to support stdio mode, now we also add supporting for tui.
2. Move block tui browser code from ui/browsers/hists.c
to block-info.c.
v6:
---
Create report__tui_browse_block_hists in block-info.c
(codes are moved from builtin-report.c).
v5:
---
Fix a crash issue when running perf report without
'--total-cycles'. The issue is because the internal flag
is renamed from 'total_cycles' to 'total_cycles_mode' in
previous patch but this patch still uses 'total_cycles'
to check if the '--total-cycles' option is enabled, which
causes the code to be inconsistent.
v4:
---
Since the block collection is moved out of printing in
previous patch, this patch is updated accordingly for
tui supporting.
v3:
---
Minor change since the function name is changed:
block_total_cycles_percent -> block_info__total_cycles_percent
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We have already supported the '--total-cycles' option in previous patch.
It's also useful to show entries only above a threshold percent.
This patch enables '--percent-limit' for not showing entries
under that percent.
For example:
perf report --total-cycles --stdio --percent-limit 1
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ................................................................. ....................
#
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
Committer testing:
From second exapmple onwards slightly edited for brevity:
# perf report --total-cycles --percent-limit 2 --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
#
# (Tip: Create an archive with symtabs to analyse on other machine: perf archive)
#
# perf report --total-cycles --percent-limit 1 --stdio
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
#
# perf report --total-cycles --percent-limit 0.7 --stdio
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
#
-------------------------------------------
It only shows the entries which 'Sampled Cycles%' > 1%.
v7:
---
No functional change. Only fix the conflict issue because
previous patches are changed.
v6:
---
No functional change. Only fix the conflict issue because
previous patches are changed.
v5:
---
No functional change. Only fix the conflict issue because
previous patches are changed.
v4:
---
No functional change. Only fix the build issue because
previous patches are changed.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It would be useful to support sorting for all blocks by the sampled
cycles percent per block. This is useful to concentrate on the globally
hottest blocks.
This patch implements a new option "--total-cycles" which sorts all
blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
percent = block sampled cycles aggregation / total sampled cycles
Note that, this patch only supports "--stdio" mode.
For example,
# perf record -b ./div
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 2753248
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ................................................ .................
#
26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div
15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so
5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div
4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so
4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div
3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div
3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so
3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so
2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so
2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so
2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so
2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so
2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div
1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so
1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div
1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so
0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so
0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms]
0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms]
0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms]
0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms]
0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms]
0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms]
0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms]
0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms]
0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms]
0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms]
Committer testing:
This should provide material for hours of endless joy, both from looking
for suspicious things in the implementation of this patch, such as the
top one:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
As well from things that look legit:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
:-)
Very short system wide taken branches session:
# perf record -h -b
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-b, --branch-any sample any taken branches
#
# perf record -b
^C[ perf record: Woken up 595 times to write data ]
[ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
#
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
#
# perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6M of event 'cycles'
# Event count (approx.): 6299936
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ...................................................................... ....................
#
2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux]
1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so
0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux]
0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux]
0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux]
0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux]
0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux]
0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux]
0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux]
0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux]
0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux]
0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux]
0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux]
0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux]
0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux]
0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux]
0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux]
0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux]
0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux]
0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux]
0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux]
0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux]
0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux]
0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux]
0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux]
0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux]
0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux]
-----------------------------------------------------------
v7:
---
Use use_browser in report__browse_block_hists for supporting
stdio and potential tui mode.
v6:
---
Create report__browse_block_hists in block-info.c (codes are
moved from builtin-report.c). It's called from
perf_evlist__tty_browse_hists.
v5:
---
1. Move all block functions to block-info.c
2. Move the code of setting ms in block hist_entry to
other patch.
v4:
---
1. Use new option '--total-cycles' to replace
'-s total_cycles' in v3.
2. Move block info collection out of block info
printing.
v3:
---
1. Use common function block_info__process_sym to
process the blocks per symbol.
2. Remove the nasty hack for skipping calculation
of column length
3. Some minor cleanup
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch provides helper routines to support new columns for block
info output.
The new columns are:
Sampled Cycles%
Sampled Cycles
Avg Cycles%
Avg Cycles
[Program Block Range]
Shared Object
v5:
---
1. Move more block related functions from builtin-report.c to
block-info.c
2. Set ms (map+sym) in block hist_entry. Because this info
is needed for reporting the block range (i.e. source line)
Committer notes:
Remove unused set_fmt() function, some build were not completing with:
util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function]
static inline void set_fmt(struct block_fmt *block_fmt,
^
1 error generated.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example:
coverage = per block sampled cycles / total sampled cycles
This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We have already implemented some block-info related functions.
Now it's time to do some cleanup, refactoring and move the
functions and structures to new block-info.h/block-info.c.
v4:
---
Move code for skipping column length calculation to patch:
'perf diff: Don't use hack to skip column length calculation'
v3:
---
1. Rename the patch title
2. Rename from block.h/block.c to block-info.h/block-info.c
3. Move more common part to block-info, such as
block_info__process_sym.
4. Remove the nasty hack for skipping calculation of column
length
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Previously we use a nasty hack to skip the hists__calc_col_len for block
since this function is not very suitable for block column length
calculation.
This patch removes the hack code and add a check at the entry of
hists__calc_col_len to skip for block case.
Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since debuginfo__find_probes() callback function can be called with the
location which already passed, the callback function must filter out
such overlapped locations.
add_probe_trace_event() has already done it by commit 1a375ae7659a
("perf probe: Skip same probe address for a given line"), but
add_available_vars() doesn't. Thus perf probe -v shows same address
repeatedly as below:
# perf probe -V vfs_read:18
Available variables at vfs_read:18
@<vfs_read+217>
char* buf
loff_t* pos
ssize_t ret
struct file* file
@<vfs_read+217>
char* buf
loff_t* pos
ssize_t ret
struct file* file
@<vfs_read+226>
char* buf
loff_t* pos
ssize_t ret
struct file* file
With this fix, perf probe -V shows it correctly:
# perf probe -V vfs_read:18
Available variables at vfs_read:18
@<vfs_read+217>
char* buf
loff_t* pos
ssize_t ret
struct file* file
@<vfs_read+226>
char* buf
loff_t* pos
ssize_t ret
struct file* file
Fixes: cf6eb489e5c0 ("perf probe: Show accessible local variables")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157241938927.32002.4026859017790562751.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix to show calling lines of inlined functions (where an inline function
is called).
die_walk_lines() filtered out the lines inside inlined functions based
on the address. However this also filtered out the lines which call
those inlined functions from the target function.
To solve this issue, check the call_file and call_line attributes and do
not filter out if it matches to the line information.
Without this fix, perf probe -L doesn't show some lines correctly.
(don't see the lines after 17)
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
fsnotify_access(file);
add_rchar(current, ret);
}
With this fix:
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
17 fsnotify_access(file);
18 add_rchar(current, ret);
}
20 inc_syscr(current);
}
Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157241937995.32002.17899884017011512577.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Filter out instances except for inlined_subroutine and subprogram DIE in
die_walk_instances() and die_is_func_instance().
This fixes an issue that perf probe sets some probes on calling address
instead of a target function itself.
When perf probe walks on instances of an abstruct origin (a kind of
function prototype of inlined function), die_walk_instances() can also
pass a GNU_call_site (a GNU extension for call site) to callback. Since
it is not an inlined instance of target function, we have to filter out
when searching a probe point.
Without this patch, perf probe sets probes on call site address too.This
can happen on some function which is marked "inlined", but has actual
symbol. (I'm not sure why GCC mark it "inlined"):
# perf probe -D vfs_read
p:probe/vfs_read _text+2500017
p:probe/vfs_read_1 _text+2499468
p:probe/vfs_read_2 _text+2499563
p:probe/vfs_read_3 _text+2498876
p:probe/vfs_read_4 _text+2498512
p:probe/vfs_read_5 _text+2498627
With this patch:
Slightly different results, similar tho:
# perf probe -D vfs_read
p:probe/vfs_read _text+2498512
Committer testing:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Before:
# perf probe -D vfs_read
p:probe/vfs_read _text+3131557
p:probe/vfs_read_1 _text+3130975
p:probe/vfs_read_2 _text+3131047
p:probe/vfs_read_3 _text+3130380
p:probe/vfs_read_4 _text+3130000
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
After:
# perf probe -D vfs_read
p:probe/vfs_read _text+3130000
#
Fixes: db0d2c6420ee ("perf probe: Search concrete out-of-line instances")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157241937063.32002.11024544873990816590.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Skip end-of-sequence and non-statement lines while walking through lines
list.
The "end-of-sequence" line information means:
"the current address is that of the first byte after the
end of a sequence of target machine instructions."
(DWARF version 4 spec 6.2.2)
This actually means out of scope and we can not probe on it.
On the other hand, the statement lines (is_stmt) means:
"the current instruction is a recommended breakpoint location.
A recommended breakpoint location is intended to “represent”
a line, a statement and/or a semantically distinct subpart
of a statement."
(DWARF version 4 spec 6.2.2)
So, non-statement line info also should be skipped.
These can reduce unneeded probe points and also avoid an error.
E.g. without this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1
#
This puts 5 probes on one line, but acutally it's not inlined function.
This is because there are many non statement instructions at the
function prologue.
With this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
#
Now perf-probe skips unneeded addresses.
Committer testing:
Slightly different results, but similar:
Before:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1
#
After:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
#
Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157241936090.32002.12156347518596111660.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make find_best_scope() returns innermost DIE at given address if there
is no best matched scope DIE. Since Gcc sometimes generates intuitively
strange line info which is out of inlined function address range, we
need this fixup.
Without this, sometimes perf probe failed to probe on a line inside an
inlined function:
# perf probe -D ksys_open:3
Failed to find scope of probe point.
Error: Failed to add events.
With this fix, 'perf probe' can probe it:
# perf probe -D ksys_open:3
p:probe/ksys_open _text+25707308
p:probe/ksys_open_1 _text+25710596
p:probe/ksys_open_2 _text+25711114
p:probe/ksys_open_3 _text+25711343
p:probe/ksys_open_4 _text+25714058
p:probe/ksys_open_5 _text+2819653
p:probe/ksys_open_6 _text+2819701
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Tom Zanussi <[email protected]>
Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix expand_tabs that copies the source lines '\0' and then appends
another '\0' at a potentially out of bounds address.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To reduce boilerplate in some places.
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Its sufficient to check if map->groups is NULL before using it to get
->machine value.
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a parse_events_term deep delete function so that owned strings and
arrays are freed.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Avoid a memory leak when the configuration fails.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Yyabort doesn't destruct inputs and so this must be done manually before
using yyabort.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If parsing fails then destructors are ran to clean the up the stack.
Rename the head union member to make the term and evlist use cases more
distinct, this simplifies matching the correct destructor.
Committer notes:
Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?"
Ian: "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and
Bison 2 is > 14 years old:
https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl"
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make it easier to release memory associated with parse event terms by
duplicating the string for the config name and ensuring the val string
is a duplicate.
Currently the parser may memory leak terms and this is addressed in a
later patch.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Parse event error handling may overwrite one error string with another
creating memory leaks. Introduce a helper routine that warns about
multiple error messages as well as avoiding the memory leak.
A reproduction of this problem can be seen with:
perf stat -e c/c/
After this change this produces:
WARNING: multiple event parsing errors
event syntax error: 'c/c/'
\___ unknown term
valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding new --per-node option to aggregate counts per NUMA
nodes for system-wide mode measurements.
You can specify --per-node in live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
3.003625233 N0 20 6,604,441 cycles
3.003625233 N1 20 1,043,428 cycles
4.005135036 N0 20 6,350,522 cycles
4.005135036 N1 20 681,564 cycles
4.340902364 N0 20 3,403,188 cycles
4.340902364 N1 20 520,705 cycles
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To speed up cpu to node lookup, add perf_env__numa_node(), that creates
cpu array on the first lookup, that holds numa nodes for each stored
cpu.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If event parsing fails the event list is leaked, instead splice the list
onto the out result and let the caller cleanup.
An example input for parse_events found by libFuzzer that reproduces
this memory leak is 'm{'.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To reduce boilerplate, providing a more compact form to iterate over the
maps in a map_group.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To reduce boilerplate, provide a more compact form using an idiom
present in other trees of data structures.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Just like free(), return NULL in that case, will simplify the
for_each_entry_safe() iterators.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We were checking just if it was still on some rb tree, but that is not
the only way that this map can still have references, map->refcnt is
there exactly for this, use it.
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add functions to write into the dso file data cache, but not change the
file itself.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Refactor dso_cache__read() to separate populating the cache from copying
data from it. This is preparation for adding a cache "write" that will
update the data in the cache.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add auxtrace_cache__remove(). Intel PT uses an auxtrace_cache to store
the results of code-walking, so that the same block of instructions does
not have to be decoded repeatedly. However, when there are text poke
events, the associated cache entries need to be removed.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix to show ranges of variables (--range and --vars option) in functions
which DIE has only ranges but no entry_pc attribute.
Without this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
With this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-35,317-317,2052-2059]>
Committer testing:
Before:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-23,23-105,105-106,106-106,1843-1850,1850-1862]>
[root@quaco ~]#
Using it:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask cpu
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask with cpu)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c with cpu)
[root@quaco ~]#
[root@quaco ~]# perf trace -e probe:*cpumask
^C[root@quaco ~]#
Fixes: 349e8d261131 ("perf probe: Add --range option to show a variable's location range")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199323018.8075.8179744380479673672.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix 'perf probe --line' option to show inlined function callsite lines
even if the function DIE has only ranges.
Without this:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
With this patch:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
Committer testing:
Before:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]# perf probe amd_put_event_constraints:4
Added new event:
probe:amd_put_event_constraints (on amd_put_event_constraints:4)
You can now use it in all perf tools, such as:
perf record -e probe:amd_put_event_constraints -aR sleep 1
[root@quaco ~]#
[root@quaco ~]# perf probe -l
probe:amd_put_event_constraints (on amd_put_event_constraints:4@arch/x86/events/amd/core.c)
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Using it:
[root@quaco ~]# perf trace -e probe:*
^C[root@quaco ~]#
Ok, Intel system here... :-)
Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199322107.8075.12659099000567865708.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since debuginfo__find_probe_point() uses dwarf_entrypc() for finding the
entry address of the function on which a probe is, it will fail when the
function DIE has only ranges attribute.
To fix this issue, use die_entrypc() instead of dwarf_entrypc().
Without this fix, perf probe -l shows incorrect offset:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579263632@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask+18446744071579263752@work/linux/linux/kernel/cpu.c)
With this:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:21@work/linux/linux/kernel/cpu.c)
Committer testing:
Before:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579765152@kernel/cpu.c)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Fixes: 1d46ea2a6a40 ("perf probe: Fix listing incorrect line number with inline function")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199321227.8075.14655572419136993015.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix perf probe to probe an inlne function which has no entry pc
or low pc but only has ranges attribute.
This seems very rare case, but I could find a few examples, as
same as probe_point_search_cb(), use die_entrypc() to get the
entry address in probe_point_inline_cb() too.
Without this patch:
# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
With this patch:
# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints amd_put_event_constraints+43
Committer testing:
Before:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints _text+33789
[root@quaco ~]#
Fixes: 4ea42b181434 ("perf: Add perf probe subcommand, a kprobe-event setup helper")
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199320336.8075.16189530425277588587.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix 'perf probe' to probe a function which has no entry pc or low pc but
only has ranges attribute.
probe_point_search_cb() uses dwarf_entrypc() to get the probe address,
but that doesn't work for the function DIE which has only ranges
attribute. Use die_entrypc() instead.
Without this fix:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
With this:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
Committer testing:
Before:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]#
Using it with 'perf trace':
[root@quaco ~]# perf trace -e probe:clear_tasks_mm_cpumask
Doesn't seem to be used in x86_64:
$ find . -name "*.c" | xargs grep clear_tasks_mm_cpumask
./kernel/cpu.c: * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
./kernel/cpu.c:void clear_tasks_mm_cpumask(int cpu)
./arch/xtensa/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/csky/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/sh/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/arm/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/powerpc/mm/nohash/mmu_context.c: clear_tasks_mm_cpumask(cpu);
$ find . -name "*.h" | xargs grep clear_tasks_mm_cpumask
./include/linux/cpu.h:void clear_tasks_mm_cpumask(int cpu);
$ find . -name "*.S" | xargs grep clear_tasks_mm_cpumask
$
Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions")
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199319438.8075.4695576954550638618.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since there are some DIE which has only ranges instead of the
combination of entrypc/highpc, address verification must use
dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc.
Also, the ranges only DIE will have a partial code in different section
(e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In
that case, we can not use dwarf_entrypc() or die_entrypc(), because the
offset from original DIE can be a minus value.
Instead, this simply gets the symbol and offset from symtab.
Without this patch;
# perf probe -D clear_tasks_mm_cpumask:1
Failed to get entry address of clear_tasks_mm_cpumask
Error: Failed to add events.
And with this patch:
# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5
p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8
p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16
p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82
Committer testing:
I managed to reproduce the above:
[root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask _text+919968
p:probe/clear_tasks_mm_cpumask_1 _text+919973
p:probe/clear_tasks_mm_cpumask_2 _text+919976
[root@quaco ~]#
But then when trying to actually put the probe in place, it fails if I
use :0 as the offset:
[root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5
<clear_tasks_mm_cpumask@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/kernel/cpu.c:0>
0 void clear_tasks_mm_cpumask(int cpu)
1 {
2 struct task_struct *p;
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco
The next patch is needed to fix this case.
Fixes: 576b523721b7 ("perf probe: Fix probing symbols with optimization suffix")
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157199318513.8075.10463906803299647907.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix die_walk_lines() to list the function entry line correctly. Since
the dwarf_entrypc() does not return the entry pc if the DIE has only
range attribute, __die_walk_funclines() fails to list the declaration
line (entry line) in that case.
To solve this issue, this introduces die_entrypc() which correctly
returns the entry PC (the first address range) even if the DIE has only
range attribute. With this fix die_walk_lines() shows the function entry
line is able to probe correctly.
Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157190837419.1859.4619125803596816752.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since some inlined functions are in lexical blocks of given function, we
have to recursively walk through the DIE tree. Without this fix,
perf-probe -L can miss the inlined functions which is in a lexical block
(like if (..) { func() } case.)
However, even though, to walk the lines in a given function, we don't
need to follow the children DIE of inlined functions because those do
not have any lines in the specified function.
We need to walk though whole trees only if we walk all lines in a given
file, because an inlined function can include another inlined function
in the same file.
Fixes: b0e9cb2802d4 ("perf probe: Fix to search nested inlined functions in CU")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157190836514.1859.15996864849678136353.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix die_is_func_instance() to find range-only function instance.
In some case, a function instance can be made without any low PC or
entry PC, but only with address ranges by optimization. (e.g. cold text
partially in "text.unlikely" section) To find such function instance, we
have to check the range attribute too.
Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/157190835669.1859.8368628035930950596.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use realloc() rather than malloc()+memcpy() to possibly avoid a memory
allocation when appending array elements.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Having a YYABORT in a macro makes it hard to free memory for components
of a rule. Separate the logic out.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In some weak fallback cases close can be called a lot with -1. Check for
this case and avoid calling close then.
This is mainly to shut up valgrind which complains about this case.
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In some cases when perf_event_open fails, it may do some closes to clean
up. In special cases these closes can fail too, which overwrites the
errno of the perf_event_open, which is then incorrectly reported.
Save/restore errno around closes.
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Macro TO_CS_QUEUE_NR definition has a typo, which uses 'trace_id_chan'
as its parameter, this doesn't match with its definition body which uses
'trace_chan_id'. So renames the parameter to 'trace_chan_id'.
It's luck to have a local variable 'trace_chan_id' in the function
cs_etm__setup_queue(), even we wrongly define the macro TO_CS_QUEUE_NR,
the local variable 'trace_chan_id' is used rather than the macro's
parameter 'trace_id_chan'; so the compiler doesn't complain for this
before.
After renaming the parameter, it leads to a compiling error due
cs_etm__setup_queue() has no variable 'trace_id_chan'. This patch uses
the variable 'trace_chan_id' for the macro so that fixes the compiling
error.
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Its a bit annoying to have that message, better make it a debug one.
I.e. now this message will only appear when using '-v':
[root@quaco tracebuffer]# trace -e bristot.c
LLVM: dumping bristot.o
^C[root@quaco tracebuffer]#
Cc: Adrian Hunter <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Luis Cláudio Gonçalves <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|