Age | Commit message (Collapse) | Author | Files | Lines |
|
The commit 94dbfd6781a0e87b ("perf parse-events: Architecture specific
leader override") introduced a feature to reorder the slots event to
fulfill the restriction of the perf metrics topdown group. But the
feature doesn't work on the hybrid machine.
$ perf stat -e "{cpu_core/instructions/,cpu_core/slots/,cpu_core/topdown-retiring/}" -a sleep 1
Performance counter stats for 'system wide':
<not counted> cpu_core/instructions/
<not counted> cpu_core/slots/
<not supported> cpu_core/topdown-retiring/
1.002871801 seconds time elapsed
A hybrid platform has a different PMU name for the core PMUs, while
current perf hard code the PMU name "cpu".
Introduce a new function to check whether the system supports the perf
metrics feature. The result is cached for the future usage.
For X86, the core PMU name always has "cpu" prefix.
With the patch:
$ perf stat -e "{cpu_core/instructions/,cpu_core/slots/,cpu_core/topdown-retiring/}" -a sleep 1
Performance counter stats for 'system wide':
76,337,010 cpu_core/slots/
10,416,809 cpu_core/instructions/
11,692,372 cpu_core/topdown-retiring/
1.002805453 seconds time elapsed
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The evsel->name may have a different format for a topdown event, a pure
topdown name (e.g., topdown-fe-bound), or a PMU name + a topdown name
(e.g., cpu/topdown-fe-bound/). The cpu/topdown-fe-bound/ kind format
isn't supported by the arch_evlist__leader(). This format is a very
common format for a hybrid platform, which requires specifying the PMU
name for each event.
Without the patch,
$ perf stat -e '{instructions,slots,cpu/topdown-fe-bound/}' -a sleep 1
Performance counter stats for 'system wide':
<not counted> instructions
<not counted> slots
<not supported> cpu/topdown-fe-bound/
1.003482041 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try reorganizing the group.
With the patch,
$ perf stat -e '{instructions,slots,cpu/topdown-fe-bound/}' -a sleep 1
Performance counter stats for 'system wide':
157,383,996 slots
25,011,711 instructions
27,441,686 cpu/topdown-fe-bound/
1.003530890 seconds time elapsed
Fixes: bc355822f0d9623b ("perf parse-events: Move slots only with topdown")
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If any member in a group has a different cpu mask than the other
members, the current perf stat disables group. when the perf metrics
topdown events are part of the group, the below <not supported> error
will be triggered.
$ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }
Performance counter stats for 'system wide':
141,465,174 slots
<not supported> topdown-retiring
1,605,330,334 uncore_imc_free_running_0/dclk/
The perf metrics topdown events must always be grouped with a slots
event as leader.
Factor out evsel__remove_from_group() to only remove the regular events
from the group.
Remove evsel__must_be_in_group(), since no one use it anymore.
With the patch, the topdown events aren't broken from the group for the
splitting.
$ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
WARNING: grouped events cpus do not match, disabling group:
anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }
Performance counter stats for 'system wide':
346,110,588 slots
124,608,256 topdown-retiring
1,606,869,976 uncore_imc_free_running_0/dclk/
1.003877592 seconds time elapsed
Fixes: a9a1790247bdcf3b ("perf stat: Ensure group is defined on top of the same cpu mask")
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The patch ("perf evlist: Keep topdown counters in weak group") fixes the
perf metrics topdown event issue when the topdown events are in a weak
group on a non-hybrid platform. However, it doesn't work for the hybrid
platform.
$./perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1
Performance counter stats for 'system wide':
751,765,068 cpu_core/slots/ (84.07%)
<not supported> cpu_core/topdown-bad-spec/
<not supported> cpu_core/topdown-be-bound/
<not supported> cpu_core/topdown-fe-bound/
<not supported> cpu_core/topdown-retiring/
12,398,197 cpu_core/branch-instructions/ (84.07%)
1,054,218 cpu_core/branch-misses/ (84.24%)
539,764,637 cpu_core/bus-cycles/ (84.64%)
14,683 cpu_core/cache-misses/ (84.87%)
7,277,809 cpu_core/cache-references/ (77.30%)
222,299,439 cpu_core/cpu-cycles/ (77.28%)
63,661,714 cpu_core/instructions/ (84.85%)
0 cpu_core/mem-loads/ (77.29%)
12,271,725 cpu_core/mem-stores/ (77.30%)
542,241,102 cpu_core/ref-cycles/ (84.85%)
8,854 cpu_core/cache-misses/ (76.71%)
7,179,013 cpu_core/cache-references/ (76.31%)
1.003245250 seconds time elapsed
A hybrid platform has a different PMU name for the core PMUs, while
the current perf hard code the PMU name "cpu".
The evsel->pmu_name can be used to replace the "cpu" to fix the issue.
For a hybrid platform, the pmu_name must be non-NULL. Because there are
at least two core PMUs. The PMU has to be specified.
For a non-hybrid platform, the pmu_name may be NULL. Because there is
only one core PMU, "cpu". For a NULL pmu_name, we can safely assume that
it is a "cpu" PMU.
In case other PMUs also define the "slots" event, checking the PMU type
as well.
With the patch,
$ perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1
Performance counter stats for 'system wide':
766,620,266 cpu_core/slots/ (84.06%)
73,172,129 cpu_core/topdown-bad-spec/ # 9.5% bad speculation (84.06%)
193,443,341 cpu_core/topdown-be-bound/ # 25.0% backend bound (84.06%)
403,940,929 cpu_core/topdown-fe-bound/ # 52.3% frontend bound (84.06%)
102,070,237 cpu_core/topdown-retiring/ # 13.2% retiring (84.06%)
12,364,429 cpu_core/branch-instructions/ (84.03%)
1,080,124 cpu_core/branch-misses/ (84.24%)
564,120,383 cpu_core/bus-cycles/ (84.65%)
36,979 cpu_core/cache-misses/ (84.86%)
7,298,094 cpu_core/cache-references/ (77.30%)
227,174,372 cpu_core/cpu-cycles/ (77.31%)
63,886,523 cpu_core/instructions/ (84.87%)
0 cpu_core/mem-loads/ (77.31%)
12,208,782 cpu_core/mem-stores/ (77.31%)
566,409,738 cpu_core/ref-cycles/ (84.87%)
23,118 cpu_core/cache-misses/ (76.71%)
7,212,602 cpu_core/cache-references/ (76.29%)
1.003228667 seconds time elapsed
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Stat events can come from disk and so need a degree of validation. They
contain a CPU which needs looking up via CPU map to access a counter.
Add the CPU to index translation, alongside validity checking.
Discussion thread:
https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/
Fixes: 7ac0089d138f80dc ("perf evsel: Pass cpu not cpu map index to synthesize")
Reported-by: Michael Petlan <[email protected]>
Suggested-by: Michael Petlan <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Dave Marchevsky <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Lv Ruyi <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Quentin Monnet <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Avi Kivity reported a problem where the __weak
btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being
used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in
turn called back to btf__load_from_kernel_by_id(), resulting in an
endless loop.
Fix this by adding a feature test to check if
btf__load_from_kernel_by_id() is available when building perf with
LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old
btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id()
since at that time it didn't exist at all.
Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC
where we don't have btf__load_from_kernel_by_id() and thus its feature
test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID:
$ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output
test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’:
test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration]
6 | return btf__load_from_kernel_by_id(20151128, NULL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
$
$ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id
00000000005ba180 T btf__load_from_kernel_by_id
$
$ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf
/tmp/build/perf-urgent/perf: file format elf64-x86-64
<SNIP>
00000000005ba180 <btf__load_from_kernel_by_id>:
#include "record.h"
#include "util/synthetic-events.h"
#ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID
struct btf *btf__load_from_kernel_by_id(__u32 id)
{
5ba180: 55 push %rbp
5ba181: 48 89 e5 mov %rsp,%rbp
5ba184: 48 83 ec 10 sub $0x10,%rsp
5ba188: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
5ba18f: 00 00
5ba191: 48 89 45 f8 mov %rax,-0x8(%rbp)
5ba195: 31 c0 xor %eax,%eax
struct btf *btf;
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
int err = btf__get_from_id(id, &btf);
5ba197: 48 8d 75 f0 lea -0x10(%rbp),%rsi
5ba19b: e8 a0 57 e5 ff call 40f940 <btf__get_from_id@plt>
5ba1a0: 89 c2 mov %eax,%edx
#pragma GCC diagnostic pop
return err ? ERR_PTR(err) : btf;
5ba1a2: 48 98 cltq
5ba1a4: 85 d2 test %edx,%edx
5ba1a6: 48 0f 44 45 f0 cmove -0x10(%rbp),%rax
}
<SNIP>
Fixes: 218e7b775d368f38 ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions")
Reported-by: Avi Kivity <[email protected]>
Link: https://lore.kernel.org/linux-perf-users/[email protected]
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/linux-perf-users/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a basic stat test.
Add two tests of grouping behavior for topdown events. Topdown events
are special as they must be grouped with the slots event first.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
On Intel Icelake, topdown events must always be grouped with a slots
event as leader. When a metric is parsed a weak group is formed and
retried if perf_event_open fails. The retried events aren't grouped
breaking the slots leader requirement. This change modifies the weak
group "reset" behavior so that topdown events aren't broken from the
group for the retry.
$ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1
Performance counter stats for 'system wide':
47,867,188,483 slots (92.27%)
<not supported> topdown-bad-spec
<not supported> topdown-be-bound
<not supported> topdown-fe-bound
<not supported> topdown-retiring
2,173,346,937 branch-instructions (92.27%)
10,540,253 branch-misses # 0.48% of all branches (92.29%)
96,291,140 bus-cycles (92.29%)
6,214,202 cache-misses # 20.120 % of all cache refs (92.29%)
30,886,082 cache-references (76.91%)
11,773,726,641 cpu-cycles (84.62%)
11,807,585,307 instructions # 1.00 insn per cycle (92.31%)
0 mem-loads (92.32%)
2,212,928,573 mem-stores (84.69%)
10,024,403,118 ref-cycles (92.35%)
16,232,978 baclears.any (92.35%)
23,832,633 ARITH.DIVIDER_ACTIVE (84.59%)
0.981070734 seconds time elapsed
After:
$ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1
Performance counter stats for 'system wide':
31040189283 slots (92.27%)
8997514811 topdown-bad-spec # 28.2% bad speculation (92.27%)
10997536028 topdown-be-bound # 34.5% backend bound (92.27%)
4778060526 topdown-fe-bound # 15.0% frontend bound (92.27%)
7086628768 topdown-retiring # 22.2% retiring (92.27%)
1417611942 branch-instructions (92.26%)
5285529 branch-misses # 0.37% of all branches (92.28%)
62922469 bus-cycles (92.29%)
1440708 cache-misses # 8.292 % of all cache refs (92.30%)
17374098 cache-references (76.94%)
8040889520 cpu-cycles (84.63%)
7709992319 instructions # 0.96 insn per cycle (92.32%)
0 mem-loads (92.32%)
1515669558 mem-stores (84.68%)
6542411177 ref-cycles (92.35%)
4154149 baclears.any (92.35%)
20556152 ARITH.DIVIDER_ACTIVE (84.59%)
1.010799593 seconds time elapsed
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
it is ASCII
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.
To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.
Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:
$ echo -n "Hello" | od -t x8
0000000 0000006f6c6c6548
0000005
$ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=ew intel-pt-events.py
Intel PT Branch Trace, Power Events, Event Trace and PTWRITE
Switch In 38524/38524 [001] 24166.044995916 0/0
eg_ptw 38524/38524 [001] 24166.045380004 ptwrite jmp IP: 0 payload: 0x6f6c6c6548 Hello 56532c7ce196 perf_emulate_ptwrite+0x16 (/home/ahunter/git/work/eg_ptw)
End
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.
To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.
Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:
$ echo -n "Hello" | od -t x8
0000000 0000006f6c6c6548
0000005
$ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=ew
eg_ptw 35563 [005] 18256.087338: ptwrite: IP: 0 payload: 0x6f6c6c6548 Hello 55e764db5196 perf_emulate_ptwrite+0x16 (/home/user/eg_ptw)
$
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
ptwrite is an Intel x86 instruction that writes arbitrary values into an
Intel PT trace. It is not supported on all hardware, so provide an
alternative that makes use of TNT packets to convey the payload data.
TNT packets encode Taken/Not-taken conditional branch information, so
taking branches based on the payload value will encode the value into
the TNT packet. Refer to the changes to the documentation file
perf-intel-pt.txt in this patch for an example.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Cast pointers to unsigned long instead of to uint64_t to avoid this
problem on 32-bit arches:
31 6.89 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
bench/breakpoint.c: In function 'breakpoint_setup':
bench/breakpoint.c:56:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
56 | attr.bp_addr = (uint64_t)addr;
| ^
cc1: all warnings being treated as errors
make[3]: *** [/git/perf-5.18.0-rc7/tools/build/Makefile.build:139: bench] Error 2
Fixes: 68a6772f11dbb1ed ("perf bench: Add breakpoint benchmarks")
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes from perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
machines__find_host() does not exist. Remove declaration.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add 2 benchmarks:
1. Performance of thread creation/exiting in presence of breakpoints.
2. Performance of breakpoint modification in presence of threads.
The benchmarks capture use cases that we are interested in:
using inheritable breakpoints in large highly-threaded applications.
The benchmarks show significant slowdown imposed by breakpoints
(even when they don't fire).
Testing on Intel 8173M with 112 HW threads show:
perf bench --repeat=56 breakpoint thread --breakpoints=0 --parallelism=56 --threads=20
78.675000 usecs/op
perf bench --repeat=56 breakpoint thread --breakpoints=4 --parallelism=56 --threads=20
12967.135714 usecs/op
That's 165x slowdown due to presence of the breakpoints.
perf bench --repeat=20000 breakpoint enable --passive=0 --active=0
1.433250 usecs/op
perf bench --repeat=20000 breakpoint enable --passive=224 --active=0
585.318400 usecs/op
perf bench --repeat=20000 breakpoint enable --passive=0 --active=111
635.953000 usecs/op
That's 408x and 444x slowdown due to presence of threads.
Profiles show some overhead in toggle_bp_slot,
but also very high contention:
90.83% breakpoint-thre [kernel.kallsyms] [k] osq_lock
4.69% breakpoint-thre [kernel.kallsyms] [k] mutex_spin_on_owner
2.06% breakpoint-thre [kernel.kallsyms] [k] __reserve_bp_slot
2.04% breakpoint-thre [kernel.kallsyms] [k] toggle_bp_slot
79.01% breakpoint-enab [kernel.kallsyms] [k] smp_call_function_single
9.94% breakpoint-enab [kernel.kallsyms] [k] llist_add_batch
5.70% breakpoint-enab [kernel.kallsyms] [k] _raw_spin_lock_irq
1.84% breakpoint-enab [kernel.kallsyms] [k] event_function_call
1.12% breakpoint-enab [kernel.kallsyms] [k] send_call_function_single_ipi
0.37% breakpoint-enab [kernel.kallsyms] [k] generic_exec_single
0.24% breakpoint-enab [kernel.kallsyms] [k] __perf_event_disable
0.20% breakpoint-enab [kernel.kallsyms] [k] _perf_event_enable
0.18% breakpoint-enab [kernel.kallsyms] [k] toggle_bp_slot
Committer notes:
Fixup struct init for older compilers:
3 32.90 alpine:3.5 : FAIL clang version 3.8.1 (tags/RELEASE_381/final)
bench/breakpoint.c:49:34: error: missing field 'size' initializer [-Werror,-Wmissing-field-initializers]
struct perf_event_attr attr = {0};
^
1 error generated.
7 37.31 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0)
bench/breakpoint.c:49:34: error: missing field 'size' initializer [-Werror,-Wmissing-field-initializers]
struct perf_event_attr attr = {0};
^
1 error generated.
Signed-off-by: Dmitriy Vyukov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like in 'perf report' and 'perf top', Add this option to limit the
number of functions it displays based on the overhead value in percent.
This affects only stdio and stdio2 output modes. Without this, it
shows very long disassembly lines for every function in the data
file. If users don't want this behavior, they can set a value in
percent to suppress that.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is
needed, in preparation for correctly determining whether or not an
auxtrace mmap is needed.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add evsel as a parameter to ->idx() in preparation for correctly
determining whether an auxtrace mmap is needed.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Remove ->idx() per_cpu parameter because it isn't needed.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The idx is with respect to evlist not evsel. That hasn't mattered because
they are the same at present. Prepare for that not being the case, which it
won't be when sideband tracking events are allowed on all CPUs even when
auxtrace is limited to selected CPUs.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
evlist__enable_event_idx() is used only by auxtrace. Move it to auxtrace.c
in preparation for making it even more auxtrace specific.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
evlist__enable_event_idx() is used only for auxtrace events which are never
system_wide. Simplify by using libperf enable event functions.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Tool events are added to the set of events for parsing so that having a
tool event in a metric doesn't inhibit event sharing of events between
metrics.
All tool events were added but this meant unused tool events would be
counted. Reduce this set of tool events to just those present in the
overall metric list.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Previously duration_time was hard coded, which was ok until commit
b03b89b350034f22 ("perf stat: Add user_time and system_time events")
added additional tool events. Do for all tool events what was previously
done just for duration_time.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Convert to and from a string. Fix evsel__tool_name() as array is
off-by-1. Support more than just duration_time as a metric-id.
Fixes: 75eafc970bd9d36d ("perf list: Print all available tool events")
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Remove public definition of evsel__tool_names(). Not used outside
util/evsel.c.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This reverts commit 60344f1a9a597f2e0efcd57df5dad0b42da15e21.
Hybrid metrics place a PMU at the end of the parse string. This is also
where tool events are placed. The behavior of the parse string isn't
clear and so revert the change for now.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Shunsuke Nakamura <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently the `perf test` always fails the coresight test like:
89: Check Arm CoreSight trace data recording and synthesized samples: FAILED!
That is because the test_arm_coresight.sh is attempting to SIGINT the
parent but is using $$ rather than $PPID and it sigint's itself when
run under the perf test framework.
Since this is done in a trap clause it ends up returning a non zero
return.
Since $PPID is a bash ism and not all distros are linking /bin/sh to
bash, the alternative parent pid lookups are uglier than just dropping
the kill, and its not strictly needed, lets pick the simple solution and
drop the sigint.
Fixes: 133fe2e617e48ca0 ("perf tests: Improve temp file cleanup in test_arm_coresight.sh")
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Jeremy Linton <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Jeremy Linton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
BUG_ON is a no-op if NDEBUG is defined, otherwise it is an assert.
Compiling with NDEBUG yields:
bench/numa.c: In function ‘bind_to_cpu’:
bench/numa.c:314:1: error: control reaches end of non-void function [-Werror=return-type]
314 | }
| ^
bench/numa.c: In function ‘bind_to_node’:
bench/numa.c:367:1: error: control reaches end of non-void function [-Werror=return-type]
367 | }
| ^
Add return statements to cover this case.
Reviewed-by: Athira Jajeev <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As reported in:
https://lore.kernel.org/linux-perf-users/[email protected]/
the 'instructions:u' event may not be supported. Add a skip using 'perf
record'.
Switch some code away from pipe to make the failures clearer.
Reported-by: Thomas Richter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Thomas Richter <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Switch some raw accesses to the cpu map to using the library API. This
can help with reference count checking. Some BPF cases switch from index
to CPU for consistency, this shouldn't matter as the CPU map is full.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for CascadeLake Server v1.15 with events from:
https://download.01.org/perfmon/CLX/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add JSON uncore events for Sapphirerapids to perf.
Based on JSON list v1.01:
https://download.01.org/perfmon/SPR/
Signed-off-by: Zhengjun Xing <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Update JSON core events for Sapphirerapids to perf.
Based on JSON list v1.01:
https://download.01.org/perfmon/SPR/
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Zhengjun Xing <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This fixes the issue where the build will fail if only the Python2
runtime is installed but the Python3 devtools are installed. Currently
the workaround is 'make PYTHON=python3'.
Fix it by autodetecting Python based on whether python[x]-config exists
rather than just python[x] because both are needed for the build. Then
-config is stripped to find the Python runtime.
Testing
=======
* Auto detect links with Python3 when the v3 devtools are installed
and only Python 2 runtime is installed
* Auto detect links with Python2 when both devtools are installed
* Sensible warning is printed if no Python devtools are installed
* 'make PYTHON=x' still automatically sets PYTHON_CONFIG=x-config
* 'make PYTHON=x' fails if x-config doesn't exist
* 'make PYTHON=python3' overrides Python2 devtools
* 'make PYTHON=python2' overrides Python3 devtools
* 'make PYTHON_CONFIG=x-config' works
* 'make PYTHON=x PYTHON_CONFIG=x' works
* 'make PYTHON=missing' reports an error
* 'make PYTHON_CONFIG=missing' reports an error
Fixes: 79373082fa9de8be ("perf python: Autodetect python3 binary")
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
'struct perf_data' in util/data.h uses the "u64" data type, which is
defined in "linux/types.h".
If we only include util/data.h, the following compilation error occurs:
util/data.h:38:3: error: unknown type name ‘u64’
u64 version;
^~~
Solution: include "linux/types.h." to add the needed type definitions.
Fixes: 258031c017c353e8 ("perf header: Add DIR_FORMAT feature to describe directory data")
Signed-off-by: Yang Jihong <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes from perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now the generic code can handle kallsyms fixup properly so no need to
keep the arch-functions anymore.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now arch-specific functions all do the same thing. When it fixes the
symbol address it needs to check the boundary between the kernel image
and modules. For the last symbol in the previous region, it cannot
know the exact size as it's discarded already. Thus it just uses a
small page size (4096) and rounds it up like the last symbol.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The symbol fixup is necessary for symbols in kallsyms since they don't
have size info. So we use the next symbol's address to calculate the
size. Now it's also used for user binaries because sometimes they miss
size for hand-written asm functions.
There's a arch-specific function to handle kallsyms differently but
currently it cannot distinguish kallsyms from others. Pass this
information explicitly to handle it properly. Note that those arch
functions will be moved to the generic function so I didn't added it to
the arch-functions.
Fixes: 3cf6a32f3f2a4594 ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adds a perf_event_attr test for Arm SPE in which the presence of
physical addresses are checked when SPE unit is run with pa_enable=1.
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Timothy Hayes <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Timothy Hayes <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.
Fixes: 54f7815efef7fad9 ("perf arm-spe: Fill address info for samples")
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Timothy Hayes <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: [email protected]
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Cc: Song Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Westmere EX v3 with events from:
https://download.01.org/perfmon/WSM-EX/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Westmere EP-SP v3 with events from:
https://download.01.org/perfmon/WSM-EP-SP/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Skylake Server v1.27 with
events from:
https://download.01.org/perfmon/SKX/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Skylake v53 with
events from:
https://download.01.org/perfmon/SKL/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Ivytown v21 with events from:
https://download.01.org/perfmon/IVT/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change fixes a spelling mistake in a description.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Events are generated for Icelake v1.13 with events from:
https://download.01.org/perfmon/ICL/
Using the scripts at:
https://github.com/intel/event-converter-for-linux-perf/
This change updates descriptions and adds INST_DECODED.DECODERS.
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|