Age | Commit message (Collapse) | Author | Files | Lines |
|
Add JSON uncore events for Alderlake to perf.
Based on JSON list v1.06:
https://download.01.org/perfmon/ADL/
Signed-off-by: Zhengjun Xing <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add JSON core events for Alderlake to perf.
It is a hybrid event list for both Atom and Core.
Based on JSON list v1.06:
https://download.01.org/perfmon/ADL/
Signed-off-by: Zhengjun Xing <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
net/batman-adv/hard-interface.c
commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check")
commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message")
https://lore.kernel.org/all/[email protected]/
net/smc/af_smc.c
commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails")
commit 462791bbfa35 ("net/smc: add sysctl interface for SMC")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This expands generic branch type classification by adding two more entries
there in i.e irq and exception return. Also updates the x86 implementation
to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add support for HiSilicon CPA PMU aliasing.
The kernel driver is in drivers/perf/hisilicon/hisi_uncore_cpa_pmu.c
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Qi Liu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When recording SPE traces, the default sample_period is currently being
set to 1 in the perf_event_attr fields, instead of the value advertised
in '/sys/devices/arm_spe_0/caps/min_interval':
Before:
$ perf record -e arm_spe// -vv -- sleep 1
[...]
{ sample_period, sample_freq } 1
[...]
Use the value from the above sysfs location as a more sensible default
(it was already being read, but the value not being used)
After:
$ perf record -e arm_spe// -vv -- sleep 1
[...]
{ sample_period, sample_freq } 1024
[...]
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: German Gomez <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The option `--to-ctf` is only available when perf has libbabeltrace
support. Hence, on error, we shouldn't state that user must include
`--to-ctf` unless it's supported.
The only user-visible change for this commit is that when `perf` is not
configured to support libbabeltrace, the user is only prompted to
provide the `--to-json` option instead of bothe `--to-json` and
`--to-ctf`.
Signed-off-by: Mahmoud Mandour <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In SPE traces the 'weight' field can't be printed in 'perf script'
because the 'dummy:u' event doesn't have the WEIGHT attribute set.
Use evsel__do_check_stype(..) to check this field, as it's done with
other fields such as "phys_addr".
Before:
$ perf record -e arm_spe_0// -- sleep 1
$ perf script -F event,ip,weight
Samples for 'dummy:u' event do not have WEIGHT attribute set. Cannot print 'weight' field.
After:
$ perf script -F event,ip,weight
l1d-access: 12 ffffaf629d4cb320
tlb-access: 12 ffffaf629d4cb320
memory: 12 ffffaf629d4cb320
Fixes: b0fde9c6e291e528 ("perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHT")
Signed-off-by: German Gomez <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add proper return codes for all cases of data directory creation failure
and add error message output based on these codes.
Signed-off-by: Alexey Bayduraev <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When perf_data__create_dir() fails, it calls close_dir(), but
perf_session__delete() also calls close_dir() and since dir.version and
dir.nr were initialized by perf_data__create_dir(), a double free occurs.
This patch moves the initialization of dir.version and dir.nr after
successful initialization of dir.files, that prevents double freeing.
This behavior is already implemented in perf_data__open_dir().
Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions")
Signed-off-by: Alexey Bayduraev <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now that all aliases are defined using SYM_FUNC_ALIAS(), remove the old
SYM_FUNC_{START,END}_ALIAS() macros.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Mark Brown <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Currently aliasing an asm function requires adding START and END
annotations for each name, as per Documentation/asm-annotations.rst:
SYM_FUNC_START_ALIAS(__memset)
SYM_FUNC_START(memset)
... asm insns ...
SYM_FUNC_END(memset)
SYM_FUNC_END_ALIAS(__memset)
This is more painful than necessary to maintain, especially where a
function has many aliases, some of which we may wish to define
conditionally. For example, arm64's memcpy/memmove implementation (which
uses some arch-specific SYM_*() helpers) has:
SYM_FUNC_START_ALIAS(__memmove)
SYM_FUNC_START_ALIAS_WEAK_PI(memmove)
SYM_FUNC_START_ALIAS(__memcpy)
SYM_FUNC_START_WEAK_PI(memcpy)
... asm insns ...
SYM_FUNC_END_PI(memcpy)
EXPORT_SYMBOL(memcpy)
SYM_FUNC_END_ALIAS(__memcpy)
EXPORT_SYMBOL(__memcpy)
SYM_FUNC_END_ALIAS_PI(memmove)
EXPORT_SYMBOL(memmove)
SYM_FUNC_END_ALIAS(__memmove)
EXPORT_SYMBOL(__memmove)
SYM_FUNC_START(name)
It would be much nicer if we could define the aliases *after* the
standard function definition. This would avoid the need to specify each
symbol name twice, and would make it easier to spot the canonical
function definition.
This patch adds new macros to allow us to do so, which allows the above
example to be rewritten more succinctly as:
SYM_FUNC_START(__pi_memcpy)
... asm insns ...
SYM_FUNC_END(__pi_memcpy)
SYM_FUNC_ALIAS(__memcpy, __pi_memcpy)
EXPORT_SYMBOL(__memcpy)
SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
EXPORT_SYMBOL(memcpy)
SYM_FUNC_ALIAS(__pi_memmove, __pi_memcpy)
SYM_FUNC_ALIAS(__memmove, __pi_memmove)
EXPORT_SYMBOL(__memmove)
SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
EXPORT_SYMBOL(memmove)
The reduction in duplication will also make it possible to replace some
uses of WEAK with more accurate Kconfig guards, e.g.
#ifndef CONFIG_KASAN
SYM_FUNC_ALIAS(memmove, __memmove)
EXPORT_SYMBOL(memmove)
#endif
... which should make it easier to ensure that symbols are neither used
nor overidden unexpectedly.
The existing SYM_FUNC_START_ALIAS() and SYM_FUNC_START_LOCAL_ALIAS() are
marked as deprecated, and will be removed once existing users are moved
over to the new scheme.
The tools/perf/ copy of linkage.h is updated to match. A subsequent
patch will depend upon this when updating the x86 asm annotations.
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Mark Brown <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The 'perf record' and 'perf stat' commands have supported the option
'-C/--cpus' to count or collect only on the list of CPUs provided.
Commit 1d3351e631fc34d7 ("perf tools: Enable on a list of CPUs for
hybrid") add it to be supported for hybrid. For hybrid support, it
checks the cpu list are available on hybrid PMU. But when we test only
uncore events(or events not in cpu_core and cpu_atom), there is a bug:
Before:
# perf stat -C0 -e uncore_clock/clockticks/ sleep 1
failed to use cpu list 0
In this case, for uncore event, its pmu_name is not cpu_core or
cpu_atom, so in evlist__fix_hybrid_cpus, perf_pmu__find_hybrid_pmu
should return NULL,both events_nr and unmatched_count should be 0 ,then
the cpu list check function evlist__fix_hybrid_cpus return -1 and the
error "failed to use cpu list 0" will happen. Bypass "events_nr=0" case
then the issue is fixed.
After:
# perf stat -C0 -e uncore_clock/clockticks/ sleep 1
Performance counter stats for 'CPU(s) 0':
195,476,873 uncore_clock/clockticks/
1.004518677 seconds time elapsed
When testing with at least one core event and uncore events, it has no
issue.
# perf stat -C0 -e cpu_core/cpu-cycles/,uncore_clock/clockticks/ sleep 1
Performance counter stats for 'CPU(s) 0':
5,993,774 cpu_core/cpu-cycles/
301,025,912 uncore_clock/clockticks/
1.003964934 seconds time elapsed
Fixes: 1d3351e631fc34d7 ("perf tools: Enable on a list of CPUs for hybrid")
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Zhengjun Xing <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Skip the Sigtrap test for arm + arm64, same as was done for s390 in
commit a840974e96fd ("perf test: Test 73 Sig_trap fails on s390"). For
this, reuse BP_SIGNAL_IS_SUPPORTED - meaning that the arch can use BP to
generate signals - instead of BP_ACCOUNT_IS_SUPPORTED, which is
appropriate.
As described by Will at [0], in the test we get stuck in a loop of
handling the HW breakpoint exception and never making progress. GDB
handles this by stepping over the faulting instruction, but with perf
the kernel is expected to handle the step (which it doesn't for arm).
Dmitry made an attempt to get this work, also mentioned in the same
thread as [0], which was appreciated. But the best thing to do is skip
the test for now.
[0] https://lore.kernel.org/linux-perf-users/20220118124343.GC98966@leoy-ThinkPad-X240s/T/#m13b06c39d2a5100d340f009435df6f4d8ee57b5a
Fixes: 5504f67944484495 ("perf test sigtrap: Add basic stress test for sigtrap handling")
Signed-off-by: John Garry <[email protected]>
Tested-by: Leo Yan <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes from perf/urgent that recently got merged.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This was detected by the gcc in Fedora Rawhide's gcc:
50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
1225 | *key_scan_pos += strlen(map_opt);
| ^~~~~~~~~~~~~~~
util/bpf-loader.c:1223:9: note: call to 'free' here
1223 | free(map_name);
| ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The struct perf_event_attr is initialised differently in Arm64 when
recording in call-graph fp mode, so update the relevant tests, and add
two extra arm64-only tests.
Before:
$ perf test 17 -v
17: Setup struct perf_event_attr
[...]
running './tests/attr/test-record-graph-default'
expected sample_type=295, got 4391
expected sample_regs_user=0, got 1073741824
FAILED './tests/attr/test-record-graph-default' - match failure
test child finished with -1
---- end ----
After:
[...]
running './tests/attr/test-record-graph-default-aarch64'
test limitation 'aarch64'
running './tests/attr/test-record-graph-fp-aarch64'
test limitation 'aarch64'
running './tests/attr/test-record-graph-default'
test limitation '!aarch64'
excluded architecture list ['aarch64']
skipped [aarch64] './tests/attr/test-record-graph-default'
running './tests/attr/test-record-graph-fp'
test limitation '!aarch64'
excluded architecture list ['aarch64']
skipped [aarch64] './tests/attr/test-record-graph-fp'
[...]
Fixes: 7248e308a5758761 ("perf tools: Record ARM64 LR register automatically")
Signed-off-by: German Gomez <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Truong <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
'perf inject' with Coresight data generates files that cannot be opened
when only the last branch option is specified:
perf inject -i perf.data --itrace=l -o inject.data
perf script -i inject.data
0x33faa8 [0x8]: failed to process type: 9 [Bad address]
This is because cs_etm__synth_instruction_sample() is called even when
the sample type for instructions hasn't been setup. Last branch records
are attached to instruction samples so it doesn't make sense to generate
them when --itrace=i isn't specified anyway.
This change disables all calls of cs_etm__synth_instruction_sample()
unless --itrace=i is specified, resulting in a file with no samples if
only --itrace=l is provided, rather than a bad file.
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
sample_branches and sample_instructions are already saved in the
synth_opts struct. Other usages like synth_opts.last_branch don't save a
value, so make this more consistent by always going through synth_opts
and not saving duplicate values.
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The function trace__symbols_init() runs "perf-read-vdso32" and that ends up
with a SIGCHLD delivered to 'perf'. And this SIGCHLD make perf exit early.
'perf trace' should exit only if the SIGCHLD is from our workload process.
So let's use sigaction() instead of signal() to match such condition.
Committer notes:
Use memset to zero the 'struct sigaction' variable as the '= { 0 }'
method isn't accepted in many compiler versions, e.g.:
4 34.02 alpine:3.6 : FAIL clang version 4.0.0 (tags/RELEASE_400/final)
builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct sigaction sigchld_act = { 0 };
^
{}
builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
struct sigaction sigchld_act = { 0 };
^
2 errors generated.
6 32.60 alpine:3.8 : FAIL gcc version 6.4.0 (Alpine 6.4.0)
builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct sigaction sigchld_act = { 0 };
^
{}
builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
struct sigaction sigchld_act = { 0 };
^
2 errors generated.
7 34.82 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0)
builtin-trace.c:4897:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct sigaction sigchld_act = { 0 };
^
{}
builtin-trace.c:4897:37: error: missing field 'sa_mask' initializer [-Werror,-Wmissing-field-initializers]
struct sigaction sigchld_act = { 0 };
^
2 errors generated.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same. That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.
Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch. These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.
Here is an example using AMD's branch sampling:
$ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg
$ perf report
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycle
99.65% test_prg test_prg [.] test_thread [.] test_thread -
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_timer_interrupt [k] error_entry -
$ perf report -F overhead,comm,dso,addr_from,addr_to
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Shared Object Source Address Target Address
4.22% test_prg test_prg [.] test_thread+0x3c [.] test_thread+0x4
4.13% test_prg test_prg [.] test_thread+0x4 [.] test_thread+0x3a
4.09% test_prg test_prg [.] test_thread+0x3a [.] test_thread+0x6
4.08% test_prg test_prg [.] test_thread+0x2 [.] test_thread+0x3c
4.06% test_prg test_prg [.] test_thread+0x3e [.] test_thread+0x2
3.87% test_prg test_prg [.] test_thread+0x6 [.] test_thread+0x38
3.84% test_prg test_prg [.] test_thread [.] test_thread+0x3e
3.76% test_prg test_prg [.] test_thread+0x1e [.] test_thread
3.76% test_prg test_prg [.] test_thread+0x38 [.] test_thread+0x8
3.56% test_prg test_prg [.] test_thread+0x22 [.] test_thread+0x1e
3.54% test_prg test_prg [.] test_thread+0x8 [.] test_thread+0x36
3.47% test_prg test_prg [.] test_thread+0x1c [.] test_thread+0x22
3.45% test_prg test_prg [.] test_thread+0x36 [.] test_thread+0xa
3.28% test_prg test_prg [.] test_thread+0x24 [.] test_thread+0x1c
3.25% test_prg test_prg [.] test_thread+0xa [.] test_thread+0x34
3.24% test_prg test_prg [.] test_thread+0x1a [.] test_thread+0x24
3.20% test_prg test_prg [.] test_thread+0x34 [.] test_thread+0xc
3.04% test_prg test_prg [.] test_thread+0x26 [.] test_thread+0x1a
3.01% test_prg test_prg [.] test_thread+0xc [.] test_thread+0x32
2.98% test_prg test_prg [.] test_thread+0x18 [.] test_thread+0x26
2.94% test_prg test_prg [.] test_thread+0x32 [.] test_thread+0xe
2.76% test_prg test_prg [.] test_thread+0x28 [.] test_thread+0x18
2.73% test_prg test_prg [.] test_thread+0xe [.] test_thread+0x30
2.67% test_prg test_prg [.] test_thread+0x30 [.] test_thread+0x10
2.67% test_prg test_prg [.] test_thread+0x16 [.] test_thread+0x28
2.46% test_prg test_prg [.] test_thread+0x10 [.] test_thread+0x2e
2.44% test_prg test_prg [.] test_thread+0x2a [.] test_thread+0x16
2.38% test_prg test_prg [.] test_thread+0x14 [.] test_thread+0x2a
2.32% test_prg test_prg [.] test_thread+0x2e [.] test_thread+0x12
2.28% test_prg test_prg [.] test_thread+0x12 [.] test_thread+0x2c
2.16% test_prg test_prg [.] test_thread+0x2c [.] test_thread+0x14
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_ti+0x5 [k] error_entry
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
There is a spelling mistake in a debug message. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Return the result from hist_entry_iter__add() directly instead of taking
this in another redundant variable.
Signed-off-by: tangmeng <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The variable 'err' in the perf_event__process_sample() is only used in
the only one judgment statement, it is not used in other places.
So, use the return value from hist_entry_iter__add() directly instead of
taking this in another redundant variable.
Signed-off-by: tangmeng <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When testing metric expressions we fake counter values from 1 going
upward. For some metrics this can yield negative values that are clipped
to zero, and then cause divide by zero failures.
Such clipping is questionable but may be a result of tools automatically
generating metrics. A workaround for this case is to try a second time
with counter values going in the opposite direction.
This case was seen in a metric like:
event1 / max(event2 - event3, 0)
But it may also happen in more sensible metrics like:
event1 / (event2 + event3 - 1 - event4)
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now that a config flag for branch broadcast has been added, take it into
account when trying to deduce what the driver would have programmed the
TRCCONFIGR register to.
Reviewed-by: Leo Yan <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Reviewed-by: Suzuki Poulouse <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some code in builtin-c2c.c calls bitmap_weight() to check if any bit of
a given bitmap is set.
It's better to use bitmap_empty() in that case because bitmap_empty()
stops traversing the bitmap as soon as it finds first set bit, while
bitmap_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: David Laight <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matti Vaittinen <[email protected]>
Cc: Michał Mirosław <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make the --tui command line flags dependent HAVE_SLANG_SUPPORT. This was
reported as confusing in:
https://lore.kernel.org/linux-perf-users/[email protected]/
Reported-by: xaizek <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: xaizek <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add documentation for Event Trace and TNT disable to the perf Intel PT man
page.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add sample flags to the PostgreSQL database definition and export.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add sample flags to the SQLite database definition and export.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently, the transaction flag (x) is kept separate from branch flags.
Instead of doing the same for the interrupt disabled flags (D and t), add
all flags so that new flags will not need to be handled separately in the
future.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add Event Trace to the intel-pt-events.py script. This shows how to unpack
the raw data from the new sample events in a Python script.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Amend the display to include D and t flags in the same way as the x flag.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Similar to other Intel PT synth events, display changes to the interrupt
flag represented by the MODE.Exec packet.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
synthesized event
Similar to other Intel PT synth events, display Event Trace events recorded
by CFE / EVD packets.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It is not possible to walk the executable code without TNT packets, so
force 'quick' mode when TNT is disabled, because 'quick' mode does not walk
the code.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Update sample flags to represent the state and changes to the interrupt
flag.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Synthesize an attribute event and sample events for changes to the
interrupt flag represented by the MODE.Exec packet.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Synthesize an attribute event and sample events for Intel PT Event Trace
events represented by CFE and EVD packets.
Committer notes:
Make 'struct perf_synth_intel_evd evd[]' evd[0] at the end of 'struct
perf_synth_intel_evt' as it is breaking the build with in many compilers
with (e.g. clang version 13.0.0 (Fedora 13.0.0-3.fc35)):
util/intel-pt.c:2213:31: error: field 'cfe' with variable sized type 'struct perf_synth_intel_evt' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
struct perf_synth_intel_evt cfe;
^
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The change to the MODE.Exec packet means processing must distinguish
between the old and new cases. Record the Event Trace capability flag to
make that possible.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add itrace option "I" to synthesize interrupt or similar (asynchronous)
events. This will be used for Intel PT Event Trace events.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Define 2 new flags to represent:
- when interrupts are disabled (D)
- when interrupt disabling toggles (t)
This gives 4 combinations:
no flag, interrupts enabled
t interrupts were enabled but become disabled
D interrupts are disabled
Dt interrupts were disabled but become enabled
Committer notes:
Those are control flow flags, as per 'tools/perf/Documentation/perf-intel-pt.txt:
<quote>
An interesting field that is not printed by default is 'flags' which can be
displayed as follows:
perf script --itrace=ibxwpe -F+flags
The flags are "bcrosyiABExgh" which stand for branch, call, return, conditional,
system, asynchronous, interrupt, transaction abort, trace begin, trace end,
in transaction, VM-entry, and VM-exit respectively.
</quote>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Similar to other Intel PT synth events, define a structure to hold
information about a change to the interrupt flag.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Similar to other Intel PT synth events, define structures to hold CFE
and EVD data.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which adds a bit to the existing
MODE.Exec packet to record the interrupt flag.
Previously, the MODE.Exec packet did not generate any events, so the
new processing required is practically the same as a new packet.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
processing
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which requires 2 new packets CFE
(Control Flow Event) and EVD (Event Data).
Each Event Trace event is represented by a CFE packet that is preceded
by zero or more EVD packets. It may be bound to a following FUP (Flow
Update) packet that provides the IP.
Event Trace exposes details about asynchronous events. The CFE packet
contains a type field to identify one of the following:
1 INTR interrupt, fault, exception, NMI
2 IRET interrupt return
3 SMI system management interrupt
4 RSM resume from system management mode
5 SIPI startup interprocessor interrupt
6 INIT INIT signal
7 VMENTRY VM-Entry
8 VMEXIT VM-Entry
9 VMEXIT_INTR VM-Exit due to interrupt
10 SHUTDOWN Shutdown
For more details, refer to the Intel SDM, Intel Processor Trace chapter.
Add processing to the decoder for the new packets.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Factor out clearing of FUP (Flow Update) event variables, to avoid code duplication.
Committer Notes:
From the Intel documentation:
<quote>
Flow Update Packets (FUP): FUPs provide the source IP addresses for
asynchronous events (interrupt and exceptions), as well as other cases
where the source address cannot be determined from the binary.
</quote>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Tidy up config bit constants to use #define.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|