Age | Commit message (Collapse) | Author | Files | Lines |
|
Documentation to make current functionality clearer.
Rename a variable called 'metric' to 'metric_name' as it can be
ambiguous as to whether a string is the name of a metric or the
expression.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The runtime value is needed when recursively parsing metrics, currently
a value of 1 is passed which is incorrect.
Rather than add more arguments to the bison parser, add runtime to the
context.
Fix call sites not to pass a value. The runtime value is defaulted to 0,
which is arbitrary. In some places this replaces a value of 1, which was
also arbitrary.
This shouldn't affect anything other than PPC.
The use of 0 or 1 shouldn't matter as a proper runtime value would be
needed in a case that it did matter.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make lookup nature of data structures clearer through their type. Reduce
scope of architecture specific pmu_event tables by making them static.
Suggested-by: John Garry <[email protected]>
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make lookup nature of data structures clearer through their type.
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The pmu_events_map is generated at compile time and used for lookup. For
testing purposes we need to swap the map being used.
Having the pmu_events_map be non-const is misleading as it may be an out
argument.
Make it const and update uses so they work on const too.
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add list_sort.[ch] from the main kernel tree. The linux/bug.h #include
is removed due to conflicting definitions. Add check-headers and modify
perf build accordingly.
MANIFEST and python-ext-sources fixes suggested by Arnaldo.
Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Kilroy <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Denys Zagorui <[email protected]>
Cc: Fabian Hemmer <[email protected]>
Cc: Felix Fietkau <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Keller <[email protected]>
Cc: Jiapeng Chong <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joakim Zhang <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: ShihCheng Tu <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wan Jiabing <[email protected]>
Cc: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Going forward, future generation systems can have more hierarchy
within the node/package level but currently we don't have any data source
encoding field in perf, which can be used to represent this level of data.
Add a new field called 'mem_hops' in the perf_mem_data_src structure
which can be used to represent intra-node/package or inter-node/off-package
details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value
can be used to present different hop levels data.
Also add corresponding macros to define mem_hop field values
and shift value.
Currently we define macro for HOPS_0 which corresponds
to data coming from another core but same node.
Add functionality to represent mem_hop field data in
perf_mem__lvl_scnprintf function with the help of added string
array called mem_hops.
For ex: Encodings for mem_hops fields with L2 cache:
L2 - local L2
L2 | REMOTE | HOPS_0 - remote core, same node L2
Since with the addition of HOPS field, now remote can be used to
denote cache access from the same node but different core, a check
is added in the c2c_decode_stats function to set mrem only when HOPS
is zero along with set remote field.
Signed-off-by: Kajol Jain <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
an extra line
Add a comment about PERF_MEM_LVL_* namespace being depricated
to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_}
fields.
Remove an extra line present in perf_mem__lvl_scnprintf function.
Signed-off-by: Kajol Jain <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Print offset of PERF_RECORD_COMPRESSED record instead of zero for
decompressed records in raw trace dump (-D option of perf-report):
0x17cf08 [0x28]: event: 9
instead of:
0 [0x28]: event: 9
The fix is not critical, because currently file_pos for compressed
events is used in perf_session__process_event only to show offsets
in the raw dump.
This patch was separated from patchset:
https://lore.kernel.org/lkml/[email protected]/
and was already rewieved.
Reviewed-by: Riccardo Mancini <[email protected]>
Signed-off-by: Alexey Bayduraev <[email protected]>
Tested-by: Riccardo Mancini <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds a new function in util/mmap.c to duplicate a mmap_cpu_mask.
This new function will be used in patches in the workqueue patchkit.
Signed-off-by: Riccardo Mancini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/8943a548ef7a3dd3e015095afad7e9a8b2154c05.1629490974.git.rickyman7@gmail.com
[ bitmap_alloc() was renamed to bitmap_zalloc() ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up the fixes in perf/urgent that were just merged into upstream.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use get_unaligned() instead of memcpy() to access potentially unaligned
memory, which, when accessed through a pointer, leads to undefined
behavior. get_unaligned() describes much better what is happening there
anyway even if memcpy() does the job.
In addition, since perf tool builds with -Werror, it would fire with:
util/intel-pt-decoder/../../../arch/x86/lib/insn.c: In function '__insn_get_emulate_prefix':
tools/include/../include/asm-generic/unaligned.h:10:15: error: packed attribute is unnecessary [-Werror=packed]
10 | const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
because -Werror=packed would complain if the packed attribute would have
no effect on the layout of the structure.
In this case, that is intentional so disable the warning only for that
compilation unit.
That part is Reported-by: Stephen Rothwell <[email protected]>
No functional changes.
Fixes: 5ba1071f7554 ("x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses")
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Tested-by: Stephen Rothwell <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Invoking addr2line in a separate subprocess, one for each required
lookup, takes a terribly long time.
This patch introduces a long-running addr2line process for each DSO,
*DRAMATICALLY* speeding up runs of perf.
What used to take tens of minutes now takes tens of seconds.
Debian bug report about this issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815
Signed-off-by: Tony Garnock-Jones <[email protected]>
Tested-by: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For a metric like:
CONST if expr else CONST
if the values of CONST are identical then expr doesn't need evaluating,
and events, in order to compute a result.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For a metric like:
EVENT1 if #smt_on else EVENT2
currently EVENT1 and EVENT2 will be measured and then when the metric is
reported EVENT1 or EVENT2 will be printed depending on the value from
smt_on() during the expr parsing. Computing both events is unnecessary and
can lead to multiplexing as discussed in this thread:
https://lore.kernel.org/lkml/[email protected]/
If the input is constant to certain operators like:
IDS1 if CONST else IDS2
then the result will be either IDS1 or IDS2 depending on CONST (which
may be evaluated from an entire expression), and so IDS1 or IDS2 may
be discarded avoiding events from being programmed.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When we're computing ID values, if we have constant values then compute
the constant result. For example:
1 + 2
Previously .val would be set to BOTTOM by union_expr, meaning that
all values are possible. With this change .val is set to 3.
Later changes will use the constant values to hopefully eliminate ID
values that don't need to be computed.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new option to parsing that the set of IDs being used should be
computed, this means every action needs to handle the compute_ids and
regular case. This means actions yield a new ids type is a set of ids or
the value being computed. Use the compute_ids case to replace find IDs
parsing.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A metric may be a constant value, for example, some SMT metrics are
constant 0 if #smt_on is 0. If we eliminate all the events then there is
no printing. Fix this by forcing metrics like this to have a
duration_time tool event, previously the metric would fail when parsing
the events with a parse error.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Reflow one __parse_events() call so that a ternary operation gets in a single line ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add utilities to new/free an ids hashmap, as well as to union. Add
testing of the union. Unioning hashmaps will be used when parsing the
metric, if a value is known then the hashmap is unnecessary, otherwise
we need to union together all the event ids to compute their values for
reporting.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A later change will remove the notion of other, rename the function to
expr__find_ids as this is what it populates.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No functional change, just modifying whitespace. This creates additional
space for adding logic to actions in later changes.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No functional change, switch the operators to use macros so that
additional complexity for constants can be added in a later change.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No functional change, so the type of expr remains <num>. A later patch
will change the computation to be an aggregate type and making this
change makes that later change smaller.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
No functional change. Inlining d_ratio makes it easier to special case
for constants in a later patch.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If during computing a metric an event (id) is missing the parsing
aborts. A later patch will make it so that events that aren't used in
the output are deliberately omitted, in which case we don't want the
abort. Modify the missing ID case to report NAN for these cases.
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A later change to parsing the ids out (in expr__find_other) will
potentially drop hashmaps and so it is more convenient to move
expr_parse_ctx to have a hashmap pointer rather than a struct value.
As this pointer must be freed, rather than just going out of scope, add
expr__ctx_new and expr__ctx_free to manage expr_parse_ctx memory.
Adjust use of struct expr_parse_ctx accordingly.
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandeep Dasgupta <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For numeric terms, the config field may be NULL as it is not set from
the l+y parsing.
Fix by setting the term config from the term type name.
Also fix up the pmu-events test to set the alias strings to set the
period term properly, and fix up parse-events test to check the term
config string.
Signed-off-by: John Garry <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Shaokun Zhang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
libtraceevent has added more levels of debug printout and with changes
like:
https://lore.kernel.org/linux-trace-devel/[email protected]
previously generated output like "registering plugin" is no longer
displayed.
This change makes it so that if perf's verbose debug output is enabled
then the debug and info libtraceevent messages can be displayed.
The code is conditionally enabled based on the libtraceevent version as
discussed in the RFC:
https://lore.kernel.org/lkml/[email protected]/
v2. Is a rebase and handles the case of building without
LIBTRACEEVENT_DYNAMIC.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Tzvetomir Stoyanov (VMware) <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds basic arch initialization and instruction associate
support for the riscv64 CPU architecture.
Example output:
$ perf annotate --stdio2
Samples: 122K of event 'task-clock:u', 4000 Hz, Event count (approx.): 30637250000, [percent: local period]
strcmp() /usr/lib64/libc-2.32.so
Percent
Disassembly of section .text:
0000000000069a30 <strcmp>:
__GI_strcmp():
const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1++;
37.30 lbu a5,0(a0)
c2 = (unsigned char) *s2++;
1.23 addi a1,a1,1
c1 = (unsigned char) *s1++;
18.68 addi a0,a0,1
c2 = (unsigned char) *s2++;
1.37 lbu a4,-1(a1)
if (c1 == '\0')
18.71 ↓ beqz a5,18
return c1 - c2;
}
Signed-off-by: William Cohen <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If there is no configuration file at first, the user can write any pair
of "key.subkey=value" to the newly created configuration file, while
value validation against a valid configurable key is *deferred* until
the next execution or the implied execution of "perf config ... ".
For example:
$ rm ~/.perfconfig
$ perf config call-graph.dump-size=65529
$ cat ~/.perfconfig
# this file is auto-generated.
[call-graph]
dump-size = 65529
$ perf config call-graph.dump-size=2048
callchain: Incorrect stack dump size (max 65528): 65529
Error: wrong config key-value pair call-graph.dump-size=65529
The user might expect that the second value 2048 is valid and can be
updated to the configuration file, but the error message is very
confusing because the first value 65529 is not reported as an error
during the last configuration.
It is recommended not to change the current behavior of delayed
validation (as more effort is needed), but to refine the original error
message to *clearly indicate* that the cause of the error is the
configuration file.
Signed-off-by: Like Xu <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Part of hardware cache events are only available on one CPU PMU.
For example, 'L1-dcache-load-misses' is only available on cpu_core.
perf list should clearly report this info.
root@otcpl-adl-s-2:~# ./perf list
Before:
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
After:
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
cpu_atom/L1-icache-loads/ [Hardware cache event]
cpu_core/L1-dcache-load-misses/ [Hardware cache event]
cpu_core/node-load-misses/ [Hardware cache event]
cpu_core/node-loads/ [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
Now we can clearly see 'L1-dcache-load-misses' is only available
on cpu_core.
If without pmu prefix, it indicates the event is available on both
cpu_core and cpu_atom.
Signed-off-by: Jin Yao <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Minor cleanup motivated by trying to separately fuzz test parse-events.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up fixes in the last pushed perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf code re-implements libbpf's btf__load_from_kernel_by_id() API as
a weak function, presumably to dynamically link against old version of
libbpf shared library. Unfortunately this causes compilation warning
when perf is compiled against libbpf v0.6+.
For now, just ignore deprecation warning, but there might be a better
solution, depending on perf's needs.
Signed-off-by: Andrii Nakryiko <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: [email protected]
LPU-Reference: [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It's later supposed to be either a correct address or NULL. Without the
initialization, it may contain an undefined value which results in the
following segmentation fault:
# perf top --sort comm -g --ignore-callees=do_idle
terminates with:
#0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6
#1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6
#2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489
#3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564
#4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420,
sample_self=sample_self@entry=true) at util/hist.c:657
#5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0,
sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288
#6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38)
at util/hist.c:1056
#7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056
#8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0)
at util/hist.c:1231
#9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0)
at builtin-top.c:842
#10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202
#11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244
#12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323
#13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339
#14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341
#15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339
#16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114
#17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6
If you look at the frame #2, the code is:
488 if (he->srcline) {
489 he->srcline = strdup(he->srcline);
490 if (he->srcline == NULL)
491 goto err_rawdata;
492 }
If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish),
it gets strdupped and strdupping a rubbish random string causes the problem.
Also, if you look at the commit 1fb7d06a509e, it adds the srcline property
into the struct, but not initializing it everywhere needed.
Committer notes:
Now I see, when using --ignore-callees=do_idle we end up here at line
2189 in add_callchain_ip():
2181 if (al.sym != NULL) {
2182 if (perf_hpp_list.parent && !*parent &&
2183 symbol__match_regex(al.sym, &parent_regex))
2184 *parent = al.sym;
2185 else if (have_ignore_callees && root_al &&
2186 symbol__match_regex(al.sym, &ignore_callees_regex)) {
2187 /* Treat this symbol as the root,
2188 forgetting its callees. */
2189 *root_al = al;
2190 callchain_cursor_reset(cursor);
2191 }
2192 }
And the al that doesn't have the ->srcline field initialized will be
copied to the root_al, so then, back to:
1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
1212 int max_stack_depth, void *arg)
1213 {
1214 int err, err2;
1215 struct map *alm = NULL;
1216
1217 if (al)
1218 alm = map__get(al->map);
1219
1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent,
1221 iter->evsel, al, max_stack_depth);
1222 if (err) {
1223 map__put(alm);
1224 return err;
1225 }
1226
1227 err = iter->ops->prepare_entry(iter, al);
1228 if (err)
1229 goto out;
1230
1231 err = iter->ops->add_single_entry(iter, al);
1232 if (err)
1233 goto out;
1234
That al at line 1221 is what hist_entry_iter__add() (called from
sample__resolve_callchain()) saw as 'root_al', and then:
iter->ops->add_single_entry(iter, al);
will go on with al->srcline with a bogus value, I'll add the above
sequence to the cset and apply, thanks!
Signed-off-by: Michael Petlan <[email protected]>
CC: Milian Wolff <[email protected]>
Cc: Jiri Olsa <[email protected]>
Fixes: 1fb7d06a509e ("perf report Use srcline from callchain for hist entries")
Link: https //lore.kernel.org/r/[email protected]
Reported-by: Juri Lelli <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add an option to control the synthesizing behavior.
--synth <no|all|task|mmap|cgroup>
Fine-tune event synthesis: default=all
This can be useful when we know it doesn't need some synthesis like
in a specific usecase and/or when using pipe:
$ perf record -a --all-cgroups --synth cgroup -o- sleep 1 | \
> perf report -i- -s cgroup
Committer notes:
Added a clarification to the man page entry for --synth that this is
about pre-existing threads.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
during record
Depending on the use case, it might require some kind of synthesizing
and some not. Make it controllable to turn off heavy operations like
MMAP for all tasks.
Currently all users are converted to enable all the synthesis by
default. It'll be updated in the later patch.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Enum forward declarations aren't allowed as the size can't be implied.
Switch to just using an int. This fixes a clang warning:
In file included from tools/perf/bench/evlist-open-close.c:13:
tools/perf/bench/../util/parse-events.h:185:6: error: redeclaration of already-defined enum 'perf_tool_event' is a GNU extension [-Werror,-Wgnu-redeclared-enum]
enum perf_tool_event;
^
tools/perf/bench/../util/evsel.h:28:6: note: previous definition is here
enum perf_tool_event {
^
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf code re-implements libbpf's btf__load_from_kernel_by_id() API as
a weak function, presumably to dynamically link against old version of
libbpf shared library. Unfortunately this causes compilation warning
when perf is compiled against libbpf v0.6+.
For now, just ignore deprecation warning, but there might be a better
solution, depending on perf's needs.
Signed-off-by: Andrii Nakryiko <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: [email protected]
LPU-Reference: [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As a part of libbpf 1.0 plan[0], this patch deprecates use of
bpf_map__resize in favour of bpf_map__set_max_entries.
Reference: https://github.com/libbpf/libbpf/issues/304
[0]: https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#libbpfh-high-level-apis
Signed-off-by: Muhammad Falak R Wani <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Muhammad Falak R Wani <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Quentin Monnet <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yu Kuai <[email protected]>
Link: http //lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
AMD family 15h and above microarchs fuse a subset of cmp/test/ALU
instructions with branch instructions[1][2]. Add perf annotate
fused instruction support for these microarchs.
Before:
│ testb $0x80,0x51(%rax)
│ ┌──jne 5b3
0.78 │ │ mov %r13,%rdi
│ │→ callq mark_page_accessed
1.08 │5b3:└─→mov 0x8(%r13),%rax
After:
│ ┌──testb $0x80,0x51(%rax)
│ ├──jne 5b3
0.78 │ │ mov %r13,%rdi
│ │→ callq mark_page_accessed
1.08 │5b3:└─→mov 0x8(%r13),%rax
[1] https://bugzilla.kernel.org/attachment.cgi?id=298553
[2] https://bugzilla.kernel.org/attachment.cgi?id=298555
Committer testing:
On a:
$ grep -m1 "model name" /proc/cpuinfo
model name : AMD Ryzen 9 3900X 12-Core Processor
$
Samples: 44K of event 'cycles', 4000 Hz, Event count (approx.): 7533249650
_int_malloc /usr/lib64/libc-2.33.so [Percent: local period]
Percent│ ┌──test %eax,%eax
│ ├──jne 884
│ │↓ jmpq 943
│ │ nop
│878:│ add $0x10,%rdx
0.64 │ │ add %eax,%eax
0.57 │ │↓ je cc9
0.77 │884:└─→test %esi,%eax
│ ↑ je 878
│ mov 0x18(%rdx),%r15
Reported-by: Kim Phillips <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently perf saves a build-id with size but old versions assumes the
size of 20. In case the build-id is less than 20 (like for MD5), it'd
fill the rest with 0s.
I saw a problem when old version of perf record saved a binary in the
build-id cache and new version of perf reads the data. The symbols
should be read from the build-id cache (as the path no longer has the
same binary) but it failed due to mismatch in the build-id.
symsrc__init: build id mismatch for /home/namhyung/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf.
The build-id event in the data has 20 byte build-ids, but it saw a
different size (16) when it reads the build-id of the elf file in the
build-id cache.
$ readelf -n ~/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 53e4c2f42a4c61a2d632d92a72afa08f
Let's fix this by allowing trailing zeros if the size is different.
Fixes: 39be8d0115b321ed ("perf tools: Pass build_id object to dso__build_id_equal()")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A config terms list was spliced twice, resulting in a never-ending loop
when the list was traversed. Fix by using list_splice_init() and copying
and freeing the lists as necessary.
This patch also depends on patch "perf tools: Factor out
copy_config_terms() and free_config_terms()"
Example on ADL:
Before:
# perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname &
# jobs
[1]+ Running perf record -e "{intel_pt//,cycles/aux-sample-size=4096/pp}" uname
# perf top -E 10
PerfTop: 4071 irqs/sec kernel: 6.9% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 24 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
97.60% perf [.] __evsel__get_config_term
0.25% [kernel] [k] kallsyms_expand_symbol.constprop.13
0.24% perf [.] kallsyms__parse
0.15% [kernel] [k] _raw_spin_lock
0.14% [kernel] [k] number
0.13% [kernel] [k] advance_transaction
0.08% [kernel] [k] format_decode
0.08% perf [.] map__process_kallsym_symbol
0.08% perf [.] rb_insert_color
0.08% [kernel] [k] vsnprintf
exiting.
# kill %1
After:
# perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname &
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.060 MB perf.data ]
# perf script | head
perf-exec 604 [001] 1827.312293: psb: psb offs: 0 ffffffffb8415e87 pt_config_start+0x37 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb856a3bd event_sched_in.isra.133+0xfd ([kernel.kallsyms]) => ffffffffb856a9a0 perf_pmu_nop_void+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb856b10e merge_sched_in+0x26e ([kernel.kallsyms]) => ffffffffb856a2c0 event_sched_in.isra.133+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb856a45d event_sched_in.isra.133+0x19d ([kernel.kallsyms]) => ffffffffb8568b80 perf_event_set_state.part.61+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb8568b86 perf_event_set_state.part.61+0x6 ([kernel.kallsyms]) => ffffffffb85662a0 perf_event_update_time+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb856a35c event_sched_in.isra.133+0x9c ([kernel.kallsyms]) => ffffffffb8567610 perf_log_itrace_start+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb856a377 event_sched_in.isra.133+0xb7 ([kernel.kallsyms]) => ffffffffb8403b40 x86_pmu_add+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb8403b86 x86_pmu_add+0x46 ([kernel.kallsyms]) => ffffffffb8403940 collect_events+0x0 ([kernel.kallsyms])
perf-exec 604 1827.312293: 1 branches: ffffffffb8403a7b collect_events+0x13b ([kernel.kallsyms]) => ffffffffb8402cd0 collect_event+0x0 ([kernel.kallsyms])
Fixes: 30def61f64bac5 ("perf parse-events Create two hybrid cache events")
Fixes: 94da591b1c7913 ("perf parse-events Create two hybrid raw events")
Fixes: 9cbfa2f64c04d9 ("perf parse-events Create two hybrid hardware events")
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Factor out copy_config_terms() and free_config_terms() so that they can
be reused.
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some fields are missing and text_poke is duplicated. Fix that up.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The btf__get_from_id() function was deprecated in favour of
btf__load_from_kernel_by_id(), but it is still avaiable, so use it to
provide a weak function btf__load_from_kernel_by_id() for older libbpf
when building perf with LIBBPF_DYNAMIC=1, i.e. using the system's libbpf
package.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
data
Perf records IBS (Instruction Based Sampling) extra sample data when
'perf record --raw-samples' is used with an IBS-compatible event, on a
machine that supports IBS. IBS support is indicated in
CPUID_Fn80000001_ECX bit #10.
Up until now, users have been able to see the extra sample data solely
in raw hex format using 'perf report --dump-raw-trace'. From there,
users could decode the data either manually, or by using an external
script.
Enable the built-in 'perf report --dump-raw-trace' to do the decoding of
the extra sample data bits, so manual or external script decoding isn't
necessary.
Example usage:
$ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace
Stdout contains IBS Fetch samples, e.g.:
ibs_fetch_ctl: 02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat 7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0
IbsFetchLinAd: 000056016b2ead40
IbsFetchPhysAd: 000000115cedfd40
c_ibs_ext_ctl: 0000000000000000 IbsItlbRefillLat 0
..and IBS Op samples, e.g.:
ibs_op_ctl: 0000009e009e8968 MaxCnt 10000000 En 1 Val 1 CntCtl 1=uOps CurCnt 158
IbsOpRip: 000056016b2ea73d
ibs_op_data: 00000000000b0002 CompToRetCtr 2 TagToRetCtr 11 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0
ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache
ibs_op_data3: 0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth 4 bytes OpDcMissOpenMemReqs 0 DcMissLat 0 TlbRefillLat 0
IbsDCLinAd: 00007f133c319ce0
IbsDCPhysAd: 0000000270485ce0
Committer notes:
Fixed up this:
util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’:
util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=]
125 | " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1));
| ^~~~~~
In file included from /usr/include/stdio.h:866,
from util/amd-sample-raw.c:7:
/usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21
71 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
As that %2d won't limit the number of chars to 2, just state that 2 is
the minimal width:
$ cat printf.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char bf[64];
int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1]));
printf("strlen(%s): %u\n", bf, len);
return 0;
}
$ ./printf 1
strlen( 1): 2
$ ./printf 12
strlen(12): 2
$ ./printf 123
strlen(123): 3
$ ./printf 1234
strlen(1234): 4
$ ./printf 12345
strlen(12345): 5
$ ./printf 123456
strlen(123456): 6
$
And since we probably don't want that output to be truncated, just
assume the worst case, as the compiler did, and add a few more chars to
that buffer.
Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to
avoid bugs when changing one but not the other.
I also had to change this:
-#include <asm/amd-ibs.h>
+#include "../../arch/x86/include/asm/amd-ibs.h"
To make it build on other architectures, just like intel-pt does.
Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To be used by IBS raw data display: It needs the recorder's cpuid in
order to determine which errata workarounds to apply to the data, and
the pmu_mappings are needed in order to figure out which PMU sample
type is IBS Fetch vs. IBS Op.
When not available from perf.data, we assume local operation, and
retrieve cpuid and pmu mappings directly from the running system.
Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https //lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instead of using the file offset in the debug file.
This fixes a regression from 00a3423492bc90be ("perf symbols: Make
dso__load_bfd_symbols() load PE files from debug cache only"), causing
incorrect symbol resolution when debug file have been stripped from
non-debug sections (in which case its .text section is empty and doesn't
have any file position).
The debug files could also be created with a different file alignment,
and have different file positions from the mmap-ed binary, or have the
section reordered.
This instead looks for the file image base, using the corresponding bfd
*ABS* symbols. As PE symbols only have 4 bytes, it also needs to keep
.text section vma high bits.
Signed-off-by: Remi Bernon <[email protected]>
Fixes: 00a3423492bc90be ("perf symbols: Make dso__load_bfd_symbols() load PE files from debug cache only")
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Nicholas Fraser <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <[email protected]>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
|