Age | Commit message (Collapse) | Author | Files | Lines |
|
Normally exception packets don't directly output a branch sample, but
if they're the last record in a buffer then they will. Because they
don't have addresses set we'll see the placeholder value
CS_ETM_INVAL_ADDR (0xdeadbeef) in the output.
Since commit 6035b6804bdf ("perf cs-etm: Support dummy address value for
CS_ETM_TRACE_ON packet") we've used 0 as an externally visible "not set"
address value. For consistency reasons and to not make exceptions look
like an error, change them to use 0 too.
This is particularly visible when doing userspace only tracing because
trace is disabled when jumping to the kernel, causing the flush and then
forcing the last exception packet to be emitted as a branch. With kernel
trace included, there is no flush so exception packets don't generate
samples until the next range packet and they'll pick up the correct
address.
Before:
$ perf record -e cs_etm//u -- stress -i 1 -t 1
$ perf script -F comm,ip,addr,flags
stress syscall ffffb7eedbc0 => deadbeefdeadbeef
stress syscall ffffb7f14a14 => deadbeefdeadbeef
stress syscall ffffb7eedbc0 => deadbeefdeadbeef
After:
stress syscall ffffb7eedbc0 => 0
stress syscall ffffb7f14a14 => 0
stress syscall ffffb7eedbc0 => 0
Reviewed-by: Mike Leach <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
instruction
Since the "ins.name" is not set while using raw instruction,
'perf annotate' with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad
-----------------------------------------------------------
: 419 196
This patch sets "dl->ins.name" in arch specific function
"check_ppc_insn" while initialising "struct disasm_line".
Also update "ins_find" function to pass "struct disasm_line" as a
parameter so as to set its name field in arch specific call.
With the patch changes:
Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)
Name/opcode : Good Bad
-----------------------------------------------------------
58 : 323 80
32 : 49 43
34 : 33 11
OP_31_XOP_LDX : 8 20
40 : 23 0
OP_31_XOP_LWARX : 5 1
OP_31_XOP_LWZX : 2 3
OP_31_XOP_LDARX : 3 0
33 : 0 2
OP_31_XOP_LBZX : 0 1
OP_31_XOP_LWAX : 0 1
OP_31_XOP_LHZX : 0 1
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now perf uses the capstone library to disassemble the instructions in
x86. capstone is used (if available) for perf annotate to speed up.
Currently it only supports x86 architecture.
This patch includes changes to enable this in powerpc.
For now, only for data type sort keys, this method is used and only
binary code (raw instruction) is read. This is because powerpc approach
to understand instructions and reg fields uses raw instruction.
The "cs_disasm" is currently not enabled. While attempting to do
cs_disasm, observation is that some of the instructions were not
identified (ex: extswsli, maddld) and it had to fallback to use objdump.
Hence enabling "cs_disasm" is added in comment section as a TODO for
powerpc.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
[ Use dso__nsinfo(dso) as required to match EXTRA_CFLAGS=-DREFCNT_CHECKING=1 build expectations ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
capstone_init is made availbale for all archs to use and updated to
enable support for CS_ARCH_PPC as well. Patch removes
open_capstone_handle and uses capstone_init in all the places.
Committer notes:
Avoid including capstone/capstone.h from print_insn.h to not break the
build in builtin-script.c due to the namespace clash with libbpf:
/usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
symbol disassemble
symbol__disassemble_capstone in util/disasm.c calls function
open_capstone_handle to open/init the capstone.
We already have a capstone_init function in "util/print_insn.c". But
capstone_init is defined as a static function in util/print_insn.c.
Change this and also add the function in print_insn.h
The open_capstone_handle checks the disassembler_style option from
annotation_options to decide whether to set CS_OPT_SYNTAX_ATT.
Add that logic in capstone_init also and by default set it to true.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add instruction tracking function "update_insn_state_powerpc" for
powerpc. Example sequence in powerpc:
ld r10,264(r3)
mr r31,r3
<<after some sequence>
ld r9,312(r31)
Consider ithe sample is pointing to: "ld r9,312(r31)".
Here the memory reference is hit at "312(r31)" where 312 is the offset
and r31 is the source register.
Previous instruction sequence shows that register state of r3 is moved
to r31.
So to identify the data type for r31 access, the previous instruction
("mr") needs to be tracked and the state type entry has to be updated.
Current instruction tracking support in perf tools infrastructure is
specific to x86. Patch adds this support for powerpc as well.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
instruction tracking in powerpc
Data-type profiling has the concept of instruction tracking.
Example sequence in powerpc:
ld r10,264(r3)
mr r31,r3
<<after some sequence>
ld r9,312(r31)
or differently
lwz r10,264(r3)
add r31, r3, RB
lwz r9, 0(r31)
If a sample is hit at "lwz r9, 0(r31)", data type of r31 depends
on previous instruction sequence here. So to track the previous
instructions, patch adds changes to identify some of the arithmetic
instructions which are having opcode as 31.
Since memory instructions also has cases with opcode 31, use the bits
22:30 to filter the arithmetic instructions here.
Also there are instructions with just two operands like "addme", "addze".
This patch adds new instructions ops "arithmetic_ops" to handle this
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
powerpc
There are memory instructions in powerpc with opcode as 31.
Example: "ldx RT,RA,RB" , Its X form is as below:
______________________________________
| 31 | RT | RA | RB | 21 |/|
--------------------------------------
0 6 11 16 21 30 31
The opcode for "ldx" is 31. There are other instructions also with
opcode 31 which are memory insn like ldux, stbx, lwzx, lhaux
But all instructions with opcode 31 are not memory. Example is add
instruction: "add RT,RA,RB"
The value in bit 21-30 [ 21 for ldx ] is different for these
instructions. Patch uses this value to assign instruction ops for these
cases. The naming convention and value to identify these are picked from
defines in "arch/powerpc/include/asm/ppc-opcode.h"
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset.
The implementation addresses the D-form, X-form, DS-form instructions.
Two main functions are added.
New parse function "load_store__parse" as instruction ops parser for
memory instructions.
Unlike other parsers (like mov__parse), this one fills in the
"multi_regs" field for source/target and new added "mem_ref" field. No
other fields are set because, here there is no need to parse the
disassembled code and arch specific macros will take care of extracting
offset and regs which is easier and will be precise.
In powerpc, all instructions with a primary opcode from 32 to 63
are memory instructions. Update "ins__find" function to have "raw_insn"
also as a parameter.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
instruction on powerpc
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset.
The implementation addresses the D-form, X-form, DS-form instructions.
Adds "mem_ref" field to check whether source/target has memory
reference.
Add function "get_powerpc_regs" which will set these fields: reg1, reg2,
offset depending of where it is source or target ops.
Update "parse" callback for "struct ins_ops" to also pass "struct
disasm_line" as argument. This is needed in parse functions where opcode
is used to determine whether to set multi_regs and other fields
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
using dso__data_read_offset utility
Add support to capture and parse raw instruction in powerpc.
Currently, the perf tool infrastructure uses two ways to disassemble
and understand the instruction. One is objdump and other option is
via libcapstone.
Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdump" while disassemble. Example from powerpc with this option
for an instruction address is:
Snippet from:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov (%rax), %rcx".
In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, patch adds support to use raw
instruction for powerpc. Approach used is to read the raw instruction
directly from the DSO file using "dso__data_read_offset" utility which
is already implemented in perf infrastructure in "util/dso.c".
Example:
38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
| 58 | RT | RA | DS | |
-------------------------------------
0 6 11 16 30 31
Function "symbol__disassemble_dso" is updated to read raw instruction
directly from DSO using dso__data_read_offset utility. In case of
above example, this captures:
line: 38 01 81 e8
The above works well when 'perf report' is invoked with only sort keys
for data type ie type and typeoff.
Because there is no instruction level annotation needed if only data
type information is requested for.
For annotating sample, along with type and typeoff sort key, "sym" sort
key is also needed. And by default invoking just "perf report" uses sort
key "sym" that displays the symbol information.
With approach changes in powerpc which first reads DSO for raw
instruction, "perf annotate" and "perf report" + a key breaks since
it doesn't do the instruction level disassembly.
Snippet of result from 'perf report':
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work /usr/bin/pmlogger [Percent: local period]
Percent│ ea230010
│ 3a550010
│ 3a600000
│ 38f60001
│ 39490008
│ 42400438
51.44 │ 81290008
│ 7d485378
Here, raw instruction is displayed in the output instead of human
readable annotated form.
One way to get the appropriate data is to specify "--objdump path", by
which code annotation will be done. But the default behaviour will be
changed. To fix this breakage, check if "sym" sort key is set. If so
fallback and use the libcapstone/objdump way of disassmbling the sample.
With the changes and "perf report"
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work /usr/bin/pmlogger [Percent: local period]
Percent│ ld r17,16(r3)
│ addi r18,r21,16
│ li r19,0
│ 8b0: rldicl r10,r10,63,33
│ addi r10,r10,1
│ mtctr r10
│ ↓ b 8e4
│ 8c0: addi r7,r22,1
│ addi r10,r9,8
│ ↓ bdz d00
51.44 │ lwz r9,8(r9)
│ mr r8,r10
│ cmpw r20,r9
Committer notes:
Just add the extern for 'sort_order' in disasm.c so that we don't end up
breaking the build due to this type colision with capstone and libbpf:
In file included from /usr/include/capstone/capstone.h:325,
from /git/perf-6.10.0/tools/perf/util/print_insn.h:23,
from builtin-script.c:38:
/usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag
94 | typedef enum bpf_insn {
I reported this to the bpf mailing list, see one of the links below.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/bpf/ZqOltPk9VQGgJZAA@x1/T/#u
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.
Example snippet from objdump:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.
In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.
Example: 38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:
line -> "38 01 81 e8 ld r4,312(r1)"
raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).
Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code. The size of raw instruction
depends on architecture.
In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse". The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.
Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.
Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined to be
16.
While checking if reg is valid using has_reg_type, max value is checked
using TYPE_STATE_MAX_REGS value.
Define this conditionally for powerpc.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
specific instruction tracking
Add "update_insn_state" callback to "struct arch" to handle instruction
tracking. Currently updating instruction state is handled by static
function "update_insn_state_x86" which is defined in "annotate-data.c".
Make this as a callback for specific arch and move to archs specific
file "arch/x86/annotate/instructions.c" . This will help to add helper
function for other platforms in file:
"arch/<platform>/annotate/instructions.c" and make changes/updates
easier.
Define callback "update_insn_state" as part of "struct arch", also make
some of the debug functions non-static so that it can be referenced from
other places.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Data type profiling uses instruction tracking by checking each
instruction and updating the register type state in some data
structures.
This is useful to find the data type in cases when the register state
gets transferred from one reg to another.
Example, in x86, "mov" instruction and in powerpc, "mr" instruction.
Currently these structures are defined in annotate-data.c and
instruction tracking is implemented only for x86.
Move these data structures to "annotate-data.h" header file so that
other arch implementations can use it in arch specific files as well.
Reviewed-by: Kajol Jain <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Akanksha J N <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions"),
when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR),
thread__find_map() could return with al->maps being NULL.
The path below could add a callchain_cursor_node with NULL ms.maps.
add_callchain_ip()
thread__find_symbol(.., &al)
thread__find_map(.., &al) // al->maps becomes NULL
ms.maps = maps__get(al.maps)
callchain_cursor_append(..., &ms, ...)
node->ms.maps = maps__get(ms->maps)
Then the path below would dereference NULL maps and get segfault.
fill_callchain_info()
maps__machine(node->ms.maps);
Fix it by checking if maps is NULL in fill_callchain_info().
Fixes: 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions")
Signed-off-by: Casey Chen <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
Now that symsrc_filename is always accessed through an accessor, we also
need a free() function for it to avoid the following compilation error:
util/unwind-libunwind-local.c:416:12: error: lvalue required as unary
‘&’ operand
416 | zfree(&dso__symsrc_filename(dso));
Fixes: 1553419c3c10 ("perf dso: Fix address sanitizer build")
Signed-off-by: James Clark <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Leo Yan <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Cc: Yunseong Kim <[email protected]>
Cc: Athira Rajeev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
This is a bug found when implementing pretty-printing for the
landlock_add_rule system call, I decided to send this patch separately
because this is a serious bug that should be fixed fast.
I wrote a test program to do landlock_add_rule syscall in a loop,
yet perf trace -e landlock_add_rule freezes, giving no output.
This bug is introduced by the false understanding of the variable "key"
below:
```
for (key = 0; key < trace->sctbl->syscalls.nr_entries; ++key) {
struct syscall *sc = trace__syscall_info(trace, NULL, key);
...
}
```
The code above seems right at the beginning, but when looking at
syscalltbl.c, I found these lines:
```
for (i = 0; i <= syscalltbl_native_max_id; ++i)
if (syscalltbl_native[i])
++nr_entries;
entries = tbl->syscalls.entries = malloc(sizeof(struct syscall) * nr_entries);
...
for (i = 0, j = 0; i <= syscalltbl_native_max_id; ++i) {
if (syscalltbl_native[i]) {
entries[j].name = syscalltbl_native[i];
entries[j].id = i;
++j;
}
}
```
meaning the key is merely an index to traverse the syscall table,
instead of the actual syscall id for this particular syscall.
So if one uses key to do trace__syscall_info(trace, NULL, key), because
key only goes up to trace->sctbl->syscalls.nr_entries, for example, on
my X86_64 machine, this number is 373, it will end up neglecting all
the rest of the syscall, in my case, everything after `rseq`, because
the traversal will stop at 373, and `rseq` is the last syscall whose id
is lower than 373
in tools/perf/arch/x86/include/generated/asm/syscalls_64.c:
```
...
[334] = "rseq",
[424] = "pidfd_send_signal",
...
```
The reason why the key is scrambled but perf trace works well is that
key is used in trace__syscall_info(trace, NULL, key) to do
trace->syscalls.table[id], this makes sure that the struct syscall returned
actually has an id the same value as key, making the later bpf_prog
matching all correct.
After fixing this bug, I can do perf trace on 38 more syscalls, and
because more syscalls are visible, we get 8 more syscalls that can be
augmented.
before:
perf $ perf trace -vv --max-events=1 |& grep Reusing
Reusing "open" BPF sys_enter augmenter for "stat"
Reusing "open" BPF sys_enter augmenter for "lstat"
Reusing "open" BPF sys_enter augmenter for "access"
Reusing "connect" BPF sys_enter augmenter for "accept"
Reusing "sendto" BPF sys_enter augmenter for "recvfrom"
Reusing "connect" BPF sys_enter augmenter for "bind"
Reusing "connect" BPF sys_enter augmenter for "getsockname"
Reusing "connect" BPF sys_enter augmenter for "getpeername"
Reusing "open" BPF sys_enter augmenter for "execve"
Reusing "open" BPF sys_enter augmenter for "truncate"
Reusing "open" BPF sys_enter augmenter for "chdir"
Reusing "open" BPF sys_enter augmenter for "mkdir"
Reusing "open" BPF sys_enter augmenter for "rmdir"
Reusing "open" BPF sys_enter augmenter for "creat"
Reusing "open" BPF sys_enter augmenter for "link"
Reusing "open" BPF sys_enter augmenter for "unlink"
Reusing "open" BPF sys_enter augmenter for "symlink"
Reusing "open" BPF sys_enter augmenter for "readlink"
Reusing "open" BPF sys_enter augmenter for "chmod"
Reusing "open" BPF sys_enter augmenter for "chown"
Reusing "open" BPF sys_enter augmenter for "lchown"
Reusing "open" BPF sys_enter augmenter for "mknod"
Reusing "open" BPF sys_enter augmenter for "statfs"
Reusing "open" BPF sys_enter augmenter for "pivot_root"
Reusing "open" BPF sys_enter augmenter for "chroot"
Reusing "open" BPF sys_enter augmenter for "acct"
Reusing "open" BPF sys_enter augmenter for "swapon"
Reusing "open" BPF sys_enter augmenter for "swapoff"
Reusing "open" BPF sys_enter augmenter for "delete_module"
Reusing "open" BPF sys_enter augmenter for "setxattr"
Reusing "open" BPF sys_enter augmenter for "lsetxattr"
Reusing "openat" BPF sys_enter augmenter for "fsetxattr"
Reusing "open" BPF sys_enter augmenter for "getxattr"
Reusing "open" BPF sys_enter augmenter for "lgetxattr"
Reusing "openat" BPF sys_enter augmenter for "fgetxattr"
Reusing "open" BPF sys_enter augmenter for "listxattr"
Reusing "open" BPF sys_enter augmenter for "llistxattr"
Reusing "open" BPF sys_enter augmenter for "removexattr"
Reusing "open" BPF sys_enter augmenter for "lremovexattr"
Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr"
Reusing "open" BPF sys_enter augmenter for "mq_open"
Reusing "open" BPF sys_enter augmenter for "mq_unlink"
Reusing "fsetxattr" BPF sys_enter augmenter for "add_key"
Reusing "fremovexattr" BPF sys_enter augmenter for "request_key"
Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch"
Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat"
Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat"
Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat"
Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat"
Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat"
Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "linkat"
Reusing "open" BPF sys_enter augmenter for "symlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat"
Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat"
Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat"
Reusing "connect" BPF sys_enter augmenter for "accept4"
Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at"
Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2"
Reusing "open" BPF sys_enter augmenter for "memfd_create"
Reusing "fremovexattr" BPF sys_enter augmenter for "execveat"
Reusing "fremovexattr" BPF sys_enter augmenter for "statx"
after
perf $ perf trace -vv --max-events=1 |& grep Reusing
Reusing "open" BPF sys_enter augmenter for "stat"
Reusing "open" BPF sys_enter augmenter for "lstat"
Reusing "open" BPF sys_enter augmenter for "access"
Reusing "connect" BPF sys_enter augmenter for "accept"
Reusing "sendto" BPF sys_enter augmenter for "recvfrom"
Reusing "connect" BPF sys_enter augmenter for "bind"
Reusing "connect" BPF sys_enter augmenter for "getsockname"
Reusing "connect" BPF sys_enter augmenter for "getpeername"
Reusing "open" BPF sys_enter augmenter for "execve"
Reusing "open" BPF sys_enter augmenter for "truncate"
Reusing "open" BPF sys_enter augmenter for "chdir"
Reusing "open" BPF sys_enter augmenter for "mkdir"
Reusing "open" BPF sys_enter augmenter for "rmdir"
Reusing "open" BPF sys_enter augmenter for "creat"
Reusing "open" BPF sys_enter augmenter for "link"
Reusing "open" BPF sys_enter augmenter for "unlink"
Reusing "open" BPF sys_enter augmenter for "symlink"
Reusing "open" BPF sys_enter augmenter for "readlink"
Reusing "open" BPF sys_enter augmenter for "chmod"
Reusing "open" BPF sys_enter augmenter for "chown"
Reusing "open" BPF sys_enter augmenter for "lchown"
Reusing "open" BPF sys_enter augmenter for "mknod"
Reusing "open" BPF sys_enter augmenter for "statfs"
Reusing "open" BPF sys_enter augmenter for "pivot_root"
Reusing "open" BPF sys_enter augmenter for "chroot"
Reusing "open" BPF sys_enter augmenter for "acct"
Reusing "open" BPF sys_enter augmenter for "swapon"
Reusing "open" BPF sys_enter augmenter for "swapoff"
Reusing "open" BPF sys_enter augmenter for "delete_module"
Reusing "open" BPF sys_enter augmenter for "setxattr"
Reusing "open" BPF sys_enter augmenter for "lsetxattr"
Reusing "openat" BPF sys_enter augmenter for "fsetxattr"
Reusing "open" BPF sys_enter augmenter for "getxattr"
Reusing "open" BPF sys_enter augmenter for "lgetxattr"
Reusing "openat" BPF sys_enter augmenter for "fgetxattr"
Reusing "open" BPF sys_enter augmenter for "listxattr"
Reusing "open" BPF sys_enter augmenter for "llistxattr"
Reusing "open" BPF sys_enter augmenter for "removexattr"
Reusing "open" BPF sys_enter augmenter for "lremovexattr"
Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr"
Reusing "open" BPF sys_enter augmenter for "mq_open"
Reusing "open" BPF sys_enter augmenter for "mq_unlink"
Reusing "fsetxattr" BPF sys_enter augmenter for "add_key"
Reusing "fremovexattr" BPF sys_enter augmenter for "request_key"
Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch"
Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat"
Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat"
Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat"
Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat"
Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat"
Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "linkat"
Reusing "open" BPF sys_enter augmenter for "symlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat"
Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat"
Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat"
Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat"
Reusing "connect" BPF sys_enter augmenter for "accept4"
Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at"
Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2"
Reusing "open" BPF sys_enter augmenter for "memfd_create"
Reusing "fremovexattr" BPF sys_enter augmenter for "execveat"
Reusing "fremovexattr" BPF sys_enter augmenter for "statx"
TL;DR:
These are the new syscalls that can be augmented
Reusing "openat" BPF sys_enter augmenter for "open_tree"
Reusing "openat" BPF sys_enter augmenter for "openat2"
Reusing "openat" BPF sys_enter augmenter for "mount_setattr"
Reusing "openat" BPF sys_enter augmenter for "move_mount"
Reusing "open" BPF sys_enter augmenter for "fsopen"
Reusing "openat" BPF sys_enter augmenter for "fspick"
Reusing "openat" BPF sys_enter augmenter for "faccessat2"
Reusing "openat" BPF sys_enter augmenter for "fchmodat2"
as for the perf trace output:
before
perf $ perf trace -e faccessat2 --max-events=1
[no output]
after
perf $ ./perf trace -e faccessat2 --max-events=1
0.000 ( 0.037 ms): waybar/958 faccessat2(dfd: 40, filename: "uevent") = 0
P.S. The reason why this bug was not found in the past five years is
probably because it only happens to the newer syscalls whose id is
greater, for instance, faccessat2 of id 439, which not a lot of people
care about when using perf trace.
[Arnaldo]: notes
That and the fact that the BPF code was hidden before having to use -e,
that got changed kinda recently when we switched to using BPF skels for
augmenting syscalls in 'perf trace':
⬢[acme@toolbox perf-tools-next]$ git log --oneline tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
a9f4c6c999008c92 perf trace: Collect sys_nanosleep first argument
29d16de26df17e94 perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h
5069211e2f0b47e7 perf trace: Use the right bpf_probe_read(_str) variant for reading user data
33b725ce7b988756 perf trace: Avoid compile error wrt redefining bool
7d9642311b6d9d31 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two.
262b54b6c9396823 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two.
1836480429d173c0 perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr)
cd2cece61ac5f900 perf trace: Tidy comments related to BPF + syscall augmentation
5e6da6be3082f77b perf trace: Migrate BPF augmentation to use a skeleton
⬢[acme@toolbox perf-tools-next]$
⬢[acme@toolbox perf-tools-next]$ git show --oneline --pretty=reference 5e6da6be3082f77b | head -1
5e6da6be3082f77b (perf trace: Migrate BPF augmentation to use a skeleton, 2023-08-10)
⬢[acme@toolbox perf-tools-next]$
I.e. from August, 2023.
One had as well to ask for BUILD_BPF_SKEL=1, which now is default if all
it needs is available on the system.
I simplified the code to not expose the 'struct syscall' outside of
tools/perf/util/syscalltbl.c, instead providing a function to go from
the index to the syscall id:
int syscalltbl__id_at_idx(struct syscalltbl *tbl, int idx);
Signed-off-by: Howard Chu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/lkml/ZmhlAxbVcAKoPTg8@x1
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
Various files had been missed from having accessor functions added for
the sake of dso reference count checking. Add the function calls and
missing dso accessor functions.
Fixes: ee756ef7491e ("perf dso: Add reference count checking and accessor functions")
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Yunseong Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
It is possible that memory events are not supported on all CPUs.
Prints a warning by dumping the enabled CPU maps in this case.
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: John Garry <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.
Fixes: 3f4ac23a9908 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Steinar Gunderson <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Matt Fleming <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
The array is sorted, so just move the elements and insert in order.
Fixes: 13ca628716c6 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Matt Fleming <[email protected]>
Cc: Steinar Gunderson <[email protected]>
Cc: Athira Rajeev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
./tools/perf/util/pmu.c:1776:49-50: Unneeded semicolon
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9443
Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
It didn't use the passed field separator (using -x option) when it
prints the metric headers and always put "," between the fields.
Before:
$ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true
core,cpus,% tma_core_bound: <<<--- here: "core,cpus," but ":" expected
S0-D0-C0:2:10.5:
S0-D0-C1:2:14.8:
S0-D0-C2:2:9.9:
S0-D0-C3:2:13.2:
After:
$ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true
core:cpus:% tma_core_bound:
S0-D0-C0:2:10.5:
S0-D0-C1:2:15.0:
S0-D0-C2:2:16.5:
S0-D0-C3:2:12.5:
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
The new --per-cluster option was added recently but it forgot to update
the aggr_header fields which are used for --metric-only option. And it
resulted in a segfault due to NULL string in fputs().
Fixes: cbc917a1b03b ("perf stat: Support per-cluster aggregation")
Reviewed-by: Yicong Yang <[email protected]>
Tested-by: Yicong Yang <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
Arm PMUs have a suffix, either a single decimal (armv8_pmuv3_0) or 3 hex
digits which (armv8_cortex_a53) which Perf assumes are both strippable
suffixes for the purposes of deduplication. S390 "cpum_cf" is a
similarly suffixed core PMU but is only two characters so is not treated
as strippable because the rules are a minimum of 3 hex characters or 1
decimal character.
There are two paths involved in listing PMU events:
* HW/cache event printing assumes core PMUs don't have suffixes so
doesn't try to strip.
* Sysfs PMU events share the printing function with uncore PMUs which
strips.
This results in slightly inconsistent Perf list behavior if a core PMU
has a suffix:
# perf list
...
armv8_pmuv3_0/branch-load-misses/
armv8_pmuv3/l3d_cache_wb/ [Kernel PMU event]
...
Fix it by partially reverting back to the old list behavior where
stripping was only done for uncore PMUs. For example commit 8d9f5146f5da
("perf pmus: Sort pmus by name then suffix") mentions that only PMUs
starting 'uncore_' are considered to have a potential suffix. This
change doesn't go back that far, but does only strip PMUs that are
!is_core. This keeps the desirable behavior where the many possibly
duplicated uncore PMUs aren't repeated, but it doesn't break listing for
core PMUs.
Searching for a PMU continues to use the new stripped comparison
functions, meaning that it's still possible to request an event by
specifying the common part of a PMU name, or even open events on
multiple similarly named PMUs. For example:
# perf stat -e armv8_cortex/inst_retired/
5777173628 armv8_cortex_a53/inst_retired/ (99.93%)
7469626951 armv8_cortex_a57/inst_retired/ (49.88%)
Fixes: 3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
Suggested-by: Ian Rogers <[email protected]>
Signed-off-by: James Clark <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit b2b9d3a3f021 ("perf pmu: Support wildcards on pmu name in dynamic
pmu events") gives the following example for wildcarding a subset of
PMUs:
E.g., in a system with the following dynamic pmus:
mypmu_0
mypmu_1
mypmu_2
mypmu_4
perf stat -e mypmu_[01]/<config>/
Since commit f91fa2ae6360 ("perf pmu: Refactor perf_pmu__match()"), only
"*" has been supported, removing the ability to subset PMUs, even though
parse-events.l still supports ? and [] characters.
Fix it by using fnmatch() when any glob character is detected and add a
test which covers that and other scenarios of
perf_pmu__match_ignoring_suffix().
Fixes: f91fa2ae6360 ("perf pmu: Refactor perf_pmu__match()")
Signed-off-by: James Clark <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It's possible to save pipe output of perf record into a file.
$ perf record -o- ... > pipe.data
And you can use the data same as the normal perf data.
$ perf report -i pipe.data
In that case, perf tools will treat the input as a pipe, but it can get
the total size of the input. This means it can show the progress bar
unlike the normal pipe input (which doesn't know the total size in
advance).
While at it, fix the string in __perf_session__process_dir_events().
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The python build now depends on libraries and doesn't use
python-ext-sources except for the util/python.c dependency. Switch to
just directly depending on that file and util/setup.py. This allows
the removal of python-ext-sources.
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
setup.py was building most perf sources causing setup.py to mimic the
Makefile logic as well as flex/bison code to be stubbed out, due to
complexity building. By using libraries fewer functions are stubbed
out, the build is faster and the Makefile logic is reused which should
simplify updating. The libraries are passed through LDFLAGS to avoid
complexity in python.
Force the -fPIC flag for libbpf.a to ensure it is suitable for linking
into the perf python module.
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make the util directory into its own library. This is done to avoid
compiling code twice, once for the perf tool and once for the perf
python module. For convenience:
arch/common.c
scripts/perl/Perf-Trace-Util/Context.c
scripts/python/Perf-Trace-Util/Context.c
are made part of this library.
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Guilherme reported a crash in perf mem record. It's because the
perf_mem_event->name was NULL on his machine. It should just return
a NULL string when it has no format string in the name.
The backtrace at the crash is below:
Program received signal SIGSEGV, Segmentation fault.
__strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
67 vmovdqu (%rdi), %ymm2
(gdb) bt
#0 __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
#1 0x00007ffff6c982de in __find_specmb (format=0x0) at printf-parse.h:82
#2 __printf_buffer (buf=buf@entry=0x7fffffffc760, format=format@entry=0x0, ap=ap@entry=0x7fffffffc880,
mode_flags=mode_flags@entry=0) at vfprintf-internal.c:649
#3 0x00007ffff6cb7840 in __vsnprintf_internal (string=<optimized out>, maxlen=<optimized out>, format=0x0,
args=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vsnprintf.c:96
#4 0x00007ffff6cb787f in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>,
args=<optimized out>) at vsnprintf.c:103
#5 0x00005555557b9391 in scnprintf (buf=0x555555fe9320 <mem_loads_name> "", size=100, fmt=0x0)
at ../lib/vsprintf.c:21
#6 0x00005555557b74c3 in perf_pmu__mem_events_name (i=0, pmu=0x555556832180) at util/mem-events.c:106
#7 0x00005555557b7ab9 in perf_mem_events__record_args (rec_argv=0x55555684c000, argv_nr=0x7fffffffca20)
at util/mem-events.c:252
#8 0x00005555555e370d in __cmd_record (argc=3, argv=0x7fffffffd760, mem=0x7fffffffcd80) at builtin-mem.c:156
#9 0x00005555555e49c4 in cmd_mem (argc=4, argv=0x7fffffffd760) at builtin-mem.c:514
#10 0x000055555569716c in run_builtin (p=0x555555fcde80 <commands+672>, argc=8, argv=0x7fffffffd760) at perf.c:349
#11 0x0000555555697402 in handle_internal_command (argc=8, argv=0x7fffffffd760) at perf.c:402
#12 0x0000555555697560 in run_argv (argcp=0x7fffffffd59c, argv=0x7fffffffd590) at perf.c:446
#13 0x00005555556978a6 in main (argc=8, argv=0x7fffffffd760) at perf.c:562
Reported-by: Guilherme Amadio <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/linux-perf-users/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A compiler warning on the second argument of bsearch() should not be
NULL, but there's a case we might pass it. Let's return early if we
don't have any DSOs to search in __dsos__find_by_longname_id().
util/dsos.c:184:8: runtime error: null pointer passed as argument 2, which is declared to never be null
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In dso__load(), it checks if the dso is a kernel module by looking the
symtab type. Actually dso has 'is_kmod' field to check that easily and
dso__set_module_info() set the symtab type and the is_kmod bit. So it
should have the same result to check the is_kmod bit.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It's expected that both hist entries are in the same hists when
comparing two. But the current code in the function checks one without
dso sort key and other with the key. This would make the condition true
in any case.
I guess the intention of the original commit was to add '!' for the
right side too. But as it should be the same, let's just remove it.
Fixes: 69849fc5d2119 ("perf hists: Move sort__has_dso into struct perf_hpp_list")
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In the previous loop, all the members in the aliases[j-1] have been freed
and set to NULL. But in this loop, the function pmu_alias_is_duplicate()
compares the aliases[j] with the aliases[j-1] that has already been
disposed, so the function will always return false and duplicate aliases
will never be discarded.
If we find duplicate aliases, it skips the zfree aliases[j], which is
accompanied by a memory leak.
We can use the next aliases[j+1] to theck for duplicate aliases to
fixes the aliases NULL pointer dereference, then goto zfree code snippet
to release it.
After patch testing:
$ perf list --unit=hisi_sicl,cpa pmu
uncore cpa:
cpa_p0_rd_dat_32b
[Number of read ops transmitted by the P0 port which size is 32 bytes.
Unit: hisi_sicl,cpa]
cpa_p0_rd_dat_64b
[Number of read ops transmitted by the P0 port which size is 64 bytes.
Unit: hisi_sicl,cpa]
Fixes: c3245d2093c1 ("perf pmu: Abstract alias/event struct")
Signed-off-by: Junhao He <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add malloc() failure handling in unread_unwind_spec_debug_frame().
This make caller find_proc_info() works well when the allocation failure.
Signed-off-by: Yunseong Kim <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Austin Kim <[email protected]>
Cc: [email protected]
Cc: Ze Gao <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch resolve following warning.
tools/perf/util/evsel.c:1620:9: error: result of comparison of constant
-1 with expression of type 'char' is always false
-Werror,-Wtautological-constant-out-of-range-compare
1620 | if (c == -1)
| ~ ^ ~~
Signed-off-by: Yunseong Kim <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Austin Kim <[email protected]>
Cc: [email protected]
Cc: Ze Gao <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
REX2 prefix. JMPABS is designed to be used in the procedure linkage table
(PLT) to replace indirect jumps, because it has better performance. In that
case the jump target will be amended at run time. To enable Intel PT to
follow the code, a TIP packet is always emitted when JMPABS is traced under
Intel PT.
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
Decode JMPABS as an indirect jump, because it has an associated TIP packet
the same as an indirect jump and the control flow should follow the TIP
packet payload, and not assume it is the same as the on-file object code
JMPABS target address.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
pattern /tmp/perf-%d.map
commit 80d496be89ed ("perf report: Add support for profiling JIT
generated code") added support for profiling JIT generated code.
This patch handles dso's of form "/tmp/perf-$PID.map".
Some of the references doesn't check exactly for same pattern.
some uses "if (!strncmp(dso_name, "/tmp/perf-", 10))". Fix
this by using helper function perf_pid_map_tid and
is_perf_pid_map_name which looks for proper pattern of
form: "/tmp/perf-$PID.map" for these checks.
Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Perf test for perf probe of function from different CU fails
as below:
./perf test -vv "test perf probe of function from different CU"
116: test perf probe of function from different CU:
--- start ---
test child forked, pid 2679
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile
Error: Failed to add events.
--- Cleaning up ---
"foo" does not hit any event.
Error: Failed to delete events.
---- end(-1) ----
116: test perf probe of function from different CU : FAILED!
The test does below to probe function "foo" :
# gcc -g -Og -flto -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
# gcc -g -Og -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
# gcc -g -Og -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
# ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
Error: Failed to add events.
Perf probe fails to find symbol foo in the executable placed in
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7
Simple reproduce:
# mktemp -d /tmp/perf-checkXXXXXXXXXX
/tmp/perf-checkcWpuLRQI8j
# gcc -g -o test test.c
# cp test /tmp/perf-checkcWpuLRQI8j/
# nm /tmp/perf-checkcWpuLRQI8j/test | grep foo
00000000100006bc T foo
# ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo
Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test
Error: Failed to add events.
But it works with any files like /tmp/perf/test. Only for
patterns with "/tmp/perf-", this fails.
Further debugging, commit 80d496be89ed ("perf report: Add support
for profiling JIT generated code") added support for profiling JIT
generated code. This patch handles dso's of form
"/tmp/perf-$PID.map" .
The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)"
to match "/tmp/perf-$PID.map". With this commit, any dso in
/tmp/perf- folder will be considered separately for processing
(not only JIT created map files ). Fix this by changing the
string pattern to check for "/tmp/perf-%d.map". Add a helper
function is_perf_pid_map_name to do this check. In "struct dso",
dso->long_name holds the long name of the dso file. Since the
/tmp/perf-$PID.map check uses the complete name, use dso___long_name for
the string name.
With the fix,
# ./perf test "test perf probe of function from different CU"
117: test perf probe of function from different CU : Ok
Fixes: 56cbeacf1435 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Chaitanya S Prakash <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The hard-coded metrics is wrongly calculated on the hybrid machine.
$ perf stat -e cycles,instructions -a sleep 1
Performance counter stats for 'system wide':
18,205,487 cpu_atom/cycles/
9,733,603 cpu_core/cycles/
9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle
4,268,965 cpu_core/instructions/ # 0.23 insn per cycle
The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.
When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.
In the hard-coded metrics, the events from a different PMU are only
SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type,
STAT_NSECS. Except the SW CLOCK events, check the PMU type as well.
Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value")
Reported-by: Khalil, Amiri <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
So that it can skip events with no sample according to the config value.
This can omit the dummy event in the output of perf report --group.
An example output:
$ sudo perf mem record -a sleep 1
$ sudo perf report --group
Before)
#
# Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u'
# Event count (approx.): 3089861
#
# Overhead Command Shared Object Symbol
# ........................ ........... ................. .....................................
#
9.29% 0.00% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages
5.26% 0.15% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_se
4.15% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0
3.87% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook
3.79% 0.17% 0.00% swapper [kernel.kallsyms] [k] enqueue_task_fair
3.63% 0.00% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page
2.86% 0.00% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq
2.78% 0.00% 0.00% swapper [kernel.kallsyms] [k] __schedule
2.34% 0.00% 0.00% swapper [kernel.kallsyms] [k] intel_idle
2.32% 0.97% 0.00% swapper [kernel.kallsyms] [k] psi_group_change
After)
#
# Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P'
# Event count (approx.): 3089861
#
# Overhead Command Shared Object Symbol
# ................ ........... ................. .....................................
#
9.29% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages
5.26% 0.15% swapper [kernel.kallsyms] [k] __update_load_avg_se
4.15% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0
3.87% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook
3.79% 0.17% swapper [kernel.kallsyms] [k] enqueue_task_fair
3.63% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page
2.86% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq
2.78% 0.00% swapper [kernel.kallsyms] [k] __schedule
2.34% 0.00% swapper [kernel.kallsyms] [k] intel_idle
2.32% 0.97% swapper [kernel.kallsyms] [k] psi_group_change
Now it doesn't have a column for the dummy event.
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the skip_empty flag to symbol_conf and set the value from the report
command to preserve the existing behavior. This makes the code simpler
and will be needed other code which is hard to add a new argument.
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so:
- duration_time - call `rdclock` on open and then report the count as
a delta since the start in evsel__read_counter. This moves code out
of builtin-stat making it more general purpose.
- user_time/system_time - open the fd as either `/proc/pid/stat` or
`/proc/stat` for cases like system wide. evsel__read_counter will
read the appropriate field out of the procfs file. These values
were previously supplied by wait4, if the procfs read fails then
the wait4 values are used, assuming the process/thread terminated.
By reading user_time and system_time this way, interval mode, per
PID and per CPU can be supported although there are restrictions
given what the files provide (e.g. per PID can't be combined with
per CPU).
Opening any of the tool events for `perf record` is changed to return
invalid.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Weilin Wang <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: James Clark <[email protected]>
Cc: Dmitrii Dolgov <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Data may have lots of overlapping mmaps. The regular insert adds at
the end and relies on a later sort. For data with overlapping mappings
the sort will happen during a subsequent maps__find or
__maps__fixup_overlap_and_insert, there's never a period where the
inserted maps buffer up and a single sort happens. To avoid back to
back sorts, maintain the sort order when fixing up and
inserting. Previously the first_ending_after search was O(log n) where
n is the size of maps, and the insert was O(1) but because of the
continuous sorting was becoming O(n*log(n)). With maintaining sort
order, the insert now becomes O(n) for a memmove.
For a perf report on a perf.data file containing overlapping mappings
the time numbers are:
Before:
real 0m5.894s
user 0m5.650s
sys 0m0.231s
After:
real 0m0.675s
user 0m0.454s
sys 0m0.196s
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When an 'after' map is generated the 'new' map must be before it so
terminate iterating and don't resort. If the entry 'pos' is entirely
overlapped by the 'new' mapping then don't remove and insert the
mapping, just replace - again to remove sorting.
For a perf report on a perf.data file containing overlapping mappings
the time numbers are:
Before:
real 0m9.856s
user 0m9.637s
sys 0m0.204s
After:
real 0m5.894s
user 0m5.650s
sys 0m0.231s
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In the case 'before' and 'after' are broken out from pos,
maps_by_address may be changed by __maps__insert, as such it needs
re-reading.
Don't ignore the return value from __maps_insert.
Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Compiling perf tool with 'DEBUG_PARSER=1' leads to errors:
$> make -C tools/perf PARSER_DEBUG=1 NO_LIBTRACEEVENT=1
...
CC util/expr-flex.o
CC util/expr.o
util/parse-events.c:33:12: error: redundant redeclaration of ‘parse_events_debug’ [-Werror=redundant-decls]
33 | extern int parse_events_debug;
| ^~~~~~~~~~~~~~~~~~
In file included from util/parse-events.c:18:
util/parse-events-bison.h:43:12: note: previous declaration of ‘parse_events_debug’ with type ‘int’
43 | extern int parse_events_debug;
| ^~~~~~~~~~~~~~~~~~
util/expr.c:27:12: error: redundant redeclaration of ‘expr_debug’ [-Werror=redundant-decls]
27 | extern int expr_debug;
| ^~~~~~~~~~
In file included from util/expr.c:11:
util/expr-bison.h:43:12: note: previous declaration of ‘expr_debug’ with type ‘int’
43 | extern int expr_debug;
| ^~~~~~~~~~
cc-1: all warnings being treated as errors
Remove extern declaration from the parse-envents.c file as there is a
conflict with the ones generated using bison and yacc tools from the file
parse-events.[ly].
Signed-off-by: Clément Le Goffic <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
'hisi_ptt_queue' has been unused since the original
commit 5e91e57e6809 ("perf auxtrace arm64: Add support for parsing
HiSilicon PCIe Trace packet").
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|