Age | Commit message (Collapse) | Author | Files | Lines |
|
While rendering annotate browser from perf report tui, we keep track
of total number of lines(asm + source) in annotation->nr_entries and
total number of asm lines in annotation->nr_asm_entries. But we don't
reset them before starting. Thus if user annotates same function
multiple times, we restart incrementing these fields with old values.
This causes a segfault when user tries to toggle source code after
annotating same function multiple times. Fix it.
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Align fields of struct annotate_args.
Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We are allocating disasm_line object in annotation_line__new() instead
of disasm_line__new(). Similarly annotation_line__delete() is actually
freeing disasm_line object as well. This complexity is because of
privsize. But we don't need privsize anymore so get rid of privsize and
simplify disasm_line allocation and freeing code.
Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
privsize is passed as 0 from all the symbol__annotate() callers.
Remove it from argument list.
Signed-off-by: Ravi Bangoria <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
strlist__add() may fail with -ENOMEM. Check it and give debugging hint
in advance.
Signed-off-by: He Zhe <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
While documenting annotate.show_nr_samples config option, I found many
other config options missing in perf-config documentation. Add them.
Signed-off-by: Ravi Bangoria <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf config annotate options says it works only with TUI, which is wrong.
Most of the TUI options are applicable to stdio2 as well. So remove that
generic line and add individual line with each option stating which
browsers supports that option. Also, annotate.show_nr_samples config is
missing in Documentation. Describe it.
Signed-off-by: Ravi Bangoria <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For all the perf-config options that can also be set from command line
option, the preference is given to command line version in case of any
conflict. But that's opposite in case of perf annotate. i.e. the more
preference is given to default option rather than command line option.
Fix it.
Before:
$ ./perf config
annotate.show_nr_samples=false
$ ./perf annotate shash --show-nr-samples
Percent│
│24: mov -0xc(%rbp),%eax
49.19 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax
After:
Samples│
│24: mov -0xc(%rbp),%eax
1 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf default config set by user in [annotate] section is totally ignored
by annotate code. Fix it.
Before:
$ ./perf config
annotate.hide_src_code=true
annotate.show_nr_jumps=true
annotate.show_nr_samples=true
$ ./perf annotate shash
│ unsigned h = 0;
│ movl $0x0,-0xc(%rbp)
│ while (*s)
│ ↓ jmp 44
│ h = 65599 * h + *s++;
11.33 │24: mov -0xc(%rbp),%eax
43.50 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax
After:
│ movl $0x0,-0xc(%rbp)
│ ↓ jmp 44
1 │1 24: mov -0xc(%rbp),%eax
4 │ imul $0x1003f,%eax,%ecx
│ mov -0x18(%rbp),%rax
Note that we have removed show_nr_samples and show_total_period from
annotation_options because they are not used. Instead of them we use
symbol_conf.show_nr_samples and symbol_conf.show_total_period.
Committer testing:
Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:
# perf config
#
# perf config annotate.hide_src_code=true
# perf config
annotate.hide_src_code=true
#
# perf config annotate.show_nr_jumps=true
# perf config annotate.show_nr_samples=true
# perf config
annotate.hide_src_code=true
annotate.show_nr_jumps=true
annotate.show_nr_samples=true
#
#
Before:
# perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Percent
00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@Base>:
endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
100.00 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1b: xor %eax,%eax
pop %rbp
← retq
nop
20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
cmpq $0x0,0x18(%rbp)
↑ jne 1b
mov %rbp,%rdi
→ callq ObjectBase::jsobj_addr() const@plt
mov $0x1,%eax
pop %rbp
← retq
#
After:
# perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Samples endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
1 1 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1 1b: xor %eax,%eax
pop %rbp
← retq
nop
1 20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
cmpq $0x0,0x18(%rbp)
↑ jne 1b
mov %rbp,%rdi
→ callq ObjectBase::jsobj_addr() const@plt
mov $0x1,%eax
pop %rbp
← retq
#
# perf config annotate.show_nr_jumps
annotate.show_nr_jumps=true
# perf config annotate.show_nr_jumps=false
# perf config annotate.show_nr_jumps
annotate.show_nr_jumps=false
#
# perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
Samples endbr64
cmpq $0x0,0x20(%rdi)
↓ je 10
xor %eax,%eax
← retq
xchg %ax,%ax
1 10: push %rbp
cmpq $0x0,0x18(%rdi)
mov %rdi,%rbp
↓ jne 20
1b: xor %eax,%eax
pop %rbp
← retq
nop
20: lea 0x18(%rdi),%rdi
→ callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
cmpq $0x0,0x18(%rbp)
↑ jne 1b
mov %rbp,%rdi
→ callq ObjectBase::jsobj_addr() const@plt
mov $0x1,%eax
pop %rbp
← retq
#
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Introduce perf_config_u8() utility function to convert char * input into
u8 destination. We will utilize it in followup patch.
Signed-off-by: Ravi Bangoria <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf annotate --show-nr-samples does not really show number of samples.
The reason is we have two separate variables for the same purpose.
One is in symbol_conf.show_nr_samples and another is
annotation_options.show_nr_samples.
We save command line option in symbol_conf.show_nr_samples but uses
annotation_option.show_nr_samples while rendering tui/stdio2 browser.
Though, we copy symbol_conf.show_nr_samples to
annotation__default_options.show_nr_samples but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().
Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_nr_samples is used by perf report/top as well. So
let's kill annotation_options.show_nr_samples.
On a side note, I've kept annotation_options.show_nr_samples definition
because it's still used by perf-config code. Follow up patch to fix
perf-config for annotate will remove annotation_options.show_nr_samples.
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf annotate --show-total-period does not really show total period.
The reason is we have two separate variables for the same purpose.
One is in symbol_conf.show_total_period and another is
annotation_options.show_total_period.
We save command line option in symbol_conf.show_total_period but uses
annotation_option.show_total_period while rendering tui/stdio2 browser.
Though, we copy symbol_conf.show_total_period to
annotation__default_options.show_total_period but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().
Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_total_period is used by perf report/top as well.
So let's kill annotation_options.show_total_period.
On a side note, I've kept annotation_options.show_total_period
definition because it's still used by perf-config code. Follow up patch
to fix perf-config for annotate will remove
annotation_options.show_total_period.
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The 'perf annotate' TUI browser provides a 'r' hot key to switch to a
script browser. But the annotate browser title bar becomes hidden while
switching back from script browser. Fix it.
Signed-off-by: Ravi Bangoria <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yisheng Xie <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Copy over powerpc syscall.tbl to grab changes from the below commits
fddb5d430ad9 ("open: introduce openat2(2) syscall")
9a2cef09c801 ("arch: wire up pidfd_getfd syscall")
Now 'perf trace' on powerpc will be able to map from those syscall
strings to the right syscall numbers, i.e.
perf trace -e pidfd*
Will include 'pidfd_getfd' as well as:
perf trace open*
Will cover all 'open' variants.
Reported-by: Stephen Rothwell <[email protected]>
Reviewed-by: Ravi Bangoria <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
All ->read_finish() implementations are doing the same thing. Add a
helper function so that they can share the same implementation.
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Tested-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Wei Li <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will
be set and the event list will be disabled by evlist__disable() once.
While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be
endless.
If the event is disabled, don't enable it again here.
Based-on-patch-by: Wei Li <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Tan Xiaojun <[email protected]>
Cc: [email protected] # 5.4+
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will
be set and the event list will be disabled by evlist__disable() once.
While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be
endless.
If the cs_etm event is disabled, we don't enable it again here.
Note: This patch is NOT tested since i don't have such a machine with
coresight feature, but the code seems buggy same as arm-spe and
intel-pt.
Tester notes:
Thanks for looping, Adrian. Applied this patch and tested with
CoreSight on juno board, it works well.
Signed-off-by: Wei Li <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Tan Xiaojun <[email protected]>
Cc: [email protected] # 5.4+
Link: http://lore.kernel.org/lkml/[email protected]
[ahunter: removed redundant 'else' after 'return']
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will
be set and the event list will be disabled by evlist__disable() once.
While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be
endless.
If the intel_bts event is disabled, we don't enable it again here.
Note: This patch is NOT tested since i don't have such a machine with
intel_bts feature, but the code seems buggy same as arm-spe and
intel-pt.
Signed-off-by: Wei Li <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Tan Xiaojun <[email protected]>
Cc: [email protected] # 5.4+
Link: http://lore.kernel.org/lkml/[email protected]
[ahunter: removed redundant 'else' after 'return']
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will
be set and the event list will be disabled by evlist__disable() once.
While in auxtrace_record.read_finish(), the related events will be
enabled again, if they are continuous, the recording seems to be endless.
If the intel_pt event is disabled, we don't enable it again here.
Before the patch:
huawei@huawei-2288H-V5:~/linux-5.5-rc4/tools/perf$ ./perf record -e \
intel_pt//u -p 46803
^C^C^C^C^C^C
After the patch:
huawei@huawei-2288H-V5:~/linux-5.5-rc4/tools/perf$ ./perf record -e \
intel_pt//u -p 48591
^C[ perf record: Woken up 0 times to write data ]
Warning:
AUX data lost 504 times out of 4816!
[ perf record: Captured and wrote 2024.405 MB perf.data ]
Signed-off-by: Wei Li <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Tan Xiaojun <[email protected]>
Cc: [email protected] # 5.4+
Link: http://lore.kernel.org/lkml/[email protected]
[ ahunter: removed redundant 'else' after 'return' ]
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This test places a kprobe to function getname_flags() in the kernel
which has the following prototype:
struct filename *getname_flags(const char __user *filename, int flags, int *empty)
The 'filename' argument points to a filename located in user space memory.
Looking at commit 88903c464321c ("tracing/probe: Add ustring type for
user-space string") the kprobe should indicate that user space memory is
accessed.
Output before:
[root@m35lp76 perf]# ./perf test 66 67
66: Use vfs_getname probe to get syscall args filenames : FAILED!
67: Check open filename arg using perf trace + vfs_getname: FAILED!
[root@m35lp76 perf]#
Output after:
[root@m35lp76 perf]# ./perf test 66 67
66: Use vfs_getname probe to get syscall args filenames : Ok
67: Check open filename arg using perf trace + vfs_getname: Ok
[root@m35lp76 perf]#
Comments from Masami Hiramatsu:
This bug doesn't happen on x86 or other archs on which user address
space and kernel address space is the same. On some arches (ppc64 in
this case?) user address space is partially or completely the same as
kernel address space.
(Yes, they switch the world when running into the kernel) In this case,
we need to use different data access functions for each space.
That is why I introduced the "ustring" type for kprobe events.
As far as I can see, Thomas's patch is sane. Thomas, could you show us
your result on your test environment?
Comments from Thomas Richter:
Test results for s/390 included above.
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The bpf.h file needed gets installed in /usr/lib/include/perf/bpf/bpf.h,
and /usr/lib/include/perf/ is added to the include path passed to clang
to build the eBPF bytecode, so just remove "bpf/", its directly in the
path passed already. This was working by accident, fix it.
I.e. now this is back working:
# cat /home/acme/git/perf/tools/perf/examples/bpf/hello.c
#include <stdio.h>
int syscall_enter(openat)(void *args)
{
puts("Hello, world\n");
return 0;
}
license(GPL);
# perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/hello.c
0.000 pickup/21493 __bpf_stdout__(Hello, world)
56.462 sh/13539 __bpf_stdout__(Hello, world)
56.536 sh/13539 __bpf_stdout__(Hello, world)
56.673 sh/13539 __bpf_stdout__(Hello, world)
56.781 sh/13539 __bpf_stdout__(Hello, world)
56.707 perf/13182 __bpf_stdout__(Hello, world)
56.849 perf/13182 __bpf_stdout__(Hello, world)
^C
#
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
kbuild
Before this patch:
# ./perf test 39 41
39: LLVM search and compile :
39.1: Basic BPF llvm compile : Ok
39.2: kbuild searching : FAILED!
39.3: Compile source for BPF prologue generation : Skip
39.4: Compile source for BPF relocation : Skip
41: BPF filter :
41.1: Basic BPF filtering : Ok
41.2: BPF pinning : Ok
41.3: BPF prologue generation : FAILED!
41.4: BPF relocation checker : Skip
#
Using 'perf test -v' for these tests shows that it is not finding
uapi/linux/fs.h, which ends up being because we don't setup the right header
path. Fix it.
After this patch:
# perf test 39 41
39: LLVM search and compile :
39.1: Basic BPF llvm compile : Ok
39.2: kbuild searching : Ok
39.3: Compile source for BPF prologue generation : Ok
39.4: Compile source for BPF relocation : Ok
41: BPF filter :
41.1: Basic BPF filtering : Ok
41.2: BPF pinning : Ok
41.3: BPF prologue generation : Ok
41.4: BPF relocation checker : Ok
#
Longer description:
In llvm-utils.c we use some techniques to obtain the kbuild make
directives and that recently stopped working as now 'ar' gets called and
expects to find the dummy.o used to echo these variables:
$(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS)
Add the $(CC) line to satisfy that, making sure this works with all
kernels, i.e. preserving the temp directory and files in it used for
this technique we can see that it works everywhere:
# make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ clean
# ls -la /tmp/tmp.qgaFHgxjZ4/
total 4
drwx------. 2 root root 80 Feb 14 09:42 .
drwxrwxrwt. 47 root root 1200 Feb 14 09:42 ..
-rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c
-rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile
#
# cat /tmp/tmp.qgaFHgxjZ4/Makefile
obj-y := dummy.o
$(obj)/%.o: $(src)/%.c
@echo -n "$(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS)"
$(CC) -c -o $@ $<
#
Then build with an old kernel Makefile:
# make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ dummy.o
-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/9/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
#
# ls -la /tmp/tmp.qgaFHgxjZ4/
total 8
drwx------. 2 root root 100 Feb 14 09:43 .
drwxrwxrwt. 47 root root 1200 Feb 14 09:43 ..
-rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c
-rw-r--r--. 1 root root 936 Feb 14 09:43 dummy.o
-rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile
#
And a new one:
# make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ clean
# ls -la /tmp/tmp.qgaFHgxjZ4/
total 4
drwx------. 2 root root 80 Feb 14 09:43 .
drwxrwxrwt. 47 root root 1200 Feb 14 09:43 ..
-rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c
-rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile
# make -s -C /lib/modules/5.6.0-rc1+/build M=/tmp/tmp.qgaFHgxjZ4/ dummy.o
-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/9/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
#
# ls -la /tmp/tmp.qgaFHgxjZ4/
total 16
drwx------. 2 root root 160 Feb 14 09:44 .
drwxrwxrwt. 47 root root 1200 Feb 14 09:44 ..
-rw-r--r--. 1 root root 158 Feb 14 09:44 built-in.a
-rw-r--r--. 1 root root 149 Feb 14 09:44 .built-in.a.cmd
-rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c
-rw-r--r--. 1 root root 936 Feb 14 09:44 dummy.o
-rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile
-rw-r--r--. 1 root root 0 Feb 14 09:44 modules.order
#
Reported-by: Thomas Richter <[email protected]>
Tested-by: Thomas Richter <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Zefan Li <[email protected]>
Link: https://www.spinics.net/lists/linux-perf-users/msg10600.html
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add an arm64 version of get_cpuid(), which is used for various annotation
and headers - for example, I now get the CPUID in "perf report --header",
as shown in this snippet:
# hostname : ubuntu
# os release : 5.5.0-rc1-dirty
# perf version : 5.5.rc1.gbf8a13dc9851
# arch : aarch64
# nrcpus online : 96
# nrcpus avail : 96
# cpuid : 0x00000000480fd010
Since much of the code to read the MIDR is already in get_cpuid_str(),
factor out this code.
Tester notes:
I tested this patch on my new ARM64 Kunpeng 920 server.
[root@node1 zsk]# ./perf --version
perf version 5.6.rc1.g2cdb955b7252
Both perf list and perf stat can work.
Signed-off-by: John Garry <[email protected]>
Tested-by: Shaokun Zhang <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
# perf trace -e syscalls:sys_enter_prctl --filter="option==SET_NAME"
0.000 Socket Thread/3860 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7fc50b9733e8)
0.053 SSL Cert #78/3860 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7fc50b9733e8)
^C #
If one uses '-v' with 'perf trace', we can see the filter it puts in
place:
New filter for syscalls:sys_enter_prctl: (option==0xf) && (common_pid != 3859 && common_pid != 2757)
We still need to allow using plain '-e prctl' and have this turn into
creating a 'syscalls:sys_enter_prctl' event so that the filter can be
applied only to it as right now '-e prctl' ends up using the
'raw_syscalls:sys_enter/sys_exit'.
The end goal is to have something like:
# perf trace -e prctl/option==SET_NAME/
And have that use tracepoint filters or eBPF ones.
Cc: Adrian Hunter <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So that we can use it with strtoul, allowing string to number
conversions in filter expressions.
Cc: Adrian Hunter <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So the kmaps pointer setup is centralized and we do not need to update
it in all those places (2 current places and few more missing) after
calling maps__insert().
Reported-by: Ravi Bangoria <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Tested-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The map__clone() function can be called on kernel maps as well, so it
needs to duplicate the whole kmap data.
Reported-by: Ravi Bangoria <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Tested-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We add ksymbol map into machine->kmaps, so it needs to be created as
'struct kmap', which is dependent on its dso having kernel type.
Reported-by: Ravi Bangoria <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Tested-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/20200210200847.GA36715@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We add kernel module map into machine->kmaps, so it needs to be created
as 'struct kmap', which is dependent on its dso having kernel type.
Reported-by: Ravi Bangoria <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Kim Phillips <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
to pick up openat2 and pidfd_getfd
fddb5d430ad9 ("open: introduce openat2(2) syscall")
9a2cef09c801 ("arch: wire up pidfd_getfd syscall")
We also need to grab a copy of uapi/linux/openat2.h since it is now
needed by fcntl.h, add it to tools/perf/check_headers.h.
$ diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
--- tools/perf/arch/x86/entry/syscalls/syscall_64.tbl 2019-12-20 16:43:57.662429958 -0300
+++ arch/x86/entry/syscalls/syscall_64.tbl 2020-02-10 16:36:22.070012468 -0300
@@ -357,6 +357,8 @@
433 common fspick __x64_sys_fspick
434 common pidfd_open __x64_sys_pidfd_open
435 common clone3 __x64_sys_clone3/ptregs
+437 common openat2 __x64_sys_openat2
+438 common pidfd_getfd __x64_sys_pidfd_getfd
#
# x32-specific system call numbers start at 512 to avoid cache impact
$
Update tools/'s copy of that file:
$ cp arch/x86/entry/syscalls/syscall_64.tbl tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
See the result:
$ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
--- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before 2020-02-10 16:42:59.010636041 -0300
+++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2020-02-10 16:43:24.149958337 -0300
@@ -346,5 +346,7 @@
[433] = "fspick",
[434] = "pidfd_open",
[435] = "clone3",
+ [437] = "openat2",
+ [438] = "pidfd_getfd",
};
-#define SYSCALLTBL_x86_64_MAX_ID 435
+#define SYSCALLTBL_x86_64_MAX_ID 438
$
Now one can use:
perf trace -e openat2,pidfd_getfd
To get just those syscalls or use in things like:
perf trace -e open*
To get all the open variant (open, openat, openat2, etc) or:
perf trace pidfd*
To get the pidfd syscalls.
Cc: Adrian Hunter <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the more optimized strlist implementation to do the idle function
lookup.
Signed-off-by: Kim Phillips <[email protected]>
Acked-by: Song Liu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The "acpi_idle_do_entry", "acpi_processor_ffh_cstate_enter", and
"idle_cpu" symbols appear in 'perf top' output, at least on AMD systems.
Add them to perf's idle_symbols list, so they don't dominate 'perf top'
output.
Signed-off-by: Kim Phillips <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Song Liu <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For data collected on machines with front end stalled cycles supported,
such as found on modern AMD CPU families, commit 146540fb545b ("perf
stat: Always separate stalled cycles per insn") introduces a new line in
CSV output with a leading comma that upsets some automated scripts.
Scripts have to use "-e ex_ret_instr" to work around this issue, after
upgrading to a version of perf with that commit.
We could add "if (have_frontend_stalled && !config->csv_sep)" to the not
(total && avg) else clause, to emphasize that CSV users are usually
scripts, and are written to do only what is needed, i.e., they wouldn't
typically invoke "perf stat" without specifying an explicit event list.
But - let alone CSV output - why should users now tolerate a constant
0-reporting extra line in regular terminal output?:
BEFORE:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
181,110,981 instructions # 0.58 insn per cycle
# 0.00 stalled cycles per insn
309,876,469 cycles
1.002202582 seconds time elapsed
The user would not like to see the now permanent:
"0.00 stalled cycles per insn"
line fixture, as it gives no useful information.
So this patch removes the printing of the zeroed stalled cycles line
altogether, almost reverting the very original commit fb4605ba47e7
("perf stat: Check for frontend stalled for metrics"), which seems like
it was written to normalize --metric-only column output of common Intel
machines at the time: modern Intel machines have ceased to support the
genericised frontend stalled metrics AFAICT.
AFTER:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
244,071,432 instructions # 0.69 insn per cycle
355,353,490 cycles
1.001862516 seconds time elapsed
Output behaviour when stalled cycles is indeed measured is not affected
(BEFORE == AFTER):
$ sudo perf stat --all-cpus -einstructions,cycles,stalled-cycles-frontend -- sleep 1
Performance counter stats for 'system wide':
247,227,799 instructions # 0.63 insn per cycle
# 0.26 stalled cycles per insn
394,745,636 cycles
63,194,485 stalled-cycles-frontend # 16.01% frontend cycles idle
1.002079770 seconds time elapsed
Fixes: 146540fb545b ("perf stat: Always separate stalled cycles per insn")
Signed-off-by: Kim Phillips <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Song Liu <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf maps:
Cengiz Can:
- Add missing unlock to maps__insert() error case.
srcline:
Changbin Du:
- Make perf able to build with latest libbfd.
perf parse:
Leo Yan:
- Keep copy of string in perf_evsel_config_term() to fix sink terms
processing in ARM CoreSight.
perf test:
Thomas Richter:
- Fix test case Merge cpu map, removing extra reference count drop that
causes a segfault on s/390.
perf probe:
Thomas Richter:
- Add ustring support for perf probe command
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
`tools/perf/util/map.c` has a function named `maps__insert` that
acquires a write lock if its in multithread context.
Even though this lock is released when function successfully completes,
there's a branch that is executed when `maps_by_name == NULL` that
returns from this function without releasing the write lock.
Added an `up_write` to release the lock when this happens.
Fixes: a7c2b572e217 ("perf map_groups: Auto sort maps by name, if needed")
Signed-off-by: Cengiz Can <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Kernel commit 88903c464321 ("tracing/probe: Add ustring type for user-space string")
adds support for user-space strings when type 'ustring' is specified.
Here is an example using sysfs command line interface
for kprobes:
Function to probe:
struct filename *
getname_flags(const char __user *filename, int flags, int *empty)
Setup:
# cd /sys/kernel/debug/tracing/
# echo 'p:tmr1 getname_flags +0(%r2):ustring' > kprobe_events
# cat events/kprobes/tmr1/format | fgrep print
print fmt: "(%lx) arg1=\"%s\"", REC->__probe_ip, REC->arg1
# echo 1 > events/kprobes/tmr1/enable
# touch /tmp/111
# echo 0 > events/kprobes/tmr1/enable
# cat trace|fgrep /tmp/111
touch-5846 [005] d..2 255520.717960: tmr1:\
(getname_flags+0x0/0x400) arg1="/tmp/111"
Doing the same with the perf tool fails.
Using type 'string' succeeds:
# perf probe "vfs_getname=getname_flags:72 pathname=filename:string"
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=filename:string)
....
# perf probe -d probe:vfs_getname
Removed event: probe:vfs_getname
However using type 'ustring' fails (output before):
# perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring"
Failed to write event: Invalid argument
Error: Failed to add events.
#
Fix this by adding type 'ustring' in function
convert_variable_type().
Using ustring succeeds (output after):
# ./perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring"
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=filename:ustring)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_getname -aR sleep 1
#
Note: This issue also exists on x86, it is not s390 specific.
Signed-off-by: Thomas Richter <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
libbfd has changed the bfd_section_* macros to inline functions
bfd_section_<field> since 2019-09-18. See below two commits:
o http://www.sourceware.org/ml/gdb-cvs/2019-09/msg00064.html
o https://www.sourceware.org/ml/gdb-cvs/2019-09/msg00072.html
This fix make perf able to build with both old and new libbfd.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Commit a2408a70368a ("perf evlist: Maintain evlist->all_cpus")
introduces a test case for cpumap merge operation, see functions
perf_cpu_map__merge() and test__cpu_map_merge().
The test case fails on s390 with this error message:
[root@m35lp76 perf]# ./perf test -Fvvvvv 52
52: Merge cpu map :
--- start ---
cpumask list: 1-2,4-5,7
perf: /root/linux/tools/include/linux/refcount.h:131:\
refcount_sub_and_test: Assertion `!(new > val)' failed.
Aborted (core dumped)
[root@m35lp76 perf]#
The root cause is in the function test__cpu_map_merge():
It creates two cpu_maps named 'a' and 'b':
struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
and creates a third map named 'c' which is the result of
the merge of maps a and b:
struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
After some verifaction of the merged cpu_map all three
of them are have their reference count reduced and are
freed:
perf_cpu_map__put(a); (1)
perf_cpu_map__put(b);
perf_cpu_map__put(c);
The release of perf_cpu_map__put(a) is wrong. The map
is already released and free'ed as part of the function
perf_cpu_map__merge(struct perf_cpu_map *orig,
| struct perf_cpu_map *other)
+--> perf_cpu_map__put(orig);
|
+--> cpu_map__delete(orig)
At the end perf_cpu_map_put() is called for map 'orig'
alias 'a' and since the reference count is 1, the map
is deleted, as can be seen by the following gdb trace:
(gdb) where
#0 tcache_put (tc_idx=0, chunk=0x156cc30) at malloc.c:2940
#1 _int_free (av=0x3fffd49ee80 <main_arena>, p=0x156cc30,
have_lock=<optimized out>) at malloc.c:4222
#2 0x00000000012d5e78 in cpu_map__delete (map=0x156cc40) at cpumap.c:31
#3 0x00000000012d5f7a in perf_cpu_map__put (map=0x156cc40) at cpumap.c:45
#4 0x00000000012d723a in perf_cpu_map__merge (orig=0x156cc40,
other=0x156cc60) at cpumap.c:343
#5 0x000000000110cdd0 in test__cpu_map_merge (
test=0x14ea6c8 <generic_tests+2856>, subtest=-1) at tests/cpumap.c:128
Thus the perf_cpu_map__put(a) (see (1) above) frees map 'a'
a second time and causes the failure. Fix this be removing that
function call.
Output after:
[root@m35lp76 perf]# ./perf test -Fvvvvv 52
52: Merge cpu map :
--- start ---
cpumask list: 1-2,4-5,7
---- end ----
Merge cpu map: Ok
[root@m35lp76 perf]#
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf with CoreSight fails to record trace data with command:
perf record -e cs_etm/@tmc_etr0/u --per-thread ls
failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a
directory)/perf/
This failure is root caused with the commit 1dc925568f01 ("perf
parse: Add a deep delete for parse event terms").
The log shows, cs_etm fails to parse the sink attribution; cs_etm event
relies on the event configuration to pass sink name, but the event
specific configuration data cannot be passed properly with flow:
get_config_terms()
ADD_CONFIG_TERM(DRV_CFG, term->val.str);
__t->val.str = term->val.str;
`> __t->val.str is assigned to term->val.str;
parse_events_terms__purge()
parse_events_term__delete()
zfree(&term->val.str);
`> term->val.str is freed and assigned to NULL pointer;
cs_etm_set_sink_attr()
sink = __t->val.str;
`> sink string has been freed.
To fix this issue, in the function get_config_terms(), this patch
changes to use strdup() for allocation a new duplicate string rather
than directly assignment string pointer.
This patch addes a new field 'free_str' in the data structure
perf_evsel_config_term; 'free_str' is set to true when the union is used
as a string pointer; thus it can tell perf_evsel__free_config_terms() to
free the string.
Fixes: 1dc925568f01 ("perf parse: Add a deep delete for parse event terms")
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
[ Use zfree() in perf_evsel__free_config_terms ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
:# modified: tools/perf/util/evsel_config.h
|
|
The struct perf_evsel_config_term::val is a union which contains fields
'callgraph', 'drv_cfg' and 'branch' as string pointers. This leads to
the complex code logic for handling every type's string separately, and
it's hard to release string as a general way.
This patch refactors the structure to add a common field 'str' in the
'val' union as string pointer and remove the other three fields
'callgraph', 'drv_cfg' and 'branch'. Without passing field name, the
patch simplifies the string handling with macro ADD_CONFIG_TERM_STR()
for string pointer assignment.
This patch fixes multiple warnings of line over 80 characters detected
by checkpatch tool.
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Ftrace is one of the last W^X violators (after this only KLP is
left). These patches move it over to the generic text_poke()
interface and thereby get rid of this oddity. This requires a
surprising amount of surgery, by Peter Zijlstra.
- x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to
count certain types of events that have a special, quirky hw ABI
(by Kim Phillips)
- kprobes fixes by Masami Hiramatsu
Lots of tooling updates as well, the following subcommands were
updated: annotate/report/top, c2c, clang, record, report/top TUI,
sched timehist, tests; plus updates were done to the gtk ui, libperf,
headers and the parser"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
perf/x86/amd: Add support for Large Increment per Cycle Events
perf/x86/amd: Constrain Large Increment per Cycle events
perf/x86/intel/rapl: Add Comet Lake support
tracing: Initialize ret in syscall_enter_define_fields()
perf header: Use last modification time for timestamp
perf c2c: Fix return type for histogram sorting comparision functions
perf beauty sockaddr: Fix augmented syscall format warning
perf/ui/gtk: Fix gtk2 build
perf ui gtk: Add missing zalloc object
perf tools: Use %define api.pure full instead of %pure-parser
libperf: Setup initial evlist::all_cpus value
perf report: Fix no libunwind compiled warning break s390 issue
perf tools: Support --prefix/--prefix-strip
perf report: Clarify in help that --children is default
tools build: Fix test-clang.cpp with Clang 8+
perf clang: Fix build with Clang 9
kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
tools lib: Fix builds when glibc contains strlcpy()
perf report/top: Make 'e' visible in the help and make it toggle showing callchains
perf report/top: Do not offer annotation for symbols without samples
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timekeeping and timers departement provides:
- Time namespace support:
If a container migrates from one host to another then it expects
that clocks based on MONOTONIC and BOOTTIME are not subject to
disruption. Due to different boot time and non-suspended runtime
these clocks can differ significantly on two hosts, in the worst
case time goes backwards which is a violation of the POSIX
requirements.
The time namespace addresses this problem. It allows to set offsets
for clock MONOTONIC and BOOTTIME once after creation and before
tasks are associated with the namespace. These offsets are taken
into account by timers and timekeeping including the VDSO.
Offsets for wall clock based clocks (REALTIME/TAI) are not provided
by this mechanism. While in theory possible, the overhead and code
complexity would be immense and not justified by the esoteric
potential use cases which were discussed at Plumbers '18.
The overhead for tasks in the root namespace (ie where host time
offsets = 0) is in the noise and great effort was made to ensure
that especially in the VDSO. If time namespace is disabled in the
kernel configuration the code is compiled out.
Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
feature and kept on for more than a year addressing review
comments, finding better solutions. A pleasant experience.
- Overhaul of the alarmtimer device dependency handling to ensure
that the init/suspend/resume ordering is correct.
- A new clocksource/event driver for Microchip PIT64
- Suspend/resume support for the Hyper-V clocksource
- The usual pile of fixes, updates and improvements mostly in the
driver code"
* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
alarmtimer: Use wakeup source from alarmtimer platform device
alarmtimer: Make alarmtimer platform device child of RTC device
alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
hrtimer: Add missing sparse annotation for __run_timer()
lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
clocksource/drivers/exynos_mct: Rename Exynos to lowercase
clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
clocksource/drivers/bcm2835_timer: Fix memory leak of timer
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
...
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-01-22
The following pull-request contains BPF updates for your *net-next* tree.
We've added 92 non-merge commits during the last 16 day(s) which contain
a total of 320 files changed, 7532 insertions(+), 1448 deletions(-).
The main changes are:
1) function by function verification and program extensions from Alexei.
2) massive cleanup of selftests/bpf from Toke and Andrii.
3) batched bpf map operations from Brian and Yonghong.
4) tcp congestion control in bpf from Martin.
5) bulking for non-map xdp_redirect form Toke.
6) bpf_send_signal_thread helper from Yonghong.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix perf to include libbpf header files with the bpf/ prefix, to
be consistent with external users of the library.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Using .st_ctime clobbers the timestamp information in perf report header
whenever any operation is done with the file. Even tar-ing and untar-ing
the perf.data file (which preserves the file last modification timestamp)
doesn't prevent that:
[Michael@Diego tmp]$ ls -l perf.data
-> -rw-------. 1 Michael Michael 169888 Dec 2 15:23 perf.data
[Michael@Diego tmp]$ perf report --header-only
# ========
-> # captured on : Mon Dec 2 15:23:42 2019
[...]
[Michael@Diego tmp]$ tar c perf.data | xz > perf.data.tar.xz
[Michael@Diego tmp]$ mkdir aaa
[Michael@Diego tmp]$ cd aaa
[Michael@Diego aaa]$ xzcat ../perf.data.tar.xz | tar x
[Michael@Diego aaa]$ ls -l -a
total 172
drwxrwxr-x. 2 Michael Michael 23 Jan 14 11:26 .
drwxrwxr-x. 6 Michael Michael 4096 Jan 14 11:26 ..
-> -rw-------. 1 Michael Michael 169888 Dec 2 15:23 perf.data
[Michael@Diego aaa]$ perf report --header-only
# ========
-> # captured on : Tue Jan 14 11:26:16 2020
[...]
When using .st_mtime instead, correct information is printed:
[Michael@Diego aaa]$ ~/acme/tools/perf/perf report --header-only
# ========
-> # captured on : Mon Dec 2 15:23:42 2019
[...]
Signed-off-by: Michael Petlan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
LPU-Reference: [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Commit 722ddfde366f ("perf tools: Fix time sorting") changed - correctly
so - hist_entry__sort to return int64. Unfortunately several of the
builtin-c2c.c comparison routines only happened to work due the cast
caused by the wrong return type.
This causes meaningless ordering of both the cacheline list, and the
cacheline details page. E.g a simple:
perf c2c record -a sleep 3
perf c2c report
will result in cacheline table like
=================================================
Shared Data Cache Line Table
=================================================
#
# ------- Cacheline ---------- Total Tot - LLC Load Hitm - - Store Reference - - Load Dram - LLC Total - Core Load Hit - - LLC Load Hit -
# Index Address Node PA cnt records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rmt
# ..... .............. .... ...... ....... ...... ..... ..... ... .... ..... ...... ...... .... ...... ..... ..... ..... ... .... .......
0 0x7f0d27ffba00 N/A 0 52 0.12% 13 6 7 12 12 0 0 7 14 40 4 16 0 0 0
1 0x7f0d27ff61c0 N/A 0 6353 14.04% 1475 801 674 779 779 0 0 718 1392 5574 1299 1967 0 115 0
2 0x7f0d26d3ec80 N/A 0 71 0.15% 16 4 12 13 13 0 0 12 24 58 1 20 0 9 0
3 0x7f0d26d3ec00 N/A 0 98 0.22% 23 17 6 19 19 0 0 6 12 79 0 40 0 10 0
i.e. with the list not being ordered by Total Hitm.
Fixes: 722ddfde366f ("perf tools: Fix time sorting")
Signed-off-by: Andres Freund <[email protected]>
Tested-by: Michael Petlan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected] # v3.16+
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The sockaddr related examples given in
`tools/perf/examples/bpf/augmented_syscalls.c` almost always use `long`s
to represent most of their fields.
However, `size_t syscall_arg__scnprintf_sockaddr(..)` has a `scnprintf`
call that uses `"%#x"` as format string.
This throws a warning (whenever the syscall argument is `unsigned
long`).
Added `l` identifier to indicate that the `arg->value` is an unsigned
long.
Not sure about the complications of this with x86 though.
Signed-off-by: Cengiz Can <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Ravi Bangoria reported an issue when doing the gtk2 feature detection on
Fedora 31, where some types got deprecated:
/usr/include/gtk-2.0/gtk/gtktypeutils.h:236:1: error: ‘GTypeDebugFlags’ is deprecated [-Werror=deprecated-declarations]
236 | void gtk_type_init (GTypeDebugFlags debug_flags);
Fix this for perf by allowing the compile to pass with deprecated
symbols via the -Wno-deprecated-declarations compiler directive.
Reported-by: Ravi Bangoria <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jelle van der Waa <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When we moved zalloc.o to the library we missed gtk library which needs
it compiled in, otherwise the missing __zfree symbol will cause the
library to fail to load.
Adding the zalloc object to the gtk library build.
Fixes: 7f7c536f23e6 ("tools lib: Adopt zalloc()/zfree() from tools/perf")
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jelle van der Waa <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|