Age | Commit message (Collapse) | Author | Files | Lines |
|
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Reduce scope of build_id__mark_dso_hit_ops() to the scope of function
perf_session__list_build_ids, its only use, and use perf_tool__init()
for the default values. Move perf_event__exit_del_thread() to event.[ch]
so it can be used in builtin-buildid-list.c.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Reduce the scope of the tool from global/static to just that of the
cmd_kmem function where the session is scoped. Use the perf_tool__init()
to initialize default values.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add init function that behaves like perf_tool__fill_defaults() but
assumes all values haven't been initialized.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The aim here is to eventually make perf_tool__fill_defaults() an init
function so that the tools struct is more const.
Create a tool.c to go along with tool.h. Move perf_tool__fill_defaults()
out of session.c into tool.c along with the default stub values. Add
perf_tool__compressed_is_stub() for a test in
perf_session__process_user_event().
perf_session__process_compressed_event() is only used from being default
initialized so migrate into tool.c.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Adrian Hunter <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
struct s390_cpumsf_synth was likely cargo culted from other auxtrace
examples. It has no users, so remove.
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add perf_session__deliver_synth_attr_event that synthesizes a
perf_record_header_attr event with one id. Remove use of
perf_event__synthesize_attr that necessitates the use of the dummy
tool in order to pass the session.
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Adrian Hunter <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ilkka Koskinen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The processing of leader samples would turn an individual sample with
a group of read values into multiple samples. 'perf inject' would pass
through the additional samples increasing the output data file size:
$ perf record -g -e "{instructions,cycles}:S" -o perf.orig.data true
$ perf script -D -i perf.orig.data | sed -e 's/perf.orig.data/perf.data/g' > orig.txt
$ perf inject -i perf.orig.data -o perf.new.data
$ perf script -D -i perf.new.data | sed -e 's/perf.new.data/perf.data/g' > new.txt
$ diff -u orig.txt new.txt
--- orig.txt 2024-07-29 14:29:40.606576769 -0700
+++ new.txt 2024-07-29 14:30:04.142737434 -0700
...
[email protected] [0x30]: event: 3
[email protected] [0xd0]: event: 9
+.
+. ... raw event: size 208 bytes
+. 0000: 09 00 00 00 01 00 d0 00 fc 72 01 86 ff ff ff ff .........r......
+. 0010: 74 7d 2c 00 74 7d 2c 00 fb c3 79 f9 ba d5 05 00 t},.t},...y.....
+. 0020: e6 cb 1a 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
+. 0030: 02 00 00 00 00 00 00 00 76 01 00 00 00 00 00 00 ........v.......
+. 0040: e6 cb 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+. 0050: 62 18 00 00 00 00 00 00 f6 cb 1a 00 00 00 00 00 b...............
+. 0060: 00 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00 ................
+. 0070: 80 ff ff ff ff ff ff ff fc 72 01 86 ff ff ff ff .........r......
+. 0080: f3 0e 6e 85 ff ff ff ff 0c cb 7f 85 ff ff ff ff ..n.............
+. 0090: bc f2 87 85 ff ff ff ff 44 af 7f 85 ff ff ff ff ........D.......
+. 00a0: bd be 7f 85 ff ff ff ff 26 d0 7f 85 ff ff ff ff ........&.......
+. 00b0: 6d a4 ff 85 ff ff ff ff ea 00 20 86 ff ff ff ff m......... .....
+. 00c0: 00 fe ff ff ff ff ff ff 57 14 4f 43 fc 7e 00 00 ........W.OC.~..
+
+1642373909693435 0xc550 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 2915700/2915700: 0xffffffff860172fc period: 1 addr: 0
+... FP chain: nr:12
+..... 0: ffffffffffffff80
+..... 1: ffffffff860172fc
+..... 2: ffffffff856e0ef3
+..... 3: ffffffff857fcb0c
+..... 4: ffffffff8587f2bc
+..... 5: ffffffff857faf44
+..... 6: ffffffff857fbebd
+..... 7: ffffffff857fd026
+..... 8: ffffffff85ffa46d
+..... 9: ffffffff862000ea
+..... 10: fffffffffffffe00
+..... 11: 00007efc434f1457
+... sample_read:
+.... group nr 2
+..... id 00000000001acbe6, value 0000000000000176, lost 0
+..... id 00000000001acbf6, value 0000000000001862, lost 0
+
[email protected] [0x30]: event: 3
...
This behavior is incorrect as in the case above 'perf inject' should
have done nothing. Fix this behavior by disabling separating samples
for a tool that requests it. Only request this for `perf inject` so as
to not affect other perf tools. With the patch and the test above
there are no differences between the orig.txt and new.txt.
Fixes: e4caec0d1af3d608 ("perf evsel: Add PERF_SAMPLE_READ sample related processing")
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now default is to fold everything but it only shows the name of the
top-level data type which is not very useful. Instead just expand the
top level entry so that it can show the layout at a higher level.
Annotate type: 'struct task_struct' (4 samples)
Percent Offset Size Field
- 100.00 0 9792 struct task_struct { ◆
+ 0.50 0 24 struct thread_info thread_info; ▒
0.00 24 4 unsigned int __state; ▒
0.00 32 8 void* stack; ▒
+ 0.00 40 4 refcount_t usage; ▒
0.00 44 4 unsigned int flags; ▒
0.00 48 4 unsigned int ptrace; ▒
0.00 52 4 int on_cpu; ▒
+ 0.00 56 16 struct __call_single_node wake_entry; ▒
0.00 72 4 unsigned int wakee_flips; ▒
0.00 80 8 long unsigned int wakee_flip_decay_ts;▒
0.00 88 8 struct task_struct* last_wakee; ▒
0.00 96 4 int recent_used_cpu; ▒
0.00 100 4 int wake_cpu; ▒
0.00 104 4 int on_rq; ▒
0.00 108 4 int prio; ▒
0.00 112 4 int static_prio; ▒
0.00 116 4 int normal_prio; ▒
0.00 120 4 unsigned int rt_priority; ▒
+ 0.00 128 256 struct sched_entity se; ▒
+ 0.00 384 48 struct sched_rt_entity rt; ▒
+ 0.00 432 224 struct sched_dl_entity dl; ▒
0.00 656 8 struct sched_class* sched_class; ▒
...
Committer testing:
# perf mem record -a sleep 5s
# perf annotate --group --data-type=pthread_mutex_t
Annotate type: 'pthread_mutex_t' (13 samples)
Percent Offset Size Field
- 100.00 0 40 pthread_mutex_t { ▒
- 100.00 0 40 struct __pthread_mutex_s __data { ▒
39.45 0 4 int __lock; ▒
0.00 4 4 unsigned int __count; ▒
7.80 8 4 int __owner; ▒
6.88 12 4 unsigned int __nusers; ▒
45.87 16 4 int __kind; ▒
0.00 20 2 short int __spins; ▒
0.00 22 2 short int __elision; ▒
+ 0.00 24 16 __pthread_list_t __list; ▒
}; ▒
0.00 0 0 char[] __size; ▒
39.45 0 8 long int __align;
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like 'perf report', use 'e' or 'E' key to toggle folding the current
entry so that it can control displaying child entries.
Note I didn't add the 'c' and 'C' key to collapse the entry because it's
also handled with the 'e'/'E' since it toggles the state.
Committer testing:
Do some 'perf mem record' for some workload of the whole system, using
the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide
profiling, etc) and then:
# perf annotate --skip-empty --data-type=pthread_mutex_t
That, by default, will start as --tui, then press 'E' to see the whole
struct unfolded, etc.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like in the hists browser, it should support folding current entry so
that it can hide unwanted details in some data structures.
The folded entries will be displayed with the '+' sign, while unfolded
entries will have the '-' sign.
Entries that have no children will not show any signs.
Annotate type: 'struct socket' (1 samples)
Percent Offset Size Field
- 100.00 0 128 struct socket { ◆
0.00 0 4 socket_state state; ▒
0.00 4 2 short int type; ▒
0.00 8 8 long unsigned int flags; ▒
0.00 16 8 struct file* file; ▒
100.00 24 8 struct sock* sk; ▒
0.00 32 8 struct proto_ops* ops; ▒
- 0.00 64 64 struct socket_wq wq { ▒
- 0.00 64 24 wait_queue_head_t wait { ▒
+ 0.00 64 4 spinlock_t lock; ▒
- 0.00 72 16 struct list_head head { ▒
0.00 72 8 struct list_head* next; ▒
0.00 80 8 struct list_head* prev; ▒
}; ▒
}; ▒
0.00 88 8 struct fasync_struct* fasync_list; ▒
0.00 96 8 long unsigned int flags; ▒
+ 0.00 104 16 struct callback_head rcu; ▒
}; ▒
}; ▒
This just adds the display logic for folding, actually folding action
will be implemented in the next patch.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Cache home agent (CHA) events were setting the low rather than high
config1 bits. SNR was using CLX CHA events, however its CHA is similar
to ICX so remove the events.
Incorporate the updates in:
https://github.com/intel/perfmon/pull/215
https://github.com/intel/perfmon/pull/216
Fixes: 4cc49942444e958b ("perf vendor events: Update cascadelakex events/metrics")
Closes: https://lore.kernel.org/linux-perf-users/CAPhsuW4nem9XZP+b=sJJ7kqXG-cafz0djZf51HsgjCiwkGBA+A@mail.gmail.com/
Reported-by: Song Liu <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Co-authored-by: Weilin Wang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The bpf_get_stackid() helper returns a signed type to check whether it
failed to get a stacktrace or not. But it saved the result in u32 and
checked if the value is negative.
376 if (needs_callstack) {
377 pelem->stack_id = bpf_get_stackid(ctx, &stacks,
378 BPF_F_FAST_STACK_CMP | stack_skip);
--> 379 if (pelem->stack_id < 0)
./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin()
warn: unsigned 'pelem->stack_id' is never less than zero.
Let's change the type to s32 instead.
Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In get_member_overhead(), k is updated when it has a entry in the
histogram. But the entry->hists array is allocated with the number of
evsel in the group. So the k should be reset when it iterates the event
using for_each_group_evsel(), otherwise it'd crash due to a buffer
overflow.
Fixes: cb1898f58e0f175d ("perf annotate-data: Support --skip-empty option")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current description for the AUX trace buffer size is misleading. When a
user specifies the option '-m,512M', it represents a size value in bytes
(512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).
Make the document clear that the normal buffer and the AUX tracing
buffer share the same semantics. Syncs the documents for consistent
text.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Similarly to other subcommands (like report, top), it would be handy to
provide a path for addr2line command.
Signed-off-by: Martin Liska <[email protected]>
Cc: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instead of explicitely initializing just the .name and .alias_name,
use struct member named initialization of just the non-null -name field,
the compiler will initialize all the other non-explicitely initialized
fields to NULL.
This makes the code more robust, avoiding the error recently fixed when
the .alias_name was used and contained a random value.
Reviewed-by: Veronika Molnarova <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Radostin Stoyanov <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Noticed with:
1 6.22 debian:experimental-x-mipsel : FAIL gcc version 13.2.0 (Debian 13.2.0-25)
builtin-daemon.c: In function 'cmd_session_list':
builtin-daemon.c:691:35: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=]
Use inttypes.h's PRIu64 to deal with that.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/ZplvH21aQ8pzmza_@x1
Signed-off-by: Namhyung Kim <[email protected]>
|
|
The --skip-empty option is to hide dummy events in a group. Like other
output mode in 'perf report' and 'perf annotate', the data-type
profiling output should support the option.
Committer testing:
With dummy:
root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
event[0] = cpu_atom/mem-loads,ldlat=30/P
event[1] = cpu_atom/mem-stores/P
event[2] = dummy:u
============================================================================
Percent offset size field
100.00 100.00 0.00 0 40 pthread_mutex_t {
100.00 100.00 0.00 0 40 struct __pthread_mutex_s __data {
45.21 84.54 0.00 0 4 int __lock;
0.00 0.00 0.00 4 4 unsigned int __count;
0.00 1.83 0.00 8 4 int __owner;
5.19 10.65 0.00 12 4 unsigned int __nusers;
49.61 2.97 0.00 16 4 int __kind;
0.00 0.00 0.00 20 2 short int __spins;
0.00 0.00 0.00 22 2 short int __elision;
0.00 0.00 0.00 24 16 __pthread_list_t __list {
0.00 0.00 0.00 24 8 struct __pthread_internal_list* __prev;
0.00 0.00 0.00 32 8 struct __pthread_internal_list* __next;
};
};
0.00 0.00 0.00 0 0 char[] __size;
45.21 84.54 0.00 0 8 long int __align;
};
Skipping it:
root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
event[0] = cpu_atom/mem-loads,ldlat=30/P
event[1] = cpu_atom/mem-stores/P
============================================================================
Percent offset size field
100.00 100.00 0 40 pthread_mutex_t {
100.00 100.00 0 40 struct __pthread_mutex_s __data {
45.21 84.54 0 4 int __lock;
0.00 0.00 4 4 unsigned int __count;
0.00 1.83 8 4 int __owner;
5.19 10.65 12 4 unsigned int __nusers;
49.61 2.97 16 4 int __kind;
0.00 0.00 20 2 short int __spins;
0.00 0.00 22 2 short int __elision;
0.00 0.00 24 16 __pthread_list_t __list {
0.00 0.00 24 8 struct __pthread_internal_list* __prev;
0.00 0.00 32 8 struct __pthread_internal_list* __next;
};
};
0.00 0.00 0 0 char[] __size;
45.21 84.54 0 8 long int __align;
};
Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples):
root@number:~#
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When --group option is used, it should display all events together. But
the current logic only checks if the first (leader) event has samples or
not. Let's check the member events as well.
Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.
For example, take a look at this example.
$ ./perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
It has three events but 'path_put' function has samples only for
mem-stores (second) event.
$ sudo ./perf annotate --stdio -f path_put
Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
----------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 : ffffffffae600020: endbr64
0.00 : ffffffffae600024: nopl (%rax, %rax)
91.22 : ffffffffae600029: pushq %rbx
0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
8.78 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 : ffffffffae600039: popq %rbx
0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 : ffffffffae60003f: nop
Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples. But now it checks both
events.
Before:
$ sudo ./perf annotate --stdio -f --group path_put
(no output)
After:
$ sudo ./perf annotate --stdio -f --group path_put
Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 0.00 0.00 : ffffffffae600020: endbr64
0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax)
0.00 91.22 0.00 : ffffffffae600029: pushq %rbx
0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 0.00 0.00 : ffffffffae600039: popq %rbx
0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 0.00 0.00 : ffffffffae60003f: nop
Committer testing:
Before:
root@number:~# perf annotate --group --stdio2 clear_page_erms
root@number:~#
After:
root@number:~# perf annotate --group --stdio2 clear_page_erms
Samples: 125 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period]
clear_page_erms() /proc/kcore
Percent 0xffffffff990c6cc0 <clear_page_erms>:
endbr64
movl $0x1000,%ecx
xorl %eax,%eax
0.00 100.00 0.00 rep stosb %al, (%rdi)
← retq
int3
int3
int3
int3
nop
nop
root@number:~#
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Create a source symlink to the original source in the objdir.
This is similar to what the main kernel build script does.
Committer testing:
⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
<SNIP>
⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source
lrwxrwxrwx. 1 acme acme 41 Aug 9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf
⬢[acme@toolbox perf-tools-next]$
Signed-off-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In that case we have a set of placeholder functions, one of them uses a
'Dwarf_Addr' type that is not present as it is defined in the missing
DWARF libraries, so provide a placeholder typedef for that as well.
The build error before this patch:
In file included from util/annotate.c:28:
util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’
44 | Dwarf_Addr *offs __maybe_unused,
| ^~~~~~~~~~
make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1
make[6]: *** Waiting for unfinished jobs....
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in 'ins_lat', while the cache latency
is stored in 'weight'.
This patch reports the 'ins_lat' field for Python scripting.
Committer testing:
On a Rocket Lake Refresh Intel machine (14th gen):
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-14700K
root@number:~# perf mem record -a sleep 5
Memory events are enabled on a subset of CPUs: 16-27
[ perf record: Woken up 85 times to write data ]
[ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
root@number:~# perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~#
Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:
root@number:~# perf script --gen python
generated Python script: perf-script.py
root@number:~# vim perf-script.py
root@number:~# perf script -s perf-script.py | head -40
in trace_begin
in trace_end
root@number:~# vim perf-script.py
Reviewed-by: Adrian Hunter <[email protected]>
Signed-off-by: Zixian Cai <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ben Gainey <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Running on a:
root@x1:~# grep 'model name' -m1 /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
root@x1:~#
It skips all the tests with:
root@x1:~# perf test -vvvv LBR
97: perf record LBR tests:
--- start ---
test child forked, pid 2033388
Skip: only x86 CPUs support LBR
---- end(-2) ----
97: perf record LBR tests : Skip
root@x1:~#
Because the test checks for the /sys/devices/cpu/caps/branches file,
that isn't present as we have instead:
root@x1:~# ls -la /sys/devices/cpu*/caps/branches
-r--r--r--. 1 root root 4096 Aug 8 11:22 /sys/devices/cpu_atom/caps/branches
-r--r--r--. 1 root root 4096 Aug 8 11:21 /sys/devices/cpu_core/caps/branches
root@x1:~#
If we check as well for one of those,
/sys/devices/cpu_core/caps/branches, then we don't skip the tests and
all are run on these x86 Intel Hybrid systems as well, passing all of
them:
root@x1:~# perf test -vvvv LBR
97: perf record LBR tests:
--- start ---
test child forked, pid 2034956
LBR callgraph
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ]
LBR callgraph [Success]
LBR any branch test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ]
LBR any branch test: 8071 samples
LBR any branch test [Success]
LBR any call test
[ perf record: Woken up 23 times to write data ]
[ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ]
LBR any call test: 8092 samples
LBR any call test [Success]
LBR any ret test
[ perf record: Woken up 24 times to write data ]
[ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ]
LBR any ret test: 8093 samples
LBR any ret test [Success]
LBR any indirect call test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ]
LBR any indirect call test: 8067 samples
LBR any indirect call test [Success]
LBR any indirect jump test
[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ]
LBR any indirect jump test: 8061 samples
LBR any indirect jump test [Success]
LBR direct calls test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ]
LBR direct calls test: 8076 samples
LBR direct calls test [Success]
LBR any indirect user call test
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ]
LBR any indirect user call test: 8079 samples
LBR any indirect user call test [Success]
LBR system wide any branch test
[ perf record: Woken up 26 times to write data ]
[ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ]
LBR system wide any branch test: 9209 samples
LBR system wide any branch test [Success]
LBR system wide any call test
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ]
LBR system wide any call test: 9333 samples
LBR system wide any call test [Success]
LBR parallel any branch test
LBR parallel any call test
LBR parallel any ret test
LBR parallel any indirect call test
LBR parallel any indirect jump test
LBR parallel direct calls test
LBR parallel system wide any branch test
LBR parallel any indirect user call test
LBR parallel system wide any call test
[ perf record: Woken up 9 times to write data ]
[ perf record: Woken up 51 times to write data ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Woken up 5 times to write data ]
[ perf record: Woken up 559 times to write data ]
[ perf record: Woken up 14 times to write data ]
[ perf record: Woken up 17 times to write data ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ]
[ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ]
[ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ]
[ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ]
[ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ]
[ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ]
[ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ]
LBR parallel any indirect user call test: 1909 samples
[ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ]
LBR parallel any indirect user call test [Success]
[ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ]
LBR parallel any indirect jump test: 4311 samples
LBR parallel any indirect jump test [Success]
LBR parallel direct calls test: 3033 samples
LBR parallel direct calls test [Success]
LBR parallel any indirect call test: 1742 samples
LBR parallel any indirect call test [Success]
LBR parallel any call test: 4827 samples
LBR parallel any call test [Success]
LBR parallel any branch test: 6130 samples
LBR parallel any branch test [Success]
LBR parallel system wide any branch test: 29662 samples
LBR parallel any ret test: 3971 samples
LBR parallel any ret test [Success]
LBR parallel system wide any branch test [Success]
LBR parallel system wide any call test: 29576 samples
LBR parallel system wide any call test [Success]
---- end(0) ----
97: perf record LBR tests : Ok
root@x1:~#
Reviewed-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adds coverage for LBR operations and LBR callgraph.
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Anne Macedo <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
and map members are reference counted. Ensure these values use a _get
routine to increment the reference counts and use map_symbol__exit() to
release the reference counts.
Do similar for 'struct thread's prev_lbr_cursor, but save the size of
the prev_lbr_cursor array so that it may be iterated.
Ensure that when stitch_nodes are placed on the free list the
map_symbols are exited.
Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
list_splice_init(), so the whole list is moved and nodes aren't leaked.
A reproduction of the memory leaks is possible with a leak sanitizer
build in the perf report command of:
```
$ perf record -e cycles --call-graph lbr perf test -w thloop
$ perf report --stitch-lbr
```
Reviewed-by: Kan Liang <[email protected]>
Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack")
Signed-off-by: Ian Rogers <[email protected]>
[ Basic tests after applying the patch, repeating the example above ]
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Anne Macedo <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Commit 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard
support") adds a test case "PMU cmdline match" that covers PMU name
wildcard support provided by function perf_pmu__match().
The test works with a wide range of supported combinations of PMU name
matching but omits the case that if the perf_pmu__match() cannot match
the PMU name to the wildcard, it tries to match its alias. However, this
variable is not set up, causing the test case to fail when run with
subprocesses or to segfault if run as a single process.
./perf test -vv 9
9: Sysfs PMU tests :
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
9.6: PMU cmdline match : FAILED!
./perf test -F 9
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
Segmentation fault (core dumped)
Initialize the PMU alias to null for all tests of perf_pmu__match()
as this functionality is not being tested and the alias matching works
exactly the same as the matching of the PMU name.
./perf test -F 9
9.1: Parsing with PMU format directory : Ok
9.2: Parsing with PMU event : Ok
9.3: PMU event names : Ok
9.4: PMU name combining : Ok
9.5: PMU name comparison : Ok
9.6: PMU cmdline match : Ok
Fixes: 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard support")
Signed-off-by: Veronika Molnarova <[email protected]>
Cc: James Clark <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Radostin Stoyanov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In 'perf ftrace profile sleep 0.1' we know that we'll have an specific
kernel function that will take a bit more than 0.1 seconds and will take
place just one time, so we can add a check for that so that we validate
more than just the presence of some functions in the profile.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
$ sudo ./perf test ftrace -vv
86: perf ftrace tests:
--- start ---
test child forked, pid 1772223
perf ftrace list test
syscalls for sleep:
__x64_sys_nanosleep
__ia32_sys_nanosleep
__x64_sys_clock_nanosleep
__ia32_sys_clock_nanosleep
perf ftrace list test [Success]
perf ftrace trace test
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | __x64_sys_clock_nanosleep() {
0) | common_nsleep() {
0) | hrtimer_nanosleep() {
0) | do_nanosleep() {
perf ftrace trace test [Success]
perf ftrace latency test
target function: __x64_sys_clock_nanosleep
# DURATION | COUNT | GRAPH |
32 - 64 ms | 1 | ############################################## |
perf ftrace latency test [Success]
perf ftrace profile test
# Total (us) Avg (us) Max (us) Count Function
100136.400 100136.400 100136.400 1 __x64_sys_clock_nanosleep
100135.200 100135.200 100135.200 1 common_nsleep
100134.700 100134.700 100134.700 1 hrtimer_nanosleep
100133.700 100133.700 100133.700 1 do_nanosleep
100130.600 100130.600 100130.600 1 schedule
166.868 55.623 80.299 3 scheduler_tick
5.926 5.926 5.926 1 native_smp_send_reschedule
301.941 301.941 301.941 1 __x64_sys_execve
295.786 295.786 295.786 1 do_execveat_common.isra.0
71.397 35.699 46.403 2 bprm_execve
2.519 1.260 1.547 2 sched_mm_cid_before_execve
1.098 0.549 0.686 2 sched_mm_cid_after_execve
perf ftrace profile test [Success]
---- end(0) ----
86: perf ftrace tests : Ok
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The die_get_typename() would resolve typedef and get to the original
type. But sometimes the original type is a struct without name and it
makes the output confusing and hard to read.
This is a diff of perf report -s type before and after the change.
New types such as atomic{,64}_t and sigset_t appeared and the portion
of unnamed struct was reduced. Also u32, u64 and size_t were splitted
from the base types.
--- b 2024-08-01 17:02:34.307809952 -0700
+++ a 2024-08-07 14:17:05.245853999 -0700
- 2.40% long unsigned int
+ 2.26% long unsigned int
- 1.56% unsigned int
+ 1.27% unsigned int
- 0.98% struct
- 0.79% long long unsigned int
+ 0.58% long long unsigned int
+ 0.36% struct
+ 0.27% atomic64_t
+ 0.22% u32
+ 0.21% u64
+ 0.19% atomic_t
+ 0.13% size_t
- 0.08% struct seqcount_spinlock
+ 0.08% seqcount_spinlock_t
+ 0.08% sigset_t
+ 0.08% __poll_t
Let's use the typedef name directly and the resolved to get the size of
the type.
Committer testing:
root@x1:~# diff -u before after | head -30
--- before 2024-08-08 09:35:13.917325041 -0300
+++ after 2024-08-08 09:37:35.312257905 -0300
@@ -10,25 +10,27 @@
# ........ .........
#
79.40% (unknown)
- 2.28% union
1.96% (stack operation)
- 1.24% struct
+ 1.87% pthread_mutex_t
0.99% u32[]
- 0.92% unsigned int
0.77% struct task_struct
+ 0.75% U32
0.75% struct pcpu_hot
0.63% struct qspinlock
+ 0.61% atomic_t
0.59% struct list_head
- 0.58% int
0.53% struct cfs_rq
0.51% BYTE*
- 0.48% unsigned char
+ 0.48% BYTE
0.48% long unsigned int
0.46% struct rq
0.41% struct worker
0.41% struct memcg_vmstats_percpu
+ 0.41% pthread_cond_t
0.37% _Bool
+ 0.36% int
root@x1:~#
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.
Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:
```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
#0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x564d3400d10b in map__get util/map.h:186
#2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
#3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
#4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
#5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
#8 0x564d3403ba52 in do_flush util/ordered-events.c:245
#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
#11 0x564d34032480 in perf_session__process_event util/session.c:1877
#12 0x564d340336ad in reader__read_event util/session.c:2399
#13 0x564d34033fdc in reader__process_events util/session.c:2448
#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
#15 0x564d34033fdc in perf_session__process_events util/session.c:2661
#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
#21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```
Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.
Committer testing:
$ sudo dnf install libasan
$ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up changes from:
0f9ca80fa4f9 fs: Add initial atomic write support info to statx
f9af549d1fd3 fs: export mount options via statmount()
0a3deb11858a fs: Allow listmount() in foreign mount namespace
09b31295f833 fs: export the mount ns id via statmount
d04bccd8c19d listmount: allow listing in reverse order
bfc69fd05ef9 fs/procfs: add build ID fetching to PROCMAP_QUERY API
ed5d583a88a9 fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps
This should be used to beautify FS syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h
diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h
diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
To pick up changes from:
d25a92ccae6b net/smc: Introduce IPPROTO_SMC
060f4ba6e403 io_uring/net: move charging socket out of zc io_uring
bb6aaf736680 net: Split a __sys_listen helper for io_uring
dc2e77979412 net: Split a __sys_bind helper for io_uring
This should be used to beautify socket syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
And arch syscall tables to pick up changes from:
b1e31c134a8a powerpc: restore some missing spu syscalls
d3882564a77c syscalls: fix compat_sys_io_pgetevents_time64 usage
54233a425403 uretprobe: change syscall number, again
63ded110979b uprobe: Change uretprobe syscall scope and number
9142be9e6443 x86/syscall: Mark exit[_group] syscall handlers __noreturn
9aae1baa1c5d x86, arm: Add missing license tag to syscall tables files
5c28424e9a34 syscalls: Fix to add sys_uretprobe to syscall.tbl
190fec72df4a uprobe: Wire up uretprobe system call
This should be used to beautify syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
To pick up changes from:
f05c1ffc2745 ALSA: pcm: reinvent the stream synchronization ID API
This should be used to beautify sound syscall arguments and it addresses
these tools/perf build warnings:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Please see tools/include/uapi/README for details (it's in the first patch
of this series).
Cc: Jaroslav Kysela <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
|
|
To pick a patch that albeit being for tools/perf/ directory went thru a
different tree and ended up breaking some recent tests introduced in the
perf-tools-next tree to validate duplicate events in the JSON
performance event files.
Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Duplicate event names break invariants in 'perf list'. Assert that an
event name isn't duplicated so that broken JSON won't build.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Charles Ci-Jyun Wu <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guilherme Amadio <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inochi Amaoto <[email protected]>
Cc: James Clark <[email protected]>
Cc: Ji Sheng Teoh <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Locus Wei-Han Chen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xu Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
OP_SPEC is repeated twice in the file which will break invariants in
'perf list' as discussed in this thread:
https://lore.kernel.org/linux-perf-users/[email protected]/
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Charles Ci-Jyun Wu <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guilherme Amadio <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inochi Amaoto <[email protected]>
Cc: James Clark <[email protected]>
Cc: Ji Sheng Teoh <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Locus Wei-Han Chen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xu Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Switch from $? (all the prerequisites that are newer than the target)
to $^ (all the prerequisites) as touching jevents.py will mean that
empty-pmu-events.c won't be passed to the diff command breaking the
build.
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Charles Ci-Jyun Wu <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guilherme Amadio <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inochi Amaoto <[email protected]>
Cc: James Clark <[email protected]>
Cc: Ji Sheng Teoh <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Locus Wei-Han Chen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xu Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Building with JEVENTS_ARCH=all builds all CPU types and allows things
like assertions to check the validity of the input JSON.
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Charles Ci-Jyun Wu <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guilherme Amadio <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inochi Amaoto <[email protected]>
Cc: James Clark <[email protected]>
Cc: Ji Sheng Teoh <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Locus Wei-Han Chen <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xu Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like in 'perf report', we want to hide empty events in the 'perf annotate'
output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d
Committer testing:
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#
Before:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 0.00 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
After:
root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This is a preparation to support skipping empty events.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write()
properly.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a common options section and move some items to the section. Also
add description of new options to report options.
Suggested-by: Ian Rogers <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid dropping some of the internal pseudo-extensions, which
breaks *envcfg dependency parsing
- The kernel entry address is now aligned in purgatory, which avoids a
misaligned load that can lead to crash on systems that don't support
misaligned accesses early in boot
- The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
perf JSON configurations, one of them been updated to
FW_SFENCE_VMA_ASID_SENT
- The starfive cache driver is now restricted to 64-bit systems, as it
isn't 32-bit clean
- A fix for to avoid aliasing legacy-mode perf counters with software
perf counters
- VM_FAULT_SIGSEGV is now handled in the page fault code
- A fix for stalls during CPU hotplug due to IPIs being disabled
- A fix for memblock bounds checking. This manifests as a crash on
systems with discontinuous memory maps that have regions that don't
fit in the linear map
* tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix linear mapping checks for non-contiguous memory regions
RISC-V: Enable the IPI before workqueue_online_cpu()
riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
perf: riscv: Fix selecting counters in legacy mode
cache: StarFive: Require a 64-bit system
perf arch events: Fix duplicate RISC-V SBI firmware event name
riscv/purgatory: align riscv_kernel_entry
riscv: cpufeature: Do not drop Linux-internal extensions
|