aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/ui/browsers
AgeCommit message (Collapse)AuthorFilesLines
2024-08-21perf annotate-data: Show offset and size in hexNamhyung Kim1-2/+2
It'd be better to have them in hex to check cacheline alignment. Percent offset size field 100.00 0 0x1c0 struct cfs_rq { 0.00 0 0x10 struct load_weight load { 0.00 0 0x8 long unsigned int weight; 0.00 0x8 0x4 u32 inv_weight; }; 0.00 0x10 0x4 unsigned int nr_running; 14.56 0x14 0x4 unsigned int h_nr_running; 0.00 0x18 0x4 unsigned int idle_nr_running; 0.00 0x1c 0x4 unsigned int idle_h_nr_running; ... Committer notes: Justification from Namhyung when asked about why it would be "better": Cache line sizes are power of 2 so it'd be natural to use hex and check whether an offset is in the same boundary. Also 'perf annotate' shows instruction offsets in hex. > > Maybe this should be selectable? I can add an option and/or a config if you want. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf annotate: Display the branch counter histogramKan Liang2-3/+18
Display the branch counter histogram in the annotation view. Press 'B' to display the branch counter's abbreviation list as well. Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf report: Display the branch counter histogramKan Liang1-1/+16
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged branch counter events. They are shown right after all the cycle-related annotations. Extend the 'struct block_info' to store and pass the branch counter related information. The annotation_br_cntr_entry() is to print the histogram of each branch counter event. If the number of logged events is less than 4, the exact number of the abbr name is printed. Otherwise, using '+' to stands for more than 3 events. Assume the number of logged events is less than 4. The annotation_br_cntr_abbr_list() prints the branch counter's abbreviation list. Press 'B' to display the list in the TUI mode. $ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter $ perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }' # Event count (approx.): 1610046 # # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 |A |- | ... 25.27% 1.1M 0.00% 2 |AA |- | ... 15.61% 667.2K 0.00% 1 |A |- | ... 0.16% 6.9K 0.81% 575 |A |- | ... 0.16% 6.8K 1.38% 977 |AA |- | ... 0.16% 6.8K 0.04% 28 |AA |B | ... 0.15% 6.6K 1.33% 946 |A |- | ... 0.11% 4.5K 0.06% 46 |AAA+|- | ... 0.10% 4.4K 0.88% 624 |A |- | ... 0.09% 3.7K 0.74% 524 |AAA+|B | ... With -v applied, # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 A=1 ,B=- ... 25.27% 1.1M 0.00% 2 A=2 ,B=- ... 15.61% 667.2K 0.00% 1 A=1 ,B=- ... 0.16% 6.9K 0.81% 575 A=1 ,B=- ... 0.16% 6.8K 1.38% 977 A=2 ,B=- ... 0.16% 6.8K 0.04% 28 A=2 ,B=1 ... 0.15% 6.6K 1.33% 946 A=1 ,B=- ... 0.11% 4.5K 0.06% 46 A=3+,B=- ... 0.10% 4.4K 0.88% 624 A=1 ,B=- ... 0.09% 3.7K 0.74% 524 A=3+,B=1 ... Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-12perf annotate-data: Show first-level children by default in TUINamhyung Kim1-2/+10
Now default is to fold everything but it only shows the name of the top-level data type which is not very useful. Instead just expand the top level entry so that it can show the layout at a higher level. Annotate type: 'struct task_struct' (4 samples) Percent Offset Size Field - 100.00 0 9792 struct task_struct { ◆ + 0.50 0 24 struct thread_info thread_info; ▒ 0.00 24 4 unsigned int __state; ▒ 0.00 32 8 void* stack; ▒ + 0.00 40 4 refcount_t usage; ▒ 0.00 44 4 unsigned int flags; ▒ 0.00 48 4 unsigned int ptrace; ▒ 0.00 52 4 int on_cpu; ▒ + 0.00 56 16 struct __call_single_node wake_entry; ▒ 0.00 72 4 unsigned int wakee_flips; ▒ 0.00 80 8 long unsigned int wakee_flip_decay_ts;▒ 0.00 88 8 struct task_struct* last_wakee; ▒ 0.00 96 4 int recent_used_cpu; ▒ 0.00 100 4 int wake_cpu; ▒ 0.00 104 4 int on_rq; ▒ 0.00 108 4 int prio; ▒ 0.00 112 4 int static_prio; ▒ 0.00 116 4 int normal_prio; ▒ 0.00 120 4 unsigned int rt_priority; ▒ + 0.00 128 256 struct sched_entity se; ▒ + 0.00 384 48 struct sched_rt_entity rt; ▒ + 0.00 432 224 struct sched_dl_entity dl; ▒ 0.00 656 8 struct sched_class* sched_class; ▒ ... Committer testing: # perf mem record -a sleep 5s # perf annotate --group --data-type=pthread_mutex_t Annotate type: 'pthread_mutex_t' (13 samples) Percent Offset Size Field - 100.00 0 40 pthread_mutex_t { ▒ - 100.00 0 40 struct __pthread_mutex_s __data { ▒ 39.45 0 4 int __lock; ▒ 0.00 4 4 unsigned int __count; ▒ 7.80 8 4 int __owner; ▒ 6.88 12 4 unsigned int __nusers; ▒ 45.87 16 4 int __kind; ▒ 0.00 20 2 short int __spins; ▒ 0.00 22 2 short int __elision; ▒ + 0.00 24 16 __pthread_list_t __list; ▒ }; ▒ 0.00 0 0 char[] __size; ▒ 39.45 0 8 long int __align; Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-12perf annotate-data: Implement folding in TUI browserNamhyung Kim1-6/+92
Like 'perf report', use 'e' or 'E' key to toggle folding the current entry so that it can control displaying child entries. Note I didn't add the 'c' and 'C' key to collapse the entry because it's also handled with the 'e'/'E' since it toggles the state. Committer testing: Do some 'perf mem record' for some workload of the whole system, using the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide profiling, etc) and then: # perf annotate --skip-empty --data-type=pthread_mutex_t That, by default, will start as --tui, then press 'E' to see the whole struct unfolded, etc. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-12perf annotate-data: Support folding in TUI browserNamhyung Kim1-23/+212
Like in the hists browser, it should support folding current entry so that it can hide unwanted details in some data structures. The folded entries will be displayed with the '+' sign, while unfolded entries will have the '-' sign. Entries that have no children will not show any signs. Annotate type: 'struct socket' (1 samples) Percent Offset Size Field - 100.00 0 128 struct socket { ◆ 0.00 0 4 socket_state state; ▒ 0.00 4 2 short int type; ▒ 0.00 8 8 long unsigned int flags; ▒ 0.00 16 8 struct file* file; ▒ 100.00 24 8 struct sock* sk; ▒ 0.00 32 8 struct proto_ops* ops; ▒ - 0.00 64 64 struct socket_wq wq { ▒ - 0.00 64 24 wait_queue_head_t wait { ▒ + 0.00 64 4 spinlock_t lock; ▒ - 0.00 72 16 struct list_head head { ▒ 0.00 72 8 struct list_head* next; ▒ 0.00 80 8 struct list_head* prev; ▒ }; ▒ }; ▒ 0.00 88 8 struct fasync_struct* fasync_list; ▒ 0.00 96 8 long unsigned int flags; ▒ + 0.00 104 16 struct callback_head rcu; ▒ }; ▒ }; ▒ This just adds the display logic for folding, actually folding action will be implemented in the next patch. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-12perf annotate-data: Fix a buffer overflow in TUI browserNamhyung Kim1-1/+2
In get_member_overhead(), k is updated when it has a entry in the histogram. But the entry->hists array is allocated with the number of evsel in the group. So the k should be reset when it iterates the event using for_each_group_evsel(), otherwise it'd crash due to a buffer overflow. Fixes: cb1898f58e0f175d ("perf annotate-data: Support --skip-empty option") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-09perf annotate-data: Support --skip-empty optionNamhyung Kim1-7/+23
The --skip-empty option is to hide dummy events in a group. Like other output mode in 'perf report' and 'perf annotate', the data-type profiling output should support the option. Committer testing: With dummy: root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P event[2] = dummy:u ============================================================================ Percent offset size field 100.00 100.00 0.00 0 40 pthread_mutex_t { 100.00 100.00 0.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0.00 0 4 int __lock; 0.00 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 0.00 8 4 int __owner; 5.19 10.65 0.00 12 4 unsigned int __nusers; 49.61 2.97 0.00 16 4 int __kind; 0.00 0.00 0.00 20 2 short int __spins; 0.00 0.00 0.00 22 2 short int __elision; 0.00 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0.00 0 0 char[] __size; 45.21 84.54 0.00 0 8 long int __align; }; Skipping it: root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P ============================================================================ Percent offset size field 100.00 100.00 0 40 pthread_mutex_t { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0 4 int __lock; 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 8 4 int __owner; 5.19 10.65 12 4 unsigned int __nusers; 49.61 2.97 16 4 int __kind; 0.00 0.00 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char[] __size; 45.21 84.54 0 8 long int __align; }; Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples): root@number:~# Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-07-31perf annotate: Convert comma to semicolonChen Ni1-1/+1
Replace a comma between expression statements by a semicolon. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Chen Ni <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahelenia Ziemiańska <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-06-26perf ui: Make ui its own libraryIan Rogers1-7/+7
Make the ui code its own library. This is done to avoid compiling code twice, once for the perf tool and once for the perf python module. Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: James Clark <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Kees Cook <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Gary Guo <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Cc: Ze Gao <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Yicong Yang <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Guo Ren <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mike Leach <[email protected]> Cc: Leo Yan <[email protected]> Cc: Oliver Upton <[email protected]> Cc: John Garry <[email protected]> Cc: Benno Lossin <[email protected]> Cc: Björn Roy Baron <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Paul Walmsley <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-05-07perf annotate: Use zfree() to avoid possibly accessing dangling pointersArnaldo Carvalho de Melo1-1/+2
When freeing a->b it is good practice to set a->b to NULL using zfree(&a->b) so that when we have a bug where a reference to a freed 'a' pointer is kept somewhere, we can more quickly cause a segfault if some code tries to use a->b. This is mostly done but some new cases were introduced recently, convert them to zfree(). Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-05-06perf dso: Add reference count checking and accessor functionsIan Rogers3-9/+9
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso. The majority of the change is mechanical in nature and not easy to split up. Committer testing: 'perf test' up to this patch shows no regressions. But: util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 | dso__set_adjust_symbols(dso); | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val) | ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: *** Waiting for unfinished jobs.... This was updated: - symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso); But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora). Add the missing argument: symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true); Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ahelenia Ziemiańska <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Changbin Du <[email protected]> Cc: Chengen Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dima Kogan <[email protected]> Cc: Ilkka Koskinen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Yanteng Si <[email protected]> Cc: zhaimingbing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-22Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo1-1/+1
To pick up fixes sent via perf-tools, by Namhyung Kim. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-12perf report: Add a menu item to annotate data type in TUINamhyung Kim1-0/+31
When the hist entry has the type info, it should be able to display the annotation browser for the type like in `perf annotate --data-type`. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-12perf annotate-data: Support event group display in TUINamhyung Kim1-10/+40
Like in stdio, it should print all events in a group together. Committer notes: Collect it: root@number:~# perf record -a -e '{cpu_core/mem-loads,ldlat=30/P,cpu_core/mem-stores/P}' ^C[ perf record: Woken up 8 times to write data ] [ perf record: Captured and wrote 4.980 MB perf.data (55825 samples) ] root@number:~# Then do it in stdio: root@number:~# perf annotate --stdio --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples): event[0] = cpu_core/mem-loads,ldlat=30/P event[1] = cpu_core/mem-stores/P ============================================================================ Percent offset size field 100.00 100.00 0 40 union { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 48.61 23.46 0 4 int __lock; 0.00 0.48 4 4 unsigned int __count; 6.38 41.32 8 4 int __owner; 8.74 34.02 12 4 unsigned int __nusers; 35.66 0.26 16 4 int __kind; 0.61 0.45 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char* __size; 48.61 23.94 0 8 long int __align; }; Now with TUI before this patch: root@number:~# perf annotate --tui --data-type Annotate type: 'union ' (790 samples) Percent Offset Size Field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 48.61 0 4 int __lock; 0.00 4 4 unsigned int __count; 6.38 8 4 int __owner; 8.74 12 4 unsigned int __nusers; 35.66 16 4 int __kind; 0.61 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; 0.00 0 0 char* __size; 48.61 0 8 long int __align; }; And now after this patch: Annotate type: 'union ' (790 samples) Percent Offset Size Field 100.00 100.00 0 40 union { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 48.61 23.46 0 4 int __lock; 0.00 0.48 4 4 unsigned int __count; 6.38 41.32 8 4 int __owner; 8.74 34.02 12 4 unsigned int __nusers; 35.66 0.26 16 4 int __kind; 0.61 0.45 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char* __size; 48.61 23.94 0 8 long int __align; }; On a followup patch the --tui output should have this that is present in --stdio: And the --stdio has all the missing info in TUI: Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples): event[0] = cpu_core/mem-loads,ldlat=30/P event[1] = cpu_core/mem-stores/P Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-12perf annotate-data: Add hist_entry__annotate_data_tui()Namhyung Kim2-0/+283
Support data type profiling output on TUI. Testing from Arnaldo: First make sure that the debug information for your workload binaries in embedded in them by building it with '-g' or install the debuginfo packages, since our workload is 'find': root@number:~# type find find is hashed (/usr/bin/find) root@number:~# rpm -qf /usr/bin/find findutils-4.9.0-5.fc39.x86_64 root@number:~# dnf debuginfo-install findutils <SNIP> root@number:~# Then collect some data: root@number:~# echo 1 > /proc/sys/vm/drop_caches root@number:~# perf mem record find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ] root@number:~# Finally do data-type annotation with the following command, that will default, as 'perf report' to the --tui mode, with lines colored to highlight the hotspots, etc. root@number:~# perf annotate --data-type Annotate type: 'struct predicate' (58 samples) Percent Offset Size Field 100.00 0 312 struct predicate { 0.00 0 8 PRED_FUNC pred_func; 0.00 8 8 char* p_name; 0.00 16 4 enum predicate_type p_type; 0.00 20 4 enum predicate_precedence p_prec; 0.00 24 1 _Bool side_effects; 0.00 25 1 _Bool no_default_print; 0.00 26 1 _Bool need_stat; 0.00 27 1 _Bool need_type; 0.00 28 1 _Bool need_inum; 0.00 32 4 enum EvaluationCost p_cost; 0.00 36 4 float est_success_rate; 0.00 40 1 _Bool literal_control_chars; 0.00 41 1 _Bool artificial; 0.00 48 8 char* arg_text; <SNIP> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-11perf annotate: Make sure to call symbol__annotate2() in TUINamhyung Kim1-1/+1
The symbol__annotate2() initializes some data structures needed by TUI. It has a logic to prevent calling it multiple times by checking if it has the annotated source. But data type profiling uses a different code (symbol__annotate) to allocate the annotated lines in advance. So TUI missed to call symbol__annotate2() when it shows the annotation browser. Make symbol__annotate() reentrant and handle that situation properly. This fixes a crash in the annotation browser started by perf report in TUI like below. $ perf report -s type,sym --tui # and press 'a' key and then move down Fixes: 81e57deec325 ("perf report: Support data type profiling") Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-08perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'Namhyung Kim1-1/+1
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-08perf annotate: Move 'widths' struct to 'struct annotated_source'Namhyung Kim1-3/+3
It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Also move the 'max_line_len' field into it as it's related. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-08perf annotate: Get rid of offsets arrayNamhyung Kim1-4/+1
The struct annotated_source.offsets[] is to save pointers to annotation_line at each offset. We can use annotated_source__get_line() helper instead so let's get rid of the array. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-04-08perf annotate: Introduce annotated_source__get_line()Namhyung Kim1-1/+1
It's a helper function to get annotation_line at the given offset without using the offsets array. The goal is to get rid of the offsets array altogether. It just does the linear search but I think it's better to save memory as it won't be called in a hot path. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-02-20perf: script: prefer capstone to XEDChangbin Du2-2/+2
Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-04perf TUI: Don't ignore job controlAhelenia Ziemiańska3-0/+4
In its infinite wisdom, by default, SLang sets susp undef, and this can only be un-done by calling SLtty_set_suspend_state(true). After every SLang_init_tty(). Additionally, no provisions are made for maintaining the teletype attributes across suspend/continue (outside of curses emulation mode(?!), which provides full support, naturally), so we need to save and restore the flags ourselves, as well as reset the text colours when going under. We need to also re-draw the screen, and raising SIGWINCH, shockingly, Just Works. The correct solution would be to Not Use SLang, but as a stop-gap, this makes TUI 'perf report' usable. Signed-off-by: Ahelenia Ziemiańska <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: yaowenbin <[email protected]> Link: https://lore.kernel.org/r/0354dcae23a8713f75f4fed609e0caec3c6e3cd5.1672174189.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-12-07perf annotate: Remove remaining usages of local annotation optionsNamhyung Kim1-8/+6
So that it can get rid of the unused data. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-12-07perf ui/browser/annotate: Use global annotation_optionsNamhyung Kim3-60/+41
Now it can use the global options and no need save local browser options separately. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-12-07perf annotate: Use global annotation_optionsNamhyung Kim1-3/+3
Now it can directly use the global options and no need to pass it as an argument. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Fixup build with GTK2=1 ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Move offsets array from 'struct annotation' to 'struct ↵Namhyung Kim1-2/+2
annotated_source' The offsets array keeps pointers to 'struct annotation_line' entries which are available in the 'struct annotated_source'. Let's move it to there. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Move some source code related fields from 'struct annotation' ↵Namhyung Kim1-6/+6
to 'struct annotated_source' Some fields in the 'struct annotation' are only used with 'struct annotated_source' so better to be moved there in order to reduce memory consumption for other symbols. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Split branch stack cycles information out of 'struct ↵Namhyung Kim1-1/+1
annotation_line' The cycles info is used only when branch stack is provided. Separate them from 'struct annotation_line' into a separate struct and lazy allocate them to save some memory. Committer notes: Make annotation__compute_ipc() check if the lazy allocation works, bailing out if so, its callers already do error checking and propagation. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-10-12perf hists browser: Avoid potential NULL dereferenceIan Rogers1-1/+1
On other code paths browser->he_selection is NULL checked, add a missing case reported by clang-tidy. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: [email protected] Cc: Ming Wang <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-10-12perf hists browser: Reorder variables to reduce paddingIan Rogers1-2/+2
Address clang-tidy warning: ``` tools/perf/ui/browsers/hists.c:2416:8: warning: Excessive padding in 'struct popup_action' (8 padding bytes, where 0 is optimal). Optimal fields order: time, thread, evsel, fn, ms, socket, rstype, ``` Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: [email protected] Cc: Ming Wang <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-08-25perf tui slang: Tidy castsIan Rogers1-5/+0
Casts were necessary for older versions of libslang, however, these are now 15 years old and so we no longer need to care about supporting them. Tidy the casts and remove unnecessary logic. Move the ENABLE_SLFUTURE_CONST to the libslang.h common include file, and also enable ENABLE_SLFUTURE_VOID. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wei Li <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-08-03perf hists browser: Fix the number of entries for 'e' keyNamhyung Kim1-34/+24
The 'e' key is to toggle expand/collapse the selected entry only. But the current code has a bug that it only increases the number of entries by 1 in the hierarchy mode so users cannot move under the current entry after the key stroke. This is due to a wrong assumption in the hist_entry__set_folding(). The commit b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function") factored out the code, but actually it should be handled separately. The hist_browser__set_folding() is to update fold state for each entry so it needs to traverse all (child) entries regardless of the current fold state. So it increases the number of entries by 1. But the hist_entry__set_folding() only cares the currently selected entry and its all children. So it should count all unfolded child entries. This code is implemented in hist_browser__toggle_fold() already so we can just call it. Fixes: b33f922651011eff ("perf hists browser: Put hist_entry folding logic into single function") Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-08-03perf hists browser: Fix hierarchy mode headerNamhyung Kim1-1/+1
The commit ef9ff6017e3c4593 ("perf ui browser: Move the extra title lines from the hists browser") introduced ui_browser__gotorc_title() to help moving non-title lines easily. But it missed to update the title for the hierarchy mode so it won't print the header line on TUI at all. $ perf report --hierarchy Fixes: ef9ff6017e3c4593 ("perf ui browser: Move the extra title lines from the hists browser") Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-06-20perf annotation: Switch lock from a mutex to a sharded_mutexIan Rogers1-5/+5
Remove the "struct mutex lock" variable from annotation that is allocated per symbol. This removes in the region of 40 bytes per symbol allocation. Use a sharded mutex where the number of shards is set to the number of CPUs. Assuming good hashing of the annotation (done based on the pointer), this means in order to contend there needs to be more threads than CPUs, which is not currently true in any perf command. Were contention an issue it is straightforward to increase the number of shards in the mutex. On my Debian/glibc based machine, this reduces the size of struct annotation from 136 bytes to 96 bytes, or nearly 30%. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Andres Freund <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Yuan Can <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Kan Liang <[email protected]> Cc: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-06-12perf thread: Add accessor functions for threadIan Rogers1-9/+10
Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Changbin Du <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Fangrui Song <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Babrou <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Ye Xingchen <[email protected]> Cc: Yuan Can <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-05-15perf annotate browser: Add '<' and '>' keys for navigationNamhyung Kim1-1/+3
hists__find_annotations() allows to move to next or previous symbols for annotation using the arrow keys. But TUI annotate_browser__run() uses the RIGHT key as ENTER to handle jump/call instructions. That makes the navigation to the next function impossible. I'd like to change it back to move the next symbol but I'm afraid if some users get confused. So I added a new pair of keys to handle that. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf dso: Fix use before NULL check introduced by map__dso() introductionArnaldo Carvalho de Melo1-2/+2
James Clark noticed that the recent 63df0e4bc368adbd ("perf map: Add accessor for dso") patch accessed map->dso before the 'map' variable was NULL checked, which is a change in logic that leads to segmentation faults, so comb thru that patch to fix similar cases. Fixes: 63df0e4bc368adbd ("perf map: Add accessor for dso") Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected] Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-10perf util: Move input_name to utilIan Rogers1-1/+1
'input_name' is the name of the input perf.data file, it is used by data convert and ui code. Move it to util to make it more consistent with other global state. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Chengdong Li <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Florian Fischer <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin Liška <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Raul Silvera <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: Xing Zhengjun <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf map: Add accessor for dsoIan Rogers3-12/+18
Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf maps: Add functions to access mapsIan Rogers1-1/+2
Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const-ification, in particlar to unwind_libunwind_ops. Committer notes: Fixed up tools/perf/util/unwind-libunwind.c: - return ops->get_entries(cb, arg, thread, data, max_stack); + return ops->get_entries(cb, arg, thread, data, max_stack, best_effort); Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-06perf annotate: Remove unused struct disasm_line_samplesYuan Can1-5/+0
After commit 3ab6db8d0f3b ("perf annotate browser: Use samples data from struct annotation_line"), no one use struct disasm_line_samples, so remove it. Signed-off-by: Yuan Can <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/linux-perf-users/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-04perf annotate: Toggle full address <-> offset displayNamhyung Kim1-1/+5
Handle 'f' key to toggle the display offset and full address. Obviously it only works when users set to see disassembler output ('o' key). It'd be useful when users want to see the full virtual address in the TUI annotate browser. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-04perf annotate: Update use of pthread mutexIan Rogers1-5/+5
Switch to the use of mutex wrappers that provide better error checking. Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Truong <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dario Petrillo <[email protected]> Cc: Darren Hart <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Hewenliang <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin Liška <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Pavithra Gurushankar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Monnet <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Remi Bernon <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tom Rix <[email protected]> Cc: Weiguo Li <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: William Cohen <[email protected]> Cc: Zechuan Chen <[email protected]> Cc: [email protected] Cc: [email protected] Cc: yaowenbin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-04perf ui: Update use of pthread mutexIan Rogers1-1/+1
Switch to the use of mutex wrappers that provide better error checking. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Truong <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andres Freund <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dario Petrillo <[email protected]> Cc: Darren Hart <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Hewenliang <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin Liška <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Pavithra Gurushankar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Monnet <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Remi Bernon <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Tom Rix <[email protected]> Cc: Weiguo Li <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: William Cohen <[email protected]> Cc: Zechuan Chen <[email protected]> Cc: [email protected] Cc: [email protected] Cc: yaowenbin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-10perf annotate: Avoid TUI crash when navigating in the annotation of ↵Dario Petrillo1-9/+14
recursive functions In 'perf report', entering a recursive function from inside of itself (either directly of indirectly through some other function) results in calling symbol__annotate2 multiple() times, and freeing the whole disassembly when exiting from the innermost instance. The first issue causes the function's disassembly to be duplicated, and the latter a heap use-after-free (and crash) when trying to access the disassembly again. I reproduced the bug on perf 5.11.22 (Ubuntu 20.04.3 LTS) and 5.16.rc8 with the following testcase (compile with gcc recursive.c -o recursive). To reproduce: - perf record ./recursive - perf report - enter fibonacci and annotate it - move the cursor on one of the "callq fibonacci" instructions and press enter - at this point there will be two copies of the function in the disassembly - go back by pressing q, and perf will crash #include <stdio.h> int fibonacci(int n) { if(n <= 2) return 1; return fibonacci(n-1) + fibonacci(n-2); } int main() { printf("%d\n", fibonacci(40)); } This patch addresses the issue by annotating a function and freeing the associated memory on exit only if no annotation is already present, so that a recursive function is only annotated on entry. Signed-off-by: Dario Petrillo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-18perf annotate: Fix fused instr logic for assembly functionsRavi Bangoria1-7/+17
Some x86 microarchitectures fuse a subset of cmp/test/ALU instructions with branch instructions, and thus perf annotate highlight such valid pairs as fused. When annotated with source, perf uses struct disasm_line to contain either source or instruction line from objdump output. Usually, a C statement generates multiple instructions which include such cmp/test/ALU + branch instruction pairs. But in case of assembly function, each individual assembly source line generate one instruction. The 'perf annotate' instruction fusion logic assumes the previous disasm_line as the previous instruction line, which is wrong because, for assembly function, previous disasm_line contains source line. And thus perf fails to highlight valid fused instruction pairs for assembly functions. Fix it by searching backward until we find an instruction line and consider that disasm_line as fused with current branch instruction. Before: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ cmp %rcx,0x88(%rsp) │ je .Lerror_bad_iret <--- Source line 0.14 │ ┌──je b4 <--- Instruction line │ │movl %ecx, %eax After: │ cmpq %rcx, RIP+8(%rsp) 0.00 │ ┌──cmp %rcx,0x88(%rsp) │ │je .Lerror_bad_iret 0.14 │ ├──je b4 │ │movl %ecx, %eax Reviewed-by: Jin Yao <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https //lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-08-03perf annotate: Re-add annotate_warned functionalityJames Clark1-0/+1
Setting annotate_warned to true on errors was removed in commit ee51d851392e ("perf annotate: Introduce strerror for handling symbol__disassemble() errors") which means when 'perf annotate --skip-missing' is used warnings are shown multiple times for the same DSO. Setting this again restores the original behavior of only one warning each. Signed-off-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-09libperf: Move 'idx' from tools/perf to perf_evsel::idxJiri Olsa1-1/+1
Move evsel::idx to perf_evsel::idx, so we can move the group interface to libperf. Committer notes: Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that appeared in my tree in my local tree. Also fixed up these: $ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx' tools/perf/ui/gtk/annotate.c: evsel->idx + i); tools/perf/ui/gtk/annotate.c: evsel->idx); $ That running 'make -C tools/perf build-test' caught. Signed-off-by: Jiri Olsa <[email protected]> Requested-by: Shunsuke Nakamura <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-07perf annotate: Fix 's' on source line when disasm is emptyRiccardo Mancini1-2/+2
If the disasm is empty, 's' should fail. Instead it seemingly works, hiding the empty lines and causing an assertion error on the next time annotate is called (from within perf report). The problem is caused by a buffer overflow, caused by a wrong exit condition in annotate_browser__find_next_asm_line, which checks browser->b.top instead of browser->b.entries. This patch fixes the issue, making annotate_browser__toggle_source fail if the disasm is empty (nothing happens to the user). Fixes: 6de249d66d2e7881 ("perf annotate: Allow 's' on source code lines") Signed-off-by: Riccardo Mancini <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin Liška <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>