aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2024-08-22perf dwarf-aux: Handle bitfield members from pointer accessNamhyung Kim1-2/+9
The __die_find_member_offset_cb() missed to handle bitfield members which don't have DW_AT_data_member_location. Like in adding member types in __add_member_cb() it should fallback to check the bit offset when it resolves the member type for an offset. Fixes: 437683a9941891c1 ("perf dwarf-aux: Handle type transfer for memory access") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf annotate-data: Add 'typecln' sort keyNamhyung Kim3-0/+54
Sometimes it's useful to organize member fields in cache-line boundary. The 'typecln' sort key is short for type-cacheline and to show samples in each cacheline. The cacheline size is fixed to 64 for now, but it can read the actual size once it saves the value from sysfs. For example, you maybe want to which cacheline in a target is hot or cold. The following shows members in the cfs_rq's first cache line. $ perf report -s type,typecln,typeoff -H ... - 2.67% struct cfs_rq + 1.23% struct cfs_rq: cache-line 2 + 0.57% struct cfs_rq: cache-line 4 + 0.46% struct cfs_rq: cache-line 6 - 0.41% struct cfs_rq: cache-line 0 0.39% struct cfs_rq +0x14 (h_nr_running) 0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost) ... Committer testing: # root@number:~# perf report -s type,typecln,typeoff -H --stdio # Total Lost Samples: 0 # # Samples: 5K of event 'cpu_atom/mem-loads,ldlat=5/P' # Event count (approx.): 312251 # # Overhead Data Type / Data Type Cacheline / Data Type Offset # .............. .................................................. # <SNIP> 0.07% struct sigaction 0.05% struct sigaction: cache-line 1 0.02% struct sigaction +0x58 (sa_mask) 0.02% struct sigaction +0x78 (sa_mask) 0.03% struct sigaction: cache-line 0 0.02% struct sigaction +0x38 (sa_mask) 0.01% struct sigaction +0x8 (sa_mask) <SNIP> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf annotate-data: Show offset and size in hexNamhyung Kim3-4/+4
It'd be better to have them in hex to check cacheline alignment. Percent offset size field 100.00 0 0x1c0 struct cfs_rq { 0.00 0 0x10 struct load_weight load { 0.00 0 0x8 long unsigned int weight; 0.00 0x8 0x4 u32 inv_weight; }; 0.00 0x10 0x4 unsigned int nr_running; 14.56 0x14 0x4 unsigned int h_nr_running; 0.00 0x18 0x4 unsigned int idle_nr_running; 0.00 0x1c 0x4 unsigned int idle_h_nr_running; ... Committer notes: Justification from Namhyung when asked about why it would be "better": Cache line sizes are power of 2 so it'd be natural to use hex and check whether an offset is in the same boundary. Also 'perf annotate' shows instruction offsets in hex. > > Maybe this should be selectable? I can add an option and/or a config if you want. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf bpf: Remove redundant check that map is NULLYang Ruibin1-3/+0
The check that map is NULL is already done in the bpf_map__fd(map) and returns an errno, which does not run further checks. In addition, even if the check for map is run, the return is a pointer, which is not consistent with the err_number returned by bpf_map__fd(map). Signed-off-by: Yang Ruibin <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf annotate-data: Fix percpu pointer checkNamhyung Kim2-59/+66
In check_matching_type(), it checks the type state of the register in a wrong order. When it's the percpu pointer, it should check the type for the pointer, but it checks the CFA bit first and thought it has no type in the stack slot. This resulted in no type info. ----------------------------------------------------------- find data type for 0x28(reg1) at hrtimer_reprogram+0x88 CU for kernel/time/hrtimer.c (die:0x18f219f) frame base: cfa=1 fbreg=7 ... add [72] percpu 0x24500 -> reg1 pointer type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46) bb: [7a - 7e] bb: [80 - 86] (here) bb: [88 - 88] vvv chk [88] reg1 offset=0x28 ok=1 kind=4 cfa : no type information no type information Here, instruction at 0x72 found reg1 has a (percpu) pointer and got the correct type. But when it checks the final result, it wrongly thought it was stack variable because it checks the cfa bit first. After changing the order of state check: ----------------------------------------------------------- find data type for 0x28(reg1) at hrtimer_reprogram+0x88 CU for kernel/time/hrtimer.c (die:0x18f219f) frame base: cfa=1 fbreg=7 ... (here) vvvvvvvvvv chk [88] reg1 offset=0x28 ok=1 kind=4 percpu ptr : Good! found by insn track: 0x28(reg1) type-offset=0x28 final type: type='struct hrtimer_cpu_base' size=0x240 (die:0x18f6d46) Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf annotate-data: Prefer struct/union over base typeNamhyung Kim1-1/+19
Sometimes a compound type can have a single field and the size is the same as the base type. But it's still preferred as struct or union could carry more information than the base type. Also put a slight priority on the typedef for the same reason. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-21perf annotate-data: Fix missing constant copyNamhyung Kim1-0/+1
I found it missed to copy the immediate constant when it moves the register value. This could result in a wrong type inference since the address for the per-cpu variable would be 0 always. Fixes: eb9190afaed6afd5 ("perf annotate-data: Handle ADD instructions") Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-20perf cap: Tidy up and improve capability testingIan Rogers7-72/+75
Remove dependence on libcap. libcap is only used to query whether a capability is supported, which is just 1 capget system call. If the capget system call fails, fall back on root permission checking. Previously if libcap fails then the permission is assumed not present which may be pessimistic/wrong. Add a used_root out argument to perf_cap__capable to say whether the fall back root check was used. This allows the correct error message, "root" vs "users with the CAP_PERFMON or CAP_SYS_ADMIN capability", to be selected. Tidy uses of perf_cap__capable so that tests aren't repeated if capget isn't supported. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-20perf annotate-data: Set bitfield member offset and size properlyNamhyung Kim1-6/+28
The bitfield members might not have DW_AT_data_member_location. Let's use DW_AT_data_bit_offset to set the member offset correct. Also use DW_AT_bit_size for the name like in a C program. Before: Annotate type: 'struct sk_buff' (1 samples) Percent Offset Size Field - 100.00 0 232 struct sk_buff { + 0.00 0 24 union ; + 0.00 24 8 union ; + 0.00 32 8 union ; 0.00 40 48 char[] cb; + 0.00 88 16 union ; 0.00 104 8 long unsigned int _nfct; 100.00 112 4 unsigned int len; 0.00 116 4 unsigned int data_len; 0.00 120 2 __u16 mac_len; 0.00 122 2 __u16 hdr_len; 0.00 124 2 __u16 queue_mapping; 0.00 126 0 __u8[] __cloned_offset; 0.00 0 1 __u8 cloned; 0.00 0 1 __u8 nohdr; 0.00 0 1 __u8 fclone; 0.00 0 1 __u8 peeked; 0.00 0 1 __u8 head_frag; 0.00 0 1 __u8 pfmemalloc; 0.00 0 1 __u8 pp_recycle; 0.00 127 1 __u8 active_extensions; + 0.00 128 60 union ; 0.00 188 4 sk_buff_data_t tail; 0.00 192 4 sk_buff_data_t end; 0.00 200 8 unsigned char* head; After: Annotate type: 'struct sk_buff' (1 samples) Percent Offset Size Field - 100.00 0 232 struct sk_buff { + 0.00 0 24 union ; + 0.00 24 8 union ; + 0.00 32 8 union ; 0.00 40 48 char[] cb + 0.00 88 16 union ; 0.00 104 8 long unsigned int _nfct; 100.00 112 4 unsigned int len; 0.00 116 4 unsigned int data_len; 0.00 120 2 __u16 mac_len; 0.00 122 2 __u16 hdr_len; 0.00 124 2 __u16 queue_mapping; 0.00 126 0 __u8[] __cloned_offset; 0.00 126 1 __u8 cloned:1; 0.00 126 1 __u8 nohdr:1; 0.00 126 1 __u8 fclone:2; 0.00 126 1 __u8 peeked:1; 0.00 126 1 __u8 head_frag:1; 0.00 126 1 __u8 pfmemalloc:1; 0.00 126 1 __u8 pp_recycle:1; 0.00 127 1 __u8 active_extensions; + 0.00 128 60 union ; 0.00 188 4 sk_buff_data_t tail; 0.00 192 4 sk_buff_data_t end; 0.00 200 8 unsigned char* head; Commiter notes: Collect some data: root@number:~# perf mem record -a --ldlat 5 -- ping -s 8193 -f 192.168.86.1 Memory events are enabled on a subset of CPUs: 16-27 PING 192.168.86.1 (192.168.86.1) 8193(8221) bytes of data. .^C --- 192.168.86.1 ping statistics --- 13881 packets transmitted, 13880 received, 0.00720409% packet loss, time 8664ms rtt min/avg/max/mdev = 0.510/0.599/7.768/0.115 ms, ipg/ewma 0.624/0.593 ms [ perf record: Woken up 8 times to write data ] [ perf record: Captured and wrote 14.877 MB perf.data (46785 samples) ] root@number:~# root@number:~# perf evlist cpu_atom/mem-loads,ldlat=5/P cpu_atom/mem-stores/P dummy:u root@number:~# perf evlist -v cpu_atom/mem-loads,ldlat=5/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x7 cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@number:~# Ok, now lets see what changes from before this patch to after it: root@number:~# perf annotate --data-type > /tmp/before Apply the patch, build: root@number:~# perf annotate --data-type > /tmp/after The first hunk of the diff, for a glib data structure, in userspace, look at those bitfields: root@number:~# diff -u10 /tmp/before /tmp/after | head -20 --- /tmp/before 2024-08-20 17:29:58.306765780 -0300 +++ /tmp/after 2024-08-20 17:33:13.210582596 -0300 @@ -163,22 +163,22 @@ Annotate type: 'GHashTable' in /usr/lib64/libglib-2.0.so.0.8000.3 (1 samples): ============================================================================ Percent offset size field 100.00 0 96 GHashTable { 0.00 0 8 gsize size; 0.00 8 4 gint mod; 100.00 12 4 guint mask; 0.00 16 4 guint nnodes; 0.00 20 4 guint noccupied; - 0.00 0 4 guint have_big_keys; - 0.00 0 4 guint have_big_values; + 0.00 24 1 guint have_big_keys:1; + 0.00 24 1 guint have_big_values:1; 0.00 32 8 gpointer keys; 0.00 40 8 guint* hashes; 0.00 48 8 gpointer values; root@number:~# As advertised :-) Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf daemon: Fix the build on more 32-bit architecturesArnaldo Carvalho de Melo1-4/+4
The previous attempt fixed the build on debian:experimental-x-mipsel, but when building on a larger set of containers I noticed it broke the build on some other 32-bit architectures such as: 42 7.87 ubuntu:18.04-x-arm : FAIL gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) builtin-daemon.c: In function 'cmd_session_list': builtin-daemon.c:692:16: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=] fprintf(out, "%c%" PRIu64, ^~~~~ builtin-daemon.c:694:13: csv_sep, (curr - daemon->start) / 60); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from builtin-daemon.c:3:0: /usr/arm-linux-gnueabihf/include/inttypes.h:105:34: note: format string is defined here # define PRIu64 __PRI64_PREFIX "u" So lets cast that time_t (32-bit/64-bit) to uint64_t to make sure it builds everywhere. Fixes: 4bbe6002931954bb ("perf daemon: Fix the build on 32-bit architectures") Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/ZsPmldtJ0D9Cua9_@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf test: Add cgroup sampling testNamhyung Kim1-0/+23
Add it to the record.sh shell test to verify if it tracks cgroup information correctly. It records with --all-cgroups option can check if it has PERF_RECORD_CGROUP and the names are not "unknown". $ sudo ./perf test -vv 95 95: perf record tests: --- start --- test child forked, pid 2871922 169c90-169cd0 g test_loop perf does have symbol 'test_loop' Basic --per-thread mode test Basic --per-thread mode test [Success] Register capture test Register capture test [Success] Basic --system-wide mode test Basic --system-wide mode test [Success] Basic target workload test Basic target workload test [Success] Branch counter test branch counter feature not supported on all core PMUs (/sys/bus/event_source/devices/cpu) [Skipped] Cgroup sampling test Cgroup sampling test [Success] ---- end(0) ---- 95: perf record tests : Ok Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf record: Fix sample cgroup & namespace trackingNamhyung Kim1-6/+3
The recent change in 'struct perf_tool' constification broke the cgroup and/or namespace tracking by resetting tool fields. It should set the values after perf_tool__init(). Fixes: cecb1cf154b301c6 ("perf record: Use perf_tool__init()") Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf inject: Combine mmap and mmap2 handlingIan Rogers1-62/+56
The handling of mmap and mmap2 events is near identical. Add a common helper function and call that by the two event handling functions. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf inject: Combine different mmap and mmap2 functionsIan Rogers1-123/+76
There are repipe, build ID and JIT dump variants of the mmap and mmap2 repipe functions. The organization doesn't allow JIT dump to work with build ID injection and the structure is less than clear. Combine the function and enable the different behaviors based on ifs. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf inject: Combine build_ids and build_id_all into enumIan Rogers1-20/+31
It is clearer to have a single enum that determines how build ids are injected, it also allows for future extension. Set the header build ID feature whether lazy or all are generated, previously only the lazy case would set it. Allow parsing of known build IDs for either the lazy or all cases. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf test: Expand pipe/inject testIan Rogers1-22/+79
Test recording of call-graphs and injecting --build-all. Add/expand trap handler. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf evsel: Constify evsel__id_hdr_size() argumentIan Rogers2-2/+2
Allows evsel__id_hdr_size() to be used when the evsel is const. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf dso: Constify dso_idIan Rogers7-14/+16
The passed dso_id is copied and so is never an out argument. Remove its mutability. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf jit: Constify filename argumentIan Rogers2-4/+5
Make it clearer the argument is just being used as a string. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf map: API clean upIan Rogers4-22/+19
map__init() is only used internally so make it static. Assume memory is zero initialized, which will better support adding fields to struct map in the future and was already the case for map__new2. To reduce complexity, change set_priv and set_erange_warned to not take a value to assign as they always assign true. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf synthetic-events: Avoid unnecessary memsetIan Rogers1-1/+1
Make sure the memset of a synthesized event only zeros the necessary tracing data part of the event, as a full event can be over 4kb in size. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anne Macedo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Casey Chen <[email protected]> Cc: Chaitanya S Prakash <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Weilin Wang <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yunseong Kim <[email protected]> Cc: Ze Gao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf python: Fix the build on 32-bit arm by including missing "util/sample.h"Xu Yang1-0/+1
The 32-bit arm build system will complain: tools/perf/util/python.c:75:28: error: field ‘sample’ has incomplete type 75 | struct perf_sample sample; However, arm64 build system doesn't complain this. The root cause is arm64 define "HAVE_KVM_STAT_SUPPORT := 1" in tools/perf/arch/arm64/Makefile, but arm arch doesn't define this. This will lead to kvm-stat.h include other header files on arm64 build system, especially "util/sample.h" for util/python.c. This will try to directly include "util/sample.h" for "util/python.c" to avoid such build issue on arm platform. Signed-off-by: Xu Yang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Update type stat at the end of find_data_type_die()Namhyung Kim1-17/+30
After trying all possibilities with DWARF and instruction tracking. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Check variables in every scopeNamhyung Kim1-17/+27
Sometimes it matches a variable in the inner scope but it fails because the actual access can be on a different type. Let's try variables in every scope and choose the best one using is_better_type(). I have an example with update_blocked_averages(), at first it found a variable (__mptr) but it's a void pointer. So it moved on to the upper scope and found another variable (cfs_rq). $ perf --debug type-profile annotate --data-type --stdio ... ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer variable location: base=reg14, offset=0x140 type='void*' size=0x8 (die:0x12dd8f9) found "cfs_rq" (die: 0x1301721) in scope=3/4 (die: 0x130171c) type_offset=0x140 variable location: reg14 type='struct cfs_rq' size=0x1c0 (die:0x12e37e5) final type: type='struct cfs_rq' size=0x1c0 (die:0x12e37e5) IIUC the scope is like below: 1: update_blocked_averages 2: __update_blocked_fair 3: for_each_leaf_cfs_rq_safe 4: list_entry -> (container_of) The container_of is implemented like: #define container_of(ptr, type, member) ({ \ void *__mptr = (void *)(ptr); \ static_assert(__same_type(*(ptr), ((type *)0)->member) || \ __same_type(*(ptr), void), \ "pointer type mismatch in container_of()"); \ ((type *)(__mptr - offsetof(type, member))); }) That's why we see the __mptr variable first but it failed since it has no type information. Then for_each_leaf_cfs_rq_safe() is defined as #define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \ list_for_each_entry_safe(cfs_rq, pos, &rq->leaf_cfs_rq_list, \ leaf_cfs_rq_list) Note that the access was 0x140(r14). And the cfs_rq has leaf_cfs_rq_list at the 0x140. So it converts the list_head pointer to a pointer to struct cfs_rq here. $ pahole --hex -C cfs_rq vmlinux | grep 140 struct cfs_rq struct list_head leaf_cfs_rq_list; /* 0x140 0x10 */ Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Add is_better_type() helperNamhyung Kim1-10/+51
Sometimes more than one variables are located in the same register or a stack slot. Or it can overwrite existing information with others. I found this is not helpful in some cases so it needs to update the type information from the variable only if it's better. But it's hard to know which one is better, so we needs heuristics. :) As it deals with memory accesses, the location should have a pointer or something similar (like array or reference). So if it had an integer type and a variable is a pointer, we can take the variable's type to resolve the target of the access. If it has a pointer type and a variable with the same location has a different pointer type, it'll take one with bigger target type. This can be useful when the target type embeds a smaller type (like list header or RB-tree node) at the beginning so their location is same. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Add is_pointer_type() helperNamhyung Kim1-9/+14
It treats pointers and arrays in the same way. Let's add the helper and use it when it checks if it needs a pointer. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Change return type of find_data_type_block()Namhyung Kim1-59/+58
So that it can return enum variable_match_type to be propagated to the find_data_type_die(). Also update the debug message to show the result of the check_matching_type(). chk [dd] reg0 offset=0 ok=1 kind=1 : Good! or chk [177] reg4 offset=0x138 ok=0 kind=0 cfa : no type information Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Add variable_state_str()Namhyung Kim1-15/+26
So that it can show a proper debug message in the right place. The check_variable() is used in other places which don't want to print the message. $ perf --debug type-profile annotate --data-type Before: ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 no pointer or no type <<<--- removed check variable "__mptr" failed (die: 0x13022f1) variable location: base=reg14, offset=0x140 type='void*' size=0x8 (die:0x12dd8f9) After: ----------------------------------------------------------- find data type for 0x140(reg14) at update_blocked_averages+0x2db CU for kernel/sched/fair.c (die:0x12dd892) frame base: cfa=1 fbreg=7 found "__mptr" (die: 0x13022f1) in scope=4/4 (die: 0x13022e8) failed: no/void pointer <<<--- here variable location: base=reg14, offset=0x140 type='void*' size=0x8 (die:0x12dd8f9) Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Add 'enum type_match_result'Namhyung Kim1-11/+25
And let check_variable() return the enum value so that callers can know what was the problem. This will be used by the later patch to update the statistics correctly and print the error message in a right place. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf annotate-data: Fix off-by-one in location range checkNamhyung Kim2-2/+2
The location list will have entries with half-open addressing like [start, end) which means it doesn't include the end address. So it should skip entries at the end address and match to the next entry. An example location list looks like this (from readelf -wo): 00237876 ffffffff8110d32b (base address) 0023787f v000000000000000 v000000000000002 views at 00237868 for: ffffffff8110d32b ffffffff8110d4eb (DW_OP_reg3 (rbx)) <<<--- 1 00237885 v000000000000002 v000000000000000 views at 0023786a for: ffffffff8110d4eb ffffffff8110d50b (DW_OP_reg14 (r14)) <<<--- 2 0023788c v000000000000000 v000000000000001 views at 0023786c for: ffffffff8110d50b ffffffff8110d7c4 (DW_OP_reg3 (rbx)) 00237893 v000000000000000 v000000000000000 views at 0023786e for: ffffffff8110d806 ffffffff8110d854 (DW_OP_reg3 (rbx)) 0023789a v000000000000000 v000000000000000 views at 00237870 for: ffffffff8110d876 ffffffff8110d88e (DW_OP_reg3 (rbx)) The first entry at 0023787f has [8110d32b, 8110d4eb) (omitting the ffffffff at the beginning), and the second one has [8110d4eb, 8110d50b). Fixes: 2bc3cf575a162a2c ("perf annotate-data: Improve debug message with location info") Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-19perf dwarf-aux: Check allowed location expressions when collecting variablesNamhyung Kim1-0/+3
It missed to call check_allowed_ops() in __die_collect_vars_cb() so it can take variables with complex location expression incorrectly. For example, I found some variable has this expression. 015d8df8 ffffffff81aacfb3 (base address) 015d8e01 v000000000000004 v000000000000000 views at 015d8df2 for: ffffffff81aacfb3 ffffffff81aacfd2 (DW_OP_fbreg: -176; DW_OP_deref; DW_OP_plus_uconst: 332; DW_OP_deref_size: 4; DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64; DW_OP_minus; DW_OP_stack_value) 015d8e14 v000000000000000 v000000000000000 views at 015d8df4 for: ffffffff81aacfd2 ffffffff81aacfd7 (DW_OP_reg3 (rbx)) 015d8e19 v000000000000000 v000000000000000 views at 015d8df6 for: ffffffff81aacfd7 ffffffff81aad020 (DW_OP_fbreg: -176; DW_OP_deref; DW_OP_plus_uconst: 332; DW_OP_deref_size: 4; DW_OP_lit1; DW_OP_shra; DW_OP_const1u: 64; DW_OP_minus; DW_OP_stack_value) 015d8e2c <End of list> It looks like '((int *)(-176(%rbp) + 332) >> 1) - 64' but the current code thought it's just -176(%rbp) and processed the variable incorrectly. It should reject such a complex expression if check_allowed_ops() doesn't like it. :) Fixes: 932dcc2c39aedf54 ("perf dwarf-aux: Add die_collect_vars()") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-16Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo9-20/+204
To pick up the latest perf-tools merge for 6.11, i.e. to have the current perf tools branch that is getting into 6.11 with the perf-tools-next that is geared towards 6.12. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-16perf stat: Display iostat headers correctlyYicong Yang1-1/+2
Currently we'll only print metric headers for metric leader in aggregration mode. This will make `perf iostat` header not shown since it'll aggregrated globally but don't have metric events: root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000 Performance counter stats for 'system wide': port 0000:00 0 0 0 0 0000:80 0 0 0 0 [...] Fix this by excluding the iostat in the check of printing metric headers. Then we can see the headers: root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat --iostat --timeout 1000 Performance counter stats for 'system wide': port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB) 0000:00 0 0 0 0 0000:80 0 0 0 0 [...] Fixes: 193a9e30207f5477 ("perf stat: Don't display metric header for non-leader uncore events") Signed-off-by: Yicong Yang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Junhao He <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Zeng Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-16perf sched timehist: Fix missing free of session in perf_sched__timehist()Yang Jihong1-1/+2
When perf_time__parse_str() fails in perf_sched__timehist(), need to free session that was previously created, fix it. Fixes: 853b74071110bed3 ("perf sched timehist: Add option to specify time window of interest") Signed-off-by: Yang Jihong <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-15perf hist: Update hist symbol when updating mapsMatt Fleming1-0/+5
AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with 5c24b67aae72f54c ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until c087e9480cf33672 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: c087e9480cf33672 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <[email protected]> Signed-off-by: Matt Fleming (Cloudflare) <[email protected]> Cc: Ian Rogers <[email protected]> Cc: [email protected] Cc: Namhyung Kim <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: [email protected] # v5.13+ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf test record.sh: Raise limit of open file descriptorsVeronika Molnarova1-1/+18
Subtest for system-wide record with '--threads=cpu' option fails due to a limit of open file descriptors on systems with 128 or more CPUs as the default limit is set to 1024. The number of open file descriptors should be slightly above nmb_events*nmb_cpus + nmb_cpus(for perf.data.n) + 4*nmb_cpus(for pipes), which equals 8*nmb_cpus. Therefore, temporarily raise the limit to 16*nmb_cpus for the test. Committer notes: Instead of disabling ShellCheck warnings all the uses of 'uname -n', i.e. those: In tests/shell/record.sh line 35: default_fd_limit=$(ulimit -Sn) ^-^ SC3045 (warning): In POSIX sh, ulimit -S is undefined. We can just switch from using '/bin/sh' to '/bin/bash' for this test, as bash _has_ 'ulimit -n', so ShellCheck will not emit that warning. There are dozens of 'perf test' shell tests that do just that, '/bin/bash' is a reasonable expectation for those tests. Signed-off-by: Veronika Molnarova <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Radostin Stoyanov <[email protected]> Link: https://lore.kernel.org/linux-perf-users/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf test: Add new test cases for the branch counter featureKan Liang1-5/+12
Enhance the test case for the branch counter feature. Now, the test verifies: - The new filter can be successfully applied on the supported platforms. - The counter value can be outputted via the perf report -D - The counter value and the abbr name can be outputted via the perf script (New) Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf script: Add branch countersKan Liang2-8/+63
It's useful to print the branch counter information for each jump in the brstackinsn when it's available. Add a new field 'brcntr' to display the branch counter information. By default, the abbreviation will be used to indicate the branch counter. In the verbose mode, the real event name is shown. $ perf script -F +brstackinsn,+brcntr # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC $ perf script -F +brstackinsn,+brcntr -v tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 7 cycles [14] 0.43 IPC Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf annotate: Display the branch counter histogramKan Liang6-9/+74
Display the branch counter histogram in the annotation view. Press 'B' to display the branch counter's abbreviation list as well. Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret Originally-by: Tinghao Zhang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf report: Display the branch counter histogramKan Liang8-18/+246
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged branch counter events. They are shown right after all the cycle-related annotations. Extend the 'struct block_info' to store and pass the branch counter related information. The annotation_br_cntr_entry() is to print the histogram of each branch counter event. If the number of logged events is less than 4, the exact number of the abbr name is printed. Otherwise, using '+' to stands for more than 3 events. Assume the number of logged events is less than 4. The annotation_br_cntr_abbr_list() prints the branch counter's abbreviation list. Press 'B' to display the list in the TUI mode. $ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter $ perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }' # Event count (approx.): 1610046 # # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 |A |- | ... 25.27% 1.1M 0.00% 2 |AA |- | ... 15.61% 667.2K 0.00% 1 |A |- | ... 0.16% 6.9K 0.81% 575 |A |- | ... 0.16% 6.8K 1.38% 977 |AA |- | ... 0.16% 6.8K 0.04% 28 |AA |B | ... 0.15% 6.6K 1.33% 946 |A |- | ... 0.11% 4.5K 0.06% 46 |AAA+|- | ... 0.10% 4.4K 0.88% 624 |A |- | ... 0.09% 3.7K 0.74% 524 |AAA+|B | ... With -v applied, # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 A=1 ,B=- ... 25.27% 1.1M 0.00% 2 A=2 ,B=- ... 15.61% 667.2K 0.00% 1 A=1 ,B=- ... 0.16% 6.9K 0.81% 575 A=1 ,B=- ... 0.16% 6.8K 1.38% 977 A=2 ,B=- ... 0.16% 6.8K 0.04% 28 A=2 ,B=1 ... 0.15% 6.6K 1.33% 946 A=1 ,B=- ... 0.11% 4.5K 0.06% 46 A=3+,B=- ... 0.10% 4.4K 0.88% 624 A=1 ,B=- ... 0.09% 3.7K 0.74% 524 A=3+,B=1 ... Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf evsel: Assign abbr name for the branch counter eventsKan Liang2-1/+60
There could be several branch counter events. If perf tool output the result via the format "event name + a number", the line could be very long and hard to read. An abbreviation is introduced to replace the full event name in the display. The abbreviation starts from 'A' to 'Z9', which can support up to 286 events. The same abbreviation will be assigned if the same events are found in the evlist. The next patch will utilize the abbreviation name to show the branch counter events in the output. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf annotate: Save branch counters for each blockKan Liang10-22/+80
When annotating a basic block, it's useful to display the occurrences of other events in the block. The branch counter feature is only available for newer Intel platforms. So a dedicated option to display the branch counters is not introduced. Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation. When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation. Accounting the branch counters as well when accounting the cycles in hist__account_cycles(). In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter. In a sample, all the branch counters for a branch are saved in a u64 space. Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3. Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf evlist: Save branch counters informationKan Liang4-6/+32
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. The kernel only dumps the number of occurrences into a record. The perf tool has to map the number to the corresponding event. Add evlist__update_br_cntr() to go through the evlist to pick the events that are configured to be logged. Assign a logical idx to track them, and add the total number of the events in the leader event. The total number will be used to allocate the space to save the branch counters for a block. The logical idx will be used to locate the corresponding event quickly in the following patches. It only needs to iterate the evlist once. The evsel__has_branch_counters() is also optimized. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf report: Remove the first overflow check for branch countersKan Liang1-2/+0
A false overflow warning is triggered if a sample doesn't have any LBRs recorded and the branch counters feature is enabled. The current code does OVERFLOW_CHECK_u64() at the very beginning when reading the information of branch counters. It assumes that there is at least one LBR in the PEBS record. But it is a valid case that 0 LBR is recorded especially in a high context switch. Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be good enough to check the overflow when reading the information of the branch counters. Fixes: 9fbb4b02302b0ae6 ("perf tools: Add branch counter knob") Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf report: Fix --total-cycles --stdio output errorKan Liang1-1/+2
The --total-cycles may output wrong information with the --stdio. For example: # perf record -e "{cycles,instructions}",cache-misses -b sleep 1 # perf report --total-cycles --stdio The total cycles output of {cycles,instructions} and cache-misses are almost the same. # Samples: 938 of events 'anon group { cycles, instructions }' # Event count (approx.): 938 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] # ............... .............. ........... .......... ..................................................> # 11.19% 2.6K 0.10% 21 [perf_iterate_ctx+48 -> > 5.79% 1.4K 0.45% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_> 5.11% 1.2K 0.33% 71 [native_write_msr+0 ->> # Samples: 293 of event 'cache-misses' # Event count (approx.): 293 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] # ............... .............. ........... .......... ..................................................> # 11.19% 2.6K 0.13% 21 [perf_iterate_ctx+48 -> > 5.79% 1.4K 0.59% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_> 5.11% 1.2K 0.43% 71 [native_write_msr+0 ->> With the symbol_conf.event_group, the 'perf report' should only report the block information of the leader event in a group. However, the current implementation retrieves the next event's block information, rather than the next group leader's block information. Make sure the index is updated even if the event is skipped. With the patch, # Samples: 293 of event 'cache-misses' # Event count (approx.): 293 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] # ............... .............. ........... .......... ..................................................> # 37.98% 9.0K 4.05% 299 [perf_event_addr_filters_exec+0 -> perf_event_a> 11.19% 2.6K 0.28% 21 [perf_iterate_ctx+48 -> > 5.79% 1.4K 1.32% 97 [__intel_pmu_enable_all.constprop.0+80 -> __intel_> Fixes: 6f7164fa231a5f36 ("perf report: Sort by sampled cycles percent per block for stdio") Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jin Yao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf test annotate: Dump trapping test in trap handlerIan Rogers1-1/+2
Help to better identify the location of test failures but dumping the failing test in the trap handler. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-14perf disasm: Fix memory leak for locked operationsIan Rogers1-0/+1
lock__parse() calls disasm_line__parse() passing &ops->locked.ins.name that will use strdup() to populate it. Ensure ops->locked.ins.name is freed in lock__delete(). Found with address/leak sanitizer. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-13perf inject: Inject build ids for entire call chainIan Rogers1-0/+31
The DSO build id is injected when the dso is first encountered but the checking for first encountered only looks at the sample->ip not the entire callchain. Use the callchain logic to ensure all build ids are inserted. Fixes: 454c407ec17a0c63 ("perf: add perf-inject builtin") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Casey Chen <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Zanussi <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Split from a larger patch that introduced the API and use it ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-13perf callchain: Add a for_each callback style APIIan Rogers2-0/+41
Add a for_each callback style API to callchain with sample__for_each_callchain_node(). Possibly in the future such an API can avoid the overhead of constructing the call chain list. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Casey Chen <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Zanussi <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Split from a larger patch that introduced the API and use it ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-08-13perf test: Add test for Intel TPEBS counting modeWeilin Wang1-0/+19
Intel TPEBS sampling mode is supported through perf record. The counting mode code uses perf record to capture retire_latency value and use it in metric calculation. This test checks the counting mode code on Intel platforms. Committer testing: root@x1:~# perf test tpebs 123: test Intel TPEBS counting mode : Ok root@x1:~# set -o vi root@x1:~# perf test tpebs 123: test Intel TPEBS counting mode : Ok root@x1:~# perf test -v tpebs 123: test Intel TPEBS counting mode : Ok root@x1:~# perf test -vvv tpebs 123: test Intel TPEBS counting mode: --- start --- test child forked, pid 16603 Testing without --record-tpebs Testing with --record-tpebs ---- end(0) ---- 123: test Intel TPEBS counting mode : Ok root@x1:~# Reviewed-by: Namhyung Kim <[email protected]> Signed-off-by: Weilin Wang <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Samantha Alt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>