aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2019-11-06perf parse: Add parse events handle errorIan Rogers3-43/+71
Parse event error handling may overwrite one error string with another creating memory leaks. Introduce a helper routine that warns about multiple error messages as well as avoiding the memory leak. A reproduction of this problem can be seen with: perf stat -e c/c/ After this change this produces: WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jin Yao <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf stat: Add --per-node agregation supportJiri Olsa5-0/+38
Adding new --per-node option to aggregate counts per NUMA nodes for system-wide mode measurements. You can specify --per-node in live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles 3.003625233 N0 20 6,604,441 cycles 3.003625233 N1 20 1,043,428 cycles 4.005135036 N0 20 6,350,522 cycles 4.005135036 N1 20 681,564 cycles 4.340902364 N0 20 3,403,188 cycles 4.340902364 N1 20 520,705 cycles Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf env: Add perf_env__numa_node()Jiri Olsa2-0/+46
To speed up cpu to node lookup, add perf_env__numa_node(), that creates cpu array on the first lookup, that holds numa nodes for each stored cpu. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf tools: Splice events onto evlist even on errorIan Rogers1-6/+11
If event parsing fails the event list is leaked, instead splice the list onto the out result and let the caller cleanup. An example input for parse_events found by libFuzzer that reproduces this memory leak is 'm{'. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jin Yao <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iteratorsArnaldo Carvalho de Melo3-27/+10
To reduce boilerplate, providing a more compact form to iterate over the maps in a map_group. Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf maps: Add for_each_entry()/_safe() iteratorsArnaldo Carvalho de Melo7-36/+59
To reduce boilerplate, provide a more compact form using an idiom present in other trees of data structures. Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf map: Allow map__next() to receive a NULL argArnaldo Carvalho de Melo1-1/+6
Just like free(), return NULL in that case, will simplify the for_each_entry_safe() iterators. Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf map: Check if the map still has some refcounts on exitArnaldo Carvalho de Melo1-1/+1
We were checking just if it was still on some rb tree, but that is not the only way that this map can still have references, map->refcnt is there exactly for this, use it. Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf dso: Add dso__data_write_cache_addr()Adrian Hunter2-15/+65
Add functions to write into the dso file data cache, but not change the file itself. Signed-off-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf dso: Refactor dso_cache__read()Adrian Hunter1-27/+37
Refactor dso_cache__read() to separate populating the cache from copying data from it. This is preparation for adding a cache "write" that will update the data in the cache. Signed-off-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf auxtrace: Add auxtrace_cache__remove()Adrian Hunter2-0/+29
Add auxtrace_cache__remove(). Intel PT uses an auxtrace_cache to store the results of code-walking, so that the same block of instructions does not have to be decoded repeatedly. However, when there are text poke events, the associated cache entries need to be removed. Signed-off-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to show ranges of variables in functions without entry_pcMasami Hiramatsu1-2/+2
Fix to show ranges of variables (--range and --vars option) in functions which DIE has only ranges but no entry_pc attribute. Without this fix: # perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> (No matched variables) With this fix: # perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> [VAL] int cpu @<clear_tasks_mm_cpumask+[0-35,317-317,2052-2059]> Committer testing: Before: [root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> (No matched variables) [root@quaco ~]# After: [root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask Available variables at clear_tasks_mm_cpumask @<clear_tasks_mm_cpumask+0> [VAL] int cpu @<clear_tasks_mm_cpumask+[0-23,23-105,105-106,106-106,1843-1850,1850-1862]> [root@quaco ~]# Using it: [root@quaco ~]# perf probe clear_tasks_mm_cpumask cpu Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask with cpu) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c with cpu) [root@quaco ~]# [root@quaco ~]# perf trace -e probe:*cpumask ^C[root@quaco ~]# Fixes: 349e8d261131 ("perf probe: Add --range option to show a variable's location range") Signed-off-by: Masami Hiramatsu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199323018.8075.8179744380479673672.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to show inlined function callsite without entry_pcMasami Hiramatsu1-1/+1
Fix 'perf probe --line' option to show inlined function callsite lines even if the function DIE has only ranges. Without this: # perf probe -L amd_put_event_constraints ... 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) __amd_put_nb_event_constraints(cpuc, event); 5 } With this patch: # perf probe -L amd_put_event_constraints ... 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) 4 __amd_put_nb_event_constraints(cpuc, event); 5 } Committer testing: Before: [root@quaco ~]# perf probe -L amd_put_event_constraints <amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0> 0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) __amd_put_nb_event_constraints(cpuc, event); 5 } PMU_FORMAT_ATTR(event, "config:0-7,32-35"); PMU_FORMAT_ATTR(umask, "config:8-15" ); [root@quaco ~]# After: [root@quaco ~]# perf probe -L amd_put_event_constraints <amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0> 0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) 2 { 3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)) 4 __amd_put_nb_event_constraints(cpuc, event); 5 } PMU_FORMAT_ATTR(event, "config:0-7,32-35"); PMU_FORMAT_ATTR(umask, "config:8-15" ); [root@quaco ~]# perf probe amd_put_event_constraints:4 Added new event: probe:amd_put_event_constraints (on amd_put_event_constraints:4) You can now use it in all perf tools, such as: perf record -e probe:amd_put_event_constraints -aR sleep 1 [root@quaco ~]# [root@quaco ~]# perf probe -l probe:amd_put_event_constraints (on amd_put_event_constraints:4@arch/x86/events/amd/core.c) probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) [root@quaco ~]# Using it: [root@quaco ~]# perf trace -e probe:* ^C[root@quaco ~]# Ok, Intel system here... :-) Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199322107.8075.12659099000567865708.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to list probe event with correct line numberMasami Hiramatsu1-2/+2
Since debuginfo__find_probe_point() uses dwarf_entrypc() for finding the entry address of the function on which a probe is, it will fail when the function DIE has only ranges attribute. To fix this issue, use die_entrypc() instead of dwarf_entrypc(). Without this fix, perf probe -l shows incorrect offset: # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579263632@work/linux/linux/kernel/cpu.c) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask+18446744071579263752@work/linux/linux/kernel/cpu.c) With this: # perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@work/linux/linux/kernel/cpu.c) probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:21@work/linux/linux/kernel/cpu.c) Committer testing: Before: [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579765152@kernel/cpu.c) [root@quaco ~]# After: [root@quaco ~]# perf probe -l probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c) [root@quaco ~]# Fixes: 1d46ea2a6a40 ("perf probe: Fix listing incorrect line number with inline function") Signed-off-by: Masami Hiramatsu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199321227.8075.14655572419136993015.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to probe an inline function which has no entry pcMasami Hiramatsu1-1/+1
Fix perf probe to probe an inlne function which has no entry pc or low pc but only has ranges attribute. This seems very rare case, but I could find a few examples, as same as probe_point_search_cb(), use die_entrypc() to get the entry address in probe_point_inline_cb() too. Without this patch: # perf probe -D __amd_put_nb_event_constraints Failed to get entry address of __amd_put_nb_event_constraints. Probe point '__amd_put_nb_event_constraints' not found. Error: Failed to add events. With this patch: # perf probe -D __amd_put_nb_event_constraints p:probe/__amd_put_nb_event_constraints amd_put_event_constraints+43 Committer testing: Before: [root@quaco ~]# perf probe -D __amd_put_nb_event_constraints Failed to get entry address of __amd_put_nb_event_constraints. Probe point '__amd_put_nb_event_constraints' not found. Error: Failed to add events. [root@quaco ~]# After: [root@quaco ~]# perf probe -D __amd_put_nb_event_constraints p:probe/__amd_put_nb_event_constraints _text+33789 [root@quaco ~]# Fixes: 4ea42b181434 ("perf: Add perf probe subcommand, a kprobe-event setup helper") Signed-off-by: Masami Hiramatsu <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199320336.8075.16189530425277588587.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to probe a function which has no entry pcMasami Hiramatsu1-1/+1
Fix 'perf probe' to probe a function which has no entry pc or low pc but only has ranges attribute. probe_point_search_cb() uses dwarf_entrypc() to get the probe address, but that doesn't work for the function DIE which has only ranges attribute. Use die_entrypc() instead. Without this fix: # perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. With this: # perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0 p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0 Committer testing: Before: [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. [root@quaco ~]# After: [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Added new event: probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask) You can now use it in all perf tools, such as: perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1 [root@quaco ~]# Using it with 'perf trace': [root@quaco ~]# perf trace -e probe:clear_tasks_mm_cpumask Doesn't seem to be used in x86_64: $ find . -name "*.c" | xargs grep clear_tasks_mm_cpumask ./kernel/cpu.c: * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU ./kernel/cpu.c:void clear_tasks_mm_cpumask(int cpu) ./arch/xtensa/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/csky/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/sh/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/arm/kernel/smp.c: clear_tasks_mm_cpumask(cpu); ./arch/powerpc/mm/nohash/mmu_context.c: clear_tasks_mm_cpumask(cpu); $ find . -name "*.h" | xargs grep clear_tasks_mm_cpumask ./include/linux/cpu.h:void clear_tasks_mm_cpumask(int cpu); $ find . -name "*.S" | xargs grep clear_tasks_mm_cpumask $ Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199319438.8075.4695576954550638618.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix wrong address verificationMasami Hiramatsu1-22/+10
Since there are some DIE which has only ranges instead of the combination of entrypc/highpc, address verification must use dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc. Also, the ranges only DIE will have a partial code in different section (e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In that case, we can not use dwarf_entrypc() or die_entrypc(), because the offset from original DIE can be a minus value. Instead, this simply gets the symbol and offset from symtab. Without this patch; # perf probe -D clear_tasks_mm_cpumask:1 Failed to get entry address of clear_tasks_mm_cpumask Error: Failed to add events. And with this patch: # perf probe -D clear_tasks_mm_cpumask:1 p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0 p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5 p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8 p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16 p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82 Committer testing: I managed to reproduce the above: [root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1 p:probe/clear_tasks_mm_cpumask _text+919968 p:probe/clear_tasks_mm_cpumask_1 _text+919973 p:probe/clear_tasks_mm_cpumask_2 _text+919976 [root@quaco ~]# But then when trying to actually put the probe in place, it fails if I use :0 as the offset: [root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5 <clear_tasks_mm_cpumask@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/kernel/cpu.c:0> 0 void clear_tasks_mm_cpumask(int cpu) 1 { 2 struct task_struct *p; [root@quaco ~]# perf probe clear_tasks_mm_cpumask:0 Probe point 'clear_tasks_mm_cpumask' not found. Error: Failed to add events. [root@quaco The next patch is needed to fix this case. Fixes: 576b523721b7 ("perf probe: Fix probing symbols with optimization suffix") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157199318513.8075.10463906803299647907.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to show function entry line as probe-ableMasami Hiramatsu2-1/+26
Fix die_walk_lines() to list the function entry line correctly. Since the dwarf_entrypc() does not return the entry pc if the DIE has only range attribute, __die_walk_funclines() fails to list the declaration line (entry line) in that case. To solve this issue, this introduces die_entrypc() which correctly returns the entry PC (the first address range) even if the DIE has only range attribute. With this fix die_walk_lines() shows the function entry line is able to probe correctly. Fixes: 4cc9cec636e7 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157190837419.1859.4619125803596816752.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Walk function lines in lexical blocksMasami Hiramatsu1-5/+9
Since some inlined functions are in lexical blocks of given function, we have to recursively walk through the DIE tree. Without this fix, perf-probe -L can miss the inlined functions which is in a lexical block (like if (..) { func() } case.) However, even though, to walk the lines in a given function, we don't need to follow the children DIE of inlined functions because those do not have any lines in the specified function. We need to walk though whole trees only if we walk all lines in a given file, because an inlined function can include another inlined function in the same file. Fixes: b0e9cb2802d4 ("perf probe: Fix to search nested inlined functions in CU") Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157190836514.1859.15996864849678136353.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf probe: Fix to find range-only function instanceMasami Hiramatsu1-1/+5
Fix die_is_func_instance() to find range-only function instance. In some case, a function instance can be made without any low PC or entry PC, but only with address ranges by optimization. (e.g. cold text partially in "text.unlikely" section) To find such function instance, we have to check the range attribute too. Fixes: e1ecbbc3fa83 ("perf probe: Fix to handle optimized not-inlined functions") Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/157190835669.1859.8368628035930950596.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf tools: Avoid a malloc() for array eventsIan Rogers1-5/+3
Use realloc() rather than malloc()+memcpy() to possibly avoid a memory allocation when appending array elements. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jin Yao <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf tools: Move ALLOC_LIST into a functionIan Rogers1-22/+43
Having a YYABORT in a macro makes it hard to free memory for components of a rule. Separate the logic out. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jin Yao <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf evsel: Avoid close(-1)Andi Kleen1-1/+2
In some weak fallback cases close can be called a lot with -1. Check for this case and avoid calling close then. This is mainly to shut up valgrind which complains about this case. Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf evsel: Always preserve errno while cleaning up perf_event_open failuresAndi Kleen1-2/+4
In some cases when perf_event_open fails, it may do some closes to clean up. In special cases these closes can fail too, which overwrites the errno of the perf_event_open, which is then incorrectly reported. Save/restore errno around closes. Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf cs-etm: Fix definition of macro TO_CS_QUEUE_NRLeo Yan1-2/+2
Macro TO_CS_QUEUE_NR definition has a typo, which uses 'trace_id_chan' as its parameter, this doesn't match with its definition body which uses 'trace_chan_id'. So renames the parameter to 'trace_chan_id'. It's luck to have a local variable 'trace_chan_id' in the function cs_etm__setup_queue(), even we wrongly define the macro TO_CS_QUEUE_NR, the local variable 'trace_chan_id' is used rather than the macro's parameter 'trace_id_chan'; so the compiler doesn't complain for this before. After renaming the parameter, it leads to a compiling error due cs_etm__setup_queue() has no variable 'trace_id_chan'. This patch uses the variable 'trace_chan_id' for the macro so that fixes the compiling error. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: coresight ml <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf llvm: Make .o saving a debug message, not an info oneArnaldo Carvalho de Melo1-3/+2
Its a bit annoying to have that message, better make it a debug one. I.e. now this message will only appear when using '-v': [root@quaco tracebuffer]# trace -e bristot.c LLVM: dumping bristot.o ^C[root@quaco tracebuffer]# Cc: Adrian Hunter <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Luis Cláudio Gonçalves <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf record: Put a copy of kcore into the perf.data directoryAdrian Hunter5-0/+57
Add a new 'perf record' option '--kcore' which will put a copy of /proc/kcore, kallsyms and modules into a perf.data directory. Note, that without the --kcore option, output goes to a file as previously. The tools' -o and -i options work with either a file name or directory name. Example: $ sudo perf record --kcore uname $ sudo tree perf.data perf.data ├── kcore_dir │   ├── kallsyms │   ├── kcore │   └── modules └── data $ sudo perf script -v build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541 Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field. Using CPUID GenuineIntel-6-8E-A Using perf.data/kcore_dir/kcore for kernel data Using perf.data/kcore_dir/kallsyms for symbols perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux) perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux) perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux) uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux) uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux) uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux) Committer testing: # rm -rf perf.data* # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] # ls -l perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data # perf record --kcore uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] ls[root@quaco ~]# ls -lad perf.data* drwx------. 3 root root 4096 Oct 21 11:08 perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # perf evlist -v -i perf.data/data cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf data: Support single perf.data file directoryAdrian Hunter2-1/+14
Support directory output that contains a regular perf.data file, named "data". By default the directory is named perf.data i.e. perf.data └── data Most of the infrastructure to support a directory is already there. This patch makes the changes needed to support the format above. Presently there is no 'perf record' option to output a directory. This is preparation for adding support for putting a copy of /proc/kcore in the directory. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf session: Fix indent in perf_session__new()"Jiri Olsa1-2/+2
Fix up indentation. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: http://lore.kernel.org/lkml/20191007112027.GD6919@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf data: Rename directory "header" file to "data"Adrian Hunter2-2/+2
In preparation to support a single file directory format, rename "header" to "data" because "header" is a mis-leading name when there is only 1 file. Note, in the multi-file case, the "header" file also contains data. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf data: Move perf_dir_version into data.hAdrian Hunter2-4/+4
perf_dir_version belongs to struct perf_data which is declared in data.h. To allow its use in inline perf_data functions, move perf_dir_version to data.h Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf data: Correctly identify directory data filesAdrian Hunter1-1/+1
In order to rename the "header" file to "data" without conflicting, correctly identify the non-header files as starting with "data." Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-05perf tools: Fix time sortingJiri Olsa1-1/+1
The final sort might get confused when the comparison is done over bigger numbers than int like for -s time. Check the following report for longer workloads: $ perf report -s time -F time,overhead --stdio Fix hist_entry__sort() to properly return int64_t and not possible cut int. Fixes: 043ca389a318 ("perf tools: Use hpp formats to sort final output") Signed-off-by: Jiri Olsa <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] # v3.16+ Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-05perf tools: Remove unused trace_find_next_event()Steven Rostedt (VMware)2-33/+0
trace_find_next_event() was buggy and pretty much a useless helper. As there are no more users, just remove it. Signed-off-by: Steven Rostedt (VMware) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-05perf scripting engines: Iterate on tep event arrays directlySteven Rostedt (VMware)2-4/+13
Instead of calling a useless (and broken) helper function to get the next event of a tep event array, just get the array directly and iterate over it. Note, the broken part was from trace_find_next_event() which after this will no longer be used, and can be removed. Committer notes: This fixes a segfault when generating python scripts from perf.data files with multiple tracepoint events, i.e. the following use case is fixed by this patch: # perf record -e sched:* sleep 1 [ perf record: Woken up 31 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ] # perf script -g python Segmentation fault (core dumped) # Reported-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-22Merge tag 'perf-core-for-mingo-5.5-20191021' of ↵Ingo Molnar15-130/+222
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf trace: - Add syscall failure stats to -s/--summary and -S/--with-summary, works in combination with specifying just a set of syscalls, see below first with -s/--summary, then with -S/--with-summary just for the syscalls we saw failing with -s: # perf trace -s sleep 1 Summary of events: sleep (16218), 80 events, 93.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ----------- ----- ------ -------- -------- -------- -------- ------ nanosleep 1 0 1000.091 1000.091 1000.091 1000.091 0.00% mmap 8 0 0.045 0.005 0.006 0.008 7.09% mprotect 4 0 0.028 0.005 0.007 0.009 11.38% openat 3 0 0.021 0.005 0.007 0.009 14.07% munmap 1 0 0.017 0.017 0.017 0.017 0.00% brk 4 0 0.010 0.001 0.002 0.004 23.15% read 4 0 0.009 0.002 0.002 0.003 8.13% close 5 0 0.008 0.001 0.002 0.002 10.83% fstat 3 0 0.006 0.002 0.002 0.002 6.97% access 1 1 0.006 0.006 0.006 0.006 0.00% lseek 3 0 0.005 0.001 0.002 0.002 7.37% arch_prctl 2 1 0.004 0.001 0.002 0.002 17.64% execve 1 0 0.000 0.000 0.000 0.000 0.00% # perf trace -e access,arch_prctl -S sleep 1 0.000 ( 0.006 ms): sleep/19503 arch_prctl(option: 0x3001, arg2: 0x7fff165996b0) = -1 EINVAL (Invalid argument) 0.024 ( 0.006 ms): sleep/19503 access(filename: 0x2177e510, mode: R) = -1 ENOENT (No such file or directory) 0.136 ( 0.002 ms): sleep/19503 arch_prctl(option: SET_FS, arg2: 0x7f9421737580) = 0 Summary of events: sleep (19503), 6 events, 50.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ---------- ----- ------ ------ ------ ------ ------ ------ arch_prctl 2 1 0.008 0.002 0.004 0.006 57.22% access 1 1 0.006 0.006 0.006 0.006 0.00% # - Introduce --errno-summary, to drill down a bit more in the errno stats: # perf trace --errno-summary -e access,arch_prctl -S sleep 1 0.000 ( 0.006 ms): sleep/5587 arch_prctl(option: 0x3001, arg2: 0x7ffd6ba6aa00) = -1 EINVAL (Invalid argument) 0.028 ( 0.007 ms): sleep/5587 access(filename: 0xb83d9510, mode: R) = -1 ENOENT (No such file or directory) 0.172 ( 0.003 ms): sleep/5587 arch_prctl(option: SET_FS, arg2: 0x7f45b8392580) = 0 Summary of events: sleep (5587), 6 events, 50.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) ---------- ----- ------ ------ ------ ------ ------ ------ arch_prctl 2 1 0.009 0.003 0.005 0.006 38.90% EINVAL: 1 access 1 1 0.007 0.007 0.007 0.007 0.00% ENOENT: 1 # - Filter own pid to avoid a feedback look in 'perf trace record -a' - Add the glue for the auto generated x86 IRQ vector array. - Show error message when not finding a field used in a filter expression # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="cnt>32767" Failed to set filter "(cnt>32767) && (common_pid != 19938 && common_pid != 8922)" on event syscalls:sys_enter_write with 22 (Invalid argument) # # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="count>32767" 0.000 python3.5/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dc53600, count: 172086) 12.641 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db63660, count: 75994) 27.738 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db4b1e0, count: 41635) 136.070 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dbab510, count: 62232) # - Add a generator for x86's IRQ vectors -> strings - Introduce stroul() (string -> number) methods for the strarray and strarrays classes, also strtoul_flags, allowing to go from both strings and or-ed strings to numbers, allowing things like: # perf trace -e syscalls:sys_enter_mmap --filter="flags==DENYWRITE|PRIVATE|FIXED" sleep 1 0.000 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2aa5000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) 0.011 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2bf2000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) 0.015 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2c3f000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) # Allowing to narrow down from the complete set of mmap calls for that workload: # perf trace -e syscalls:sys_enter_mmap sleep 1 0.000 sleep/22695 syscalls:sys_enter_mmap(len: 134773, prot: READ, flags: PRIVATE, fd: 3) 0.041 sleep/22695 syscalls:sys_enter_mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) 0.053 sleep/22695 syscalls:sys_enter_mmap(len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3) 0.069 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd23ffb6000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000) 0.077 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240103000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000) 0.083 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240150000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000) 0.095 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240156000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) 0.339 sleep/22695 syscalls:sys_enter_mmap(len: 217750512, prot: READ, flags: PRIVATE, fd: 3) # Works with all targets, so, for system wide, looking at who calls mmap with flags set to just "PRIVATE": # perf trace --max-events=5 -e syscalls:sys_enter_mmap --filter="flags==PRIVATE" 0.000 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.050 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.062 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14) 0.145 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) 0.183 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18) # # perf trace --max-events=2 -e syscalls:sys_enter_lseek --filter="whence==SET && offset != 0" 0.000 Cache2 I/O/12047 syscalls:sys_enter_lseek(fd: 277, offset: 43, whence: SET) 1142.070 mozStorage #5/12302 syscalls:sys_enter_lseek(fd: 44</home/acme/.mozilla/firefox/ina67tev.default/cookies.sqlite-wal>, offset: 393536, whence: SET) # perf annotate: - Fix objdump --no-show-raw-insn flag to work with goth gcc and clang. - Streamline objdump execution, preserving the right error codes for better reporting to user. perf report: - Add warning when libunwind not compiled in. perf stat: Jin Yao: - Support --all-kernel/--all-user, to match options available in 'perf record', asking that all the events specified work just with kernel or user events. perf list: Jin Yao: - Hide deprecated events by default, allow showing them with --deprecated. libbperf: Jiri Olsa: - Allow to build with -ltcmalloc. - Finish mmap interface, getting more stuff from tools/perf while adding abstractions to avoid pulling too much stuff, to get libperf to grow as tools needs things like auxtrace, etc. perf scripting engines: Steven Rostedt (VMware): - Iterate on tep event arrays directly, fixing script generation with '-g python' when having multiple tracepoints in a perf.data file. core: - Allow to build with -ltcmalloc. perf test: Leo Yan: - Report failure for mmap events. - Avoid infinite loop for task exit case. - Remove needless headers for bp_account test. - Add dedicated checking helper is_supported(). - Disable bp_signal testing for arm64. Vendor events: arm64: John Garry: - Fix Hisi hip08 DDRC PMU eventname. - Add some missing events for Hisi hip08 DDRC, L3C and HHA PMUs. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-10-22Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar5-8/+14
Signed-off-by: Ingo Molnar <[email protected]>
2019-10-19libperf: Move mask setup to perf_evlist__mmap_ops()Jiri Olsa1-1/+0
Move the mask setup to perf_evlist__mmap_ops(), because it's the same on both perf and libperf path. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-19libperf: Move mmap allocation to perf_evlist__mmap_ops::getJiri Olsa1-15/+9
Move allocation of the mmap array into perf_evlist__mmap_ops::get, to centralize the mmap allocation. Also move nr_mmap setup to perf_evlist__mmap_ops so it's centralized and shared by both perf and libperf mmap code. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-19libperf: Introduce perf_evlist__for_each_mmap()Jiri Olsa1-1/+3
Add the perf_evlist__for_each_mmap() function and export it in the perf/evlist.h header, so that the user can iterate through 'struct perf_mmap' objects. Add a internal perf_mmap__link() function to do the actual linking. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-19perf list: Hide deprecated events by defaultJin Yao4-8/+19
There are some deprecated events listed by perf list. But we can't remove them from perf list with ease because some old scripts may use them. Deprecated events are old names of renamed events. When an event gets renamed the old name is kept around for some time and marked with Deprecated. The newer Intel event lists in the tree already have these headers. So we need to keep them in the event list, but provide a new option to show them. The new option is "--deprecated". With this patch, the deprecated events are hidden by default but they can be displayed when option "--deprecated" is enabled. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-18perf tools: Remove unused trace_find_next_event()Steven Rostedt (VMware)2-33/+0
trace_find_next_event() was buggy and pretty much a useless helper. As there are no more users, just remove it. Signed-off-by: Steven Rostedt (VMware) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-18perf scripting engines: Iterate on tep event arrays directlySteven Rostedt (VMware)2-4/+13
Instead of calling a useless (and broken) helper function to get the next event of a tep event array, just get the array directly and iterate over it. Note, the broken part was from trace_find_next_event() which after this will no longer be used, and can be removed. Committer notes: This fixes a segfault when generating python scripts from perf.data files with multiple tracepoint events, i.e. the following use case is fixed by this patch: # perf record -e sched:* sleep 1 [ perf record: Woken up 31 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ] # perf script -g python Segmentation fault (core dumped) # Reported-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf string: Export asprintf__tp_filter_pids()Arnaldo Carvalho de Melo2-1/+5
Will be used directly in 'perf trace' for setting up the command line argv array to pass to cmd_record, as this was how 'perf trace record' was implemented, following the model used in 'perf kvm record', 'perf sched record', etc. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Luis Cláudio Gonçalves <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf tools: Fix mode setting in copyfile_mode_ns()Adrian Hunter1-3/+5
slow_copyfile() opens the file by name, so "write" permissions must not be removed in copyfile_mode_ns() before calling slow_copyfile(). Example: Before: $ sudo chmod +r /proc/kcore $ sudo setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_sys_rawio=ep" tools/perf/perf $ tools/perf/perf buildid-cache -k /proc/kcore Couldn't add /proc/kcore After: $ sudo chmod +r /proc/kcore $ sudo setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_sys_rawio=ep" tools/perf/perf $ tools/perf/perf buildid-cache -v -k /proc/kcore kcore added to build-id cache directory /home/ahunter/.debug/[kernel.kcore]/37e340b1b5a7cf4f57ba8de2bc777359588a957f/2019100709562289 Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf annotate: Fix multiple memory and file descriptor leaksGustavo A. R. Silva1-1/+1
Store SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF in variable *ret*, instead of returning in the middle of the function and leaking multiple resources: prog_linfo, btf, s and bfdf. Addresses-Coverity-ID: 1454832 ("Structurally dead code") Fixes: 11aad897f6d1 ("perf annotate: Don't return -1 for error when doing BPF disassembly") Signed-off-by: Gustavo A. R. Silva <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/20191014171047.GA30850@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf tools: Fix resource leak of closedir() on the error pathsYunfeng Ye2-3/+7
Both build_mem_topology() and rm_rf_depth_pat() have resource leaks of closedir() on the error paths. Fix this by calling closedir() before function returns. Fixes: e2091cedd51b ("perf tools: Add MEM_TOPOLOGY feature to perf data file") Fixes: cdb6b0235f17 ("perf tools: Add pattern name checking to rm_rf") Signed-off-by: Yunfeng Ye <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Feilong Lin <[email protected]> Cc: Hu Shiyuan <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf evlist: Fix fix for freed id arraysAndi Kleen1-1/+1
In the earlier fix for the memory overrun of id arrays I managed to typo the wrong event in the fix. Of course we need to close the current event in the loop, not the original failing event. The same test case as in the original patch still passes. Fixes: 7834fa948beb ("perf evlist: Fix access of freed id arrays") Signed-off-by: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf stat: Support --all-kernel/--all-userJin Yao2-0/+12
'perf record' has supported --all-kernel / --all-user to configure all used events to run in kernel space or run in user space. But 'perf stat' doesn't support these options. It would be useful to support these options in 'perf stat' too to keep the same semantics available in both tools. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-10-15perf annotate: Fix objdump --no-show-raw-insn flagIan Rogers1-2/+2
Remove redirection of objdump's stderr to /dev/null to help diagnose failures. Fix the '--no-show-raw' flag to be '--no-show-raw-insn' which binutils is permissive and allows, but fails with LLVM objdump. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>