aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2016-03-09perf tools: Omit unnecessary cast in perf_pmu__parse_scaleJiri Olsa1-2/+2
There's no need to use a const char pointer, we can used char pointer from the beginning and omit the unnecessary cast. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: David Ahern <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-09perf tools: Pass perf_hpp_list all the way through setup_sort_listJiri Olsa1-18/+26
Pass perf_hpp_list all the way through setup_sort_list so that the sort entry can be added on the arbitrary list. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-09perf tools: Fix perf script python database export crashChris Phlipot1-4/+2
Remove the union in evsel so that the database id and priv pointer can be used simultainously without conflicting and crashing. Detailed Description for the fixed bug follows: perf script crashes with a segmentation fault on user space tool version 4.5.rc7.ge2857b when using the python database export API. It works properly in 4.4 and prior versions. the crash fist appeared in: cfc8874a4859 ("perf script: Process cpu/threads maps") How to reproduce the bug: Remove any temporary files left over from a previous crash (if you have already attemped to reproduce the bug): $ rm -r test_db-perf-data $ dropdb test_db $ perf record timeout 1 yes >/dev/null $ perf script -s scripts/python/export-to-postgresql.py test_db Stack Trace: Program received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x1) at malloc.c:2929 2929 malloc.c: No such file or directory. (gdb) bt at util/stat.c:122 argv=<optimized out>, prefix=<optimized out>) at builtin-script.c:2231 argc=argc@entry=4, argv=argv@entry=0x7fffffffdf70) at perf.c:390 at perf.c:451 Signed-off-by: Chris Phlipot <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-09perf jitdump: DWARF is also neededArnaldo Carvalho de Melo1-0/+3
While building on a Docker container for ubuntu and installing package by package one ends up with: MKDIR /tmp/build/util/ CC /tmp/build/util/genelf.o util/genelf.c:22:19: fatal error: dwarf.h: No such file or directory #include <dwarf.h> ^ compilation terminated. mv: cannot stat '/tmp/build/util/.genelf.o.tmp': No such file or directory Because the jitdump code needs the DWARF related development packages to be installed. So make it dependent on that so that the build can succeed without jitdump support. Cc: Adrian Hunter <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-08perf hists: Fix indent for multiple hierarchy sort keyNamhyung Kim1-0/+1
When multiple sort keys are used in a single hierarchy, it should indent using number of hierarchy levels instead of number of sort keys. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf hists: Support multiple sort keys in a hierarchy levelNamhyung Kim1-10/+32
This implements having multiple sort keys in a single hierarchy level. Originally only single sort key is supported for each level, but now using the group syntax with '{ }', it can set more than one sort key in one level. Note that now it needs to quote in order to prevent shell interpretation. For example: $ perf report --hierarchy -s '{comm,dso},sym' ... # Overhead Command / Shared Object / Symbol # .............. .......................................... # 48.67% swapper [kernel.vmlinux] 34.42% [k] intel_idle 1.30% [k] __tick_nohz_idle_enter 1.03% [k] cpuidle_reflect 8.87% firefox libpthread-2.22.so 6.60% [.] __GI___libc_recvmsg 1.18% [.] pthread_cond_signal@@GLIBC_2.3.2 1.09% [.] 0x000000000000ff4b 6.11% Xorg libc-2.22.so 5.27% [.] __memcpy_sse2_unaligned In the above example, the command name and the shared object name are shown on the same line but the symbol name is on the different line. Since the first two are grouped by '{}', they are in the same level. Suggested-and-Tested=by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf hists: Use own hpp_list for hierarchy modeNamhyung Kim3-25/+37
Now each hists has its own hpp lists in hierarchy. So instead of having a pointer to a single perf_hpp_fmt in a hist entry, make it point the hpp_list for its level. This will be used to support multiple sort keys in a single hierarchy level. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf hists: Introduce perf_hpp__setup_hists_formats()Namhyung Kim3-0/+55
The perf_hpp__setup_hists_formats() is to build hists-specific output formats (and sort keys). Currently it's only used in order to build the output format in a hierarchy with same sort keys, but it could be used with different sort keys in non-hierarchy mode later. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf hists: Add level field to struct perf_hpp_fmtNamhyung Kim2-33/+42
The level field is to distinguish levels in the hierarchy mode. Currently each column (perf_hpp_fmt) has a different level. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf tools: Use 64-bit shifts with (TSC) time conversionAdrian Hunter1-1/+1
Commit b9511cd761fa ("perf/x86: Fix time_shift in perf_event_mmap_page") altered the time conversion algorithms documented in the perf_event.h header file, to use 64-bit shifts. That was done to make the code more future-proof (i.e. some time in the future a 32-bit shift could be allowed). Reflect those changes in perf tools. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf jit: Move clockid validationAdrian Hunter1-0/+23
Move clockid validation into jit_process() so it can later be made conditional. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf jit: Let jit_process() return errorsAdrian Hunter1-2/+4
In preparation for moving clockid validation into jit_process(). Previously a return value of zero meant the processing had been done and non-zero meant either the processing was not done (i.e. not the jitdump file mmap event) or an error occurred. Change it so that zero means the processing was not done, one means the processing was done and successful, and negative values are an error. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf session: Simplify tool stubsAdrian Hunter1-33/+7
Some of the stubs are identical so just have one function for them. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf tools: Explicitly declare inc_group_count as a void functionColin Ian King1-1/+1
The return type is not defined, so it defaults to int, however, the function is not returning anything, so this is clearly not correct. Make it a void function. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: He Kuang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03x86/asm/decoder: Use explicitly signed charsJosh Poimboeuf1-3/+3
When running objtool on a ppc64le host to analyze x86 binaries, it reports a lot of false warnings like: ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5 The warnings are caused by the x86 instruction decoder setting the wrong value for the jump instruction's immediate field because it assumes that "char == signed char", which isn't true for all architectures. When converting char to int, gcc sign-extends on x86 but doesn't sign-extend on ppc64le. According to the gcc man page, that's a feature, not a bug: > Each kind of machine has a default for what "char" should be. It is > either like "unsigned char" by default or like "signed char" by > default. > > Ideally, a portable program should always use "signed char" or > "unsigned char" when it depends on the signedness of an object. Conform to the "standards" by changing the "char" casts to "signed char". This results in no actual changes to the object code on x86. Note: the x86 decoder now lives in three different locations in the kernel tree, which are all kept in sync via makefile checks and warnings: in-kernel, perf, and objtool. This fixes all three locations. Eventually we should probably try to at least converge the two separate "tools" locations into a single shared location. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03perf stat: Check for frontend stalled for metricsAndi Kleen2-1/+9
Add an extra check for frontend stalled in the metrics. This avoids an extra column for the --metric-only case when the CPU does not support frontend stalled. v2: Add separate init function Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf test: Fix hists related entriesArnaldo Carvalho de Melo1-15/+22
That got broken by d3a72fd8187b ("perf report: Fix indentation of dynamic entries in hierarchy"), by using the evlist in setup_sorting() without checking if it is NULL, as done in some 'perf test' entries: $ find tools/ -name "*.c" | xargs grep 'setup_sorting(NULL);' tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); $ Fix it. Before: [root@jouet ~]# perf test <SNIP> 15: Test matching and linking multiple hists : FAILED! 16: Try 'import perf' in python, checking link problems : Ok 17: Test breakpoint overflow signal handler : Ok 18: Test breakpoint overflow sampling : Ok 19: Test number of exit event of a simple workload : Ok 20: Test software clock events have valid period values : Ok 21: Test object code reading : Ok 22: Test sample parsing : Ok 23: Test using a dummy software event to keep tracking : Ok 24: Test parsing with no sample_id_all bit set : Ok 25: Test filtering hist entries : FAILED! 26: Test mmap thread lookup : Ok 27: Test thread mg sharing : Ok 28: Test output sorting of hist entries : FAILED! 29: Test cumulation of child hist entries : FAILED! <SNIP> After the patch the above failed tests complete successfully. Acked-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Wang Nan <[email protected]> Fixes: d3a72fd8187b ("perf report: Fix indentation of dynamic entries in hierarchy") Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf script: Fix double free on command_lineColin Ian King1-2/+2
The 'command_line' variable is free'd twice if db_export__branch_types() fails. To avoid this, defer the free'ing of 'command_line' to after this call so that the error return path will just free 'command_line' once. Signed-off-by: Colin Ian King <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: He Kuang <[email protected]> Cc: Javi Merino <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf stat: Support metrics in --per-core/socket modeAndi Kleen1-0/+7
Enable metrics printing in --per-core / --per-socket mode. We need to save the shadow metrics in a unique place. Always use the first CPU in the aggregation. Then use the same CPU to retrieve the shadow value later. Example output: % perf stat --per-core -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 2966.020381 task-clock (msec) # 2.004 CPUs utilized (100.00%) S0-C0 2 49 context-switches # 0.017 K/sec (100.00%) S0-C0 2 4 cpu-migrations # 0.001 K/sec (100.00%) S0-C0 2 467 page-faults # 0.157 K/sec S0-C0 2 4,599,061,773 cycles # 1.551 GHz (100.00%) S0-C0 2 9,755,886,883 instructions # 2.12 insn per cycle (100.00%) S0-C0 2 1,906,272,125 branches # 642.704 M/sec (100.00%) S0-C0 2 81,180,867 branch-misses # 4.26% of all branches S0-C1 2 2965.995373 task-clock (msec) # 2.003 CPUs utilized (100.00%) S0-C1 2 62 context-switches # 0.021 K/sec (100.00%) S0-C1 2 8 cpu-migrations # 0.003 K/sec (100.00%) S0-C1 2 281 page-faults # 0.095 K/sec S0-C1 2 6,347,290 cycles # 0.002 GHz (100.00%) S0-C1 2 4,654,156 instructions # 0.73 insn per cycle (100.00%) S0-C1 2 947,121 branches # 0.319 M/sec (100.00%) S0-C1 2 37,322 branch-misses # 3.94% of all branches 1.480409747 seconds time elapsed v2: Rebase to older patches v3: Document shadow cpus. Fix aggr_get_id argument. Fix -A shadows (Jiri) Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf stat: Implement CSV metrics outputAndi Kleen1-1/+1
Now support CSV output for metrics. With the new output callbacks this is relatively straight forward by creating new callbacks. This allows to easily plot metrics from CSV files. The new line callback needs to know the number of fields to skip them correctly Example output before: % perf stat -x, true 0.200687,,task-clock,200687,100.00 0,,context-switches,200687,100.00 0,,cpu-migrations,200687,100.00 40,,page-faults,200687,100.00 730871,,cycles,203601,100.00 551056,,stalled-cycles-frontend,203601,100.00 <not supported>,,stalled-cycles-backend,0,100.00 385523,,instructions,203601,100.00 78028,,branches,203601,100.00 3946,,branch-misses,203601,100.00 After: % perf stat -x, true .502457,,task-clock,502457,100.00,0.485,CPUs utilized 0,,context-switches,502457,100.00,0.000,K/sec 0,,cpu-migrations,502457,100.00,0.000,K/sec 45,,page-faults,502457,100.00,0.090,M/sec 644692,,cycles,509102,100.00,1.283,GHz 423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle <not supported>,,stalled-cycles-backend,0,100.00,,,, 492701,,instructions,509102,100.00,0.76,insn per cycle ,,,,,0.86,stalled cycles per insn 97767,,branches,509102,100.00,194.578,M/sec 4788,,branch-misses,509102,100.00,4.90,of all branches or easier readable $ perf stat -x, -o x.csv true $ column -s, -t x.csv 0.490635 task-clock 490635 100.00 0.489 CPUs utilized 0 context-switches 490635 100.00 0.000 K/sec 0 cpu-migrations 490635 100.00 0.000 K/sec 45 page-faults 490635 100.00 0.092 M/sec 629080 cycles 497698 100.00 1.282 GHz 409498 stalled-cycles-frontend 497698 100.00 65.09 frontend cycles idle <not supported> stalled-cycles-backend 0 100.00 491424 instructions 497698 100.00 0.78 insn per cycle 0.83 stalled cycles per insn 97278 branches 497698 100.00 198.270 M/sec 4569 branch-misses 497698 100.00 4.70 of all branches Two new fields are added: metric value and metric name. v2: Split out function argument changes v3: Reenable metrics for real. v4: Fix wrong hunk from refactoring. v5: Remove extra "noise" printing (Jiri), but add it to the not counted case. Print empty metrics for not counted. v6: Avoid outputting metric on empty format. v7: Print metric at the end v8: Remove extra run, ena fields v9: Avoid extra new line for unsupported counters Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf data: Explicitly set byte order for integer typesWang Nan1-0/+6
After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences when setting a field path"), 'perf data convert' gets incorrect result if there's bpf output data. For example: # perf data convert --to-ctf ./out.ctf # babeltrace ./out.ctf [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] } [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] } The expected result of the first sample should be: raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] } however, 'perf data convert' output big endian value to resuling CTF file. The reason is a internal change (or a bug?) of babeltrace. Before this patch, at the first add_bpf_output_values(), byte order of all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)). It would be fixed by: perf_evlist__deliver_sample -> process_sample_event -> ctf_stream ... ->bt_ctf_trace_add_stream_class ->bt_ctf_field_type_structure_set_byte_order ->bt_ctf_field_type_integer_set_byte_order during creating the stream. However, the babeltrace commit mentioned above duplicates types in sequence to prevent potential conflict in following call stack and link the newly allocated type into the 'raw_data' sequence: perf_evlist__deliver_sample -> process_sample_event -> ctf_stream ... -> bt_ctf_trace_add_stream_class -> bt_ctf_stream_class_resolve_types ... -> bt_ctf_field_type_sequence_copy ->bt_ctf_field_type_integer_copy This happens before byte order setting, so only the newly allocated type is initialized, the byte order of original type perf choose to create the first raw_data is still uncertain. Byte order in CTF output is not related to byte order in perf.data. Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce behavior changing, set byte order according to compiling options. Signed-off-by: Wang Nan <[email protected]> Cc: Jeremie Galarneau <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jérémie Galarneau <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf data: Support converting data from bpf_perf_event_output()Wang Nan1-1/+111
bpf_perf_event_output() outputs data through sample->raw_data. This patch adds support to convert those data into CTF. A python script then can be used to process output data from BPF programs. Test result: # cat ./test_bpf_output_2.c /************************ BEGIN **************************/ #include <uapi/linux/bpf.h> struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; #define SEC(NAME) __attribute__((section(NAME), used)) static u64 (*ktime_get_ns)(void) = (void *)BPF_FUNC_ktime_get_ns; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; static int (*get_smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id; static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; struct bpf_map_def SEC("maps") channel = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; static inline int __attribute__((always_inline)) func(void *ctx, int type) { struct { u64 ktime; int type; } __attribute__((packed)) output_data; char error_data[] = "Error: failed to output\n"; int err; output_data.type = type; output_data.ktime = ktime_get_ns(); err = perf_event_output(ctx, &channel, get_smp_processor_id(), &output_data, sizeof(output_data)); if (err) trace_printk(error_data, sizeof(error_data)); return 0; } SEC("func_begin=sys_nanosleep") int func_begin(void *ctx) {return func(ctx, 1);} SEC("func_end=sys_nanosleep%return") int func_end(void *ctx) { return func(ctx, 2);} char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; /************************* END ***************************/ # ./perf record -e bpf-output/no-inherit,name=evt/ \ -e ./test_bpf_output_2.c/map:channel.event=evt/ \ usleep 100000 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ] # ./perf script usleep 14942 92503.198504: evt: ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0.... usleep 14942 92503.298562: evt: ffffffff810585e9 kretprobe_trampoline_holder (/lib.... # ./perf data convert --to-ctf ./out.ctf [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ] [ perf data convert: Converted and wrote 0.000 MB (2 samples) ] # babeltrace ./out.ctf [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] } [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] } # cat ./test_bpf_output_2.py from babeltrace import TraceCollection tc = TraceCollection() tc.add_trace('./out.ctf', 'ctf') d = {1:[], 2:[]} for event in tc.events: if not event.name.startswith('evt'): continue raw_data = event['raw_data'] (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2]) d[type].append(time) print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1]))))); # python3 ./test_bpf_output_2.py [100056879] Committer note: Make sure you have python3-devel installed, not python-devel, which may be for python2, which will lead to some "PyInstance_Type" errors. Also make sure that you use the right libbabeltrace, because it is shipped in Fedora, for instance, but an older version. To build libbabeltrace's python binding one also needs to use: ./configure --enable-python-bindings And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/. Signed-off-by: Wang Nan <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf tools: Fix locale handling in pmu parsingJiri Olsa1-0/+13
Ingo reported regression on display format of big numbers, which is missing separators (in default perf stat output). triton:~/tip> perf stat -a sleep 1 ... 127008602 cycles # 0.011 GHz 279538533 stalled-cycles-frontend # 220.09% frontend cycles idle 119213269 instructions # 0.94 insn per cycle This is caused by recent change: perf stat: Check existence of frontend/backed stalled cycles that added call to pmu_have_event, that subsequently calls perf_pmu__parse_scale, which has a bug in locale handling. The lc string returned from setlocale, that we use to store old locale value, may be allocated in static storage. Getting a dynamic copy to make it survive another setlocale call. $ perf stat ls ... 2,360,602 cycles # 3.080 GHz 2,703,090 instructions # 1.15 insn per cycle 546,031 branches # 712.511 M/sec Committer note: Since the patch introducing the regression didn't made to perf/core, move it to just before where the regression was introduced, so that we don't break bisection for this feature. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-29perf tools: Fix python extension buildJiri Olsa1-0/+4
The util/python-ext-sources file contains source files required to build the python extension relative to $(srctree)/tools/perf, Such a file path $(FILE).c is handed over to the python extension build system, which builds the final object in the $(PYTHON_EXTBUILD)/tmp/$(FILE).o path. After the build is done all files from $(PYTHON_EXTBUILD)lib/ are carried as the result binaries. Above system fails when we add source file relative to ../lib, which we do for: ../lib/bitmap.c ../lib/find_bit.c ../lib/hweight.c ../lib/rbtree.c All above objects will be built like: $(PYTHON_EXTBUILD)/tmp/../lib/bitmap.c $(PYTHON_EXTBUILD)/tmp/../lib/find_bit.c $(PYTHON_EXTBUILD)/tmp/../lib/hweight.c $(PYTHON_EXTBUILD)/tmp/../lib/rbtree.c which accidentally happens to be final library path: $(PYTHON_EXTBUILD)/lib/ Changing setup.py to pass full paths of source files to Extension build class and thus keep all built objects under $(PYTHON_EXTBUILD)tmp directory. Reported-by: Jeff Bastian <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Josh Boyer <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] # v4.2+ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf tools: Only set filter for tracepoints eventsWang Nan1-0/+3
perf_evlist__set_filter() tries to set filter to every evsel linked in the evlist. However, since filters can only be applied to tracepoints, checking type of evsel before calling perf_evsel__set_filter() would be better. Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf config: Bring perf_default_config to the very beginning at main()Wang Nan3-5/+7
Before this patch each subcommand calls perf_config() by themself, reading the default configuration together with subcommand specific options. If a subcommand doesn't have it own options, it needs to call 'perf_config(perf_default_config, NULL)' to ensure .perfconfig is loaded. This patch brings perf_config(perf_default_config, NULL) to the very start of main(), so subcommands don't need to do it. After this patch, 'llvm.clang-path' works for 'perf trace'. Signed-off-by: Wang Nan <[email protected]> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Update column width of dynamic entriesNamhyung Kim2-3/+16
The column width of dynamic entries is updated when comparing hist entries. However some unique entries can miss the chance to update. So move the update to output resort stage to make sure every entry will get called before display. To do that, abuse ->sort callback to update the width when the third argument is NULL. When resorting entries in normal path, it never be NULL so it should be fine IMHO. Before: # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP <-- here After: # Overhead ptr / bytes_req / gfp_flags # .............. ..................................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP_NOMEMALLOC Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists: Fix dynamic entry display in hierarchyNamhyung Kim2-1/+4
When dynamic sort key is used it might not show pretty printed output. This is because the trace output was not set only for the first dynamic sort key. During hierarchy_insert_entry() it missed to pass the trace_output to dynamic entries. Also even if it did, only first entry will have it. Subsequent entries might set it during collapsing stage but it's not guaranteed. Before: $ perf report --hierarchy --stdio -s ptr,bytes_req,gfp_flags -g none # # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% 66080 10.42% 0xffff8803f766be00 8.33% 96 8.33% 66080 2.08% 512 2.08% 67280 After: # # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Fix indentation of dynamic entries in hierarchyNamhyung Kim2-0/+20
When dynamic entries are used in the hierarchy mode with multiple events, the output might not be aligned properly. In the hierarchy mode, the each sort column is indented using total number of sort keys. So it keeps track of number of sort keys when adding them. However a dynamic sort key can be added more than once when multiple events have same field names. This results in unnecessarily long indentation in the output. For example perf kmem records following events: $ perf evlist --trace-fields -i perf.data.kmem kmem:kmalloc: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags kmem:kmalloc_node: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags,node kmem:kfree: trace_fields: call_site,ptr kmem:kmem_cache_alloc: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags kmem:kmem_cache_alloc_node: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags,node kmem:kmem_cache_free: trace_fields: call_site,ptr kmem:mm_page_alloc: trace_fields: page,order,gfp_flags,migratetype kmem:mm_page_free: trace_fields: page,order As you can see, many field names shared between kmem events. So adding 'ptr' dynamic sort key alone will set nr_sort_keys to 6. And this adds many unnecessary spaces between columns. Before: $ perf report -i perf.data.kmem --hierarchy -s ptr -g none --stdio ... # Overhead ptr # ....................... ................................... # 99.89% 0xffff8803ffb79720 0.06% 0xffff8803d228a000 0.03% 0xffff8803f7678f00 0.00% 0xffff880401dc5280 0.00% 0xffff880406172380 0.00% 0xffff8803ffac3a00 0.00% 0xffff8803ffac1600 After: # Overhead ptr # ........ .................... # 99.89% 0xffff8803ffb79720 0.06% 0xffff8803d228a000 0.03% 0xffff8803f7678f00 0.00% 0xffff880401dc5280 0.00% 0xffff880406172380 0.00% 0xffff8803ffac3a00 0.00% 0xffff8803ffac1600 Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists: Fix comparing of dynamic entriesNamhyung Kim1-0/+8
When hist_entry__cmp() and hist_entry__collapse() are called, they should check if the dynamic entry is comparing matching hists only. Otherwise it might access different hists resulting in incorrect output. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists browser: Show message for percent limitNamhyung Kim2-0/+3
Like the stdio, it should show messages about omitted hierarchy entries. Please refer the previous commit for more details. As it needs to check an entry is omitted or not multiple times, add the has_no_entry field in the hist entry. Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists: Add more helper functions for the hierarchy modeNamhyung Kim2-0/+28
The hists__overhead_width() is to calculate width occupied by the overhead (and others) columns before the sort columns. The hist_entry__has_hiearchy_children() is to check whether an entry has lower entries (children) in the hierarchy to be shown in the output. This means the children should not be filtered out and above the percent limit. These two functions will be used to show information when all children of an entry is omitted by the percent limit (or filter). Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-25perf script: Exception handling when the print fmt is emptyTaeung Song2-0/+6
After collecting samples for events 'syscalls:', perf-script with python script doesn't occasionally work generating a segmentation fault. The reason is that the print fmt is empty and a value of event->print_fmt.args is NULL, so dereferencing the null pointer results in a segmentation fault i.e.: # perf record -e syscalls:* # perf script -g python # perf script -s perf-script.py in trace_begin syscalls__sys_enter_brk 3 79841.832099154 3777 test.sh syscall_nr=12, brk=0 ... (omitted) ... Segmentation fault (core dumped) For example, a format of sys_enter_getuid() hasn't print fmt as below. # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_getuid/format name: sys_enter_getuid ID: 188 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:int syscall_nr; offset:8; size:4; signed:1; print fmt: "" So add exception handling to avoid this problem. Signed-off-by: Taeung Song <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-25perf tools: Fix parsing of pmu events with empty list of modifiersArnaldo Carvalho de Melo1-3/+3
In 1d55e8ef340d ("perf tools: Introduce opt_event_config nonterminal") I removed the unconditional "'/' '/'" for pmu events such as "intel_pt//" but forgot to use opt_event_config where it expected some event_config, oops. Fix it. Noticed when trying to use: # perf record -e intel_pt// -a sleep 1 event syntax error: 'intel_pt//' \___ parser error Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events # Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Fixes: 1d55e8ef340d ("perf tools: Introduce opt_event_config nonterminal") Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Support decaying in hierarchy modeNamhyung Kim1-8/+34
In the hierarchy mode, hist entries should decay their children too. Also update hists__delete_entry() to be able to free child entries. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf ui/stdio: Align column header for hierarchy outputNamhyung Kim2-0/+11
The hierarchy output mode is to group entries so the existing columns won't fit to the new output. Treat all sort keys as a single column and separate headers by "/". # Overhead Command / Shared Object # ........... ................................ # 15.11% swapper 14.97% [kernel.vmlinux] 0.09% [libahci] 0.05% [iwlwifi] ... Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf ui/stdio: Implement hierarchy output modeNamhyung Kim1-0/+2
The hierarchy output mode is to group entries for each level so that user can see higher level picture more easily. It also helps to find out which component is most costly. The output will look like below: 15.11% swapper 14.97% [kernel.vmlinux] 0.09% [libahci] 0.05% [iwlwifi] 10.29% irq/33-iwlwifi 6.45% [kernel.vmlinux] 1.41% [mac80211] 1.15% [iwldvm] 1.14% [iwlwifi] 0.14% [cfg80211] 4.81% firefox 3.92% libxul.so 0.34% [kernel.vmlinux] Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Count number of sort keysNamhyung Kim1-0/+1
It'll be used for hierarchy output mode to indent entries properly. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Resort after filtering hierarchyNamhyung Kim1-0/+54
In hierarchy mode, a filter can affect periods of entries in upper hierarchy. So it needs to resort the hists after filter. For example, let's look at following example: Overhead Command / Shared Object / Symbol ------------ -------------------------------- 30.00% perf 20.00% perf 10.00% main 5.00% pr_debug 5.00% memcpy 10.00% [kernel.vmlinux] 8.00% memset 2.00% cpu_idle If we apply simbol filter for 'mem' it should look like this 13.00% perf 8.00% [kernel.vmlinux] 8.00% memset 5.00% perf 5.00% memcpy Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Support filtering in hierarchy modeNamhyung Kim1-8/+93
The hists__filter_hierarchy() function implements filtering in hierarchy mode. Now we have hist_entry__filter() so use it for entries in the hierarchy. It returns 3 kind of values. A negative value means that it's not filtered by this type. It marks current entry as filtered tentatively so if a lower level entry removes the filter it also removes the all parent so that we can find the entry in the output. Zero means it's filtered out by this type. A positive value means it's not filtered so it removes the filter and shows in the output. In these cases, it moves to next entry since lower level entry won't match by this type of filter anymore. Thus all children will be filtered or not together. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Introduce hist_entry__filter()Namhyung Kim3-0/+116
The hist_entry__filter() function is to filter hist entries using sort key related info. This is needed to support hierarchy mode since each hist entry will be associated with a hpp fmt which has a sort key. So each entry should compare to only matching type of filters. To do that, add the ->se_filter callback field to struct sort_entry. This callback takes 'type' argument which determines whether it's matching sort key or not. It returns -1 for non-matching type, 0 for filtered entry and 1 for not filtered entries. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ 'socket' is reserved in sys/socket.h, so replace it with 'sk' ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Add helper functions for hierarchy modeNamhyung Kim2-0/+72
The rb_hierarchy_{next,prev,last} functions are to traverse all hist entries in a hierarchy. They will be used by various function which supports hierarchy output. As the rb_hierarchy_next() is used to traverse the whole hierarchy, it sometime needs to visit entries regardless of current folding state. So add enum hierarchy_move_dir and pass it to __rb_hierarchy_next() for those cases. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Resort hist entries with hierarchyNamhyung Kim1-3/+91
For hierarchical output, each entry must be sorted in their rbtree (hroot) properly. Add hists__hierarchy_output_resort() to do the job. Note that those hierarchy entries share the period counts, it'd be important to update the hists->stats only once (for leaves). Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf hists: Basic support of hierarchical report viewNamhyung Kim3-2/+128
In the hierarchical view, entries will be grouped and sorted on the first key, and then on the second key, and so on. Add the he->hroot_{in,out} fields to keep the lower level entries. Actually this can share space, in a union, with callchain's 'sorted_root' since the hroots are only used by non-leaf entries and callchain is only used by leaf entries. It also adds the 'parent_he' and 'depth' fields which can be used by browsers. This patch only implements collapsing part which creates internal entries for each sort key. These need to be sorted by output_sort stage and to be displayed properly in the later patch(es). Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf tools: Add helper functions for some sort keysNamhyung Kim2-0/+36
The 'trace', 'srcline' and 'srcfile' sort keys updates hist entry's field later. With the hierarchy mode, those fields are passed to a matching entry so it needs to identify the sort keys. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf tools: Make binary data printer code in trace_event public availableWang Nan3-27/+105
Move code printing binray data from trace_event() to utils.c and allows passing different printer. Further commits will use this logic to print bpf output event. Signed-off-by: Wang Nan <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf script: Display data_src valuesJiri Olsa2-0/+17
Adding support to display data_src values, for events with data_src data in sample. Example: $ perf script ... rcuos/3 32 [002] ... 68501042 Local RAM hit|SNP None or Hit|TLB L1 or L2 hit|LCK No ... rcuos/3 32 [002] ... 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No ... swapper 0 [002] ... 68100242 LFB hit|SNP None|TLB L1 or L2 hit|LCK No ... swapper 0 [000] ... 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No ... swapper 0 [000] ... 50100142 L1 hit|SNP None|TLB L2 miss|LCK No ... rcuos/3 32 [002] ... 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No ... plugin-containe 16538 [000] ... 6a100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes ... gkrellm 1736 [000] ... 68100242 LFB hit|SNP None|TLB L1 or L2 hit|LCK No ... gkrellm 1736 [000] ... 6a100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes ... ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ data_src value data_src translation Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf tools: Change perf_mem__lck_scnprintf to return nb of displayed bytesJiri Olsa2-6/+8
Moving strncat call into scnprintf to easily track number of displayed bytes. It will be used in following patch. Signed-off-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf tools: Change perf_mem__snp_scnprintf to return nb of displayed bytesJiri Olsa2-5/+6
Moving strncat/strcpy calls into scnprintf to easily track number of displayed bytes. It will be used in following patch. Signed-off-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-24perf tools: Change perf_mem__lvl_scnprintf to return nb of displayed bytesJiri Olsa2-7/+8
Moving strncat/strcpy calls into scnprintf to easily track number of displayed bytes. It will be used in following patch. Signed-off-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>