aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2016-03-05nfit, tools/testing/nvdimm: test multiple control regions per-dimmDan Williams1-24/+94
ACPI 6.1 clarifies that "The system shall include an NVDIMM Control Region Structure for every Function Interface in the NVDIMM." Implement this clarification in nfit_test. Signed-off-by: Dan Williams <[email protected]>
2016-03-05nfit, tools/testing/nvdimm: add format interface code definitionsDan Williams1-1/+6
ACPI 6.1 and JEDEC Annex L Release 3 formalize the format interface code. Add definitions and update their usage in the unit test. Signed-off-by: Dan Williams <[email protected]>
2016-03-03objtool: Support CROSS_COMPILEJosh Poimboeuf2-9/+14
When building with CONFIG_STACK_VALIDATION on a ppc64le host with an x86 cross-compiler, Stephen Rothwell saw the following objtool build errors: DESCEND objtool CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/builtin-check.o CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/special.o CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/elf.o CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool.o MKDIR /home/sfr/next/x86_64_allmodconfig/tools/objtool/arch/x86/insn/ CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/libstring.o elf.c:22:23: fatal error: sys/types.h: No such file or directory compilation terminated. CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/exec-cmd.o CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/help.o builtin-check.c:28:20: fatal error: string.h: No such file or directory compilation terminated. objtool.c:28:19: fatal error: stdio.h: No such file or directory compilation terminated. It fails to build because it tries to compile objtool with the cross-compiler instead of the host compiler. Ensure that it always uses the host compiler by ignoring CROSS_COMPILE. In order to do that properly, the libsubcmd.a library needs to be built in tools/objtool/ rather than tools/lib/subcmd/. The latter directory contains the cross-compiled version which is needed for perf and possibly other tools. Note that cross-compiling for x86 on a _big_ endian system would result in a bunch of false positive objtool warnings during the kernel build because it isn't endian-aware. But that's generally a rare edge case and there haven't been any reports of anybody needing that. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/55b63eefc347f1bb28573f972d8d1adbf1f1c31d.1456962210.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03x86/asm/decoder: Use explicitly signed charsJosh Poimboeuf2-6/+6
When running objtool on a ppc64le host to analyze x86 binaries, it reports a lot of false warnings like: ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5 The warnings are caused by the x86 instruction decoder setting the wrong value for the jump instruction's immediate field because it assumes that "char == signed char", which isn't true for all architectures. When converting char to int, gcc sign-extends on x86 but doesn't sign-extend on ppc64le. According to the gcc man page, that's a feature, not a bug: > Each kind of machine has a default for what "char" should be. It is > either like "unsigned char" by default or like "signed char" by > default. > > Ideally, a portable program should always use "signed char" or > "unsigned char" when it depends on the signedness of an object. Conform to the "standards" by changing the "char" casts to "signed char". This results in no actual changes to the object code on x86. Note: the x86 decoder now lives in three different locations in the kernel tree, which are all kept in sync via makefile checks and warnings: in-kernel, perf, and objtool. This fixes all three locations. Eventually we should probably try to at least converge the two separate "tools" locations into a single shared location. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-03perf stat: Check for frontend stalled for metricsAndi Kleen3-1/+10
Add an extra check for frontend stalled in the metrics. This avoids an extra column for the --metric-only case when the CPU does not support frontend stalled. v2: Add separate init function Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03tools/power turbostat: fix various build warningsColin Ian King1-4/+4
When building with gcc 6 we're getting various build warnings that just require some trivial function declaration and call fixes: turbostat.c: In function ‘dump_cstate_pstate_config_info’: turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’ dump_cstate_pstate_config_info(family, model) turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’ turbostat.c: In function ‘get_tdp’: turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’ double get_tdp(model) turbostat.c: In function ‘perf_limit_reasons_probe’: turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’ void perf_limit_reasons_probe(family, model) turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’ Signed-off-by: Colin Ian King <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf tests: Initialize sa.sa_flagsColin Ian King1-0/+1
The sa_flags field is not being initialized, so a garbage value is being passed to sigaction. Initialize it to zero. Signed-off-by: Colin Ian King <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf test: Fix hists related entriesArnaldo Carvalho de Melo1-15/+22
That got broken by d3a72fd8187b ("perf report: Fix indentation of dynamic entries in hierarchy"), by using the evlist in setup_sorting() without checking if it is NULL, as done in some 'perf test' entries: $ find tools/ -name "*.c" | xargs grep 'setup_sorting(NULL);' tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_output.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); tools/perf/tests/hists_cumulate.c: setup_sorting(NULL); $ Fix it. Before: [root@jouet ~]# perf test <SNIP> 15: Test matching and linking multiple hists : FAILED! 16: Try 'import perf' in python, checking link problems : Ok 17: Test breakpoint overflow signal handler : Ok 18: Test breakpoint overflow sampling : Ok 19: Test number of exit event of a simple workload : Ok 20: Test software clock events have valid period values : Ok 21: Test object code reading : Ok 22: Test sample parsing : Ok 23: Test using a dummy software event to keep tracking : Ok 24: Test parsing with no sample_id_all bit set : Ok 25: Test filtering hist entries : FAILED! 26: Test mmap thread lookup : Ok 27: Test thread mg sharing : Ok 28: Test output sorting of hist entries : FAILED! 29: Test cumulation of child hist entries : FAILED! <SNIP> After the patch the above failed tests complete successfully. Acked-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Wang Nan <[email protected]> Fixes: d3a72fd8187b ("perf report: Fix indentation of dynamic entries in hierarchy") Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit ↵Steven Rostedt (Red Hat)1-1/+1
machines When a long value is read on 32 bit machines for 64 bit output, the parsing needs to change "%lu" into "%llu", as the value is read natively. Unfortunately, if "%llu" is already there, the code will add another "l" to it and fail to parse it properly. Signed-off-by: Steven Rostedt <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03tools lib traceevent: Set int_array fields to NULL if freeing from errorSteven Rostedt (Red Hat)1-0/+3
Had a bug where on error of parsing __print_array() where the fields are freed after they were allocated, but since they were not set to NULL, the freeing of the arg also tried to free the already freed fields causing a double free. Fix process_hex() while at it. Signed-off-by: Steven Rostedt <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03tools lib traceevent: Fix time stamp rounding issueChaos.Chen1-0/+5
When rounding to microseconds, if the timestamp subsecond is between .999999500 and .999999999, it is rounded to .1000000, when it should instead increment the second counter due to the overflow. For example, if the timestamp is 1234.999999501 instead of seeing: 1235.000000 we see: 1234.1000000 Signed-off-by: Chaos.Chen <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ fixed incrementing "secs" instead of decrementing it ] Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf script: Fix double free on command_lineColin Ian King1-2/+2
The 'command_line' variable is free'd twice if db_export__branch_types() fails. To avoid this, defer the free'ing of 'command_line' to after this call so that the error return path will just free 'command_line' once. Signed-off-by: Colin Ian King <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: He Kuang <[email protected]> Cc: Javi Merino <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03tools build: Use .s extension for preprocessed assembler codeMasahiro Yamada1-1/+1
The "man gcc" says .i extension represents the file is C source code that should not be preprocessed. Here, .s should be used. For clarification, .c ---(preprocess)---> .i .S ---(preprocess)---> .s Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Aaro Koskinen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Lukas Wunner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf stat: Support metrics in --per-core/socket modeAndi Kleen2-8/+63
Enable metrics printing in --per-core / --per-socket mode. We need to save the shadow metrics in a unique place. Always use the first CPU in the aggregation. Then use the same CPU to retrieve the shadow value later. Example output: % perf stat --per-core -a ./BC1s Performance counter stats for 'system wide': S0-C0 2 2966.020381 task-clock (msec) # 2.004 CPUs utilized (100.00%) S0-C0 2 49 context-switches # 0.017 K/sec (100.00%) S0-C0 2 4 cpu-migrations # 0.001 K/sec (100.00%) S0-C0 2 467 page-faults # 0.157 K/sec S0-C0 2 4,599,061,773 cycles # 1.551 GHz (100.00%) S0-C0 2 9,755,886,883 instructions # 2.12 insn per cycle (100.00%) S0-C0 2 1,906,272,125 branches # 642.704 M/sec (100.00%) S0-C0 2 81,180,867 branch-misses # 4.26% of all branches S0-C1 2 2965.995373 task-clock (msec) # 2.003 CPUs utilized (100.00%) S0-C1 2 62 context-switches # 0.021 K/sec (100.00%) S0-C1 2 8 cpu-migrations # 0.003 K/sec (100.00%) S0-C1 2 281 page-faults # 0.095 K/sec S0-C1 2 6,347,290 cycles # 0.002 GHz (100.00%) S0-C1 2 4,654,156 instructions # 0.73 insn per cycle (100.00%) S0-C1 2 947,121 branches # 0.319 M/sec (100.00%) S0-C1 2 37,322 branch-misses # 3.94% of all branches 1.480409747 seconds time elapsed v2: Rebase to older patches v3: Document shadow cpus. Fix aggr_get_id argument. Fix -A shadows (Jiri) Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf stat: Implement CSV metrics outputAndi Kleen2-5/+70
Now support CSV output for metrics. With the new output callbacks this is relatively straight forward by creating new callbacks. This allows to easily plot metrics from CSV files. The new line callback needs to know the number of fields to skip them correctly Example output before: % perf stat -x, true 0.200687,,task-clock,200687,100.00 0,,context-switches,200687,100.00 0,,cpu-migrations,200687,100.00 40,,page-faults,200687,100.00 730871,,cycles,203601,100.00 551056,,stalled-cycles-frontend,203601,100.00 <not supported>,,stalled-cycles-backend,0,100.00 385523,,instructions,203601,100.00 78028,,branches,203601,100.00 3946,,branch-misses,203601,100.00 After: % perf stat -x, true .502457,,task-clock,502457,100.00,0.485,CPUs utilized 0,,context-switches,502457,100.00,0.000,K/sec 0,,cpu-migrations,502457,100.00,0.000,K/sec 45,,page-faults,502457,100.00,0.090,M/sec 644692,,cycles,509102,100.00,1.283,GHz 423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle <not supported>,,stalled-cycles-backend,0,100.00,,,, 492701,,instructions,509102,100.00,0.76,insn per cycle ,,,,,0.86,stalled cycles per insn 97767,,branches,509102,100.00,194.578,M/sec 4788,,branch-misses,509102,100.00,4.90,of all branches or easier readable $ perf stat -x, -o x.csv true $ column -s, -t x.csv 0.490635 task-clock 490635 100.00 0.489 CPUs utilized 0 context-switches 490635 100.00 0.000 K/sec 0 cpu-migrations 490635 100.00 0.000 K/sec 45 page-faults 490635 100.00 0.092 M/sec 629080 cycles 497698 100.00 1.282 GHz 409498 stalled-cycles-frontend 497698 100.00 65.09 frontend cycles idle <not supported> stalled-cycles-backend 0 100.00 491424 instructions 497698 100.00 0.78 insn per cycle 0.83 stalled cycles per insn 97278 branches 497698 100.00 198.270 M/sec 4569 branch-misses 497698 100.00 4.70 of all branches Two new fields are added: metric value and metric name. v2: Split out function argument changes v3: Reenable metrics for real. v4: Fix wrong hunk from refactoring. v5: Remove extra "noise" printing (Jiri), but add it to the not counted case. Print empty metrics for not counted. v6: Avoid outputting metric on empty format. v7: Print metric at the end v8: Remove extra run, ena fields v9: Avoid extra new line for unsupported counters Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf record: Ensure return non-zero rc when mmap failWang Nan1-1/+4
perf_evlist__mmap_ex() can fail without setting errno (for example, fail in condition checking. In this case all syscall is success). If this happen, record__open() incorrectly returns 0. Force setting rc is a quick way to avoid this problem, or we have to follow all possible code path in perf_evlist__mmap_ex() to make sure there's at least one system call before returning an error. Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: He Kuang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: He Kuang <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf record: Introduce record__finish_output() to finish a perf.dataWang Nan1-12/+25
Move code for finalizing 'perf.data' to record__finish_output(). It will be used by following commits to split output to multiple files. Signed-off-by: He Kuang <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: He Kuang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wang Nan <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf record: Extract synthesize code to record__synthesize()Wang Nan1-55/+70
Create record__synthesize(). It can be used to create tracking events for each perf.data after perf supporting splitting into multiple outputs. Signed-off-by: He Kuang <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: He Kuang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wang Nan <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf record: Use WARN_ONCE to replace 'if' conditionWang Nan1-8/+7
Commits in a BPF patchkit will extract kernel and module synthesizing code into a separated function and call it multiple times. This patch replace 'if (err < 0)' using WARN_ONCE, makes sure the error message show one time. Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: He Kuang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf data: Explicitly set byte order for integer typesWang Nan1-0/+6
After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences when setting a field path"), 'perf data convert' gets incorrect result if there's bpf output data. For example: # perf data convert --to-ctf ./out.ctf # babeltrace ./out.ctf [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] } [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] } The expected result of the first sample should be: raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] } however, 'perf data convert' output big endian value to resuling CTF file. The reason is a internal change (or a bug?) of babeltrace. Before this patch, at the first add_bpf_output_values(), byte order of all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)). It would be fixed by: perf_evlist__deliver_sample -> process_sample_event -> ctf_stream ... ->bt_ctf_trace_add_stream_class ->bt_ctf_field_type_structure_set_byte_order ->bt_ctf_field_type_integer_set_byte_order during creating the stream. However, the babeltrace commit mentioned above duplicates types in sequence to prevent potential conflict in following call stack and link the newly allocated type into the 'raw_data' sequence: perf_evlist__deliver_sample -> process_sample_event -> ctf_stream ... -> bt_ctf_trace_add_stream_class -> bt_ctf_stream_class_resolve_types ... -> bt_ctf_field_type_sequence_copy ->bt_ctf_field_type_integer_copy This happens before byte order setting, so only the newly allocated type is initialized, the byte order of original type perf choose to create the first raw_data is still uncertain. Byte order in CTF output is not related to byte order in perf.data. Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce behavior changing, set byte order according to compiling options. Signed-off-by: Wang Nan <[email protected]> Cc: Jeremie Galarneau <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jérémie Galarneau <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf data: Support converting data from bpf_perf_event_output()Wang Nan1-1/+111
bpf_perf_event_output() outputs data through sample->raw_data. This patch adds support to convert those data into CTF. A python script then can be used to process output data from BPF programs. Test result: # cat ./test_bpf_output_2.c /************************ BEGIN **************************/ #include <uapi/linux/bpf.h> struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; #define SEC(NAME) __attribute__((section(NAME), used)) static u64 (*ktime_get_ns)(void) = (void *)BPF_FUNC_ktime_get_ns; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; static int (*get_smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id; static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; struct bpf_map_def SEC("maps") channel = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; static inline int __attribute__((always_inline)) func(void *ctx, int type) { struct { u64 ktime; int type; } __attribute__((packed)) output_data; char error_data[] = "Error: failed to output\n"; int err; output_data.type = type; output_data.ktime = ktime_get_ns(); err = perf_event_output(ctx, &channel, get_smp_processor_id(), &output_data, sizeof(output_data)); if (err) trace_printk(error_data, sizeof(error_data)); return 0; } SEC("func_begin=sys_nanosleep") int func_begin(void *ctx) {return func(ctx, 1);} SEC("func_end=sys_nanosleep%return") int func_end(void *ctx) { return func(ctx, 2);} char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; /************************* END ***************************/ # ./perf record -e bpf-output/no-inherit,name=evt/ \ -e ./test_bpf_output_2.c/map:channel.event=evt/ \ usleep 100000 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ] # ./perf script usleep 14942 92503.198504: evt: ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0.... usleep 14942 92503.298562: evt: ffffffff810585e9 kretprobe_trampoline_holder (/lib.... # ./perf data convert --to-ctf ./out.ctf [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ] [ perf data convert: Converted and wrote 0.000 MB (2 samples) ] # babeltrace ./out.ctf [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] } [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] } # cat ./test_bpf_output_2.py from babeltrace import TraceCollection tc = TraceCollection() tc.add_trace('./out.ctf', 'ctf') d = {1:[], 2:[]} for event in tc.events: if not event.name.startswith('evt'): continue raw_data = event['raw_data'] (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2]) d[type].append(time) print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1]))))); # python3 ./test_bpf_output_2.py [100056879] Committer note: Make sure you have python3-devel installed, not python-devel, which may be for python2, which will lead to some "PyInstance_Type" errors. Also make sure that you use the right libbabeltrace, because it is shipped in Fedora, for instance, but an older version. To build libbabeltrace's python binding one also needs to use: ./configure --enable-python-bindings And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/. Signed-off-by: Wang Nan <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zefan Li <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf stat: Check existence of frontend/backed stalled cyclesAndi Kleen1-2/+20
Only put the frontend/backend stalled cycles into the default perf stat events when the CPU actually supports them. This avoids empty columns with --metric-only on newer Intel CPUs. Committer note: Before: $ perf stat ls Performance counter stats for 'ls': 1.080893 task-clock (msec) # 0.619 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 97 page-faults # 0.090 M/sec 3,327,741 cycles # 3.079 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1,609,544 instructions # 0.48 insn per cycle 319,117 branches # 295.235 M/sec 12,246 branch-misses # 3.84% of all branches 0.001746508 seconds time elapsed $ After: $ perf stat ls Performance counter stats for 'ls': 0.693948 task-clock (msec) # 0.662 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 95 page-faults # 0.137 M/sec 1,792,509 cycles # 2.583 GHz 1,599,047 instructions # 0.89 insn per cycle 316,328 branches # 455.838 M/sec 12,453 branch-misses # 3.94% of all branches 0.001048987 seconds time elapsed $ Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-03perf tools: Fix locale handling in pmu parsingJiri Olsa1-0/+13
Ingo reported regression on display format of big numbers, which is missing separators (in default perf stat output). triton:~/tip> perf stat -a sleep 1 ... 127008602 cycles # 0.011 GHz 279538533 stalled-cycles-frontend # 220.09% frontend cycles idle 119213269 instructions # 0.94 insn per cycle This is caused by recent change: perf stat: Check existence of frontend/backed stalled cycles that added call to pmu_have_event, that subsequently calls perf_pmu__parse_scale, which has a bug in locale handling. The lc string returned from setlocale, that we use to store old locale value, may be allocated in static storage. Getting a dynamic copy to make it survive another setlocale call. $ perf stat ls ... 2,360,602 cycles # 3.080 GHz 2,703,090 instructions # 1.15 insn per cycle 546,031 branches # 712.511 M/sec Committer note: Since the patch introducing the regression didn't made to perf/core, move it to just before where the regression was introduced, so that we don't break bisection for this feature. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-03-02virtio_ring: Support DMA APIsAndy Lutomirski1-0/+17
virtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-03-02selftests/powerpc: Test FPU and VMX regs in signal ucontextCyril Bur4-1/+296
Load up the non volatile FPU and VMX regs and ensure that they are the expected value in a signal handler Signed-off-by: Cyril Bur <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-02selftests/powerpc: Test preservation of FPU and VMX regs across preemptionCyril Bur6-3/+310
Loop in assembly checking the registers with many threads. Signed-off-by: Cyril Bur <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-02selftests/powerpc: Test the preservation of FPU and VMX regs across syscallCyril Bur8-1/+625
Test that the non volatile floating point and Altivec registers get correctly preserved across the fork() syscall. fork() works nicely for this purpose, the registers should be the same for both parent and child Signed-off-by: Cyril Bur <[email protected]> [mpe: Add include guards to basic_asm.h, minor formatting] Signed-off-by: Michael Ellerman <[email protected]>
2016-03-02selftests/powerpc: Remove -flto from common CFLAGSSuraj Jitindar Singh1-1/+1
LTO can cause GCC to inline some functions which have attributes set. The act of inlining the functions can lead to GCC forgetting about the attributes which leads to incorrect tests. Notable example being: __attribute__((__target__("no-vsx"))) LTO can also interact strangely with custom assembly functions and cause tests to intermittently fail. Both these cases are hard to detect and require manual inspection of binaries which is unlikely to happen for all tests. Furthermore, LTO optimisations are not necessary for selftests and correctness is paramount and as such it is best to disable LTO. LTO can be enabled on a per test basis. A pseries_le_defconfig kernel on a POWER8 was used to determine that the same subset of selftests pass and fail with and without -flto in the common Makefile. Signed-off-by: Suraj Jitindar Singh <[email protected]> Reviewed-by: Cyril Bur <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-02selftests/powerpc: Fix out of bounds access in TM signal testMichael Ellerman1-1/+1
Gcc helpfully points out that we're accessing past the end of the gprs array: tm-signal-msr-resv.c: In function 'signal_usr1': tm-signal-msr-resv.c:43:37: error: array subscript is above array bounds [-Werror=array-bounds] ucp->uc_mcontext.regs->gpr[PT_MSR] |= (7ULL); We haven't noticed previously because -flto was hiding it somehow. The code is confused, PT_MSR isn't a gpr, instead it's in uc_regs->gregs, so fix it. Signed-off-by: Michael Ellerman <[email protected]>
2016-03-01Merge 4.5-rc6 into char-misc-nextGreg Kroah-Hartman4-16/+98
We want the fixes in here, and others are sending us pull requests based on this kernel tree. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-29tools lib traceevent: Split pevent_print_event() into specific functionality ↵Steven Rostedt2-32/+117
functions Currently there's a single function that is used to display a record's data in human readable format. That's pevent_print_event(). Unfortunately, this gives little room for adding other output within the line without updating that function call. I've decided to split that function into 3 parts. pevent_print_event_task() which prints the task comm, pid and the CPU pevent_print_event_time() which outputs the record's timestamp pevent_print_event_data() which outputs the rest of the event data. pevent_print_event() now simply calls these three functions. To save time from doing the search for event from the record's type, I created a new helper function called pevent_find_event_by_record(), which returns the record's event, and this event has to be passed to the above functions. Signed-off-by: Steven Rostedt <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-29perf trace: Check and discard not only 'nr' but also '__syscall_nr'Taeung Song1-2/+6
Format fields of a syscall have the first variable '__syscall_nr' or 'nr' that mean the syscall number. But it isn't relevant here so drop it. 'nr' among fields of syscall was renamed '__syscall_nr'. So add exception handling to drop '__syscall_nr' and modify the comment for this excpetion handling. Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Taeung Song <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-29perf tools: Fix python extension buildJiri Olsa1-0/+4
The util/python-ext-sources file contains source files required to build the python extension relative to $(srctree)/tools/perf, Such a file path $(FILE).c is handed over to the python extension build system, which builds the final object in the $(PYTHON_EXTBUILD)/tmp/$(FILE).o path. After the build is done all files from $(PYTHON_EXTBUILD)lib/ are carried as the result binaries. Above system fails when we add source file relative to ../lib, which we do for: ../lib/bitmap.c ../lib/find_bit.c ../lib/hweight.c ../lib/rbtree.c All above objects will be built like: $(PYTHON_EXTBUILD)/tmp/../lib/bitmap.c $(PYTHON_EXTBUILD)/tmp/../lib/find_bit.c $(PYTHON_EXTBUILD)/tmp/../lib/hweight.c $(PYTHON_EXTBUILD)/tmp/../lib/rbtree.c which accidentally happens to be final library path: $(PYTHON_EXTBUILD)/lib/ Changing setup.py to pass full paths of source files to Extension build class and thus keep all built objects under $(PYTHON_EXTBUILD)tmp directory. Reported-by: Jeff Bastian <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Josh Boyer <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] # v4.2+ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-29tools/lib/lockdep: Fix link creation warningIngo Molnar1-1/+1
This warning triggers if the .so library has already been linked: triton:~/tip/tools/lib/lockdep> make CC common.o CC lockdep.o CC rbtree.o LD liblockdep-in.o LD liblockdep.a ln: failed to create symbolic link ‘liblockdep.so’: File exists LD liblockdep.so.4.5.0-rc6 Overwrite the link. Cc: Alfredo Alvarez Fernandez <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29tools/lib/lockdep: Add tests for AA and ABBA lockingAlfredo Alvarez Fernandez3-4/+63
Add test for AA and 2 threaded ABBA locking. Rename AA.c to ABA.c since it was implementing an ABA instead of a pure AA. Now both cases are covered. The expected output for AA.c is that the process blocks and lockdep reports a deadlock. ABBA_2threads.c differs from ABBA.c in that lockdep keeps separate chains of held locks per task. This can lead to different behaviour regarding lock detection. The expected output for this test is that the process blocks and lockdep reports a circular locking dependency. These tests found a lockdep bug - fixed by the next commit. Signed-off-by: Alfredo Alvarez Fernandez <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/1455864533-7536-3-git-send-email-alfredoalvarezernandez@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29tools/lib/lockdep: Add userspace version of READ_ONCE()Alfredo Alvarez Fernandez1-0/+1
This was added to the kernel code in <1658d35ead5d> ("list: Use READ_ONCE() when testing for empty lists"). There's nothing special we need to do about it in userspace. Signed-off-by: Alfredo Alvarez Fernandez <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/1455864533-7536-2-git-send-email-alfredoalvarezernandez@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29tools/lib/lockdep: Fix the build on recent kernelsIngo Molnar1-0/+6
The following upstream commit: 4a389810bc3c ("kernel/locking/lockdep.c: convert hash tables to hlists") broke the tools/lib/lockdep build. Add trivial RCU wrappers to fix it. These wrappers should probably be moved into their own header file. Cc: Andrew Morton <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Krinkin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar9-42/+161
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into perf/core, to pick up fixesIngo Molnar4-16/+98
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29objtool: Add tool to perform compile-time stack metadata validationJosh Poimboeuf23-6/+5173
This adds a host tool named objtool which has a "check" subcommand which analyzes .o files to ensure the validity of stack metadata. It enforces a set of rules on asm code and C inline assembly code so that stack traces can be reliable. For each function, it recursively follows all possible code paths and validates the correct frame pointer state at each instruction. It also follows code paths involving kernel special sections, like .altinstructions, __jump_table, and __ex_table, which can add alternative execution paths to a given instruction (or set of instructions). Similarly, it knows how to follow switch statements, for which gcc sometimes uses jump tables. Here are some of the benefits of validating stack metadata: a) More reliable stack traces for frame pointer enabled kernels Frame pointers are used for debugging purposes. They allow runtime code and debug tools to be able to walk the stack to determine the chain of function call sites that led to the currently executing code. For some architectures, frame pointers are enabled by CONFIG_FRAME_POINTER. For some other architectures they may be required by the ABI (sometimes referred to as "backchain pointers"). For C code, gcc automatically generates instructions for setting up frame pointers when the -fno-omit-frame-pointer option is used. But for asm code, the frame setup instructions have to be written by hand, which most people don't do. So the end result is that CONFIG_FRAME_POINTER is honored for C code but not for most asm code. For stack traces based on frame pointers to be reliable, all functions which call other functions must first create a stack frame and update the frame pointer. If a first function doesn't properly create a stack frame before calling a second function, the *caller* of the first function will be skipped on the stack trace. For example, consider the following example backtrace with frame pointers enabled: [<ffffffff81812584>] dump_stack+0x4b/0x63 [<ffffffff812d6dc2>] cmdline_proc_show+0x12/0x30 [<ffffffff8127f568>] seq_read+0x108/0x3e0 [<ffffffff812cce62>] proc_reg_read+0x42/0x70 [<ffffffff81256197>] __vfs_read+0x37/0x100 [<ffffffff81256b16>] vfs_read+0x86/0x130 [<ffffffff81257898>] SyS_read+0x58/0xd0 [<ffffffff8181c1f2>] entry_SYSCALL_64_fastpath+0x12/0x76 It correctly shows that the caller of cmdline_proc_show() is seq_read(). If we remove the frame pointer logic from cmdline_proc_show() by replacing the frame pointer related instructions with nops, here's what it looks like instead: [<ffffffff81812584>] dump_stack+0x4b/0x63 [<ffffffff812d6dc2>] cmdline_proc_show+0x12/0x30 [<ffffffff812cce62>] proc_reg_read+0x42/0x70 [<ffffffff81256197>] __vfs_read+0x37/0x100 [<ffffffff81256b16>] vfs_read+0x86/0x130 [<ffffffff81257898>] SyS_read+0x58/0xd0 [<ffffffff8181c1f2>] entry_SYSCALL_64_fastpath+0x12/0x76 Notice that cmdline_proc_show()'s caller, seq_read(), has been skipped. Instead the stack trace seems to show that cmdline_proc_show() was called by proc_reg_read(). The benefit of "objtool check" here is that because it ensures that *all* functions honor CONFIG_FRAME_POINTER, no functions will ever[*] be skipped on a stack trace. [*] unless an interrupt or exception has occurred at the very beginning of a function before the stack frame has been created, or at the very end of the function after the stack frame has been destroyed. This is an inherent limitation of frame pointers. b) 100% reliable stack traces for DWARF enabled kernels This is not yet implemented. For more details about what is planned, see tools/objtool/Documentation/stack-validation.txt. c) Higher live patching compatibility rate This is not yet implemented. For more details about what is planned, see tools/objtool/Documentation/stack-validation.txt. To achieve the validation, "objtool check" enforces the following rules: 1. Each callable function must be annotated as such with the ELF function type. In asm code, this is typically done using the ENTRY/ENDPROC macros. If objtool finds a return instruction outside of a function, it flags an error since that usually indicates callable code which should be annotated accordingly. This rule is needed so that objtool can properly identify each callable function in order to analyze its stack metadata. 2. Conversely, each section of code which is *not* callable should *not* be annotated as an ELF function. The ENDPROC macro shouldn't be used in this case. This rule is needed so that objtool can ignore non-callable code. Such code doesn't have to follow any of the other rules. 3. Each callable function which calls another function must have the correct frame pointer logic, if required by CONFIG_FRAME_POINTER or the architecture's back chain rules. This can by done in asm code with the FRAME_BEGIN/FRAME_END macros. This rule ensures that frame pointer based stack traces will work as designed. If function A doesn't create a stack frame before calling function B, the _caller_ of function A will be skipped on the stack trace. 4. Dynamic jumps and jumps to undefined symbols are only allowed if: a) the jump is part of a switch statement; or b) the jump matches sibling call semantics and the frame pointer has the same value it had on function entry. This rule is needed so that objtool can reliably analyze all of a function's code paths. If a function jumps to code in another file, and it's not a sibling call, objtool has no way to follow the jump because it only analyzes a single file at a time. 5. A callable function may not execute kernel entry/exit instructions. The only code which needs such instructions is kernel entry code, which shouldn't be be in callable functions anyway. This rule is just a sanity check to ensure that callable functions return normally. It currently only supports x86_64. I tried to make the code generic so that support for other architectures can hopefully be plugged in relatively easily. On my Lenovo laptop with a i7-4810MQ 4-core/8-thread CPU, building the kernel with objtool checking every .o file adds about three seconds of total build time. It hasn't been optimized for performance yet, so there are probably some opportunities for better build performance. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bernd Petrovitsch <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Chris J Arges <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Marek <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pedro Alves <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/f3efb173de43bd067b060de73f856567c0fa1174.1456719558.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-26perf trace: Print content of bpf-output eventWang Nan1-1/+34
With this patch the contend of BPF output event is printed by 'perf trace'. For example: # ./perf trace -a --ev bpf-output/no-inherit,name=evt/ \ --ev ./test_bpf_trace.c/map:channel.event=evt/ \ usleep 100000 ... 1.787 ( 0.004 ms): usleep/3832 nanosleep(rqtp: 0x7ffc78b18980 ) ... 1.787 ( ): evt:Raise a BPF event!..) 1.788 ( ): perf_bpf_probe:func_begin:(ffffffff810e97d0)) ... 101.866 (87.038 ms): gmain/1654 poll(ufds: 0x7f57a80008c0, nfds: 2, timeout_msecs: 1000 ) ... 101.866 ( ): evt:Raise a BPF event!..) 101.867 ( ): perf_bpf_probe:func_end:(ffffffff810e97d0 <- ffffffff81796173)) 101.869 (100.087 ms): usleep/3832 ... [continued]: nanosleep()) = 0 ... (There is an extra ')' at the end of several lines. However, it is another problem, unrelated to this commit.) Where test_bpf_trace.c is: /************************ BEGIN **************************/ #include <uapi/linux/bpf.h> struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; #define SEC(NAME) __attribute__((section(NAME), used)) static u64 (*ktime_get_ns)(void) = (void *)BPF_FUNC_ktime_get_ns; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; static int (*get_smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id; static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output; struct bpf_map_def SEC("maps") channel = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; static inline int __attribute__((always_inline)) func(void *ctx, int type) { char output_str[] = "Raise a BPF event!"; char err_str[] = "BAD %d\n"; int err; err = perf_event_output(ctx, &channel, get_smp_processor_id(), &output_str, sizeof(output_str)); if (err) trace_printk(err_str, sizeof(err_str), err); return 1; } SEC("func_begin=sys_nanosleep") int func_begin(void *ctx) {return func(ctx, 1);} SEC("func_end=sys_nanosleep%return") int func_end(void *ctx) { return func(ctx, 2);} char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; /************************* END ***************************/ Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf trace: Call bpf__apply_obj_config in 'perf trace'Wang Nan1-0/+11
Without this patch BPF map configuration is not applied. Command like this: # ./perf trace --ev bpf-output/no-inherit,name=evt/ \ --ev ./test_bpf_trace.c/map:channel.event=evt/ \ usleep 100000 Load BPF files without error, but since map:channel.event=evt is not applied, bpf-output event not work. This patch allows 'perf trace' load and run BPF scripts. Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf tools: Only set filter for tracepoints eventsWang Nan1-0/+3
perf_evlist__set_filter() tries to set filter to every evsel linked in the evlist. However, since filters can only be applied to tracepoints, checking type of evsel before calling perf_evsel__set_filter() would be better. Signed-off-by: Wang Nan <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf config: Bring perf_default_config to the very beginning at main()Wang Nan10-21/+15
Before this patch each subcommand calls perf_config() by themself, reading the default configuration together with subcommand specific options. If a subcommand doesn't have it own options, it needs to call 'perf_config(perf_default_config, NULL)' to ensure .perfconfig is loaded. This patch brings perf_config(perf_default_config, NULL) to the very start of main(), so subcommands don't need to do it. After this patch, 'llvm.clang-path' works for 'perf trace'. Signed-off-by: Wang Nan <[email protected]> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Zefan <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Update column width of dynamic entriesNamhyung Kim2-3/+16
The column width of dynamic entries is updated when comparing hist entries. However some unique entries can miss the chance to update. So move the update to output resort stage to make sure every entry will get called before display. To do that, abuse ->sort callback to update the width when the third argument is NULL. When resorting entries in normal path, it never be NULL so it should be fine IMHO. Before: # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP <-- here After: # Overhead ptr / bytes_req / gfp_flags # .............. ..................................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP_NOMEMALLOC Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists: Fix dynamic entry display in hierarchyNamhyung Kim2-1/+4
When dynamic sort key is used it might not show pretty printed output. This is because the trace output was not set only for the first dynamic sort key. During hierarchy_insert_entry() it missed to pass the trace_output to dynamic entries. Also even if it did, only first entry will have it. Subsequent entries might set it during collapsing stage but it's not guaranteed. Before: $ perf report --hierarchy --stdio -s ptr,bytes_req,gfp_flags -g none # # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% 66080 10.42% 0xffff8803f766be00 8.33% 96 8.33% 66080 2.08% 512 2.08% 67280 After: # # Overhead ptr / bytes_req / gfp_flags # .............. .......................................... # 37.50% 0xffff8803f7669400 37.50% 448 37.50% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 10.42% 0xffff8803f766be00 8.33% 96 8.33% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 2.08% 512 2.08% GFP_KERNEL|GFP_NOWARN|GFP_REPEAT|GFP Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Left align dynamic entries in hierarchyNamhyung Kim2-11/+33
The dynamic entries are right-aligned unlike other entries since it usually has numeric value. But for the hierarchy mode, left alignment is more appropriate IMHO. Also trim spaces on the left so that we can easily identify the hierarchy. Before: $ perf report --hierarchy -i perf.data.kmem -s gfp_flags,ptr,bytes_req --stdio -g none ... # # Overhead gfp_flags / ptr / bytes_req # .............. ................................................................................................. # 91.67% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 37.50% 0xffff8803f7669400 37.50% 448 8.33% 0xffff8803f766be00 8.33% 96 4.17% 0xffff8800d156dc00 4.17% 704 After: # Overhead gfp_flags / ptr / bytes_req # .............. .................................... # 91.67% GFP_ATOMIC|GFP_NOWARN|GFP_NOMEMALLOC 37.50% 0xffff8803f7669400 37.50% 448 8.33% 0xffff8803f766be00 8.33% 96 4.17% 0xffff8800d156dc00 4.17% 704 Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Fix indentation of dynamic entries in hierarchyNamhyung Kim4-6/+26
When dynamic entries are used in the hierarchy mode with multiple events, the output might not be aligned properly. In the hierarchy mode, the each sort column is indented using total number of sort keys. So it keeps track of number of sort keys when adding them. However a dynamic sort key can be added more than once when multiple events have same field names. This results in unnecessarily long indentation in the output. For example perf kmem records following events: $ perf evlist --trace-fields -i perf.data.kmem kmem:kmalloc: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags kmem:kmalloc_node: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags,node kmem:kfree: trace_fields: call_site,ptr kmem:kmem_cache_alloc: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags kmem:kmem_cache_alloc_node: trace_fields: call_site,ptr,bytes_req,bytes_alloc,gfp_flags,node kmem:kmem_cache_free: trace_fields: call_site,ptr kmem:mm_page_alloc: trace_fields: page,order,gfp_flags,migratetype kmem:mm_page_free: trace_fields: page,order As you can see, many field names shared between kmem events. So adding 'ptr' dynamic sort key alone will set nr_sort_keys to 6. And this adds many unnecessary spaces between columns. Before: $ perf report -i perf.data.kmem --hierarchy -s ptr -g none --stdio ... # Overhead ptr # ....................... ................................... # 99.89% 0xffff8803ffb79720 0.06% 0xffff8803d228a000 0.03% 0xffff8803f7678f00 0.00% 0xffff880401dc5280 0.00% 0xffff880406172380 0.00% 0xffff8803ffac3a00 0.00% 0xffff8803ffac1600 After: # Overhead ptr # ........ .................... # 99.89% 0xffff8803ffb79720 0.06% 0xffff8803d228a000 0.03% 0xffff8803f7678f00 0.00% 0xffff880401dc5280 0.00% 0xffff880406172380 0.00% 0xffff8803ffac3a00 0.00% 0xffff8803ffac1600 Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf hists: Fix comparing of dynamic entriesNamhyung Kim1-0/+8
When hist_entry__cmp() and hist_entry__collapse() are called, they should check if the dynamic entry is comparing matching hists only. Otherwise it might access different hists resulting in incorrect output. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-02-26perf report: Show message for percent limit on gtkNamhyung Kim1-0/+11
Like the stdio, it should show messages about omitted hierarchy entries. Please refer the previous commit for more details. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>