aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-10-27perf tools: Search for more options when passing args to -hArnaldo Carvalho de Melo1-1/+14
Recently 'perf <tool> -h' was made aware of arguments and would show just the help for the arguments specified, but that required a strict form, i.e.: $ perf -h --tui worked, but: $ perf -h tui didn't. Make it support both cases and also look at the option help when neither matches, so that he following examples works: $ perf report -h interface Usage: perf report [<options>] --gtk Use the GTK2 interface --stdio Use the stdio interface --tui Use the TUI interface $ perf report -h stack Usage: perf report [<options>] -g, --call-graph <print_type,threshold[,print_limit],order, sort_key[,branch]> Display call graph (stack chain/backtrace): print_type: call graph printing style (graph|flat|fractal|none) threshold: minimum call graph inclusion threshold (<percent>) print_limit: maximum number of call graph entry (<number>) order: call graph order (caller|callee) sort_key: call graph sort key (function|address) branch: include last branch info to call graph (branch) Default: graph,0.5,caller,function --max-stack <n> Set the maximum stack depth when parsing the callchain, anything beyond the specified depth will be ignored. Default: 127 $ Suggested-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-27perf stat: Cache aggregated map entries in extra cpumapJiri Olsa1-4/+55
Currently any time we need to access socket or core id for given cpu, we access the sysfs topology file. Adding a cpus_aggr_map cpu_map to cache those entries. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-27perf cpu_map: Add cpu_map__empty_new functionJiri Olsa2-0/+18
Adding cpu_map__empty_new interface to create empty cpumap with given size. The cpumap entries are initialized with -1. It'll be used for caching cpu_map in following patches. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-27perf evsel: Move id_offset out of struct perf_evsel union memberJiri Olsa1-1/+1
Because the 'perf stat record' patches will use the id_offset member together with the priv pointer. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-27perf tools: Introduce usage_with_options_msg()Namhyung Kim9-28/+62
Now usage_with_options() setup a pager before printing message so normal printf() or pr_err() will not be shown. The usage_with_options_msg() can be used to print some help message before usage strings. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-26perf tools: Setup pager when printing usage and helpNamhyung Kim1-2/+13
It's annoying to see error or help message when command has many options like in perf record, report or top. So setup pager when print parser error or help message - it should be OK since no UI is enabled at the parsing time. The usage_with_options() already disables it by calling exit_browser() anyway. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ingo Molnar <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-26perf report: Rename to --show-cpu-utilizationNamhyung Kim3-2/+5
So that it can be more consistent with other --show-* options. The old name (--showcpuutilization) is provided only for compatibility. Signed-off-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-26perf tools: Improve ambiguous option help messageNamhyung Kim1-9/+8
Currently if an option name is ambiguous it only prints first two matched option names but no help. It'd be better it could show all possible names and help messages too. Before: $ perf report --show Error: Ambiguous option: show (could be --show-total-period or --show-ref-call-graph) Usage: perf report [<options>] After: $ perf report --show Error: Ambiguous option: show (could be --show-total-period or --show-ref-call-graph) Usage: perf report [<options>] -n, --show-nr-samples Show a column with the number of samples --showcpuutilization Show sample percentage for different cpu modes -I, --show-info Display extended information about perf.data file --show-total-period Show a column with the sum of periods --show-ref-call-graph Show callgraph from reference event Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ingo Molnar <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-25Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar1-9/+81
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements from Arnaldo Carvalho de Melo: New user-visible features: - Show ordered command line options when -h is used or when an unknown option is specified. (Arnaldo Carvalho de Melo) - If options are passed after -h, show just its descriptions, not all options. (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-10-23perf tools: Provide help for subset of optionsArnaldo Carvalho de Melo1-9/+33
Some tools have a lot of options, so, providing a way to show help just for some of them may come handy: $ perf report -h --tui Usage: perf report [<options>] --tui Use the TUI interface $ perf report -h --tui --showcpuutilization -b -c Usage: perf report [<options>] -b, --branch-stack use branch records for per branch histogram filling -c, --comms <comm[,comm...]> only consider symbols in these comms --showcpuutilization Show sample percentage for different cpu modes --tui Use the TUI interface $ Using it with perf bash completion is also handy, just make sure you source the needed file: $ . ~/git/linux/tools/perf/perf-completion.sh Then press tab/tab after -- to see a list of options, put them after -h and only the options chosen will have its help presented: $ perf report -h -- --asm-raw --demangle-kernel --group --kallsyms --pretty --stdio --branch-history --disassembler-style --gtk --max-stack --showcpuutilization --symbol-filter --branch-stack --dsos --header --mem-mode --show-info --symbols --call-graph --dump-raw-trace --header-only --modules --show-nr-samples --symfs --children --exclude-other --hide-unresolved --objdump --show-ref-call-graph --threads --column-widths --fields --ignore-callees --parent --show-total-period --tid --comms --field-separator --input --percentage --socket-filter --tui --cpu --force --inverted --percent-limit --sort --verbose --demangle --full-source-path --itrace --pid --source --vmlinux $ perf report -h --socket-filter Usage: perf report [<options>] --socket-filter <n> only show processor socket that match with this filter Suggested-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-23perf tools: Show tool command line options orderedArnaldo Carvalho de Melo1-0/+48
When asking for a listing of the options, be it using -h or when an unknown option is passed, order it by one-letter options, then the ones having just long names. Suggested-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-23Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar12-49/+151
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - The default for callchains is back to 'callee' when --children is not used. (Namhyung Kim) - Move the 'use_offset' option to the right place where the annotate code expects it to be to be able to properly handle it. (Namhyung Kim) - Don't die when an unknown 'annotate' option is found in the perf config file (usually ~/.perfconfig), just warn the user. (Arnaldo Carvalho de Melo) Infrastructure changes: - Support %ps/%pS in libtraceevent. (Scott Wood) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-10-22perf annotate: Don't die() when finding an invalid config optionArnaldo Carvalho de Melo1-3/+3
The perf_config() infrastructure we inherited from git calls die() when the provided config callback returns -1, meaning some key in a config section is unexpected, that seems ok for a stdio based tool, but in --tui we end up messing up the output, so just tell the user about the error, wait for a keystroke and return 0, being more resilient and proceeding with what we managed to parse. That die() needs to die, tho :-) Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf ui tui: Register the error callbacks before initializing the widgetsArnaldo Carvalho de Melo1-4/+4
I.e. we want to tell the user about errors found during, for instance, the ui_browser initialization, so that a call to ui__warning() appears as a window waiting for a key to be pressed. Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf annotate: Fix 'annotate.use_offset' config variable usageNamhyung Kim1-1/+1
The annotate__configs should be sorted so that it can use bsearch(3). However commit 0c4a5bcea460 ("perf annotate: Display total number of samples with --show-total-period") added a new config item at the end. This resulted in the 'annotate.use_offset' config variable cannot be found and perf terminated like below: $ perf report bad config file line 6 in ~/.perfconfig Signed-off-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Martin Liška <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Taeung Song <[email protected]> Fixes: 0c4a5bcea460 ("perf annotate: Display total number of samples with --show-total-period") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf tools: Improve call graph documents and help messagesNamhyung Kim6-30/+62
The --call-graph option is complex so we should provide better guide for users. Also change help message to be consistent with config option names. Now perf top will show help like below: $ perf top --call-graph Error: option `call-graph' requires a value Usage: perf top [<options>] --call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]> setup and enables call-graph (stack chain/backtrace): record_mode: call graph recording mode (fp|dwarf|lbr) record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>) default: 8192 (bytes) print_type: call graph printing style (graph|flat|fractal|none) threshold: minimum call graph inclusion threshold (<percent>) print_limit: maximum number of call graph entry (<number>) order: call graph order (caller|callee) sort_key: call graph sort key (function|address) branch: include last branch info to call graph (branch) Default: fp,graph,0.5,caller,function Requested-by: Ingo Molnar <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf tools: Defaults to 'caller' callchain order only if --children is enabledNamhyung Kim5-1/+9
The caller callchain order is useful with --children option since it can show 'overview' style output, but other commands which don't use --children feature like 'perf script' or even 'perf report/top' without --children are better to keep callee order. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Brendan Gregg <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf top: Support call-graph display options alsoNamhyung Kim4-10/+62
Currently 'perf top --call-graph' option is same as 'perf record'. But 'perf top' also need to receive display options in 'perf report'. To do that, change parse_callchain_report_opt() to allow record options too. Now perf top can receive display options like below: $ perf top --call-graph Error: option `call-graph' requires a value Usage: perf top [<options>] --call-graph <mode[,dump_size],output_type,min_percent[,print_limit],call_order[,branch]> setup and enables call-graph (stack chain/backtrace) recording: fp dwarf lbr, output_type (graph, flat, fractal, or none), min percent threshold, optional print limit, callchain order, key (function or address), add branches $ perf top --call-graph callee,graph,fp Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22perf tools: Move callchain help messages to callchain.hNamhyung Kim3-10/+20
These messages will be used by 'perf top' in the next patch. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Chandler Carruth <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22tools lib traceevent: Support %ps/%pSScott Wood1-2/+2
Commits such as 65dd297ac25565 ("xfs: %pF is only for function pointers") caused a regression because pretty_print() didn't support %ps/%pS. The current %pf/%pF implementation in pretty_print() is what %ps/%pS is supposed to do, so use the same code for %ps/%pS. Addressing the incorrect %pf/%pF implementation is beyond the scope of this patch. Signed-off-by: Scott Wood <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Dave Chinner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-22Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar5-10/+24
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Print branch filter state in verbose mode. (Andi Kleen) - Fix core dump caused by per-socket/core system-wide stat. (Kan Liang) - Update libtraceevent KVM plugin. (Paolo Bonzini) Infrastructure changes: - Add fixdep to 'tools/build' .gitignore. (Yunlong Song) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-10-21perf annotate: Add debug message for out of bounds sampleArnaldo Carvalho de Melo1-1/+4
Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-21perf evsel: Print branch filter state with -vvAndi Kleen1-0/+1
Add a missing field to the perf_event_attr debug output. Signed-off-by: Andi Kleen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Print it between config2 and sample_regs_user (peterz)] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-20perf cpu_map: Fix core dump caused by per-socket/core system-wide statKan Liang1-1/+1
Perf will core dump if --per-socket/core -a are applied for perf stat. The root cause is that cpu_map__build_map set refcnt of evlist's cpu_map to 1. It should set refcnt for the newly created cpu_map, not evlist's cpu_map. Here is the example: # perf stat -e cycles --per-socket -a sleep 1 Performance counter stats for 'system wide': S0 36 30,196,257 cycles S1 28 15,823,536 cycles 1.001126828 seconds time elapsed *** Error in `./perf': corrupted double-linked list: 0x00000000021f9090 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3002e7bbe7] /lib64/libc.so.6[0x3002e7d2b5] ./perf(perf_evsel__delete+0x28)[0x485bdd] ./perf[0x4800e8] ./perf(perf_evlist__delete+0x5e)[0x482cd5] ./perf(cmd_stat+0xf25)[0x432328] ./perf[0x4768e0] ./perf[0x476ad6] ./perf[0x476b41] ./perf(main+0x1d0)[0x476db2] /lib64/libc.so.6(__libc_start_main+0xf5)[0x3002e21b45] ./perf[0x4202c5] Signed-off-by: Kan Liang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-20tools lib traceevent: update KVM pluginPaolo Bonzini1-8/+17
The format of the role word has changed through the years and the plugin was never updated; some VMX exit reasons were missing too. Signed-off-by: Paolo Bonzini <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-20perf build: Add fixdep to .gitignoreYunlong Song1-0/+1
Commit 7c422f5572667fef0db38d2046ecce69dcf0afc8 ("tools build: Build fixdep helper from perf and basic libs") dynamically creates fixdep during the perf building. Add it to .gitignore. Signed-off-by: Yunlong Song <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Fixes: 7c422f557266 ("tools build: Build fixdep helper from perf and basic libs") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-20perf record: Add ability to sample call branchesStephane Eranian2-0/+2
This patch add a new branch type sampling filter to perf record. It is named 'call' and maps to PERF_SAMPLE_BRANCH_CALL. It samples direct call branches only, unlike 'any_call' which includes indirect calls as well. $ perf record -j call -e cycles ..... The man page is updated accordingly. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20perf/powerpc: Add support for PERF_SAMPLE_BRANCH_CALLStephane Eranian1-0/+3
The patch catches PERF_SAMPLE_BRANCH_CALL because it is not clear whether this is actually supported by the hardware. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20perf/x86: Add support for PERF_SAMPLE_BRANCH_CALLStephane Eranian1-0/+4
This patch enables the suport for the PERF_SAMPLE_BRANCH_CALL for Intel x86 processors. When the processor support LBR filtering this the selection is done in hardware. Otherwise, the filter is applied by software. Note that we chose to include zero length calls because they also represent calls. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20perf: Add PERF_SAMPLE_BRANCH_CALLStephane Eranian1-0/+2
Add a new branch sample type to cover only call branches (function calls). The current ANY_CALL included direct, indirect calls and far jumps. We want to be able to differentiate indirect from direct calls. Therefore we introduce PERF_SAMPLE_BRANCH_CALL. The implementation is up to each architecture. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20perf/x86: Fix time_shift in perf_event_mmap_pageAdrian Hunter2-2/+13
Commit: b20112edeadf ("perf/x86: Improve accuracy of perf/sched clock") allowed the time_shift value in perf_event_mmap_page to be as much as 32. Unfortunately the documented algorithms for using time_shift have it shifting an integer, whereas to work correctly with the value 32, the type must be u64. In the case of perf tools, Intel PT decodes correctly but the timestamps that are output (for example by perf script) have lost 32-bits of granularity so they look like they are not changing at all. Fix by limiting the shift to 31 and adjusting the multiplier accordingly. Also update the documentation of perf_event_mmap_page so that new code based on it will be more future-proof. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Fixes: b20112edeadf ("perf/x86: Improve accuracy of perf/sched clock") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-20Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar22-514/+487
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes: User visible changes: - 'perf bench mem' now prefaults unconditionally, no sense in providing modes where page faults are measured. (Ingo Molnar) - Harmonize -l/--nr_loops accross 'perf bench'. (Ingo Molnar) - Various 'perf bench' consistency improvements. (Ingo Molnar) - Suppress libtraceevent warnings in non-verbose 'perf test' mode. (Namhyung Kim) - Move some tracepoint event test error messages to the verbose mode of 'perf test'. (Namhyung Kim) - Make 'perf help' usage message consistent with other tools. (Yunlong Song) Build fixes: - Fix 'perf bench' build with gcc 4.4.7. (Arnaldo Carvalho de Melo) Infrastructure changes: - 'perf stat' prep work for the 'perf stat scripting' patchkit. (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-10-19perf bench: Use named initializers in the trailer tooArnaldo Carvalho de Melo1-2/+2
To avoid this splat with gcc 4.4.7: cc1: warnings being treated as errors bench/mem-functions.c:273: error: missing initializer bench/mem-functions.c:273: error: (near initialization for ‘memcpy_functions[4].desc’) bench/mem-functions.c:366: error: missing initializer bench/mem-functions.c:366: error: (near initialization for ‘memset_functions[4].desc’) Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf script: Check output fields only for samplesJiri Olsa1-1/+4
There's no need to check sampling output fields for events without perf_event_attr::sample_type field set. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf cpu_map: Add data arg to cpu_map__build_map callbackJiri Olsa5-15/+27
Adding data arg to cpu_map__build_map callback, so we could pass data along to the callback. It'll be needed in following patches to retrieve topology info from perf.data. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf cpu_map: Make cpu_map__build_map globalJiri Olsa2-2/+4
We'll need to call it from perf stat in the stat_script patchkit Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf stat: Add AGGR_UNSET modeJiri Olsa3-0/+7
Adding AGGR_UNSET mode, so we could distinguish unset aggr_mode in following patches. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf stat: Rename perf_stat struct into perf_stat_evselJiri Olsa3-8/+8
It's used as the perf_evsel::priv data, so the name suits better. Also we'll need the perf_stat name free for more generic struct. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf help: Change 'usage' to 'Usage' for consistencyYunlong Song2-3/+3
Capitalize 'usage' to make it consistent with all the other 'Usage' in the codes, e.g., usage_builtin. Signed-off-by: Yunlong Song <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ramkumar Ramachandra <[email protected]> Cc: Sriram Raghunathan <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench: Run benchmarks, don't test themIngo Molnar1-4/+4
So right now we output this text: memcpy: Benchmark for memcpy() functions memset: Benchmark for memset() functions all: Test all memory access benchmarks But the right verb to use with benchmarks is to 'run' them, not 'test' them. So change this (and all similar texts) to: memcpy: Benchmark for memcpy() functions memset: Benchmark for memset() functions all: Run all memory access benchmarks Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Rename 'routine' to 'function'Ingo Molnar2-38/+38
So right now there's a somewhat inconsistent mess of the benchmarking code and options sometimes calling benchmarked functions 'functions', sometimes calling them 'routines'. Name them 'functions' consistently. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Updated perf-bench man page, pointed out by David Ahern ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench: Harmonize all the -l/--nr_loops optionsIngo Molnar4-23/+23
We have three benchmarking subsystems that specify some sort of 'number of loops' parameter - but all of them do it inconsistently: numa: -l/--nr_loops sched messaging: -l/--loops mem memset/memcpy: -i/--iterations Harmonize them to -l/--nr_loops by picking the numa variant - which is also the most likely one to have existing scripting which we don't want to break. Plus improve the parameter help texts to indicate the default value for the nr_loops variable to keep users from guessing ... Also propagate the naming to internal variables. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Let the harmonisation reach the perf-bench man page as well ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Reorganize the code a bitIngo Molnar1-19/+19
Reorder functions a bit, so that we synchronize the layout of the memcpy() and memset() portions of the code. This improves the code, especially after we'll add an strlcpy() variant as well. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Improve user visible stringsIngo Molnar2-15/+20
- fix various typos in user visible output strings - make the output consistent (wrt. capitalization and spelling) - offer the list of routines to benchmark on '-r help'. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Fix 'length' vs. 'size' naming confusionIngo Molnar2-50/+50
So 'perf bench mem memcpy/memset' consistently uses 'len' and 'length' for buffer sizes - while it's really a memory buffer size. (strings have length.) Rename all affected variables. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Update perf-bench man page ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Rename 'routine' to 'routine_str'Ingo Molnar1-6/+6
So bench/mem-functions.c has a 'routine' name for the routines parameter string, but a 'length_str' name for the length parameter string. We also have another entity named 'routine': 'struct routine'. This is inconsistent and confusing: rename 'routine' to 'routine_str'. Also fix typos in the --routine help text. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench mem: Change 'cycle' to 'cycles'Ingo Molnar2-30/+30
So 'perf bench mem memset/memcpy' has a CPU cycles measurement method, but calls it 'cycle' (singular) throughout the code, which makes it harder to read. Rename all related functions, variables and options to a plural 'cycles' nomenclature. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ s/--cycle/--cycles/g in perf-bench man page ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench: List output formatting options on 'perf bench -h'Ingo Molnar1-1/+1
So 'perf bench -h' is not very helpful when printing the help line about the output formatting options: -f, --format <default> Specify format style There are two output format styles, 'default' and 'simple', so improve the help text to: -f, --format <default|simple> Specify the output formatting style Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Removed leftovers from the mem-functions.c rename ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench: Remove the prefaulting complication from 'perf bench mem mem*'Ingo Molnar2-112/+50
So 'perf bench mem memcpy/memset' has elaborate code to measure memcpy()/memset() performance both with freshly allocated buffers (which includes initial page fault overhead) and with preallocated buffers. But the thing is, the resulting bandwidth results are mostly meaningless, because page faults dominate so much of the cost. It might make sense to measure cache cold vs. cache hot performance, but the code does not do this. So remove this complication, and always prefault the ranges before using them. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Remove --no-prefault, --only-prefault from docs, noticed by David Ahern ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2015-10-19perf bench: Rename 'mem-memcpy.c' => 'mem-functions.c'Ingo Molnar2-1/+1
So mem-memcpy.c started out as a simple memcpy() benchmark, then it grew memset() functionality and now I plan to add string copy benchmarks as well. This makes the file name a misnomer: rename it to the more generic mem-functions.c name. Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> Cc: Hitoshi Mitake <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ The "rename" was introducing __unused, wasn't removing the old file, and didn't update tools/perf/bench/Build, fix it ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>