aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2017-03-14perf sched timehist: Add --next optionBrendan Gregg2-5/+24
The --next option shows the next task for each context switch, providing more context for the sequence of scheduler events. $ perf sched timehist --next | head Samples do not have callchains. time cpu task name waittime schdelay run time [tid/pid] (msec) (msec) (msec) ---------- --- ---------- --------- ------ ----- 374.793792 [0] <idle> 0.000 0.000 0.000 next: rngd[1524] 374.793801 [0] rngd[1524] 0.000 0.000 0.009 next: swapper/0[0] 374.794048 [7] <idle> 0.000 0.000 0.000 next: yes[30884] 374.794066 [7] yes[30884] 0.000 0.000 0.018 next: swapper/7[0] 374.794126 [2] <idle> 0.000 0.000 0.000 next: rngd[1524] 374.794140 [2] rngd[1524] 0.325 0.006 0.013 next: swapper/2[0] 374.794281 [3] <idle> 0.000 0.000 0.000 next: perf[31070] Signed-off-by: Brendan Gregg <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-14perf tools: Add 'cgroup_id' sort order keywordHari Bathini5-1/+59
This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the device number and inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. A simple test for this would be to clone a few processes passing SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run different workloads on each of those contexts, while running perf record command with --namespaces option. Shown below is the output of perf report, sorted with cgroup identifier, on perf.data generated with the above test scenario, clearly indicating one context's considerable use of kernel memory in comparison with others: $ perf report -s cgroup_id,sample --stdio # # Total Lost Samples: 0 # # Samples: 5K of event 'kmem:kmalloc' # Event count (approx.): 5965 # # Overhead cgroup id (dev/inode) Samples # ........ ..................... ............ # 81.27% 3/0xeffffffb 4848 16.24% 3/0xf00000d0 969 1.16% 3/0xf00000ce 69 0.82% 3/0xf00000cf 49 0.50% 0/0x0 30 While this is a start, there is further scope of improving this. For example, instead of cgroup namespace's device and inode numbers, dev and inode numbers of some or all namespaces may be used to distinguish which processes are running in a given container context. Also, scripts to map device and inode info to containers sounds plausible for better tracing of containers. Signed-off-by: Hari Bathini <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Aravinda Prasad <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-14perf script: Add script print support for namespace eventsHari Bathini2-0/+43
Introduce a new option to display events of type PERF_RECORD_NAMESPACES and update perf-script documentation accordingly. Shown below is output (trimmed) of perf script command with the newly introduced option, on perf.data generated with perf record command using --namespaces option. $ perf script --show-namespace-events swapper 0 [000] 0.000000: PERF_RECORD_NAMESPACES 1/1 - nr_namespaces: 7 [0/net: 3/0xf000001c, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] swapper 0 [000] 0.000000: PERF_RECORD_NAMESPACES 2/2 - nr_namespaces: 7 [0/net: 3/0xf000001c, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] Commiter notes: Testing it: Investigating that double PERF_RECORD_NAMESPACES for the 19155 pid/tid... Its more than that, there are two PERF_RECORD_COMM as well, and with zeroed timestamps, so probably a synthesizing artifact... # perf script --show-task --show-namespace <SNIP> perf 0 [000] 0.000000: PERF_RECORD_COMM: perf:19154/19154 perf 0 [000] 0.000000: PERF_RECORD_FORK(19155:19155):(19154:19154) perf 0 [000] 0.000000: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7 [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] perf 0 [000] 0.000000: PERF_RECORD_COMM: perf:19155/19155 perf 0 [000] 0.000000: PERF_RECORD_COMM: perf:19155/19155 perf 0 [000] 0.000000: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7 [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] swapper 0 [000] 3110.881834: 1 cycles: ffffffffa7060bf6 native_write_msr (/lib/modules/4.11.0-rc1+/build/vmlinux) <SNIP> Signed-off-by: Hari Bathini <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Aravinda Prasad <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-14perf record: Synthesize namespace events for current processesHari Bathini3-10/+119
Synthesize PERF_RECORD_NAMESPACES events for processes that were running prior to invocation of perf record. The data for this is taken from /proc/$PID/ns. These changes make way for analyzing events with regard to namespaces. Committer notes: Check if 'tool' is NULL in perf_event__synthesize_namespaces(), as in the test__mmap_thread_lookup case, i.e. 'perf test Lookup mmap thread". Testing it: # ps axH > /tmp/allthreads # perf record -a --namespaces usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.169 MB perf.data (8 samples) ] # perf report -D | grep PERF_RECORD_NAMESPACES | wc -l 602 # wc -l /tmp/allthreads 601 /tmp/allthreads # tail /tmp/allthreads 16951 pts/4 T 0:00 git rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^ 16952 pts/4 T 0:00 /bin/sh /usr/libexec/git-core/git-rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^ 17176 pts/4 T 0:00 git commit --amend --no-post-rewrite 17204 pts/4 T 0:00 vim /home/acme/git/linux/.git/COMMIT_EDITMSG 18939 ? S 0:00 [kworker/2:1] 18947 ? S 0:00 [kworker/3:0] 18974 ? S 0:00 [kworker/1:0] 19047 ? S 0:00 [kworker/0:1] 19152 pts/6 S+ 0:00 weechat 19153 pts/7 R+ 0:00 ps axH # perf report -D | grep PERF_RECORD_NAMESPACES | tail 0 0 0x125068 [0xa0]: PERF_RECORD_NAMESPACES 17176/17176 - nr_namespaces: 7 0 0 0x1255b8 [0xa0]: PERF_RECORD_NAMESPACES 17204/17204 - nr_namespaces: 7 0 0 0x125df0 [0xa0]: PERF_RECORD_NAMESPACES 18939/18939 - nr_namespaces: 7 0 0 0x125f00 [0xa0]: PERF_RECORD_NAMESPACES 18947/18947 - nr_namespaces: 7 0 0 0x126010 [0xa0]: PERF_RECORD_NAMESPACES 18974/18974 - nr_namespaces: 7 0 0 0x126120 [0xa0]: PERF_RECORD_NAMESPACES 19047/19047 - nr_namespaces: 7 0 0 0x126230 [0xa0]: PERF_RECORD_NAMESPACES 19152/19152 - nr_namespaces: 7 0 0 0x129330 [0xa0]: PERF_RECORD_NAMESPACES 19154/19154 - nr_namespaces: 7 0 0 0x12a1f8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7 0 0 0x12b0b8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7 # Humm, investigate why we got two record for the 19155 pid/tid... Signed-off-by: Hari Bathini <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Aravinda Prasad <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/148891931111.25309.11073854609798681633.stgit@hbathini.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-14perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related infoHari Bathini27-3/+265
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted by the kernel when fork, clone, setns or unshare are invoked. And update perf-record documentation with the new option to record namespace events. Committer notes: Combined it with a later patch to allow printing it via 'perf report -D' and be able to test the feature introduced in this patch. Had to move here also perf_ns__name(), that was introduced in another later patch. Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt: util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=] ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx ^ Testing it: # perf record --namespaces -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ] # # perf report -D <SNIP> 3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7 [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc, 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb] 0x1151e0 [0x30]: event: 9 . . ... raw event: size 48 bytes . 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h.... . 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c.... . 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................ <SNIP> NAMESPACES events: 1 <SNIP> # Signed-off-by: Hari Bathini <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Aravinda Prasad <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-13perf hists browser: Fix typo in function switch_data_fileChangbin Du1-1/+1
Should clear buf 'abs_path', not 'options'. Signed-off-by: Changbin Du <[email protected]> Cc: Feng Tang <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 341487ab561f ("perf hists browser: Add option for runtime switching perf data file") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-13perf report: Document +field style argument support for --field optionChangbin Du1-0/+3
Commit 2f3f9bcf000b ("perf tools: Add +field argument support for --field option") by Jiri Olsa <[email protected]> introduced +field style argument support for --field option. This is useful but not updated documentation. This add a little description there. Signed-off-by: Changbin Du <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Slightly improved the phrase structure ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-13perf sort: Fix segfault with basic block 'cycles' sort dimensionChangbin Du1-0/+5
Skip the sample which doesn't have branch_info to avoid segmentation fault: The fault can be reproduced by: perf record -a perf report -F cycles Signed-off-by: Changbin Du <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Fixes: 0e332f033a82 ("perf tools: Add support for cycles, weight branch_info field") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-13perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for gitChangbin Du1-0/+2
Ignore two files: pmu-events/{jevents,pmu-events.c} which are generated during the build. Committer notes: Testing it: $ make -C tools/perf/ $ git status On branch perf/core Untracked files: (use "git add <file>..." to include in what will be committed) tools/perf/pmu-events/jevents tools/perf/pmu-events/pmu-events.c nothing added to commit but untracked files present (use "git add" to track) $ After the patch: $ git status On branch perf/core nothing to commit, working tree clean $ Signed-off-by: Changbin Du <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-13perf tools: Missing c2c command in command-listChangbin Du1-0/+1
Add the c2c command to command-list.txt so perf help can list this command. Committer notes: Before: # perf help | grep c2c # After: # perf help | grep c2c c2c Shared Data C2C/HITM Analyzer. # Signed-off-by: Changbin Du <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-07perf c2c: Fix display bug when using pipeNamhyung Kim1-1/+1
Currently 'perf c2c report' determines display mode using the --stdio option, but it could be a problem if stdout is not a tty since setup_browser falls back to stdio in this case. But perf c2c didn't know this and tried to use TUI browser anyway. It should check "use_browser" variable instead. For example, the following command showed nothing and broke terminal setting. Now it's fixed.. $ perf c2c report | head ================================================= Trace Event Information ================================================= Total records : 136 Locked Load/Store Operations : 6 Load Operations : 62 Loads - uncacheable : 0 Loads - IO : 1 Loads - Miss : 7 Loads - no mapping : 2 Committer notes: When trying it without a proper perf.data file it results in a stuck terminal, just as Namhyung reported above: [acme@jouet ~]$ perf c2c report | head WARNING: no sample cpu value[acme@jouet ~]$ One has to kill it from some other xterm. Confirm that this patch fixes it: After: $ perf c2c report | head WARNING: no sample cpu value================================================= Trace Event Information ================================================= Total records : 14 Locked Load/Store Operations : 0 Load Operations : 0 Loads - uncacheable : 0 Loads - IO : 0 Loads - Miss : 0 Loads - no mapping : 0 $ Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-07perf c2c: Clarify help message of --stats optionNamhyung Kim1-1/+1
As it is not strictly asking for only stdio output, but will imply using it. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-07perf report: Hide tip message when -q option is givenNamhyung Kim1-2/+1
The tip message at the end was printed regardless of the -q option. Originally, the message suggested only '-s comm,dso' option for higher level view when no sort option and parent option were given. Now it shows random help message regardless of the options so the condition can be simplified to honor the -q option. Committer notes: Before: $ perf report --stdio -q 42.77% ls ls [.] _init 13.21% ls ld-2.24.so [.] match_symbol 12.55% ls libc-2.24.so [.] __strcoll_l 11.94% ls libc-2.24.so [.] _init # # (Tip: Show current config key-value pairs: perf config --list) # $ After: $ perf report --stdio -q 42.77% ls ls [.] _init 13.21% ls ld-2.24.so [.] match_symbol 12.55% ls libc-2.24.so [.] __strcoll_l 11.94% ls libc-2.24.so [.] _init $ We still have those two extra lines tho (that git commit insists in turning into one, or git commit --amend doesn't make me add), food for another patch... Reported-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-06perf bench numa: Add more comment for -c optionJiri Olsa1-1/+2
Adding more commentary for -c/--show_convergence option, to explain how the convergence is defined. Before: -c, --show_convergence show convergence details Now: -c, --show_convergence convergence is reached when each process \ (all its threads) is running on a single NUMA node. Suggested--by: Jiri Hladky <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Hladky <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Rephrased a bit based on a IRC conversation with Jiri ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf bench futex: Fix build on musl + clangArnaldo Carvalho de Melo5-0/+5
When building with clang on a musl libc system, Alpine Linux, we end up hitting a problem where memset() is used but its prototype is not present, add it to avoid this: bench/futex-wake.c:99:3: error: implicitly declaring library function 'memset' with type 'void *(void *, int, unsigned long)' [-Werror,-Wimplicit-function-declaration] CPU_ZERO(&cpu); ^ /usr/include/sched.h:127:23: note: expanded from macro 'CPU_ZERO' #define CPU_ZERO(set) CPU_ZERO_S(sizeof(cpu_set_t),set) ^ /usr/include/sched.h:110:30: note: expanded from macro 'CPU_ZERO_S' #define CPU_ZERO_S(size,set) memset(set,0,size) ^ bench/futex-wake.c:99:3: note: include the header <string.h> or explicitly provide a declaration for 'memset' Found while updating my test build containers to build perf with clang in more systems. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf bench futex: Use __maybe_unusedArnaldo Carvalho de Melo1-6/+4
Instead of attributing a variable to itself to silence the compiler, use the attribute designed for that, avoiding this: In file included from bench/futex-hash.c:24: bench/futex.h:95:7: error: explicitly assigning value of variable of type 'pthread_attr_t *' to itself [-Werror,-Wself-assign] attr = attr; ~~~~ ^ ~~~~ bench/futex.h:96:13: error: explicitly assigning value of variable of type 'size_t' (aka 'unsigned long') to itself [-Werror,-Wself-assign] cpusetsize = cpusetsize; ~~~~~~~~~~ ^ ~~~~~~~~~~ bench/futex.h:97:9: error: explicitly assigning value of variable of type 'cpu_set_t *' (aka 'struct cpu_set_t *') to itself [-Werror,-Wself-assign] cpuset = cpuset; ~~~~~~ ^ ~~~~~~ That is only triggered when HAVE_PTHREAD_ATTR_SETAFFINITY_NP isn't set. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03tools build: Add test for sched_getcpu()Arnaldo Carvalho de Melo3-8/+6
Instead of trying to go on adding more ifdef conditions, do a feature test and define HAVE_SCHED_GETCPU_SUPPORT instead, then use it to provide the prototype. No need to change the stub, as it is already a __weak symbol. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf tools: Force uncore events to system wide monitoringJiri Olsa2-5/+33
Make system wide (-a) the default option if no target was specified and one of following conditions is met: - there's no workload specified (current behaviour) - there is workload specified but all requested events are system wide ones Mixed events core/uncore with workload: $ perf stat -e 'uncore_cbox_0/clockticks/,cycles' sleep 1 Performance counter stats for 'sleep 1': <not supported> uncore_cbox_0/clockticks/ 980,489 cycles 1.000897406 seconds time elapsed Uncore event with workload: $ perf stat -e 'uncore_cbox_0/clockticks/' sleep 1 Performance counter stats for 'system wide': 281,473,897,192,670 uncore_cbox_0/clockticks/ 1.000833784 seconds time elapsed Committer note: When testing I realized the default case for !root, i.e. no events passed via -e, was broke by v2 of this patch, reported and after a patch provided by Jiri it is back working: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.401335 task-clock:u (msec) # 0.297 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 48 page-faults:u # 0.120 M/sec 458,146 cycles:u # 1.142 GHz 245,113 instructions:u # 0.54 insn per cycle 47,991 branches:u # 119.578 M/sec 4,022 branch-misses:u # 8.38% of all branches 0.001350029 seconds time elapsed [acme@jouet linux]$ Suggested-and-Tested-by: Borislav Petkov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20170227094818.GA12764@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf intel-PT/BTS: Add missing initializationAdrian Hunter1-0/+2
$ perf test decoder 57: x86 instruction decoder - new instructions : FAILED! $ Failed to decode 'rel' value (0xfffffffc vs expected 0): 0f 1b 80 78 56 34 12 bndstx %bnd0,0x12345678(%rax) Failed to decode 'rel' value (0xfffffffc vs expected 0): 0f 1b 85 78 56 34 12 bndstx %bnd0,0x12345678(%rbp) Failed to decode 'rel' value (0xfffffffc vs expected 0): 0f 1b 84 01 78 56 34 12 bndstx %bnd0,0x12345678(%rcx,%rax,1) Failed to decode 'rel' value (0xfffffffc vs expected 0): 0f 1b 84 05 78 56 34 12 bndstx %bnd0,0x12345678(%rbp,%rax,1) Failed to decode 'rel' value (0xfffffffc vs expected 0): 0f 1b 84 08 78 56 34 12 bndstx %bnd0,0x12345678(%rax,%rcx,1) There is missing initialization. It only affects the test because it is checking 'rel' even in cases where there is no value. Fix it. Reported-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf probe: Generalize probe event file open routineNaveen N. Rao2-9/+12
Generalize probe event file open routine into a generic function for opening trace files. Signed-off-by: Naveen N. Rao <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/b580465c7a4dcd5d3b40fdf8568e6be45d0a6333.1487849577.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf ftrace: Use pager for displaying resultNamhyung Kim1-0/+3
It's convenient to use the pager when seeing many lines of result. Note that setup_pager() should be called after perf_evlist__prepare_workload() since they can interfere each other regarding shared stdio streams. Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf ftrace: Add support for -a and -C optionNamhyung Kim2-0/+74
The -a/--all-cpus and -C/--cpu option is for controlling tracing cpus. Signed-off-by: Namhyung Kim <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf cpumap: Introduce cpu_map__snprint_mask()Namhyung Kim2-0/+47
The cpu_map__snprint_mask() generates a string representation of a cpumask bitmap. For cpu 0 to 11, it'll return "fff". Committer notes: Fix compiler warning on some toolchains: 19 fedora:24-x-ARC-uClibc: FAIL CC /tmp/build/perf/util/cpumap.o util/cpumap.c: In function 'hex_char': util/cpumap.c:679:2: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (0 <= val && val <= 9) ^ cc1: all warnings being treated as errors Applying patch from Namhyung that makes function receive an 'unsigned char', that is what the callers are passing to this function. Signed-off-by: Namhyung Kim <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf ftrace: Add support for --pid optionNamhyung Kim2-27/+68
The -p (--pid) option enables to trace existing process by its pid. Committer notes: Testing it: Using the function_graph tracer on a process that is just waiting for user input and thus will make 'perf ftrace' sit there waiting for that, then press any key on that mutt session and see what happens: # perf ftrace -t function_graph -p `pidof mutt` | head -40 2) 1.038 us | switch_mm_irqs_off(); ------------------------------------------ 2) <idle>-0 => mutt-3595 ------------------------------------------ 2) | finish_task_switch() { 2) | smp_irq_work_interrupt() { 2) | irq_enter() { 2) 0.180 us | rcu_irq_enter(); 2) 1.248 us | } 2) | __wake_up() { 2) 0.126 us | _raw_spin_lock_irqsave(); 2) | __wake_up_common() { 2) | pollwake() { 2) | default_wake_function() { 2) | try_to_wake_up() { 2) 0.662 us | _raw_spin_lock_irqsave(); 2) | select_task_rq_fair() { 2) 1.719 us | effective_load.isra.41(); 2) 1.343 us | effective_load.isra.41(); 2) | select_idle_sibling() { 2) 0.331 us | idle_cpu(); 2) 1.458 us | } 2) 8.350 us | } 2) 0.200 us | _raw_spin_lock(); 2) | ttwu_do_activate() { 2) | activate_task() { 2) 0.136 us | update_rq_clock.part.77(); 2) | enqueue_task_fair() { 2) | enqueue_entity() { 2) 0.146 us | update_curr(); 2) 0.330 us | account_entity_enqueue(); 2) 0.280 us | update_cfs_shares(); 2) 0.321 us | place_entity(); 2) 0.206 us | __enqueue_entity(); 2) 6.926 us | } 2) | enqueue_entity() { 2) 0.105 us | update_curr(); 2) 0.175 us | account_entity_enqueue(); 2) 0.531 us | update_cfs_shares(); # Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf tools: Allow sorting by symbol sizeCharles Baylis4-0/+44
Add new sort key 'symbol_size' to allow user to sort by symbol size, or (more usefully) display the symbol size using --fields=...,symbol_size. Committer note: Testing it together with the recently added -q, to remove the headers, and using the '+' sign with -s, to add the symbol_size sort order to the default, which is '-s/--sort comm,dso,symbol': # perf report -q -s +symbol_size | head -10 10.39% swapper [kernel.vmlinux] [k] intel_idle 270 3.45% swapper [kernel.vmlinux] [k] update_blocked_averages 1546 2.61% swapper [kernel.vmlinux] [k] update_load_avg 1292 2.36% swapper [kernel.vmlinux] [k] update_cfs_shares 240 1.83% swapper [kernel.vmlinux] [k] __hrtimer_run_queues 606 1.74% swapper [kernel.vmlinux] [k] update_cfs_rq_load_avg. 1187 1.66% swapper [kernel.vmlinux] [k] apic_timer_interrupt 152 1.60% CPU 0/KVM [kvm] [k] kvm_set_msr_common 3046 1.60% gnome-shell libglib-2.0.so.0 [.] g_slist_find 37 1.46% gnome-termina libglib-2.0.so.0 [.] g_hash_table_lookup 370 # Signed-off-by: Charles Baylis <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Maxim Kuvyrkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Use symbol__size(), remove needless %lld + (long long) casting ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf evlist: Clarify a bit the use of perf_mmap->refcntArnaldo Carvalho de Melo1-1/+12
This is an odd refcount use case, so add some more comments to help understand that when it hits zero it really means that the mmap()ed area (on a perf_event_open() returned fd) has been munmap()ed. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf thread_map: Convert thread_map.refcnt from atomic_t to refcount_tElena Reshetova3-15/+15
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Did missing tests/thread-map.c conversion ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf thread: convert thread.refcnt from atomic_t to refcount_tElena Reshetova3-6/+6
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Did missing conversion in __machine__remove_thread() ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf evlist: Convert perf_map.refcnt from atomic_t to refcount_tElena Reshetova2-11/+11
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf map: Convert map_groups.refcnt from atomic_t to refcount_tElena Reshetova3-10/+10
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Did the missing conversion of tests/thread-mg-share.c too ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf map: Convert map.refcnt from atomic_t to refcount_tElena Reshetova2-6/+6
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf dso: Convert dso.refcnt from atomic_t to refcount_tElena Reshetova2-5/+5
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf comm: Convert comm_str.refcnt from atomic_t to refcount_tElena Reshetova1-9/+6
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Reinstated comm_str__get() function, needed when reusing entries in the rbtree ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf cpumap: Convert cpu_map.refcnt from atomic_t to refcount_tElena Reshetova3-11/+11
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ fixed mixed conversion to refcount in tests/cpumap.c ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf cgroup: Convert cgroup_sel.refcnt from atomic_t to refcount_tElena Reshetova2-5/+5
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: David Windsor <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Kook <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Andrew Morton <[email protected]> Cc: David Windsor <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Liljestrand <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matija Glavinic Pecotic <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03tools include: Adopt kernel's refcount.hArnaldo Carvalho de Melo1-0/+1
To aid in catching bugs when using atomics as a reference count. This is a trimmed down version with just what is used by tools/ at this point. After this, the patches submitted by Elena for tools/ doing the conversion from atomic_ to recount_ methods can be applied and tested. To activate it, buint perf with: make DEBUG=1 -C tools/perf Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03tools arch x86: Include asm/cmpxchg.hArnaldo Carvalho de Melo1-0/+1
Will be included from atomic.h and used in refcount.h Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf stat: Issue a HW watchdog disable hintBorislav Petkov1-0/+11
When using perf stat on an AMD F15h system with the default hw events attributes, some of the events don't get counted: Performance counter stats for 'sleep 1': 0.749208 task-clock (msec) # 0.001 CPUs utilized 1 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.072 M/sec 1,122,815 cycles # 1.499 GHz 286,740 stalled-cycles-frontend # 25.54% frontend cycles idle <not counted> stalled-cycles-backend (0.00%) ^^^^^^^^^^^^ <not counted> instructions (0.00%) ^^^^^^^^^^^^ <not counted> branches (0.00%) <not counted> branch-misses (0.00%) 1.001550070 seconds time elapsed The reason is that we have the HW watchdog consuming one PMU counter and when perf tries to schedule 6 events on 6 counters and some of those counters are constrained to only a specific subset of PMCs by the hardware, the event scheduling fails. So issue a hint to disable the HW watchdog around a perf stat session. Committer note: Testing it... # perf stat -d usleep 1 Performance counter stats for 'usleep 1': 1.180203 task-clock (msec) # 0.490 CPUs utilized 1 context-switches # 0.847 K/sec 0 cpu-migrations # 0.000 K/sec 54 page-faults # 0.046 M/sec 184,754 cycles # 0.157 GHz 714,553 instructions # 3.87 insn per cycle 154,661 branches # 131.046 M/sec 7,247 branch-misses # 4.69% of all branches 219,984 L1-dcache-loads # 186.395 M/sec 17,600 L1-dcache-load-misses # 8.00% of all L1-dcache hits (90.16%) <not counted> LLC-loads (0.00%) <not counted> LLC-load-misses (0.00%) 0.002406823 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-03perf vendor events: Add mapping for KnightsMill PMU eventsKarol Wachowski1-0/+1
Reuse events from KnightsLanding for KnightsMill Signed-off-by: Karol Wachowski <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Piotr Luc <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-03-01x86/events: Remove last remnants of old filenamesBorislav Petkov1-1/+1
Update to the new file paths, remove them from introductory comments. Signed-off-by: Borislav Petkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2017-02-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds50-124/+248
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes on the kernel and tooling side - nothing in particular stands out" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf/core: Fix the perf_cpu_time_max_percent check perf/core: Fix perf_event_enable_on_exec() timekeeping (again) perf/core: Remove confusing comment and move put_ctx() perf record: Honor --quiet option properly perf annotate: Add -q/--quiet option perf diff: Add -q/--quiet option perf report: Add -q/--quiet option perf utils: Check verbose flag properly perf utils: Add perf_quiet_option() perf record: Add -a as default target perf stat: Add -a as default target perf tools: Fail on using multiple bits long terms without value perf tools: Move new_term arguments into struct parse_events_term template perf build: Add special fixdep cleaning rule perf tools: Replace _SC_NPROCESSORS_CONF with max_present_cpu in cpu_topology_map perf header: Make build_cpu_topology skip offline/absent CPUs perf cpumap: Add cpu__max_present_cpu() perf session: Fix DEBUG=1 build with clang tools lib traceevent: It's preempt not prempt perf python: Filter out -specs=/a/b/c from the python binding cc options ...
2017-02-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-5/+5
Merge yet more updates from Andrew Morton: - a few MM remainders - misc things - autofs updates - signals - affs updates - ipc - nilfs2 - spelling.txt updates * emailed patches from Andrew Morton <[email protected]>: (78 commits) mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear() mm: add arch-independent testcases for RODATA hfs: atomically read inode size mm: clarify mm_struct.mm_{users,count} documentation mm: use mmget_not_zero() helper mm: add new mmget() helper mm: add new mmgrab() helper checkpatch: warn when formats use %Z and suggest %z lib/vsprintf.c: remove %Z support scripts/spelling.txt: add some typo-words scripts/spelling.txt: add "followings" pattern and fix typo instances scripts/spelling.txt: add "therfore" pattern and fix typo instances scripts/spelling.txt: add "overwriten" pattern and fix typo instances scripts/spelling.txt: add "overwritting" pattern and fix typo instances scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances scripts/spelling.txt: add "disassocation" pattern and fix typo instances scripts/spelling.txt: add "omited" pattern and fix typo instances scripts/spelling.txt: add "explictely" pattern and fix typo instances scripts/spelling.txt: add "applys" pattern and fix typo instances scripts/spelling.txt: add "configuartion" pattern and fix typo instances ...
2017-02-27Merge branch 'for-4.11' of ↵Linus Torvalds1-7/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Several noteworthy changes. - Parav's rdma controller is finally merged. It is very straight forward and can limit the abosolute numbers of common rdma constructs used by different cgroups. - kernel/cgroup.c got too chubby and disorganized. Created kernel/cgroup/ subdirectory and moved all cgroup related files under kernel/ there and reorganized the core code. This hurts for backporting patches but was long overdue. - cgroup v2 process listing reimplemented so that it no longer depends on allocating a buffer large enough to cache the entire result to sort and uniq the output. v2 has always mangled the sort order to ensure that users don't depend on the sorted output, so this shouldn't surprise anybody. This makes the pid listing functions use the same iterators that are used internally, which have to have the same iterating capabilities anyway. - perf cgroup filtering now works automatically on cgroup v2. This patch was posted a long time ago but somehow fell through the cracks. - misc fixes asnd documentation updates" * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits) kernfs: fix locking around kernfs_ops->release() callback cgroup: drop the matching uid requirement on migration for cgroup v2 cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy cgroup: misc cleanups cgroup: call subsys->*attach() only for subsystems which are actually affected by migration cgroup: track migration context in cgroup_mgctx cgroup: cosmetic update to cgroup_taskset_add() rdmacg: Fixed uninitialized current resource usage cgroup: Add missing cgroup-v2 PID controller documentation. rdmacg: Added documentation for rdmacg IB/core: added support to use rdma cgroup controller rdmacg: Added rdma cgroup controller cgroup: fix a comment typo cgroup: fix RCU related sparse warnings cgroup: move namespace code to kernel/cgroup/namespace.c cgroup: rename functions for consistency cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c cgroup: separate out cgroup1_kf_syscall_ops cgroup: refactor mount path and clearly distinguish v1 and v2 paths cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c ...
2017-02-27scripts/spelling.txt: add "an one" pattern and fix typo instancesMasahiro Yamada1-1/+1
Fix typos and add the following to the scripts/spelling.txt: an one||a one I dropped the "an" before "one or more" in drivers/net/ethernet/sfc/mcdi_pcol.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-27scripts/spelling.txt: add "an union" pattern and fix typo instancesMasahiro Yamada2-3/+3
Fix typos and add the following to the scripts/spelling.txt: an union||a union Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-27scripts/spelling.txt: add "an user" pattern and fix typo instancesMasahiro Yamada1-1/+1
Fix typos and add the following to the scripts/spelling.txt: an user||a user an userspace||a userspace I also added "userspace" to the list since it is a common word in Linux. I found some instances for "an userfaultfd", but I did not add it to the list. I felt it is endless to find words that start with "user" such as "userland" etc., so must draw a line somewhere. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2-0/+2
Pull networking updates from David Miller: "Highlights: 1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini Varadhan. 2) Simplify classifier state on sk_buff in order to shrink it a bit. From Willem de Bruijn. 3) Introduce SIPHASH and it's usage for secure sequence numbers and syncookies. From Jason A. Donenfeld. 4) Reduce CPU usage for ICMP replies we are going to limit or suppress, from Jesper Dangaard Brouer. 5) Introduce Shared Memory Communications socket layer, from Ursula Braun. 6) Add RACK loss detection and allow it to actually trigger fast recovery instead of just assisting after other algorithms have triggered it. From Yuchung Cheng. 7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot. 8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert. 9) Export MPLS packet stats via netlink, from Robert Shearman. 10) Significantly improve inet port bind conflict handling, especially when an application is restarted and changes it's setting of reuseport. From Josef Bacik. 11) Implement TX batching in vhost_net, from Jason Wang. 12) Extend the dummy device so that VF (virtual function) features, such as configuration, can be more easily tested. From Phil Sutter. 13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric Dumazet. 14) Add new bpf MAP, implementing a longest prefix match trie. From Daniel Mack. 15) Packet sample offloading support in mlxsw driver, from Yotam Gigi. 16) Add new aquantia driver, from David VomLehn. 17) Add bpf tracepoints, from Daniel Borkmann. 18) Add support for port mirroring to b53 and bcm_sf2 drivers, from Florian Fainelli. 19) Remove custom busy polling in many drivers, it is done in the core networking since 4.5 times. From Eric Dumazet. 20) Support XDP adjust_head in virtio_net, from John Fastabend. 21) Fix several major holes in neighbour entry confirmation, from Julian Anastasov. 22) Add XDP support to bnxt_en driver, from Michael Chan. 23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan. 24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi. 25) Support GRO in IPSEC protocols, from Steffen Klassert" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits) Revert "ath10k: Search SMBIOS for OEM board file extension" net: socket: fix recvmmsg not returning error from sock_error bnxt_en: use eth_hw_addr_random() bpf: fix unlocking of jited image when module ronx not set arch: add ARCH_HAS_SET_MEMORY config net: napi_watchdog() can use napi_schedule_irqoff() tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" net/hsr: use eth_hw_addr_random() net: mvpp2: enable building on 64-bit platforms net: mvpp2: switch to build_skb() in the RX path net: mvpp2: simplify MVPP2_PRS_RI_* definitions net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT net: mvpp2: remove unused register definitions net: mvpp2: simplify mvpp2_bm_bufs_add() net: mvpp2: drop useless fields in mvpp2_bm_pool and related code net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue' net: mvpp2: release reference to txq_cpu[] entry after unmapping net: mvpp2: handle too large value in mvpp2_rx_time_coal_set() net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set() net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set ...
2017-02-20perf record: Honor --quiet option properlyNamhyung Kim1-0/+2
It should call perf_quiet_option() to suppress messages. Signed-off-by: Namhyung Kim <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Fix merge clash with 483635a9d080 ("perf record: Add -a as default target") ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-02-20perf annotate: Add -q/--quiet optionNamhyung Kim2-0/+8
The -q/--quiet option is to suppress any message. Sometimes users just want to see the numbers and it can be used for that case. Signed-off-by: Namhyung Kim <[email protected]> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-02-20perf diff: Add -q/--quiet optionNamhyung Kim2-4/+14
The -q/--quiet option is to suppress any message. Sometimes users just want to see the numbers and it can be used for that case. Committer notes: Before: # perf diff | head -10 Failed to open /tmp/perf-6678.map, continuing without symbols Failed to open /tmp/perf-6678.map, continuing without symbols Failed to open /tmp/perf-2646.map, continuing without symbols # Event 'cycles' # # Baseline Delta Abs Shared Object Symbol # ........ ......... .......................... ............................................ # 5.36% -1.76% [kernel.vmlinux] [k] intel_idle 2.80% +1.48% firefox [.] 0x00000000000101fe 57.12% -1.25% libxul.so [.] 0x00000000009bea92 1.36% -1.11% [kernel.vmlinux] [k] __schedule 4.26% -1.00% perf-6678.map [.] 0x00007fac4b0e9320 After: # perf diff -q | head -10 5.36% -1.76% [kernel.vmlinux] [k] intel_idle 2.80% +1.48% firefox [.] 0x00000000000101fe 57.12% -1.25% libxul.so [.] 0x00000000009bea92 1.36% -1.11% [kernel.vmlinux] [k] __schedule 4.26% -1.00% perf-6678.map [.] 0x00007fac4b0e9320 1.86% +0.95% [kernel.vmlinux] [k] update_blocked_averages 0.80% -0.70% [kernel.vmlinux] [k] native_sched_clock 0.74% -0.58% [kernel.vmlinux] [k] native_write_msr 0.76% -0.56% qemu-system-x86_64 [.] 0x00000000002395c0 +0.54% libpulsecommon-10.0.so [.] 0x000000000002d91b # Signed-off-by: Namhyung Kim <[email protected]> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>