Age | Commit message (Collapse) | Author | Files | Lines |
|
Getting unwieldly long, for this app domain should be descriptive enough
and the use of __ to separate the class from the method names should
help with avoiding clashes with other code bases.
Reported-by: David Ahern <[email protected]>
Suggested-by: Ingo Molnar <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Print related option help messages only when it failed to process
options. While at it, modify parse_options_usage() to skip usage part
so that it can be used for showing multiple option help messages
naturally like below:
$ perf stat -Bx, ls
-B option not supported with -x
usage: perf stat [<options>] [<command>]
-B, --big-num print large numbers with thousands' separators
-x, --field-separator <separator>
print counts with custom separator
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Enthusiastically-Supported-by: Ingo Molnar <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Ingo pointed out that the task-clock counter should have the units
explicitly stated since it is not a counter.
Before:
perf stat -a -- sleep 1
Performance counter stats for 'sleep 1':
16186.874834 task-clock # 16.154 CPUs utilized
...
After:
perf stat -a -- sleep 1
Performance counter stats for 'system wide':
16146.402138 task-clock (msec) # 16.125 CPUs utilized
...
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The "perf stat" command can do system wide counters or one or more cpus.
For these options do not require a workload to be specified.
v2: use perf_target__none per Namhyung's comment.
Signed-off-by: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The "perf stat" tool displays the command run in its summary output
which is misleading when using a cpu list or system wide collection.
Before:
perf stat -a -- sleep 1
Performance counter stats for 'sleep 1':
16152.670249 task-clock # 16.132 CPUs utilized
417 context-switches # 0.002 M/sec
7 cpu-migrations # 0.030 K/sec
...
After:
perf stat -a -- sleep 1
Performance counter stats for 'system wide':
16206.931120 task-clock # 16.144 CPUs utilized
395 context-switches # 0.002 M/sec
5 cpu-migrations # 0.030 K/sec
...
or
perf stat -C1 -- sleep 1
Performance counter stats for 'CPU(s) 1':
1001.669257 task-clock # 1.000 CPUs utilized
4,264 context-switches # 0.004 M/sec
3 cpu-migrations # 0.003 K/sec
...
Signed-off-by: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When only the instructions event is requested:
$ perf stat -e instructions git s
M builtin-stat.c
Performance counter stats for 'git s':
917,453,420 instructions # 0.00 insns per cycle
0.213002926 seconds time elapsed
The 0.00 insns per cycle comment in the output is totally bogus and
misleading. It happens because update_shadow_stats() doesn't touch
runtime_cycles_stats when only the instructions event is requested. So,
omit printing the bogus data altogether.
Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When only the cycles event is requested:
$ perf stat -e cycles dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.26123 s, 2.0 GB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
911,626,453 cycles # 0.000 GHz
0.262113350 seconds time elapsed
The 0.000 GHz comment in the output is totally bogus and misleading. It
happens because update_shadow_stats() doesn't touch runtime_nsecs_stats;
it is only written when a requested counter matches a SW_TASK_CLOCK. In
our case, since we have only requested HW_CPU_CYCLES,
runtime_nsecs_stats is unavailable. So, omit printing the comment
altogether.
Signed-off-by: Ramkumar Ramachandra <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
|
|
The commit acf2892270dc ("perf stat: Use perf_evlist__prepare/
start_workload()") converted to use the function but forgot to update
child_pid. Fix it.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the in_tx and in_tx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction
length.
This is a reasonable overview over the success of the transactions.
Also support architectures that have a transaction aborted cycles
counter like POWER8. Since that is awkward to handle in the kernel
abstract handle both cases here.
Enable with a new --transaction / -T option.
This requires measuring these events in a group, since they depend on each
other.
This is implemented by using TM sysfs events exported by the kernel
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When interval mode is outputting to a pipe, each measurement should be
flushed individually, so that the reader sees it timely.
With a terminal each line is automatically flushed by stdio, but that is
disabled with non terminal output.
Simply fflush output after each time interval
Signed-off-by: Andi Kleen <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When measuring workloads the startup phase -- doing page faults, dynamic
linking, opening files -- is often very different from the rest of the
workload. Especially with smaller kernels and using counter
multiplexing this can give significant measurement errors.
Multiplexing assumes that the workload is mostly the same over longer
periods. But at startup there is typically some spike of activity which
is relatively short. If many groups are multiplexing the one group
seeing the spike, and which is then scaled up over the time to run all
groups, may see a significant error.
Also in general it's often not useful to measure the startup, because it
is so different from the rest.
One way around this is to use interval mode and discard the first
sample, but this can be awkward because interval mode doesn't support
intervals of less than 100ms, and also a useful interval is not
necessarily the same as a useful startup delay.
This patch adds a new --initial-delay / -D option to skip measuring for
the startup phase. The time can be specified in ms
Here's a simple example:
perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
...
3,721 page-faults
...
If we just wait 20 ms the number of page faults is 1/3 less:
perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
...
2,823 page-faults
...
So we filtered out most of the startup noise from bash.
Signed-off-by: Andi Kleen <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch fixes a problem reported by Andi Kleen on perf
stat when measuring uncore events:
# perf stat --per-socket -e uncore_pcu/event=0x0/ -I1000 -a sleep 2
It would not report counts for the second socket. That was due to a
cpu mapping bug in print_aggr().
This patch also fixes the socket numbering bug for <not counted>
events.
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Tested-by: Andi Kleen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20130705170645.GA32519@quad
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch fixes a problem with perf stat whereby on termination it may
send a SIGTERM signal to random processes on systems with high PID
recycling. I got some actual bug reports on this.
There is race between the SIGCHLD and sig_atexit() handlers. This patch
addresses this problem by clearing child_pid in the SIGCHLD handler.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/20130604154426.GA2928@quad
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds the --per-core option to perf stat.
This option is used to aggregate system-wide counts
on a per physical core basis. On processors with
hyperthreading, this means counts of all HT threads
running on a physical core are aggregated.
This mode is useful to find imblance between physical
cores running an uniform workload. Cores are identified
by socket: S0-C1, means physical core 1 on socket 0. Note
that cores are identified using their physical core id,
thus their numbering may not be continuous.
Per core aggregation can be combined with interval printing:
# perf stat -a --per-core -I 1000 -e cycles sleep 1000
# time core cpus counts events
1.000090030 S0-C0 1 4,765,747 cycles
1.000090030 S0-C1 1 5,580,647 cycles
1.000090030 S0-C2 1 221,181 cycles
1.000090030 S0-C3 1 266,092 cycles
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: Remove parts already applied on 86ee6e1 to keep bisectability ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To make it more obvious what this option does as suggested by Andi on
LKML.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Refactor aggregation code by introducing a single aggr_mode variable and an
enum for aggregation.
Also refactor cpumap code having to do with cpu to socket mappings. All in
preparation for extended modes, such as cpu -> core.
Also fix socket aggregation and ensure that sockets are printed in increasing
order.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: Fixup conflicts with a7e191c "--repeat forever" and
acf2892 "Use perf_evlist__prepare/start_workload()" ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Reducing the noise in the main logic.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The following patch causes 'perf stat --repeat 0' to be interpreted as
'forever', displaying the stats for every run.
We act as if a single run was asked, and reset the stats in each
iteration. In this mode SIGINT is passed to perf to be able to stop the
loop with Ctrl+C.
Signed-off-by: Frederik Deweerdt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf stat had an open code to the duplicated work. Use the helper
as it now can be called without struct perf_record_opts.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Introduce and use the thread_map__nr() function to protect a possible
NULL pointer dereference and cleanup the code a bit.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It's almost always used with NULL for both arguments. Get rid of the
arguments from the signature and use perf_evlist__set_maps() if needed.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: replaced spaces with tabs in some of the affected lines ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds per-processor socket count aggregation for system-wide
mode measurements. This is a useful mode to detect imbalance between
sockets.
To enable this mode, use --aggr-socket in addition
to -a. (system-wide).
The output includes the socket number and the number of online
processors on that socket. This is useful to gauge the amount of
aggregation.
# ./perf stat -I 1000 -a --aggr-socket -e cycles sleep 2
# time socket cpus counts events
1.000097680 S0 4 5,788,785 cycles
2.000379943 S0 4 27,361,546 cycles
2.001167808 S0 4 818,275 cycles
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: Added missing man page entry based on above comments ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The ->counts field was never freed in the current code. Add
perf_evsel__free_counts() function to free it properly.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds a new printing mode for perf stat. It allows interval
printing. That means perf stat can now print event deltas at regular
time interval. This is useful to detect phases in programs.
The -I option enables interval printing. It expects an interval duration
in milliseconds. Minimum is 100ms. Once, activated perf stat prints
events deltas since last printout. All modes are supported.
$ perf stat -I 1000 -e cycles noploop 10
noploop for 10 seconds
# time counts events
1.000109853 2,388,560,546 cycles
2.000262846 2,393,332,358 cycles
3.000354131 2,393,176,537 cycles
4.000439503 2,393,203,790 cycles
5.000527075 2,393,167,675 cycles
6.000609052 2,393,203,670 cycles
7.000691082 2,393,175,678 cycles
The output format makes it easy to feed into a plotting program such as
gnuplot when the -I option is used in combination with the -x option:
$ perf stat -x, -I 1000 -e cycles noploop 10
noploop for 10 seconds
1.000084113,2378775498,cycles
2.000245798,2391056897,cycles
3.000354445,2392089414,cycles
4.000459115,2390936603,cycles
5.000565341,2392108173,cycles
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
That consolidates the error messages in 'record', 'stat' and 'top', that
now get a consistent set of messages and allow other tools to use the
new method to report problems using whatever UI toolkit.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instead of doing it in stat, top, record or any other tool that opens
event descriptors.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Convert perf_evsel__is_group_member to perf_evsel__is_group_leader.
This is because the most usecases are using negative form to check
whether the given evsel is a leader or not and it's IMHO somewhat
ambiguous - leader also *is* a member of the group.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To clarify what is being tested, instead of assuming that evsel->leader
== NULL means either an 'isolated' evsel or a 'group leader'.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fixing events attributes for groups defined via '{}'.
Currently 'enable_on_exec' attribute in record command and both
'disabled ' and 'enable_on_exec' attributes in stat command are set
based on the 'group' option. This eliminates proper setup for '{}'
defined groups as they don't set 'group' option.
Making above attributes values based on the 'evsel->leader' as this is
common to both group definition.
Moving perf_evlist__set_leader call within builtin-record ahead
perf_evlist__config_attrs call, because the latter needs possible group
leader links in place.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In order to measure kernel builds, one has to do some pre/post cleanup
work in order to do the repeat build.
So provide --pre and --post command hooks to allow doing just that.
perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \
-- make -s -j64 O=defconfig-build/ bzImage
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/1350992414.13456.5.camel@twins
[ committer note: Added respective entries in Documentation/perf-stat.txt ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some variables were global but used in just one function, so move it to
where it belongs.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Because that is what it really does, i.e. it applies the filters that
were parsed from the command line and stashed into the evsels they refer
to.
We'll need the set_filter method name to actually apply a filter to all
the evsels in an evlist, for instance, to ask that a syswide tracer
doesn't trace itself.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If user doesn't explicitly specify CPU list, perf-stat only collects
events on CPUs listed in the PMU cpumask file.
Signed-off-by: "Yah, Zheng" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Then, the code can be shared between kvm events and perf stat.
Signed-off-by: Xiao Guangrong <[email protected]>
[ Dong Hao <[email protected]>: rebase it on acme's git tree ]
Signed-off-by: Dong Hao <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Marcelo Tosatti <[email protected]>
Cc: Runzhen Wang <[email protected]>
Cc: Xiao Guangrong <[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf defines both __used and __unused variables to use for marking
unused variables. The variable __used is defined to
__attribute__((__unused__)), which contradicts the kernel definition to
__attribute__((__used__)) for new gcc versions. On Android, __used is
also defined in system headers and this leads to warnings like: warning:
'__used__' attribute ignored
__unused is not defined in the kernel and is not a standard definition.
If __unused is included everywhere instead of __used, this leads to
conflicts with glibc headers, since glibc has a variables with this name
in its headers.
The best approach is to use __maybe_unused, the definition used in the
kernel for __attribute__((unused)). In this way there is only one
definition in perf sources (instead of 2 definitions that point to the
same thing: __used and __unused) and it works on both Linux and Android.
This patch simply replaces all instances of __used and __unused with
__maybe_unused.
Signed-off-by: Irina Tirdea <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Allows perf to clean up properly on program termination.
Signed-off-by: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To replace the longer list_entry constructs for things that are widely
used:
perf_evlist__{first,last}(evlist)
perf_evsel__next(evsel)
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Just like was done for parse_events__set_leader.
Also we need to have the list_entry set_leader method in evlist.c so that we
don't grow another dep in the python binding:
# ~acme/git/linux/tools/perf/python/twatch.py
Traceback (most recent call last):
File "/home/acme/git/linux/tools/perf/python/twatch.py", line 16, in <module>
import perf
ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: parse_events__set_leader
And also remove a pr_debug from evsel.c so that we avoid this one too:
# ~acme/git/linux/tools/perf/python/twatch.py
Traceback (most recent call last):
File "/home/acme/git/linux/tools/perf/python/twatch.py", line 16, in <module>
import perf
ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: eprintf
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds a functionality that allows to create event groups
based on the way they are specified on the command line. Adding
functionality to the '{}' group syntax introduced in earlier patch.
The current '--group/-g' option behaviour remains intact. If you
specify it for record/stat/top command, all the specified events
become members of a single group with the first event as a group
leader.
With the new '{}' group syntax you can create group like:
# perf record -e '{cycles,faults}' ls
resulting in single event group containing 'cycles' and 'faults'
events, with cycles event as group leader.
All groups are created with regards to threads and cpus. Thus
recording an event group within a 2 threads on server with
4 CPUs will create 8 separate groups.
Examples (first event in brackets is group leader):
# 1 group (cpu-clock,task-clock)
perf record --group -e cpu-clock,task-clock ls
perf record -e '{cpu-clock,task-clock}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock},{minor-faults,major-faults}' ls
# 1 group (cpu-clock,task-clock,minor-faults,major-faults)
perf record --group -e cpu-clock,task-clock -e minor-faults,major-faults ls
perf record -e '{cpu-clock,task-clock,minor-faults,major-faults}' ls
# 2 groups (cpu-clock,task-clock) (minor-faults,major-faults)
perf record -e '{cpu-clock,task-clock} -e '{minor-faults,major-faults}' \
-e instructions ls
# 1 group
# (cpu-clock,task-clock,minor-faults,major-faults,instructions)
perf record --group -e cpu-clock,task-clock \
-e minor-faults,major-faults -e instructions ls perf record -e
'{cpu-clock,task-clock,minor-faults,major-faults,instructions}' ls
It's possible to use standard event modifier for a group, which spans
over all events in the group and updates each event modifier settings,
for example:
# perf record -r '{faults:k,cache-references}:p'
resulting in ':kp' modifier being used for 'faults' and ':p' modifier
being used for 'cache-references' event.
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ulrich Drepper <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf improvements from Arnaldo Carvalho de Melo:
* Replace event_name with perf_evsel__name, that handles the event
modifiers and doesn't use static variables.
* GTK browser improvements, from Namhyung Kim
* Fix possible NULL pointer deref in the TUI annotate browser, from
Samuel Liao
* Add sort by source file:line number, using addr2line.
* Allow printing histogram text snapshots at any point in top/report.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
So that we don't use global variables that could make us misreport event
names when having a multi window top, for instance.
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The following commit:
commit 56f3bae70638b33477a6015fd362ccfe354fd3ee
Author: Jim Cromie <[email protected]>
Date: Wed Sep 7 17:14:00 2011 -0600
perf stat: Add --log-fd <N> option to redirect stderr elsewhere
introduced a bug in the way perf stat outputs the results by default,
i.e., without the --log-fd or --output option. It would default to
writing to file descriptor 0, i.e., stdin. Writing to stdin is allowed
and is equivalent to writing to stdout. However, there is a major
difference for any script that was already capturing the output of perf
stat via redirection:
perf stat >/tmp/log .... or perf stat 2>/tmp/log ....
They would not capture anything anymore. They would have to do:
perf stat 0>/tmp/log ...
This breaks compatibility with existing scripts and does not look very
natural.
This patch fixes the problem by looking at output_fd only when it was
modified by user (> 0). It also checks that the value if positive.
Passing --log-fd 0 is ignored.
I would also argue that defaulting to stderr for the results is not the
right thing to do, though this patch does not address this specific
issue.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jim Cromie <[email protected]>
Link: http://lkml.kernel.org/r/20120515111111.GA9870@quad
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When no event is specified the tools use perf_evlist__add_default(), that will
call event_attr_init to initialize the KVM exclusion bits.
When the change was made to the tools so that by default guest samples would be
excluded, the changes were made just to the parsing routines and to
perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far
just by perf stat to add multiple events, according to the level of detail
specified.
Recently the tools were changed to reconstruct the event name from all the
details in perf_event_attr, not just from .type and .config, but taking into
account all the feature bits (.exclude_{guest,host,user,kernel,etc},
.precise_ip, etc).
That is when we noticed that the default for perf stat wasn't the one for the
rest of the tools, i.e. the .exclude_guest bit wasn't being set.
I.e. the default, that doesn't call event_attr_init was showing the :HG
modifier:
$ perf stat usleep 1
Performance counter stats for 'usleep 1':
0.942119 task-clock # 0.454 CPUs utilized
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 K/sec
126 page-faults # 0.134 M/sec
693,193 cycles:HG # 0.736 GHz [40.11%]
407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%]
365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle
465,982 instructions:HG # 0.67 insns per cycle
# 0.87 stalled cycles per insn
89,760 branches:HG # 95.275 M/sec
6,178 branch-misses:HG # 6.88% of all branches
0.002077228 seconds time elapsed
While if one explicitely specifies the same events, which will make the parsing code
to be called and thus event_attr_init is called:
$ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1
Performance counter stats for 'usleep 1':
1.040349 task-clock # 0.500 CPUs utilized
2 context-switches # 0.002 M/sec
0 CPU-migrations # 0.000 K/sec
127 page-faults # 0.122 M/sec
587,966 cycles # 0.565 GHz [13.18%]
459,167 stalled-cycles-frontend # 78.09% frontend cycles idle
390,249 stalled-cycles-backend # 66.37% backend cycles idle
504,006 instructions # 0.86 insns per cycle
# 0.91 stalled cycles per insn
96,455 branches # 92.714 M/sec
6,522 branch-misses # 6.76% of all branches [96.12%]
0.002078681 seconds time elapsed
Fix it by introducing a perf_evlist__add_default_attrs method that will call
evlist_attr_init in all the perf_event_attr entries before adding the events.
Reported-by: Ingo Molnar <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Merge reason: We are going to queue up a dependent patch:
"perf tools: Move parse event automated tests to separated object"
That depends on:
commit e7c72d8
perf tools: Add 'G' and 'H' modifiers to event parsing
Conflicts:
tools/perf/builtin-stat.c
Conflicted with the recent 'perf_target' patches when checking the
result of perf_evsel open routines to see if a retry is needed to cope
with older kernels where the exclude guest/host perf_event_attr bits
were not used.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Rename perf_target__no_{cpu,task} to perf_target__has_{cpu,task} because
it's more intuitive and easy to parse (for human beings) when used with
negation.
The names are came out from David Ahern. It is intended to be a
mechanical substitution without any functional change.
The perf_target__none remains unchanged since I couldn't find a right
name and it is hardly used with negation.
Signed-off-by: Namhyung Kim <[email protected]>
Suggested-by: David Ahern <[email protected]>
Suggested-by: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf stat on PPC currently fails to run:
$ perf stat -- sleep 1
Error: open_counter returned with 6 (No such device or address). /bin/dmesg may provide additional information.
Fatal: Not all events could be opened.
The problem is that until 2.6.37 (behavior changed with commit b0a873e)
perf on PPC returns ENXIO when hw_perf_event_init() fails. With this
patch we get the expected behavior:
$ perf stat -v -- sleep 1
cycles event is not supported by the kernel.
stalled-cycles-frontend event is not supported by the kernel.
stalled-cycles-backend event is not supported by the kernel.
instructions event is not supported by the kernel.
branches event is not supported by the kernel.
branch-misses event is not supported by the kernel.
...
Signed-off-by: David Ahern <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf stat on PPC currently fails to run:
$ perf stat -- sleep 1
Error: open_counter returned with 6 (No such device or address). /bin/dmesg may provide additional information.
Fatal: Not all events could be opened.
The problem is that until 2.6.37 (behavior changed with commit b0a873e)
perf on PPC returns ENXIO when hw_perf_event_init() fails. With this
patch we get the expected behavior:
$ perf stat -v -- sleep 1
cycles event is not supported by the kernel.
stalled-cycles-frontend event is not supported by the kernel.
stalled-cycles-backend event is not supported by the kernel.
instructions event is not supported by the kernel.
branches event is not supported by the kernel.
branch-misses event is not supported by the kernel.
...
Signed-off-by: David Ahern <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use same function with perf record and top to share the code checks
combinations of different switches.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
There are places that check whether target task/cpu is given or not and
some of them didn't check newly introduced uid or cpu list. Add and use
three of helper functions to treat them properly.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|