Age | Commit message (Collapse) | Author | Files | Lines |
|
To write config items to a particular config file, we should know where
is each config section and item from.
Current setting functionality of perf-config use autogenerating way by
overwriting collected config items to a config file.
For example, when collecting config items from user and system config
files (i.e. ~/.perfconfig and $(sysconf)/perfconfig), perf_config_set
can contain both user and system config items. So we should know where
each value is from to avoid merging user and system config items on user
config file.
Signed-off-by: Taeung Song <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Nambong Ha <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wookje Kwon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add setting feature that can add config variables with their values to a
config file (i.e. user or system config file) or modify config key-value
pairs in a config file. For the syntax examples:
perf config [<file-option>] [section.name[=value] ...]
e.g. You can set the ui.show-headers to false with
# perf config ui.show-headers=false
If you want to add or modify several config items, you can do like
# perf config annotate.show_nr_jumps=false kmem.default=slab
Committer notes:
Testing it:
$ perf config -l
top.children=true
report.children=false
$
$ perf config top.children=false
$ perf config -l
top.children=false
report.children=false
$
$ perf config kmem.default=slab
$ perf config -l
top.children=false
report.children=false
kmem.default=slab
$
Signed-off-by: Taeung Song <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Nambong Ha <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wookje Kwon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Combined patch with docs update with this one ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
You can show the values for several config items as below:
# perf config report.queue-size call-graph.record-mode
but it is necessary to more precisely check arguments, before passing
them to show_spec_config(). This validation function would be also used
when parsing config key-value pairs arguments in the near future.
Committer notes:
Testing it:
$ perf config bla.
The config variable does not contain a variable name: bla.
$ perf config .bla
The config variable does not contain a section name: .bla
$ perf config bla.bla
$
Signed-off-by: Taeung Song <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Nambong Ha <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wookje Kwon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Fix some spelling errors ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a functionality getting specific config key-value pairs.
For the syntax examples,
perf config [<file-option>] [section.name ...]
e.g. To query config items 'report.queue-size' and 'report.children', do
# perf config report.queue-size report.children
Signed-off-by: Taeung Song <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Nambong Ha <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wookje Kwon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Combined patch with docs update with this one ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now when jvmti compilation is plugged into Makefile.perf, there's no
need for this makefile.
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: William Cohen <[email protected]>
Link: http://lkml.kernel.org/r/20161112121016.GA17194@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Compile jvmti agent as part of the perf build. The agent library is
called libperf-jvmti.so and is installed in default place together with
other files:
$ make libperf-jvmti.so
BUILD: Doing 'make -j4' parallel build
...
CC jvmti/libjvmti.o
CC jvmti/jvmti_agent.o
LD jvmti/jvmti-in.o
LINK libperf-jvmti.so
$ make DESTDIR=/tmp/krava/ install-bin
...
$ find /tmp/krava/ | grep libperf
/tmp/krava/lib64/libperf-jvmti.so
/tmp/krava/lib64/libperf-gtk.so
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Stephane Eranian <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: William Cohen <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since the unprivileged sched switch event was added in perf, PT doesn't
need need perf_event_paranoid=-1 anymore for per cpu decoding.
Add a note stating that that is only needed for kernels < 4.2.
Reported-by: Andi Kleen <[email protected]>
Report-Link: http://lkml.kernel.org/r/http://lkml.kernel.org/n/[email protected]
Acked-by: Adrian Hunter <[email protected]>
Fixes: 45ac1403f564 ("perf: Add PERF_RECORD_SWITCH to indicate context switches")
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Markus reported that there's a weird behavior on perf top --hierarchy
regarding the column length.
Looking at the code, I found a dubious code which affects the symptoms.
When --hierarchy option is used, the last column length might be
inaccurate since it skips to update the length on leaf entries.
I cannot remember why it did and looks like a leftover from previous
version during the development.
Anyway, updating the column length often is not harmful. So let's move
the code out.
Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 1a3906a7e6b9 ("perf hists: Resort hist entries with hierarchy")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When horizontall scrolling is used in hierarchy mode, the the right most
column has unnecessary indentation. Actually it's needed only if some
of left (overhead) columns were shown.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When horizontal scrolling is used in hierarchy mode, the folded signed
disappears at the right most column.
Committer note:
To test it, run 'perf top --hierarchy, see the '+' symbol at the first
column, then press the right arrow key, the '+' symbol will disappear,
this patch fixes that.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Move 'width -= 2' invariant to right after the if/else ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It should indent 2 spaces for folded sign and a whitespace.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.
But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.
In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.
Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since 841e3558b2d ("perf callchain: Recording 'dwarf' callchains do not
need DWARF unwinding support"), --call-graph dwarf is allowed in 'perf
record' even without unwind support. A couple of other places don't
reflect this yet though: the help text should list dwarf as a valid
record mode and the dump_size config should be respected too.
Signed-off-by: Rabin Vincent <[email protected]>
Cc: He Kuang <[email protected]>
Fixes: 841e3558b2de ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In ac12f6764c50 ("perf tools: Implement branch_type event parameter") we
started using the parse_branch_str() function from one of the files used
in the python binding, which caused this entry in 'perf test' to fail:
# perf test -v python
16: Try 'import perf' in python, checking link problems :
--- start ---
test child forked, pid 16667
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf/python/perf.so: undefined symbol:
parse_branch_str
test child finished with -1
---- end ----
Try 'import perf' in python, checking link problems: FAILED!
#
I must've commited some mistake when running 'perf test' to send the
pull request for the perf-core-for-mingo-20161024 tag, to have let this
regression to pass, sigh.
Just add tools/perf/util/parse-branch-options.c and switch from using
ui__warning(), that is not available in the python binding, use
pr_warning() instead, which is good enough for this case.
Now:
# perf test python
16: Try 'import perf' in python, checking link problems : Ok
#
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Andi Kleen <[email protected]>
Fixes: ac12f6764c50 ("perf tools: Implement branch_type event parameter")
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Removing one more set of die() calls.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Both register_perl_scripting() and register_python_scripting() allocate
this variable, fix it by checking if it already was.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Wang Nan <[email protected]>
Fixes: 7e4b21b84c43 ("perf/scripts: Add Python scripting engine")
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Introduced in commit f9afc6197e9b ("x86: Wire up protection keys system
calls")
This will make 'perf trace' aware of them on x86_64.
Cc: Adrian Hunter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Ignore export.h and EXPORT_SYMBOL in:
784d5699eddc ("x86: move exports to actual definitions")
We're not dragging this stuff, not useful in tools/
This silences the following warnings while building perf:
Warning: tools/arch/x86/lib/memcpy_64.S differs from kernel
Warning: tools/arch/x86/lib/memset_64.S differs from kernel
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support in perf list topic to only show events belonging to a
specific vendor events topic. For example the following works now:
% perf list frontend
List of pre-defined events (to be used in -e):
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]
frontend:
dsb2mite_switches.count
[Decode Stream Buffer (DSB)-to-MITE switches]
dsb2mite_switches.penalty_cycles
[Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles]
dsb_fill.exceed_dsb_lines
[Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB)
lines]
icache.hit
[Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and
noncacheable, including UC fetches]
...
Signed-off-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Joonwoo reported that there's a mismatch between timestamps in script
and sched commands. This was because of difference in printing the
timestamp. Factor out the code and share it so that they can be in
sync. Also I found that sched map has similar problem, fix it too.
Committer notes:
Fixed the max_lat_at bug introduced by Namhyung's original patch, as
pointed out by Joonwoo, and made it a function following the scnprintf()
model, i.e. returning the number of bytes formatted, and receiving as
the first parameter the object from where the data to the formatting is
obtained, renaming it from:
char *timestamp_in_usec(char *bf, size_t size, u64 timestamp)
to
int timestamp__scnprintf_usec(u64 timestamp, char *bf, size_t size)
Reported-by: Joonwoo Park <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
I'd like to see the name of tasks with perf sched map, but it only shows
name of new tasks and then use short names after all. This is not good
for long running tasks since it's hard for users to track the short
names. This patch makes it show the names (except the idle task) when
-v option is used. Probably we may make it as default behavior.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Applying cpu color always doesn't help readability IMHO. Instead it
might be better to applying the color when there's an activity on those
CPUs.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The -i and -v options can be used in subcommands so enable cascading the
sched_options. This fixes the following inconvenience in 'perf sched':
$ perf sched -i perf.data.sched map
... (it works well) ...
$ perf sched map -i perf.data.sched
Error: unknown switch `i'
Usage: perf sched map [<options>]
--color-cpus <cpus>
highlight given CPUs in map
--color-pids <pids>
highlight given pids in map
--compact map output in compact mode
--cpus <cpus> display given CPUs in map
With this patch, the second command line works with the perf.data.sched
data file.
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.
But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.
In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.
Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This gets rid of oddities such as:
perf bench futex hash -t -4
perf: calloc: Cannot allocate memory
Runtime (and many more) are equally busted, i.e. run for bogus amounts of
time. Just use the abs, instead of, for example errorring out.
Committer note:
After the patch:
$ perf bench futex hash -t -4
# Running 'futex/hash' benchmark:
Run summary [PID 10178]: 4 threads, each operating on 1024 [private] futexes for 10 secs.
[thread 0] futexes: 0x34f9fa0 ... 0x34faf9c [ 4702208 ops/sec ]
[thread 1] futexes: 0x34fb140 ... 0x34fc13c [ 4707020 ops/sec ]
[thread 2] futexes: 0x34fc2e0 ... 0x34fd2dc [ 4711526 ops/sec ]
[thread 3] futexes: 0x34fd480 ... 0x34fe47c [ 4709683 ops/sec ]
Averaged 4707609 operations/sec (+- 0.04%), total secs = 10
$
Signed-off-by: Davidlohr Bueso <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Sebastian noted that overhead for worker thread ops (throughput)
accounting was producing 'perf' to appear in the profiles, consuming a
non-trivial (i.e. 13%) amount of CPU.
This is due to cacheline bouncing due to the increment of w->ops.
We can easily fix this by just working on a local copy and updating the
actual worker once done running, and ready to show the program summary.
There is no danger of the worker being concurrent, so we can trust that
no stale value is being seen by another thread.
This also gets rid of the unnecessary cache alignment hack; its not
worth it.
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Printing the full path of the selected link is obviously not needed,
hence removing.
Signed-off-by: Mathieu Poirier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make the 'perf list' glob matching for vendor events case insensitive.
This allows to use the upper case vendor events with perf list too.
Now the following works:
% perf list LONGEST_LAT
...
cache:
longest_lat_cache.miss
[Core-originated cacheable demand requests missed LLC]
longest_lat_cache.reference
[Core-originated cacheable demand requests that refer to LLC]
Signed-off-by: Andi Kleen <[email protected]>
Suggested-by: Ingo Molnar <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instead of the one when another syscall takes place while another is being
processed (in another CPU, but we show it serialized, so need to "interrupt"
the other), and also when finally showing the sys_enter + sys_exit + duration,
where we were showing the sample->time for the sys_exit, duh.
Before:
# perf trace sleep 1
<SNIP>
0.373 ( 0.001 ms): close(fd: 3 ) = 0
1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0
1000.653 ( 0.003 ms): close(fd: 1 ) = 0
1000.657 ( 0.002 ms): close(fd: 2 ) = 0
1000.667 ( 0.000 ms): exit_group( )
#
After:
# perf trace sleep 1
<SNIP>
0.336 ( 0.001 ms): close(fd: 3 ) = 0
0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0
1000.481 ( 0.002 ms): close(fd: 1 ) = 0
1000.485 ( 0.001 ms): close(fd: 2 ) = 0
1000.494 ( 0.000 ms): exit_group( )
[root@jouet linux]#
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Fixes: 752fde44fd1c ("perf trace: Support interrupted syscalls")
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Not used at all, we need just the entry_time to calculate the syscall
duration.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It popped up in perf testing that the worker consumes some amount of
CPU. It boils down to the increment of `ops` which causes cache line
bouncing between the individual threads.
This patch aligns the struct by 256 bytes to ensure that not a cache
line is shared among CPUs. 128 byte is the x86 worst case and grep says
that L1_CACHE_SHIFT is set to 8 on s390.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We already have handling for errors when processing PERF_RECORD_ events,
so instead of calling die() when not being able to alloc, propagate the
error, so that the normal UI exit sequence can take place, the user be
warned and possibly the terminal be properly reset to a sane mode.
Cc: Adrian Hunter <[email protected]>
Cc: Brice Goglin <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It already returns whatever strbuf_(grow|addch)() returns in case of
failure, so just return -ENOSPC in the only case where it was die()ing.
When it returns, its only caller will call die() anyway, so no need to
be so eager, die later.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instead of having all tests perform alloc/free, do it in the code that
calls the do_cycles() and do_gettimeofday() functions.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Hitoshi Mitake <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the perf wiki todo-list[1], there is an entry regarding initial-delay
and 'perf trace'; the following small patch tries to fulfill this point.
It has been generated against the branch tip/perf/core.
It has only been implemented in the "trace__run" case.
Ex.:
$ sudo strace -- ./perf trace --delay 5 sleep 1 2>&1
...
fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
ioctl(7, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
ioctl(11, PERF_EVENT_IOC_SET_OUTPUT, 0x7) = 0
fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
ioctl(11, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0
write(6, "\0", 1) = 1
close(6) = 0
nanosleep({0, 5000000}, NULL) = 0 # DELAY OF 5 MS BEFORE ENABLING THE EVENTS
ioctl(3, PERF_EVENT_IOC_ENABLE, 0) = 0
ioctl(4, PERF_EVENT_IOC_ENABLE, 0) = 0
ioctl(5, PERF_EVENT_IOC_ENABLE, 0) = 0
ioctl(7, PERF_EVENT_IOC_ENABLE, 0) = 0
...
[1]: https://perf.wiki.kernel.org/index.php/Todo
Signed-off-by: Alexis Berlemont <[email protected]>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Add entry to the manpage, cut'n'pasted from stat's and record's ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Here is a small patch which tries to fulfill a point in the perf todo
list:
* Make pressing 'V' multiple times to go on cycling thru various
verbosity levels in 'perf top', so that info that is present in
'perf top -v' can be obtained without having to restart the tool
(acme).
After a small grep in the code, the max verbosity level seems 3; so,
we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant.
Signed-off-by: Alexis Berlemont <[email protected]>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The latter version occurs much more when running git grep.
Signed-off-by: Alexander Alemayhu <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With uncore event aliases which are duplicated over multiple PMUs the
"Using CPUID" message with -v could be printed many times. Only print
it once.
Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch adds a formal specification of the jitdump format. The goal
is to help jit runtime developers implement the jitdump support without
having to read the jvmti code.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Check the version number when opening a jitdump file. Accept older
versions, but not newer ones.
Signed-off-by: Stefano Sanfilippo <[email protected]>
Signed-off-by: Ross McIlroy <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When the jit_buf_desc contains unwinding information, it is emitted as
eh_frame unwinding sections in the DSOs generated by perf inject.
The unwinding information is required to unwind of JITed code which do
not maintain the frame pointer register during function calls. It can
be emitted by V8 / Chromium when the --perf_prof_unwinding_info is
passed to V8.
The eh_frame and eh_frame_hdr sections are emitted immediately after the
.text.
The .eh_frame is aligned at a 8-byte boundary, and .eh_frame_hdr at a
4-byte one. Since size of the .eh_frame is required to be a multiple of
the word size, which means there will never be additional padding
between it and the .eh_frame_hdr on machines where the word size is 4 or
8 bytes.
However, additional padding might be inserted between .text and
.eh_frame to reach the correct alignment, which will always be 8 bytes,
also on 32bit machines. The reasoning behind this choice is that 4 extra
bytes of padding worst case are not a large cost for the advantage of
removing word-size dependent offset calculations when emitting the
jitdump.
Signed-off-by: Stefano Sanfilippo <[email protected]>
Signed-off-by: Ross McIlroy <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This record is intended to provide unwinding information in the
eh_frame format. This is required to unwind JITed code which
does not maintain the frame pointer register during function calls.
The eh_frame unwinding information can be emitted by V8 / Chromium
when the --perf_prof_unwinding_info is passed.
A record of type jr_code_unwinding_info comes before the jr_code_load
it referred to and contains both the .eh_frame and .eh_frame_hdr.
The fields in the header have the following meaning:
* unwinding_size: size of the eh_frame and eh_frame_hdr, necessary
for distinguishing the content from the padding.
* eh_frame_hdr_size: as the name says.
* mapped_size: size of the payload that was in memory at runtime.
typically unwinding_size if the .eh_frame_hdr and .eh_frame were
mapped, or 0 if they weren't. It should always be the former case,
since the .eh_frame is guaranteed to be mapped in memory. However,
certain JITs might want to inject an .eh_frame_hdr with an empty LUT
to trigger fp-based unwinding fallback in libunwind. The only part
of the .eh_frame_hdr that libunwind reads from remote memory is the
LUT, and since there is none, mapping the unwinding info in memory
is not necessary, and 0 in this field signifies that it wasn't.
This practical hack allows to save bytes in code memory for those
JIT compilers that might or might not maintain a valid frame pointer.
The payload that follows is assumed to contain first the .eh_frame and
then the .eh_header_hdr, with no padding between the two.
Signed-off-by: Stefano Sanfilippo <[email protected]>
Signed-off-by: Ross McIlroy <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When calculating .eh_frame_hdr base and LUT offsets do not always assume
that pgoff is zero.
The assumption is false for DSOs built from the jitdump by perf inject,
because the ELF header did not exist in memory at sampling time.
Signed-off-by: Stefano Sanfilippo <[email protected]>
Signed-off-by: Ross McIlroy <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The behavior before this commit was to skip the remaining portion of the
jitdump in case an unknown record was found, including those records
that perf could handle.
With this change, parsing a record with an unknown id will cause a
warning to be emitted, the record will be skipped and parsing will
resume from the next (valid) one.
The patch aims at making perf more future proof, by extracting as much
information as possible from jitdumps.
Signed-off-by: Stefano Sanfilippo <[email protected]>
Signed-off-by: Ross McIlroy <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch removes all the string padding generated in the jitdump file.
They are not necessary and were adding unnecessary complexity. Modern
processors can handle unaligned accesses quite well. The perf.data/
jitdump file are always post-processed, no need to add extra complexity
for no real gain.
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch modifies the build dependencies on the jitdump support in
perf. As it stands jitdump was wrongfully made dependent 100% on using
DWARF. However, the dwarf dependency, only exist if generating the
source line table in genelf_debug.c. The rest of the support does not
need DWARF.
This patch removes the dependency on DWARF for the entire jitdump
support. It keeps it only for the genelf_debug.c support.
Signed-off-by: Maciej Debski <[email protected]>
Reviewed-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Fixes: e12b202f8fb9 ("perf jitdump: Build only on supported archs")
[ Make it build only if NO_LIBELF isn't defined, as jitdump.o will only be built in that case ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This patch improves the usefulness of error messages generated by the
JVMTI interfac.e This can help identify the root cause of a problem by
printing the actual error code. The patch adds a new helper function
called print_error().
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nilay Vaish <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Handle failure to convert numeric error to a string in print_error() ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Such as CentOS5, where such define is not present in elf.h.
This file, genelf.c, wasn't being built for several systems, because
it mistakenly was conditional on some DWARF features, now that it
is just needing libelf, after "perf jit: Enable jitdump support without
dwarf" it fails.
So, as preparation for "perf jit: Enable jitdump support without dwarf",
conditionally define it, if not available.
Cc: Adrian Hunter <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Maciej Debski <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When the loop body isn't executed at all, then the 'ret' local variable,
that is uninitialized will be used as the return value.
This triggers this error on Alpine Linux:
CC /tmp/build/perf/util/demangle-java.o
CC /tmp/build/perf/util/demangle-rust.o
CC /tmp/build/perf/util/jitdump.o
CC /tmp/build/perf/util/genelf.o
util/jitdump.c: In function 'jit_process':
util/jitdump.c:622:3: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
fprintf(stderr, "injected: %s (%d)\n", path, ret);
^
util/jitdump.c:584:6: note: 'ret' was declared here
int ret;
^
FLEX /tmp/build/perf/util/parse-events-flex.c
/ $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-alpine-linux-musl/5.3.0/lto-wrapper
Target: x86_64-alpine-linux-musl
Configured with: /home/buildozer/aports/main/gcc/src/gcc-5.3.0/configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
+--build=x86_64-alpine-linux-musl --host=x86_64-alpine-linux-musl --target=x86_64-alpine-linux-musl --with-pkgversion='Alpine 5.3.0' --enable-checking=release
+--disable-fixed-point --disable-libstdcxx-pch --disable-multilib --disable-nls --disable-werror --disable-symvers --enable-__cxa_atexit --enable-esp
+--enable-cloog-backend --enable-languages=c,c++,objc,java,fortran,ada --disable-libssp --disable-libmudflap --disable-libsanitizer --enable-shared
+--enable-threads --enable-tls --with-system-zlib
Thread model: posix
gcc version 5.3.0 (Alpine 5.3.0)
But this so far got under the radar, not causing any build problem, till the
"perf jit: enable jitdump support without dwarf" gets applied, when the above
problem takes place, some combination of inlining or whatever, the problem
is real, so fix it by initializing the variable to zero.
Cc: Anton Blanchard <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Maciej Debski <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It can be useful to specify branch type state per event, for example if
we want to collect both software trace points and last branch PMU events
in a single collection. Currently this doesn't work because the software
trace point errors out with -b.
There was already a branch-type parameter to configure branch sample
types per event in the parser, but it was stubbed out. This patch
implements the necessary plumbing to actually enable it.
Now:
$ perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...
works.
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|