Age | Commit message (Collapse) | Author | Files | Lines |
|
So that one can, for instance, use it with wc -l:
# perf list *:*write* | wc -l
60
Or to look for the "bio" tracepoints, without 'perf list' headers:
# perf list *:*bio* | head
block:block_bio_backmerge [Tracepoint event]
block:block_bio_bounce [Tracepoint event]
block:block_bio_complete [Tracepoint event]
block:block_bio_frontmerge [Tracepoint event]
block:block_bio_queue [Tracepoint event]
block:block_bio_remap [Tracepoint event]
#
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf probe shows more precisely message when it finds given
%return target function is inlined.
Without this fix:
----
# ./perf probe -V getname_flags%return
Return probe must be on the head of a real function.
Debuginfo analysis failed.
Error: Failed to show vars.
----
With this fix:
----
# ./perf probe -V getname_flags%return
Failed to find "getname_flags%return",
because getname_flags is an inlined function and has no return point.
Debuginfo analysis failed.
Error: Failed to show vars.
----
Suggested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf probe --list will get a segfault if the first kprobe event is on a
module and the second or latter one is on the kernel.
e.g.
----
# ./perf probe -q -m pcspkr pcspkr_event
# ./perf probe -q vfs_read
# ./perf probe -l
Segmentation fault (core dumped)
----
This is because the debuginfo_cache fails to handle NULL module name,
which causes segfault on strcmp. (Note that strcmp("something", NULL)
always causes segfault)
To fix this debuginfo_cache__open always translates the NULL module name
to "kernel" (this is correct, because NULL module name means opening the
debuginfo for the kernel)
----
# ./perf probe -l
probe:pcspkr_event (on pcspkr_event@drivers/input/misc/pcspkr.c
in pcspkr)
probe:vfs_read (on vfs_read@ksrc/linux-3/fs/read_write.c)
----
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Perf probe always failed to find appropriate line numbers because of
failing to find .text start address offset from debuginfo.
e.g.
----
# ./perf probe -m pcspkr pcspkr_event:5
Added new events:
probe:pcspkr_event (on pcspkr_event:5 in pcspkr)
probe:pcspkr_event_1 (on pcspkr_event:5 in pcspkr)
You can now use it in all perf tools, such as:
perf record -e probe:pcspkr_event_1 -aR sleep 1
# ./perf probe -l
Failed to find debug information for address ffffffffa031f006
Failed to find debug information for address ffffffffa031f016
probe:pcspkr_event (on pcspkr_event+6 in pcspkr)
probe:pcspkr_event_1 (on pcspkr_event+22 in pcspkr)
----
This fixes the above issue as below.
1. Get the relative address of the symbol in .text by using
map->start.
2. Adjust the address by adding the offset of .text section
in the kernel module binary.
With this fix, perf probe -l shows lines correctly.
----
# ./perf probe -l
probe:pcspkr_event (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
probe:pcspkr_event_1 (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
----
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix a trival bug about libdwfl usage of the report session, it should
explicitly begin and end a report session around dwfl_report_offline().
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fix to remove dot suffix (e.g. .const, .isra) from the second or latter
events which has suffix numbers.
Since the previous commit 35a23ff928b0 ("perf probe: Cut off the gcc
optimization postfixes from function name") didn't care about the suffix
numbered events, therefore we'll have an error when we add additional
events on the same dot suffix functions.
e.g.
----
# ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
-a get_sigframe.isra.2.constprop.3
Failed to write event: Invalid argument
Error: Failed to add events.
----
This fixes above issue as below:
----
# ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
-a get_sigframe.isra.2.constprop.3
Added new events:
probe:get_sigframe (on get_sigframe.isra.2.constprop.3)
probe:get_sigframe_1 (on get_sigframe.isra.2.constprop.3)
You can now use it in all perf tools, such as:
perf record -e probe:get_sigframe_1 -aR sleep 1
----
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Map 't', 'T' (text, local, global), 'w' and 'W' (weak text, local,
global) as STT_FUNC, and the rest as STT_OBJECT
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It is about binding, not type, we have just a letter in kallsyms that
should map both for the ELF type (STT_FUNC, etc) and to the ELF
symbol binding (STB_WEAK, STB_GLOBAL, etc), so rename it now before
introducing kallsyms2_elf_type()
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
And it is also a step in the direction of killing the separation of data
and text maps in map_groups.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In places where we were using its open coded equivalent.
Cc: Adrian Hunter <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf_regs.c file does not get built on Powerpc as CONFIG_PERF_REGS
is false. So the weak definition for 'sample_regs_masks' doesn't get
picked up.
Adding perf_regs.o to util/Build unconditionally, exposes a redefinition
error for 'perf_reg_value()' function (due to the static inline version
in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that
function.
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Dominik Dingel <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The --max_stack option was added as an optimization to reduce processing time,
so people specifying --max-stack might get a increased processing time if
combined with synthesized callchains, but otherwise no real harm.
A warning about setting both --max_stack and the synthesized callchains max
depth seems like overkill. Amend the documentation.
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Out of map_groups__find_symbol_by_name(), so that we can turn this later
one first into a call to maps__find_symbol_by_name(MAP__FUNCTION) +
MAP__VARIABLE, and then to just one call, we'll merge MAP__FUNCTION with
MAP__VARIABLE maps, to simplify the code.
Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The error variable breaks build on CentOS 6.7, due to a collision with a
global error symbol:
CC util/parse-events.o
cc1: warnings being treated as errors
util/parse-events.c:419: error: declaration of ‘error’ shadows a global
declaration
util/util.h:135: error: shadowed declaration is here
util/parse-events.c: In function ‘add_tracepoint_multi_event’:
...
Using different argument names instead to fix it.
Reported-by: Vinson Lee <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Cc: Matt Fleming <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Raphael Beamonte <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Fix one more case, at line 770 ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The error variable breaks build on CentOS 6.7, due to collision with
global error symbol:
CC util/evlist.o
cc1: warnings being treated as errors
In file included from util/evlist.c:28:
tools/include/linux/err.h: In function ‘ERR_PTR’:
tools/include/linux/err.h:34: error: declaration of ‘error’ shadows a global declaration
util/util.h:135: error: shadowed declaration is here
Using 'error_' name instead to fix it.
Signed-off-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Reported-by: Vinson Lee <[email protected]>
[ Use 'error_' instead of 'err' to, visually, not diverge too much from include/linux/err.h ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:
First round of new driver, new functionality and cleanups for IIO in the 4.4 cycle
New device support
* APDS9960 ALS + proximity driver
* bmg160 SPI devices.
* HDC100x humidity sensors
* Holt HI-8435 threshold detector
* mma8453Q accelerometer added to the mma8452 driver
* mma86452FC and mma8653FC accelerometers added to the mma8452 driver
* mxc4005 accelerometer
* PulsedLight LIDAR
* SensorTech VZ89x volatile organic compound sensor
* UPISEMI uS5182d ALS and proximity sensors
New core functionality
* triggered events - use triggers to check for changes in threshold type
detectors on devices with out interrupt support. First user is the holt
comparator.
* chemical concentration and resistance channel types.
New driver functionality
* vf610
- buffer support.
- followup coccinelle warning fix.
Core rework
* buffers
- break out callback buffer to own module.
- move buffer implementations to a new subdirectory
* percolate the error code form iio_event_getfd out to userspace
rather than giving a missleading error later on.
Cleanups
* adddac drivers
- use BIT macro where appropriate.
* meter drivers
- use BIT macro where appropriate.
* ad7303
- add an OF match table to line up with the binding docs.
* adc128s052
- add an OF match table to line up with the binding docs.
* adf4350
- add an OF match table to line up with the binding docs
* as3935
- add an OF match table to line up with the binding docs.
* berlin2-adc
- use GENMASK and BIT for masks
- prevent attempting to sample multiple channels at once by moving a
mutex scop
- coding style cleanups
* bmg150_magn
- kconfig sort order was wrong - fix it.
* bmg160
- use i2c regmap and drop all uses of i2c_client
- separate i2c and core driver
* cc10001_adc
- kconfig sort order was wrong - fix it.
* evgen (dummy driver helper module)
- move interrupt generation to irq_work to reduce differences between
the dummy driver and real hardware drivers.
* hmc5843
- set the name dynamically rather than to a fixed value for one of the
suported parts.
- export module alias information to allow autoprobing of module.
* lpc32xx
- on failure to get resource or irq return -ENXIO as uppose to -EBUSY
* max1027
- set .of_match_table to actually allow OF style matching.
* max5821
- add MODULE_DEVICE_TABLE for OF table.
* mma8452
- refactor to separate out chip specific data.
- add freefall / motion interrupt source for devices that do their
interrupts slightly differently.
- update copywrite notice.
- leave naming of events directory in sysfs to the core
* mcp320x
- set .of_match_table so that it can be use for OF style matching.
* mlx90614
- Implement filter configuration (note the datasheet changed as a result
of the driver reviews to include the values we needed ;)
* opt3001
- drop .owner field as assigned by platform driver core.
* si7020
- replace a bitmask on the humidity values with a more correct range
check.
* stk310
- improved error handling.
- use BIT macro where appropriate and use the resulting defines
instead of magic numbers in the code.
- fix indentation
* st-sensors
- add debugfs register read hook
* tsl4531
- fix error handling in check_id
* twl6030
- fix module autoload for OF
* iio-trig-sysfs
- document add and remove attribute
* trigger in staging
- code alignment fixes.
- braces on both branches of if statement if needed for one.
* xilinx-xadc
- push interrupts into hardirq context as there isn't much in them
any more and it avoids breaking PREEMPT_RT builds due to the use
of a spinlock between the hardirq and the thread.
Tools
* event-monitor
- report unsupported events. We keep expanding what can come from drivers
so give a helpful error if one turns up in an out of date userspace
program.
* generic-buffer
- helpful message about needing to enable a channel to start the buffer.
|
|
The function returns always non-negative values.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
Signed-off-by: Andrzej Hajda <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We want the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This patch enables config terms for tracepoint perf events. Valid terms
for tracepoint events are 'call-graph' and 'stack-size', so we can use
different callgraph settings for each event and eliminate unnecessary
overhead.
Here is an example for using different call-graph config for each
tracepoint.
$ perf record -e syscalls:sys_enter_write/call-graph=fp/
-e syscalls:sys_exit_write/call-graph=no/
dd if=/dev/zero of=test bs=4k count=10
$ perf report --stdio
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'syscalls:sys_enter_write'
# Event count (approx.): 13
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ......................
#
76.92% 76.92% dd libpthread-2.20.so [.] __write_nocancel
|
---__write_nocancel
23.08% 23.08% dd libc-2.20.so [.] write
|
---write
|
|--33.33%-- 0x2031342820736574
|
|--33.33%-- 0xa6e69207364726f
|
--33.33%-- 0x34202c7320393039
...
# Samples: 13 of event 'syscalls:sys_exit_write'
# Event count (approx.): 13
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ......................
#
76.92% 76.92% dd libpthread-2.20.so [.] __write_nocancel
23.08% 23.08% dd libc-2.20.so [.] write
7.69% 0.00% dd [unknown] [.] 0x0a6e69207364726f
7.69% 0.00% dd [unknown] [.] 0x2031342820736574
7.69% 0.00% dd [unknown] [.] 0x34202c7320393039
Signed-off-by: He Kuang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adds rules for parsing tracepoint names. Change rules of tracepoint which
derives from PE_NAMEs into tracepoint names directly, so adding more rules
based on tracepoint names will be easier.
Changes v2-v3:
- Change __event_legacy_tracepoint label in bison file to tracepoint_name
- Fix formats error.
Signed-off-by: He Kuang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Show proper error message and show valid terms when wrong config terms
is specified for hw/sw type perf events.
This patch makes the original error format function formats_error_string()
more generic, which only outputs the static config terms for hw/sw perf
events, and prepends pmu formats for pmu events.
Before this patch:
$ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
invalid or unsupported event: 'cpu-clock/freqx=200/'
Run 'perf list' for a list of valid events
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
After this patch:
$ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
event syntax error: 'cpu-clock/freqx=200/'
\___ unknown term
valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size
Run 'perf list' for a list of valid events
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: He Kuang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Currently, function config_term() is used for checking config terms of
all types of events, while unknown terms is not reported as an error
because pmu events have valid terms in sysfs.
But this is wrong when unknown terms are specificed to hw/sw events.
This patch Adds the config_term callback so we can use separate check
routines for each type of events.
Signed-off-by: He Kuang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
autofdo incorrectly expects branch flags to include either mispred or
predicted. In fact mispred = predicted = 0 is valid and means the flags
are not supported, which they aren't by Intel PT.
To make autofdo work, add a config option which will cause Intel PT
decoder to set the mispred flag on all branches.
Below is an example of using Intel PT with autofdo. The example is
also added to the Intel PT documentation. It requires autofdo
(https://github.com/google/autofdo) and gcc version 5. The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.
$ gcc-5 -O3 sort.c -o sort_optimized
$ ./sort_optimized 30000
Bubble sorting array of 30000 elements
2254 ms
$ cat ~/.perfconfig
[intel-pt]
mispred-all
$ perf record -e intel_pt//u ./sort 3000
Bubble sorting array of 3000 elements
58 ms
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 3.939 MB perf.data ]
$ perf inject -i perf.data -o inj --itrace=i100usle --strip
$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
$ ./sort_autofdo 30000
Bubble sorting array of 30000 elements
2155 ms
Note there is currently no advantage to using Intel PT instead of LBR,
but that may change in the future if greater use is made of the data.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new option --strip which is used with --itrace to strip out
non-synthesized events. This results in a perf.data file that is
simpler for external tools to parse. In particular, this can be used to
prepare a perf.data file for consumption by autofdo.
A subsequent patch makes a change to Intel PT also to enable use with
autofdo and gives an example of that use.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf inject can process instruction traces (using the --itrace option)
which removes aux-related events and replaces them with the requested
synthesized events.
However there are still some leftovers, namely PERF_RECORD_ITRACE_START
events and the original evsel (selected event) e.g. intel_pt//
For the sake of completeness, remove them too.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a counterpart to perf_evlist__add() that does the opposite and
deletes the evsel.
This will be used by perf inject to remove unwanted evsels.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Renamed it from perf_evlist__del() to perf_evlist__remove() and removed the perf_evsel__delete() call ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel()
except that it ensures that the id must match.
This will be used by perf inject to find a specific evsel that is to be
deleted, hence the need to match exactly.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf script has a setting to set the maximum stack depth when processing
callchains. The setting defaults to the hard-coded maximum definition
PERF_MAX_STACK_DEPTH which is 127.
It is possible, when processing instruction traces, to synthesize
callchains. Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024. The
default value is 16.
To allow for synthesized callchains, make the scripting_max_stack value
at least the same size as the synthesized callchain size.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the scripting_max_stack value to allow for values greater than
PERF_MAX_STACK_DEPTH.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a setting for maximum stack depth in preparation for allowing for
synthesized callchains.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that
arbitrary-sized callchains can be supported.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf report has an option (--max-stack) to set the maximum stack depth
when processing callchains. The option defaults to the hard-coded
maximum definition PERF_MAX_STACK_DEPTH which is 127. The intention of
the option is to allow the user to reduce the processing time by
reducing the amount of the callchain that is processed.
It is also possible, when processing instruction traces, to synthesize
callchains. Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024. The
default value is 16.
To allow for synthesized callchains, make the max_stack value at least
the same size as the synthesized callchain size.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support for generating branch stack context for PT samples. The
decoder reports a configurable number of branches as branch context for
each sample. Internally it keeps track of them by using a simple sliding
window. We also flush the last branch buffer on each sample to avoid
overlapping intervals.
This is useful for:
- Reporting accurate basic block edge frequencies through the perf
report branch view
- Using with --branch-history to get the wider context of samples
- Other users of LBRs
Also the Documentation is updated.
Examples:
Record with Intel PT:
perf record -e intel_pt//u ls
Branch stacks are used by default if synthesized so:
perf report --itrace=ile
is the same as:
perf report --itrace=ile -b
Branch history can be requested also:
perf report --itrace=igle --branch-history
Based-on-patch-by: Andi Kleen <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
intel_pt_synth_branch_sample() skips synthesizing if the branch does not
match the branch filter. That logic was sitting in the middle of the
function but is more efficiently placed at the start of the function, so
move it.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The branch stack feature flag is set by 'perf record' when recording
data that contains branch stacks. Consequently, when 'perf inject'
synthesizes branch stacks, the feature flag should be set also.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A non-synthesized event might not have a branch stack if branch stacks
have been synthesized (using itrace options).
An example of that is when Intel PT records sched_switch events for
decoding purposes. Those sched_switch events do not have branch stacks
even though the Intel PT decoder may be synthesizing other events that
do due to the itrace options.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The 'perf report' tool will default to displaying branch stacks (-b
option) if they are present. Make that also happen for synthesized
branch stacks.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf report looks at event sample types to determine if branch stacks
have been sampled. Adjust the validation to know about instruction
tracing options.
This change allows the use of the -b option which otherwise would
complain with an error like:
Error:
Selected -b but no branch data. Did you call perf record without -b?
# To display the perf.data header info,
# please use --header/--header-only options.
#
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add AUX area tracing option 'l' to synthesize branch stacks on samples
just like sample type PERF_SAMPLE_BRANCH_STACK. This is taken into use
by Intel PT in a subsequent patch.
Based-on-patch-by: Andi Kleen <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add some comments to the script and some 'views' to the created database
that better illustrate the database structure and how it can be used.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
By default 'perf record' will postprocess the perf.data file to
determine build-ids. When that happens, the number of lost perf events
is displayed.
Make that also happen for AUX events.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add option --ns to display time to 9 decimal places. That is useful in
some cases, for example when using Intel PT cycle accurate mode.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Logging is only used for debugging. Use macros to save calling into the
functions only to return immediately when logging is not enabled.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
TSC packets contain only 7 bytes of TSC. The 8th byte is assumed to
change so infrequently that its value can be inferred. However the
logic must cater for a 7 byte wraparound, which it does by adding 1 to
the top byte.
The existing code was doing that with a while loop even though the
addition should only need to be done once. That logic won't work (will
loop forever) if TSC wraps around at the 8th byte. Theoretically that
would take at least 10 years, unless something else went wrong.
And what else could go wrong. Well, if the chunks of trace data are
processed out of order, it will make it look like the 7-byte TSC has
gone backwards (i.e. wrapped). If that happens 256 times then stuck in
the while loop it will be.
Fix that by getting rid of the unnecessary while loop.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Processing instruction tracing data (e.g. Intel PT) can synthesize
callchains e.g.
$ perf record -e intel_pt//u uname
$ perf report --stdio --itrace=ige
However perf report's callgraph option gets extra validation, so:
$ perf report --stdio --itrace=ige -gflat
Error:
Selected -g or --branch-history but no callchain data. Did
you call 'perf record' without -g?
# To display the perf.data header info,
# please use --header/--header-only options.
#
Fix the validation to know about instruction tracing options so
above command works.
A side-effect of the change is that the default option to
accumulate the callchain of child functions comes into force.
To get the previous behaviour the --no-children option can be
used e.g.
$ perf report --stdio --itrace=ige -gflat --no-children
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Instruction tracing options (i.e. --itrace) include an option for
sampling instructions at an arbitrary period. e.g.
--itrace=i10us
means make an 'instructions' sample for every 10us of trace.
Currently the logic does not distinguish between a period of
zero and no period being specified at all, so it gets treated
as the default period which is 100000. That doesn't really
make sense.
Fix it so that zero period is accepted and treated as meaning
"as often as possible".
In the case of Intel PT that is the same as a period of 1 and
a unit of 'instructions' (i.e. --itrace=i1i).
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Add a few lines describing this in the Documentation/intel-pt.txt file ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding the fixdep target into the Makefile.include to ease up building of
fixdep helper, that needs to be built before we dive in to the build itself.
The user can invoke the fixdep target to build the helper.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
And use the new 'prepare' target for the $(PERF_IN) target.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Making the fixdep helper to be invoked within dep-cmd.
Each user of the build framework needs to make sure fixdep exists before
executing the build itself.
If the build doesn't find fixdep, it falls back to the old style
dependency tracking.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
So it's easier to add more functionality in the following commit.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|