aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2009-10-12perf tools: Fix const char type propagationRandy Dunlap1-4/+4
The following perf build warnings/errors in function argument types: builtin-sched.c:1894: warning: passing argument 1 of 'sort_dimension__add' discards qualifiers from pointer target type util/trace-event-parse.c:685: warning: passing argument 2 of 'read_expected' discards qualifiers from pointer target type util/trace-event-parse.c:741: warning: passing argument 4 of 'test_type_token' discards qualifiers from pointer target type util/trace-event-parse.c:706: warning: passing argument 2 of 'read_expected_item' discards qualifiers from pointer target type ... trigger because older GCC is not able to prove that sort_dimension__add() does not change the string. Some goes for test_type_token(). Fix this by improving type consistency. Signed-off-by: Randy Dunlap <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> [ Also remove ugly type cast now unnecessary. ] Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Provide backward compatibility with previous perf.data versionFrederic Weisbecker1-1/+7
We have merged the trace.info file into perf.data by adding one section in the perf headers. This makes it incompatible with previous version: the new perf tools can't read the older perf.data. To support the previous format, we check the headers size. If they have the same size than in the previous format, then ignore the trace info section that doesn't exist. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Fix thread comm resolution in perf schedFrederic Weisbecker2-28/+7
This reverts commit 9a92b479b2f088ee2d3194243f4c8e59b1b8c9c2 ("perf tools: Improve thread comm resolution in perf sched") and fixes the real bug. The bug was elsewhere: We are failing to resolve thread names in perf sched because the table of threads we are building, on top of comm events, has a per process granularity. But perf sched, unlike the other perf tools, needs a per thread granularity as we are profiling every tasks individually. So fix it by building our threads table using the tid instead of the pid as the thread identifier. v2: Revert the previous fix - it is not really needed Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Improve kernel/modules symbol lookupArnaldo Carvalho de Melo1-87/+201
This removes the ovelapping of vmlinux addresses with modules, using the ELF section name when using --vmlinux and creating a unique DSO name when using /proc/kallsyms ([kernel].N). This is done by creating multiple 'struct map' instances for address ranges backed by DSOs that have just the symbols for that range and a name that is derived from the ELF section name.o Now it is possible to ask for just the symbols in some particular kernel section: $ perf report -m --vmlinux ../build/tip-recvmmsg/vmlinux \ --dsos [kernel].vsyscall_fn | head -15 52.73% Xorg [.] vread_hpet 18.61% firefox [.] vread_hpet 14.50% npviewer.bin [.] vread_hpet 6.83% compiz [.] vread_hpet 5.73% glxgears [.] vread_hpet 0.63% java [.] vread_hpet 0.30% gnome-terminal [.] vread_hpet 0.23% perf [.] vread_hpet 0.18% xchat [.] vread_hpet $ Now we don't have to first lookup the list of modules and then, if it fails, vmlinux symbols, its just a simple lookup for the map then the symbols, just like for threads. Reports generated using /proc/kallsyms and --vmlinux should provide the same results, modulo the DSO name for sections other than ".text". But they don't right now because things like: ffffffff81011c20-ffffffff81012068 system_call ffffffff81011c30-ffffffff81011c9b system_call_after_swapgs ffffffff81011c9c-ffffffff81011cb6 system_call_fastpath ffffffff81011cb7-ffffffff81011cbb ret_from_sys_call I.e. overlapping symbols, again some ASM special case that we have to fixup. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Up the verbose level for some really verbose stuffArnaldo Carvalho de Melo1-2/+2
Like printing every symbol created. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Improve thread comm resolution in perf schedFrederic Weisbecker2-7/+28
When we get sched traces that involve a task that was already created before opening the event, we won't have the comm event for it. So if we can't find the comm event for a given thread, we look at the traces that may contain these informations. Before: ata/1:371 | 0.000 ms | 1 | avg: 3988.693 ms | max: 3988.693 ms | kondemand/1:421 | 0.096 ms | 3 | avg: 345.346 ms | max: 1035.989 ms | kondemand/0:420 | 0.025 ms | 3 | avg: 421.332 ms | max: 964.014 ms | :5124:5124 | 0.103 ms | 5 | avg: 74.082 ms | max: 277.194 ms | :6244:6244 | 0.691 ms | 9 | avg: 125.655 ms | max: 271.306 ms | firefox:5080 | 0.924 ms | 5 | avg: 53.833 ms | max: 257.828 ms | npviewer.bin:6225 | 21.871 ms | 53 | avg: 22.462 ms | max: 220.835 ms | :6245:6245 | 9.631 ms | 21 | avg: 41.864 ms | max: 213.349 ms | After: ata/1:371 | 0.000 ms | 1 | avg: 3988.693 ms | max: 3988.693 ms | kondemand/1:421 | 0.096 ms | 3 | avg: 345.346 ms | max: 1035.989 ms | kondemand/0:420 | 0.025 ms | 3 | avg: 421.332 ms | max: 964.014 ms | firefox:5124 | 0.103 ms | 5 | avg: 74.082 ms | max: 277.194 ms | npviewer.bin:6244 | 0.691 ms | 9 | avg: 125.655 ms | max: 271.306 ms | firefox:5080 | 0.924 ms | 5 | avg: 53.833 ms | max: 257.828 ms | npviewer.bin:6225 | 21.871 ms | 53 | avg: 22.462 ms | max: 220.835 ms | npviewer.bin:6245 | 9.631 ms | 21 | avg: 41.864 ms | max: 213.349 ms | Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-08perf tools: Unify perf.data mapping and events handlingFrederic Weisbecker2-0/+253
This librarizes the perf.data file mapping and handling in various perf tools, roughly reducing the amount of code and fixing the places that mmap from beginning of the file whereas we want to mmap from the beginning of the data, leading to page fault because the mmap window is too small since the trace info are written in the file too. TODO: - convert perf timechart too Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arjan van de Ven <[email protected]> LKML-Reference: <20091007104729.GD5043@nowhere> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-07perf tools: Merge trace.info content into perf.dataFrederic Weisbecker5-12/+51
This drops the trace.info file and move its contents into the common perf.data file. This is done by creating a new trace_info section into this file. A user of perf headers needs to call perf_header__set_trace_info() to save the trace meta informations into the perf.data file. A file created by perf after his patch is unsupported by previous version because the size of the headers have increased. That said, it's two new fields that have been added in the end of the headers, and those could be ignored by previous versions if they just handled the dynamic header size and then ignore the unknow part. The offsets guarantee the compatibility. We'll do a -stable fix for that. But current previous versions handle the header size using its static size, not dynamic, then it's not backward compatible with trace records. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <20091006213643.GA5343@nowhere> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06perf trace: Add string/dynamic cases to format_flagsTom Zanussi2-0/+26
Needed for distinguishing string fields in event stream processing. Signed-off-by: Tom Zanussi <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06perf trace: Add subsystem string to struct eventTom Zanussi2-2/+5
Needed to fully qualify event names for event stream processing. Signed-off-by: Tom Zanussi <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06tracing/events: Add 'signed' field to format filesTom Zanussi2-0/+25
The sign info used for filters in the kernel is also useful to applications that process the trace stream. Add it to the format files and make it available to userspace. Signed-off-by: Tom Zanussi <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06Merge branch 'perf/urgent' into perf/coreIngo Molnar3-7/+19
Merge reason: Upcoming patch is dependent on a fix in perf/urgent. Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06perf tools: elf_sym__is_function() should accept "zero" sized functionsArnaldo Carvalho de Melo1-2/+1
Asm routines that end up having size equal to zero are not really zero sized, and as now we do kernel_maps__fixup_sym_end, at least for kernel routines this gets fixed. A similar fixup needs to be done for the userspace bits as well, but as this fixup started only because in /proc/kallsyms we don't have the end address nor the function size, it appeared here first. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-06perf trace: Update eval_flag() flags array to match interrupt.hTom Zanussi1-4/+5
Add missing BLOCK_IOPOLL_SOFTIRQ entry. Signed-off-by: Tom Zanussi <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-05perf tools: /proc/modules names don't always match its nameArnaldo Carvalho de Melo3-1/+14
$ cut -d' ' -f1 /proc/modules|grep _|wc -l 29 $ cut -d' ' -f1 /proc/modules|grep _|sed 's/$/.ko'/g|while read n;do find /lib/modules/`uname -r` -name $n;done|wc -l 12 For instance: $ grep ^aes_x86 /proc/modules aes_x86_64 9056 2 - Live 0xffffffffa0091000 $ l /lib/modules/2.6.31-tip/kernel/arch/x86/crypto/aes-x86_64.ko -rw-r--r-- 1 root root 136438 2009-09-22 19:05 /lib/modules/2.6.31-tip/kernel/arch/x86/crypto/aes-x86_64.ko Handle that by introducing a strxfrchar routine that replaces dashes with underscores when matching file names to loaded modules. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-05perf tools: Create maps for modules when processing kallsymsArnaldo Carvalho de Melo2-41/+122
So that we get kallsyms processing closer to vmlinux + modules symtabs processing. One change in behaviour is that since when one specifies --vmlinux -m should be used to ask for modules, so it is now for kallsyms as well. Also continue if one manages to load the vmlinux data but module processing fails, so that at least some analisys can be done with part of the needed symbols. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-04perf tools: Remove show_mask bitmaskArnaldo Carvalho de Melo1-6/+0
As it was not being exposed via any command line and with --dsos/--comms we can do this and even more, like asking for just kernel + some module: [root@doppio linux-2.6-tip]# perf report --dsos \[kernel\],\[drm\] --vmlinux /home/acme/git/build/tip-recvmmsg/vmlinux --modules | head -15 # Samples: 619669 # # Overhead Command Shared Object Symbol # ........ ............... ............. ...... # 7.12% swapper [kernel] [k] read_hpet 6.86% init [kernel] [k] read_hpet 6.22% init [kernel] [k] mwait_idle_with_hints 5.34% swapper [kernel] [k] mwait_idle_with_hints 3.01% firefox [kernel] [.] vread_hpet 2.14% Xorg [drm] [k] drm_clflush_pages 2.09% pidgin [kernel] [.] vread_hpet 1.58% npviewer.bin [kernel] [.] vread_hpet 1.37% swapper [kernel] [k] hpet_next_event 1.23% Xorg [kernel] [k] read_hpet [root@doppio linux-2.6-tip]# Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-03perf tools: Move hist_entry__add common code to hist.cArnaldo Carvalho de Melo2-0/+49
Now perf report and annotate do the callgraph/hit processing in their specialized hist_entry__add functions. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-02perf tools: Rewrite and improve support for kernel modulesArnaldo Carvalho de Melo9-792/+362
Representing modules as struct map entries, backed by a DSO, etc, using /proc/modules to find where the module is loaded. DSOs now can have a short and long name, so that in verbose mode we can show exactly which .ko or vmlinux image was used. As kernel modules now are a DSO separate from the kernel, we can ask for just the hits for a particular set of kernel modules, just like we can do with shared libraries: [root@doppio linux-2.6-tip]# perf report -n --vmlinux /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15 84.58% 13266 Xorg [k] drm_clflush_pages 4.02% 630 Xorg [k] trace_kmalloc.clone.0 3.95% 619 Xorg [k] drm_ioctl 2.07% 324 Xorg [k] drm_addbufs 1.68% 263 Xorg [k] drm_gem_close_ioctl 0.77% 120 Xorg [k] drm_setmaster_ioctl 0.70% 110 Xorg [k] drm_lastclose 0.68% 106 Xorg [k] drm_open 0.54% 85 Xorg [k] drm_mm_search_free [root@doppio linux-2.6-tip]# Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko would have the same effect. Allowing specifying just 'drm.ko' is left for another patch. Processing kallsyms so that per kernel module struct map are instantiated was also left for another patch. That will allow removing the module name from each of its symbols. struct symbol was reduced by removing the ->module backpointer and moving it (well now the map) to struct symbol_entry in perf top, that is its only user right now. The total linecount went down by ~500 lines. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Avi Kivity <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-01perf timechart: Add a power-only modeArjan van de Ven1-1/+13
For doing work on the Linux power management components, I need to make long (30+ seconds) traces. Currently, this then results in a HUGE svg file, with mostly process data that isn't interesting. This patch adds a --power-only mode to perf timechart that only outputs the CPU power section of the SVG; this significantly reduces the size of the SVG file, making even 30+ second traces viewable with inkscape. As a minor tweak for the same effect, the minimum text size is decreased; current inkscape cannot zoom in deep enough to show text this small, but it reduces inkscape compute time. Signed-off-by: Arjan van de Ven <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-30perf tools: Use rb_tree for mapsArnaldo Carvalho de Melo3-51/+94
Threads can have many and kernel modules will be represented as a tree of maps as well. Ah, and for a perf.data with 146607 samples: Before: [root@doppio ~]# perf stat -r 5 perf report > /dev/null Performance counter stats for 'perf report' (5 runs): 699.823680 task-clock-msecs # 0.991 CPUs ( +- 0.454% ) 74 context-switches # 0.000 M/sec ( +- 1.709% ) 2 CPU-migrations # 0.000 M/sec ( +- 17.008% ) 23114 page-faults # 0.033 M/sec ( +- 0.000% ) 1381257019 cycles # 1973.721 M/sec ( +- 0.290% ) 1456894438 instructions # 1.055 IPC ( +- 0.007% ) 18779818 cache-references # 26.835 M/sec ( +- 0.380% ) 641799 cache-misses # 0.917 M/sec ( +- 1.200% ) 0.705972729 seconds time elapsed ( +- 0.501% ) [root@doppio ~]# After Performance counter stats for 'perf report' (5 runs): 691.261451 task-clock-msecs # 0.993 CPUs ( +- 0.307% ) 72 context-switches # 0.000 M/sec ( +- 0.829% ) 6 CPU-migrations # 0.000 M/sec ( +- 18.409% ) 23127 page-faults # 0.033 M/sec ( +- 0.000% ) 1366395876 cycles # 1976.670 M/sec ( +- 0.153% ) 1443136016 instructions # 1.056 IPC ( +- 0.012% ) 17956402 cache-references # 25.976 M/sec ( +- 0.325% ) 661924 cache-misses # 0.958 M/sec ( +- 1.335% ) 0.696127275 seconds time elapsed ( +- 0.377% ) I.e. we see some speedup too. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: "H. Peter Anvin" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-30perf tools: Put common histogram functions in their own fileJohn Kacur2-0/+211
Move histogram related functions into their own files (hist.c and hist.h) and make use of them in builtin-annotate.c and builtin-report.c. Signed-off-by: John Kacur <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24perf tools: Create util/sort.and use itJohn Kacur2-0/+361
Create util/sort.[ch] and move common functionality for builtin-report.c and builtin-annotate.c there, and make use of it. Signed-off-by: John Kacur <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24perf tools: Protect header files with a consistent styleJohn Kacur24-62/+72
There was a colorful mix of header guards - standardize them. Signed-off-by: John Kacur <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24perf tools: Dont use openat()Eric Dumazet1-29/+20
openat() is still a young glibc facility, better to not use it in a non performance critical program (perf list) Many machines have older glibc (RHEL 4 Update 5 -> glibc-2.3.4-2.36 on my dev machine for example). Signed-off-by: Eric Dumazet <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ulrich Drepper <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24perf tools: Fix buffer allocationEric Dumazet1-1/+1
"perf top" cores dump on my dev machine, if run from a directory where vmlinux is present: *** glibc detected *** malloc(): memory corruption: 0x085670d0 *** Signed-off-by: Eric Dumazet <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24perf tools: Handle relative paths while loading module symbolsMike Galbraith1-30/+66
Inform util/module.c::mod_dso__load_module_paths() that relative paths do exist in some modules.dep, and make it fail noisily should it encounter a path that it doesn't understand, or a module it cannot open. Reported-by: Avi Kivity <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: [email protected] Cc: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Masami Hiramatsu <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-23perf tools: Fix module symbol loading bugMike Galbraith1-4/+13
Avi Kivity reported 'perf annotate' failures with modules, the requested function was not annotated. If there are no modules currently loaded, or the last module scanned is not loaded, dso__load_modules() steps on the value from dso__load_vmlinux(), so we happily load the kallsyms symbols on top of what we've already loaded. Fix that such that the total count of symbols loaded is returned. Should module symbol load fail after parsing of vmlinux, is's a hard failure, so do not silently fall-back to kallsyms. Reported-by: Avi Kivity <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: [email protected] Cc: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Masami Hiramatsu <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar7-31/+31
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paul Mackerras <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Howells <[email protected]> Cc: Kyle McMartin <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20perf util: SVG performance improvementsArjan van de Ven1-23/+55
Tweak the output SVG to increase performance in SVG viewers by limiting the different types of font sizes and by smarter transformations on the text. At least with Inkscape this gives a notable performance improvement during zoom and scrolling. Signed-off-by: Arjan van de Ven <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20perf util: Make the timechart SVG width dynamicArjan van de Ven2-12/+24
This patch adds a command line option for timechart that allows the user to specify the width of the SVG file. This patch also makes sure that each second of recording has at least 200 units (pixels at 96 DPI) of width. This impacts recordings longer than 5 seconds; recordings shorter than 5 second will scale up to have a width of 1000 units for the whole recording (as before). Signed-off-by: Arjan van de Ven <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20perf timechart: Show the duration of scheduler delays in the SVGArjan van de Ven2-5/+54
Given that scheduler latencies are the hot thing nowadays, show the duration of said latencies in the SVG in text form. In addition, if the latency is more than 10 msec, pick a brighter yellow color as a way to point these long delays out. Signed-off-by: Arjan van de Ven <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20perf timechart: Show the name of the waker/wakee in timechartArjan van de Ven2-8/+22
Timechart currently shows thin green lines for sending or receiving wakeups. This patch also prints (in a very small font) the name of the process that is being woken/wakes up this process. Signed-off-by: Arjan van de Ven <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf utils: Use a define for the maximum length of a trace eventArjan van de Ven1-7/+7
As per Ingo's review: use a #define rather than an open coded constant for the maximum length of a trace event for storing in the perf.data file. Signed-off-by: Arjan van de Ven <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> [ add a few comments to nearby functions ] Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf utils: Be consistent about minimum text size in the svghelperArjan van de Ven1-11/+13
Be more consistent in the svghelper about the minimum text size by having a global #define for this. There needs to be a minimum text size in order to keep the size of the SVG file within the reach of what current SVG viewers can cope with. Signed-off-by: Arjan van de Ven <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paul Mackerras <[email protected]> Cc: Arjan van de Ven <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf: Add a SVG helper library fileArjan van de Ven2-0/+407
The timechart tool writes out SVG format output; this patch adds a set of helper functions to abstract dealing with SVG from the core timechart code. Signed-off-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf: Add a sample_event type to the event_unionArjan van de Ven1-0/+7
Add a sample_event type to the event_union so that raw samples can be processed easily. Signed-off-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf: Allow perf utilities to have "callback" options without argumentsArjan van de Ven1-0/+2
timechart needs to add a "callback" type command line argument that does not take arguments. This patch adds the parse-options.h infrastructure to make this possible. Signed-off-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf: Store trace event name/id pairs in perf.dataArjan van de Ven3-0/+90
The trace event name<->id mapping is dynamic for each kernel compile. In order for perf.data to be useable outside the actual system, we thus need to store a table of this mapping for later use. This patch adds this table to perf.data, and provides helper functions for lookup up fields from this table. To avoid mistakes, lookup-from-table is kept completely seprate from lookup-from-local-debugfs. Signed-off-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-19perf: Add a timestamp to fork eventsArjan van de Ven1-0/+1
perf timechart needs to know when a process forked, in order to be able to visualize properly when tasks start. This patch adds a time field to the event structure, and fills it in appropriately. Signed-off-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-18perf trace: Sample timestamp and cpu when using record flagLi Zefan1-1/+4
Sample timestamp and cpu just like the -R option. Before: init-0 [-01] 1266874889.17179869184709551615: irq_handler_entry: irq=18 handler=eth0 init-0 [-01] 1266874889.17179869184709551615: irq_handler_entry: irq=18 handler=eth0 init-0 [-01] 1266874889.17179869184709551615: irq_handler_entry: irq=1 handler=i8042 init-0 [-01] 1266874889.17179869184709551615: irq_handler_entry: irq=18 handler=eth0 init-0 [-01] 1266874889.17179869184709551615: irq_handler_entry: irq=1 handler=i8042 After: init-0 [001] 7364.568965353: irq_handler_entry: irq=18 handler=eth0 init-0 [001] 7365.530226877: irq_handler_entry: irq=1 handler=i8042 init-0 [001] 7365.542831563: irq_handler_entry: irq=18 handler=eth0 init-0 [001] 7365.644156299: irq_handler_entry: irq=18 handler=eth0 init-0 [001] 7365.694556201: irq_handler_entry: irq=18 handler=eth0 Signed-off-by: Li Zefan <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-18perf tools: Increase MAX_EVENT_LENGTHLi Zefan1-1/+1
The name length of some trace events is longer than 30, like sys_enter_sched_get_priority_max and ext4_mb_discard_preallocations. Passing those events to perf-record will fail, try: # ./perf record -f -e syscalls:sys_enter_sched_get_priority_max -F 1 -a Signed-off-by: Li Zefan <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-18perf tools: Fix memory leak in read_ftrace_printk()Li Zefan1-3/+4
get_tracing_file() should be paired with put_tracing_file(). Signed-off-by: Li Zefan <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-16perf sched: Add 'perf sched map' scheduling event map printoutIngo Molnar2-1/+2
This prints a textual context-switching outline of workload captured via perf sched record. For example, on a 16 CPU box it outputs: N1 O1 . . . S1 . . . B0 . *I0 C1 . M1 . 23002.773423 secs N1 O1 . *Q0 . S1 . . . B0 . I0 C1 . M1 . 23002.773423 secs N1 O1 . Q0 . S1 . . . B0 . *R1 C1 . M1 . 23002.773485 secs N1 O1 . Q0 . S1 . *S0 . B0 . R1 C1 . M1 . 23002.773478 secs *L0 O1 . Q0 . S1 . S0 . B0 . R1 C1 . M1 . 23002.773523 secs L0 O1 . *. . S1 . S0 . B0 . R1 C1 . M1 . 23002.773531 secs L0 O1 . . . S1 . S0 . B0 . R1 C1 *T1 M1 . 23002.773547 secs T1 => irqbalance:2089 L0 O1 . . . S1 . S0 . *P0 . R1 C1 T1 M1 . 23002.773549 secs *N1 O1 . . . S1 . S0 . P0 . R1 C1 T1 M1 . 23002.773566 secs N1 O1 . . . *J0 . S0 . P0 . R1 C1 T1 M1 . 23002.773571 secs N1 O1 . . . J0 . S0 *B0 P0 . R1 C1 T1 M1 . 23002.773592 secs N1 O1 . . . J0 . *U0 B0 P0 . R1 C1 T1 M1 . 23002.773582 secs N1 O1 . . . *S1 . U0 B0 P0 . R1 C1 T1 M1 . 23002.773604 secs N1 O1 . . . S1 . U0 B0 *. . R1 C1 T1 M1 . 23002.773615 secs N1 O1 . . . S1 . U0 B0 . . *K0 C1 T1 M1 . 23002.773631 secs N1 O1 . *M0 . S1 . U0 B0 . . K0 C1 T1 M1 . 23002.773624 secs N1 O1 . M0 . S1 . U0 *. . . K0 C1 T1 M1 . 23002.773644 secs N1 O1 . M0 . S1 . U0 . . . *R1 C1 T1 M1 . 23002.773662 secs N1 O1 . M0 . S1 . *. . . . R1 C1 T1 M1 . 23002.773648 secs N1 O1 . *. . S1 . . . . . R1 C1 T1 M1 . 23002.773680 secs N1 O1 . . . *L0 . . . . . R1 C1 T1 M1 . 23002.773717 secs *N0 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773709 secs *N1 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773747 secs Columns stand for individual CPUs, from CPU0 to CPU15, and the two-letter shortcuts stand for tasks that are running on a CPU. '*' denotes the CPU that had the event. A dot signals an idle CPU. New tasks are assigned new two-letter shortcuts - when they occur first they are printed. In the above example 'T1' stood for irqbalance: T1 => irqbalance:2089 Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-16perf sched: Make idle thread and comm/pid names more consistentIngo Molnar1-1/+1
Peter noticed that we have 3 ways of referring to the idle thread: [idle]:0 swapper:0 swapper-0 Standardize on 'swapper:0'. Reported-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-16perf sched: Account for lost events, increase default bufferingIngo Molnar1-1/+1
Output such lost event and state machine weirdness stats: TOTAL: | 14974.910 ms | 46384 | --------------------------------------------------- INFO: 8.865% lost events (19132 out of 215819, in 8 chunks) INFO: 0.198% state machine bugs (49 out of 24708) (due to lost events?) And increase buffering to -m 1024 (4 MB) by default. Since we use output multiplexing that kind of space is needed. Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-14perf tools: Implement counter output multiplexingIngo Molnar1-2/+4
Finish the -M/--multiplex option implementation: - separate it out from group_fd - correctly set it via the ioctl and dont mmap counters that are multiplexed - modify the perf record event loop to deal with buffer-less counters. - remove the -g option from perf sched record - account for unordered events in perf sched latency - (add -f to perf sched record to ease measurements) - skip idle threads (pid==0) in latency output The result is better latency output by 'perf sched latency': ----------------------------------------------------------------------------------- Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | ----------------------------------------------------------------------------------- ksoftirqd/8 | 0.071 ms | 2 | avg: 0.458 ms | max: 0.913 ms | at-spi-registry | 0.609 ms | 19 | avg: 0.013 ms | max: 0.023 ms | perf | 3.316 ms | 16 | avg: 0.013 ms | max: 0.054 ms | Xorg | 0.392 ms | 19 | avg: 0.011 ms | max: 0.018 ms | sleep | 0.537 ms | 2 | avg: 0.009 ms | max: 0.009 ms | ----------------------------------------------------------------------------------- TOTAL: | 4.925 ms | 58 | --------------------------------------------- Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-13perf sched: Implement the 'perf sched record' subcommandIngo Molnar1-1/+2
Implement the 'perf sched record' subcommand that adds a default list of events, turns on raw sampling and system-wide tracing and passes off the rest of the command to perf record. This is more convenient than having to specify the events all the time. Before: $ perf record -a -R -e sched:sched_switch:r -e sched:sched_stat_wait:r -e sched:sched_stat_sleep:r -e sched:sched_stat_iowait:r -e sched:sched_process_exit:r -e sched:sched_process_fork:r -e sched:sched_wakeup:r -e sched:sched_migrate_task:r -c 1 sleep 1 After: $ perf sched record -f sleep 1 Also fix an assumption in the event string parser that assumed that strings passed in can be modified. (In this case they wont be as they come from a readonly constant section.) Signed-off-by: Ingo Molnar <[email protected]>
2009-09-13perf sched: Clean up PID sorting logicIngo Molnar1-4/+4
Use a sort list for thread atoms insertion as well - instead of hardcoded for PID. Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-13perf sched: Display time in milliseconds, reorganize outputIngo Molnar1-2/+4
After: ----------------------------------------------------------------------------------- Task | runtime ms | switches | average delay ms | maximum delay ms | ----------------------------------------------------------------------------------- migration/0 | 0.000 ms | 1 | avg: 0.047 ms | max: 0.047 ms | ksoftirqd/0 | 0.000 ms | 1 | avg: 0.039 ms | max: 0.039 ms | migration/1 | 0.000 ms | 3 | avg: 0.013 ms | max: 0.016 ms | migration/3 | 0.000 ms | 2 | avg: 0.003 ms | max: 0.004 ms | migration/4 | 0.000 ms | 1 | avg: 0.022 ms | max: 0.022 ms | distccd | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | distccd | 0.000 ms | 1 | avg: 0.014 ms | max: 0.014 ms | distccd | 0.000 ms | 2 | avg: 0.000 ms | max: 0.000 ms | distccd | 0.000 ms | 2 | avg: 0.012 ms | max: 0.019 ms | distccd | 0.000 ms | 1 | avg: 0.002 ms | max: 0.002 ms | as | 0.000 ms | 2 | avg: 0.019 ms | max: 0.019 ms | as | 0.000 ms | 3 | avg: 0.015 ms | max: 0.017 ms | as | 0.000 ms | 1 | avg: 0.009 ms | max: 0.009 ms | perf | 0.000 ms | 1 | avg: 0.001 ms | max: 0.001 ms | gcc | 0.000 ms | 1 | avg: 0.021 ms | max: 0.021 ms | run-mozilla.sh | 0.000 ms | 2 | avg: 0.010 ms | max: 0.017 ms | mozilla-plugin- | 0.000 ms | 1 | avg: 0.006 ms | max: 0.006 ms | gcc | 0.000 ms | 2 | avg: 0.013 ms | max: 0.013 ms | ----------------------------------------------------------------------------------- (The runtime ms column is not filled in yet.) Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>