aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2010-12-09perf report: Allow user to specify path to kallsyms fileDavid Ahern4-2/+13
This is useful for analyzing a perf data file on a different system than the one data was collected on and still include symbols from loaded kernel modules in the output. Commiter note: Updated the man page accordingly. LKML-Reference: <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-07perf makefile: Allow strong and weak functions in LIB_OBJSIan Munsie1-1/+1
When we build perf we place all of the .o files from the library files (util, arch/x/util, etc) into libperf.a which is then linked into perf. The problem is that the linker will by default only consider .o files within the .a archive if they are necessary to satisfy an unresolved symbol. As weak functions are not unresolved, it will not consider a .o file from the archive containing the strong versions of weak functions unless it requires it for another reason. This patch adds the --whole-archive flags to the linker when passing in the libperf.a file to ensure that it will consider every .o file in the archive, not just what it believes that it needs. The end result is that weak functions can now be overridden by strong variants of them in the libperf.a file. Cc: "tom.leiming" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ian Munsie <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf record: Fix eternal wait for stillborn childArnaldo Carvalho de Melo1-2/+4
When execvp fails to find the specified command on the path we won't get SIGCHLD, so send a SIGUSR1 and exit right away. Current situation would require a SIGINT performed by the user and would produce meaningless summary. Now: [acme@emilia linux]$ ./foo -bash: ./foo: No such file or directory [acme@emilia linux]$ perf record ./foo ./foo: No such file or directory [acme@emilia linux]$ Acked-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Thomas Gleixner <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf tools: Catch a few uncheck calloc/malloc'sChris Samuel5-0/+15
There were a few stray calloc()'s and malloc()'s which were not having their return values checked for success. As the calling code either already coped with failure or didn't actually care we just return -ENOMEM at that point. Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Chris Samuel <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf script: Fix compiler warning in builtin_script.c:is_top_script()Stephane Eranian1-5/+5
Fix annoying compiler warning in the is_top_script() function. The issue was that a const char * was cast into a char * to call ends_with(). We fix the users of ends_with() instead. Some are passing a char *, but it is okay to cast the return value of ends_with() to char * (because we understand what ends_with() does). Cc: David S. Miller <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf session: Sort all events if ordered_samples=trueThomas Gleixner1-53/+72
Now that we have timestamps on FORK, EXIT, COMM, MMAP events we can sort everything in time order. This fixes the following observed problem: mmap(file1) -> pagefault() -> munmap(file1) mmap(file2) -> pagefault() -> munmap(file2) Resulted in decoding both pagefaults in file2 because the file1 map was already replaced by the file2 map when the map address was identical. With all events sorted we decode both pagefaults correctly. Cc: Frederic Weisbecker <[email protected]> Cc: Ian Munsie <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf options: add OPT_CALLBACK_DEFAULT_NOOPTAkihiro Nagai1-0/+4
Add new macro OPT_CALLBACK_DEFAULT_NOOPT for parse_options. It enables to pass the default value (opt->defval) to the callback function processing options require no argument. Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Akihiro Nagai <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-06perf hist: Better displaying of unresolved DSOs and symbolsIan Munsie1-3/+3
In the event that a DSO has not been identified, just print out [unknown] instead of the instruction pointer as we previously were doing, which is pretty meaningless for a shared object (at least to the users perspective). The IP we print out is fairly meaningless in general anyway - it's just one (the first) of the many addresses that were lumped together as unidentified, and could span many shared objects and symbols. In reality if we see this [unknown] output then the report -D output is going to be more useful anyway as we can see all the different address that it represents. If we are printing the symbols we are still going to see this IP in that column anyway since they shouldn't resolve either. This patch also changes the symbol address printouts so that they print out 0x before the address, are left aligned, and changes the %L format string (which relies on a glibc bug) to %ll. Before: 74.11% :3259 4a6c [k] 4a6c After: 74.11% :3259 [unknown] [k] 0x4a6c Cc: Frederic Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ian Munsie <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-04perf tools: Ask for ID PERF_SAMPLE_ info on all PERF_RECORD_ eventsArnaldo Carvalho de Melo9-100/+315
So that we can use -T == --timestamp, asking for PERF_SAMPLE_TIME: $ perf record -aT $ perf report -D | grep PERF_RECORD_ <SNIP> 3 5951915425 0x47530 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff8138c1a2 period: 215979 cpu:3 3 5952026879 0x47588 [0x90]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff810cb480 period: 215979 cpu:3 3 5952059959 0x47618 [0x38]: PERF_RECORD_FORK(6853:6853):(16811:16811) 3 5952138878 0x47650 [0x78]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff811bac35 period: 431478 cpu:3 3 5952375068 0x476c8 [0x30]: PERF_RECORD_COMM: find:6853 3 5952395923 0x476f8 [0x50]: PERF_RECORD_MMAP 6853/6853: [0x400000(0x25000) @ 0]: /usr/bin/find 3 5952413756 0x47748 [0xa0]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff810d080f period: 859332 cpu:3 3 5952419837 0x477e8 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44600000(0x21d000) @ 0]: /lib64/ld-2.5.so 3 5952437929 0x47840 [0x48]: PERF_RECORD_MMAP 6853/6853: [0x7fff7e1c9000(0x1000) @ 0x7fff7e1c9000]: [vdso] 3 5952570127 0x47888 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f46200000(0x218000) @ 0]: /lib64/libselinux.so.1 3 5952623637 0x478e0 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44a00000(0x356000) @ 0]: /lib64/libc-2.5.so 3 5952675720 0x47938 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44e00000(0x204000) @ 0]: /lib64/libdl-2.5.so 3 5952710080 0x47990 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f45a00000(0x246000) @ 0]: /lib64/libsepol.so.1 3 5952847802 0x479e8 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff813897f0 period: 1142536 cpu:3 <SNIP> First column is the cpu and the second the timestamp. That way we can investigate problems in the event stream. If the new perf binary is run on an older kernel, it will disable this feature automatically. Tested-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Ian Munsie <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Ian Munsie <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-04perf session: Parse sample earlierArnaldo Carvalho de Melo18-192/+236
At perf_session__process_event, so that we reduce the number of lines in eache tool sample processing routine that now receives a sample_data pointer already parsed. This will also be useful in the next patch, where we'll allow sample the identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu, timestamp) just after before every event. Also validate callchains in perf_session__process_event, i.e. as early as possible, and keep a counter of the number of events discarded due to invalid callchains, warning the user about it if it happens. There is an assumption that was kept that all events have the same sample_type, that will be dealt with in the future, when this preexisting limitation will be removed. Tested-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Ian Munsie <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Ian Munsie <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-02Merge branch 'perf/core' of ↵Ingo Molnar15-68/+324
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core
2010-12-01perf stat: Add csv-style outputStephane Eranian2-40/+109
This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <[email protected]> Cc: Frederik Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf stat: Use --big-num format by defaultArnaldo Carvalho de Melo1-1/+1
[acme@mica linux]$ perf stat ls > /dev/null Performance counter stats for 'ls': 1.512532 task-clock-msecs # 0.801 CPUs 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 241 page-faults # 0.159 M/sec 2,973,331 cycles # 1965.797 M/sec 1,460,802 instructions # 0.491 IPC 314,642 branches # 208.023 M/sec 18,475 branch-misses # 5.872 % <not counted> cache-references <not counted> cache-misses 0.001887676 seconds time elapsed To get the previous behaviour just use --no-big-num: [acme@mica linux]$ perf stat --no-big-num ls > /dev/null Performance counter stats for 'ls': 1.468014 task-clock-msecs # 0.795 CPUs 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 241 page-faults # 0.164 M/sec 2900254 cycles # 1975.631 M/sec 1437991 instructions # 0.496 IPC 310905 branches # 211.786 M/sec 17912 branch-misses # 5.761 % <not counted> cache-references <not counted> cache-misses 0.001845435 seconds time elapsed [acme@mica linux]$ Suggested-by: Ingo Molnar <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf stat: Document missing optionsShawn Bohrer1-7/+27
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf test: Fix spelling mistake in documentationShawn Bohrer1-1/+1
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf trace: Document missing optionsShawn Bohrer1-0/+7
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf top: Document missing optionsShawn Bohrer1-4/+24
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf sched: Document missing optionsShawn Bohrer1-2/+16
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf report: Document missing optionsShawn Bohrer1-4/+45
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf record: Document missing optionsShawn Bohrer1-4/+13
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf probe: Fix spelling mistake in documentationShawn Bohrer1-1/+1
Acked-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf lock: Document missing optionsShawn Bohrer1-0/+15
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf kvm: Document missing optionsShawn Bohrer1-1/+7
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf diff: Document missing optionsShawn Bohrer1-1/+18
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf diff: Fix displacement and modules options short flagShawn Bohrer1-1/+1
The --displacement and --modules options to perf diff both use -m as a short flag. Change --displacement to use -M since other perf commands use -m, --modules. Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf buildid-list: Document missing optionsShawn Bohrer1-0/+3
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01perf annotate: Document missing options.Shawn Bohrer1-1/+36
Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-12-01Merge branch 'perf/rename' into perf/coreIngo Molnar13-121/+121
Merge reason: This is an older commit under testing that was not pushed yet - merge it. Also fix up the merge in command-list.txt. Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Tom Zanussi <[email protected]>
2010-11-30perf tools: fix event parsing of comma-separated tracepoint eventsCorey Ashford1-4/+8
There are number of issues that prevent the use of multiple tracepoint events being specified in a -e/--event switch, separated by commas. For example, perf stat -e irq:irq_handler_entry,irq:irq_handler_exit ... fails because the tracepoint event parsing code doesn't recognize the comma separator properly. This patch corrects those issues. Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Reported-by: Michael Ellerman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Corey Ashford <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf packaging: add memcpy to perf MANIFESTDon Zickus1-0/+1
There seems to be a new dependency on arch/*/lib/memcpy*.S when compiling the perf tool. Make sure that file is included in the MANIFEST when creating the tarball. Cc: Ingo Molnar <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Don Zickus <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf debug: Simplify trace_eventArnaldo Carvalho de Melo1-28/+13
No need to check that many times if debug_trace is on. Cc: Frédéric Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Allocate chunks of sample objectsThomas Gleixner2-5/+19
The ordered sample code allocates singular reference objects struct sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's silly. Allocate ~64k sized chunks and hand them out. Performance gain: ~ 15% Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Cache sample objectsThomas Gleixner2-4/+27
When the sample queue is flushed we free the sample reference objects. Though we need to malloc new objects when we process further. Stop the malloc/free orgy and cache the already allocated object for resuage. Only allocate when the cache is empty. Performance gain: ~ 10% Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Keep file mmaped instead of malloc/memcpyThomas Gleixner1-16/+11
Profiling perf with perf revealed that a large part of the processing time is spent in malloc/memcpy/free in the sample ordering code. That code copies the data from the mmap into malloc'ed memory. That's silly. We can keep the mmap and just store the pointer in the queuing data structure. For 64 bit this is not a problem as we map the whole file anyway. On 32bit we keep 8 maps around and unmap the oldest before mmaping the next chunk of the file. Performance gain: 2.95s -> 1.23s (Faktor 2.4) Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Use sensible mmap sizeThomas Gleixner1-12/+29
On 64bit we can map the whole file in one go, on 32bit we can at least map 32MB and not map/unmap tiny chunks of the file. Base the progress bar on 1/16 of the data size. Preparatory patch to get rid of the malloc/memcpy/free of trace data. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Simplify termination checksThomas Gleixner1-9/+11
No need to check twice. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Move ui_progress_update in __perf_session__process_events()Thomas Gleixner1-1/+1
The progress bar is changed when the file offset changes. This happens only when the next mmap is done. No need to call ui_progress_update() for every event. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Cleanup __perf_session__process_events()Thomas Gleixner1-40/+37
Replace the pseudo C++ self argument with session and give the mmap related variables a sensible name. shift is a complete misnomer - it took me several rounds of cursing to figure out that it's not a shift value. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Use appropriate pointer type instead of silly typecastingThomas Gleixner1-2/+2
There is no reason to use a struct sample_event pointer in struct sample_queue and type cast it when flushing the queue. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf session: Fix list sort algorithmThomas Gleixner2-68/+49
The homebrewn sort algorithm fails to sort in time order. One of the problem spots is that it fails to deal with equal timestamps correctly. My first gut reaction was to replace the fancy list with an rbtree, but the performance is 3 times worse. Rewrite it so it works. Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf header: Don't assume there's no attr info if no sample ids is providedFranck Bui-Huu1-3/+8
This primarily fixes perf-report, which didn't report the correct type of event if perf-record was called to record one event different from 'cycles': $ perf record -e instructions true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.007 MB perf.data (~295 samples) ] $ perf report | head -n1 # Events: 7 cycles LPU-Reference: <[email protected]> Signed-off-by: Franck Bui-Huu <[email protected]>
2010-11-30perf symbols: Figure out start address of kernel map from kallsymsMing Lei1-1/+42
On ARM, module symbol start address is ahead of kernel symbol start address, so we can't suppose that the start address of kernel map always is zero, otherwise may cause incorrect .start and .end of kernel map (caused by fixup) when there are modules loaded, then map_groups__find may return incorrect map for symbol query. This patch always figures out the start address of kernel map from /proc/kallsyms if the file is available, so fix the issues on ARM for module loaded case. This patch fixes the following issues on ARM when modules are loaded: - vmlinux symbol can't be found by kallsyms maps doing 'perf test' - module symbols are parsed mistakenlly when doing 'perf top'/'perf report' Cc: Ian Munsie <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Zanussi <[email protected]> LKML-Reference: <20101125192725.62d31b42@tom-lei> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-30perf symbols: Fix kallsyms kernel/module map splittingArnaldo Carvalho de Melo1-5/+11
On ARM, module addresss space is ahead of kernel space, so the module symbols are handled before kernel symbol in dso__split_kallsyms, then was causing one map to be created for each kernel symbol. Reported-by: Ming Lei <[email protected]> Tested-by: Ming Lei <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Ming Lei <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tom Zanussi <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-27perf tools: Fix lost and unknown events handlingArnaldo Carvalho de Melo4-1/+46
Fix it by explaining what can be happening and giving the number of processed and lost events. Also holler if unknown events were found, that can be due to processing a perf.data file collected using a newer tool where newer events got added on reporting using an older perf tool, that or a bug, so ask for a report to be made. Works on both --tui and --stdio. Suggested-by: Thomas Gleixner <[email protected]> Cc: Frédéric Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-27perf trace: Handle DT_UNKNOWN on filesystems that don't support d_typeShawn Bohrer1-8/+25
Some filesystems like xfs and reiserfs will return DT_UNKNOWN for the d_type. Handle this case by calling stat() to determine the type. Cc: Andreas Schwab <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Shawn Bohrer <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-27perf symbols: Correct final kernel map guessesIan Munsie2-2/+2
If a 32bit userspace perf is running on a 64bit kernel, the end of the final map in the kernel would incorrectly be set to 2^32-1 rather than 2^64-1. Cc: Ingo Molnar <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ian Munsie <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-26perf events: Default to using event__process_lostArnaldo Carvalho de Melo1-1/+1
Tool developers have to fill in a 'perf_event_ops' method table to specify how to handle each event, so far the ones that were not explicitely especified would get a stub that would just discard the event. Change that so that tool developers can get the lost event details and the total number of such events at the end of 'perf report -D' output. Suggested-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> CC: Thomas Gleixner <[email protected]> Cc: Tom Zanussi <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-26perf record: Add option to disable collecting build-idsArnaldo Carvalho de Melo4-5/+26
Collecting build-ids for long running sessions may take a long time because it needs to traverse the whole just collected perf.data stream of events, marking the DSOs that had hits and then looking for the .note.gnu.build-id ELF section. For things like the 'trace' tool that records and right away consumes the data on systems where its unlikely that the DSOs being monitored will change while 'trace' runs, it is desirable to remove build id collection, so add a -B/--no-buildid option to perf record to allow such use case. Longer term we'll avoid all this if we, at DSO load time, in the kernel, take advantage of this slow code path to collect the build-id and stash it somewhere, so that we can insert it in the PERF_RECORD_MMAP event. Reported-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tom Zanussi <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2010-11-26Merge branch 'perf/urgent' into perf/coreIngo Molnar2-9/+12
Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf bench: Add feature that measures the performance of the ↵Hitoshi Mitake7-0/+62
arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem' This patch ports arch/x86/lib/memcpy_64.S to perf bench mem memcpy for benchmarking memcpy() in userland with tricky and dirty way. util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and util/include/linux/linkage.h are mostly dummy files with small wrappers, so that we are able to include memcpy_64.S unmodified. Signed-off-by: Hitoshi Mitake <[email protected]> Cc: [email protected] Cc: Miao Xie <[email protected]> Cc: Ma Ling <[email protected]> Cc: Zhao Yakui <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Andi Kleen <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>