diff options
author | Linus Torvalds <[email protected]> | 2017-09-04 08:39:02 -0700 |
---|---|---|
committer | Linus Torvalds <[email protected]> | 2017-09-04 08:39:02 -0700 |
commit | 9657752cb5039c7498d4b27c4a75530f93b87d9b (patch) | |
tree | ef4198ba427da0ef5e1cb8fb4ec62843b645aed9 /tools/perf/util/annotate.c | |
parent | 0081a0ce809b611c1f37da5d6ae5bc8027ffd1c4 (diff) | |
parent | 1b2f76d77a277bb70d38ad0991ed7f16bbc115a9 (diff) |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add branch type profiling/tracing support. (Jin Yao)
- Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
physical memory addresses, where the PMU supports it. (Kan Liang)
- Export some PMU capability details in the new
/sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
Kleen)
- Aux data fixes and updates (Will Deacon)
- kprobes fixes and updates (Masami Hiramatsu)
- AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)
On the tooling side, here's a (limited!) list of highlights - there
were many other changes that I could not list, see the shortlog and
git history for details:
UI improvements:
- Implement a visual marker for fused x86 instructions in the
annotate TUI browser, available now in 'perf report', more work
needed to have it available as well in 'perf top' (Jin Yao)
Further explanation from one of Jin's patches:
│ ┌──cmpl $0x0,argp_program_version_hook
81.93 │ ├──je 20
│ │ lock cmpxchg %esi,0x38a9a4(%rip)
│ │↓ jne 29
│ │↓ jmp 43
11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
That means the cmpl+je is a fused instruction pair and they should
be considered together.
- Record the branch type and then show statistics and info about in
callchain entries (Jin Yao)
Example from one of Jin's patches:
# perf record -g -j any,save_type
# perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
namespaces support:
- Add initial support for namespaces, using setns to access files in
namespaces, grabbing their build-ids, etc. (Krister Johansen)
perf trace enhancements:
- Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
(Arnaldo Carvalho de Melo)
- Add initial 'clone' syscall args beautifier in 'perf trace'
(Arnaldo Carvalho de Melo)
- Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
(Arnaldo Carvalho de Melo)
- Beautifiers for the 'cmd' arg of several ioctl types, including:
sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
Melo)
- Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
CTF conversion, allowing CTF trace visualization tools to show
callchains and to resolve symbols (Geneviève Bastien)
- Beautify the fcntl syscall, which is an interesting one in the
sense that infrastructure had to be put in place to change the
formatters of some arguments according to the value in a previous
one, i.e. cmd dictates how arg and the syscall return will be
formatted. (Arnaldo Carvalho de Melo
perf stat enhancements:
- Use group read for event groups in 'perf stat', reducing overhead
when groups are defined in the event specification, i.e. when using
{} to enclose a list of events, asking them to be read at the same
time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)
pipe mode improvements:
- Process tracing data in 'perf annotate' pipe mode (David
Carrillo-Cisneros)
- Add header record types to pipe-mode, now this command:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
Will show the same as in non-pipe mode, i.e. involving a perf.data
file (David Carrillo-Cisneros)
Vendor specific hardware event support updates/enhancements:
- Update POWER9 vendor events tables (Sukadev Bhattiprolu)
- Add POWER9 PMU events Sukadev (Bhattiprolu)
- Support additional POWER8+ PVR in PMU mapfile (Shriya)
- Add Skylake server uncore JSON vendor events (Andi Kleen)
- Support exporting Intel PT data to sqlite3 with python perf
scripts, this is in addition to the postgresql support that was
already there (Adrian Hunter)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
perf symbols: Fix plt entry calculation for ARM and AARCH64
perf probe: Fix kprobe blacklist checking condition
perf/x86: Fix caps/ for !Intel
perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
perf/core, pt, bts: Get rid of itrace_started
perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
tools headers: Sync cpu features kernel ABI headers with tooling headers
perf tools: Pass full path of FEATURES_DUMP
perf tools: Robustify detection of clang binary
tools lib: Allow external definition of CC, AR and LD
perf tools: Allow external definition of flex and bison binary names
tools build tests: Don't hardcode gcc name
perf report: Group stat values on global event id
perf values: Zero value buffers
perf values: Fix allocation check
perf values: Fix thread index bug
perf report: Add dump_read function
perf record: Set read_format for inherit_stat
perf c2c: Fix remote HITM detection for Skylake
perf tools: Fix static build with newer toolchains
...
Diffstat (limited to 'tools/perf/util/annotate.c')
-rw-r--r-- | tools/perf/util/annotate.c | 137 |
1 files changed, 89 insertions, 48 deletions
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index be1caabb9290..4397a8b6e6cd 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -47,7 +47,12 @@ struct arch { bool sorted_instructions; bool initialized; void *priv; + unsigned int model; + unsigned int family; int (*init)(struct arch *arch); + bool (*ins_is_fused)(struct arch *arch, const char *ins1, + const char *ins2); + int (*cpuid_parse)(struct arch *arch, char *cpuid); struct { char comment_char; char skip_functions_char; @@ -129,6 +134,8 @@ static struct arch architectures[] = { .name = "x86", .instructions = x86__instructions, .nr_instructions = ARRAY_SIZE(x86__instructions), + .ins_is_fused = x86__ins_is_fused, + .cpuid_parse = x86__cpuid_parse, .objdump = { .comment_char = '#', }, @@ -171,6 +178,14 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } +bool ins__is_fused(struct arch *arch, const char *ins1, const char *ins2) +{ + if (!arch || !arch->ins_is_fused) + return false; + + return arch->ins_is_fused(arch, ins1, ins2); +} + static int call__parse(struct arch *arch, struct ins_operands *ops, struct map *map) { char *endptr, *tok, *name; @@ -502,6 +517,11 @@ bool ins__is_ret(const struct ins *ins) return ins->ops == &ret_ops; } +bool ins__is_lock(const struct ins *ins) +{ + return ins->ops == &lock_ops; +} + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -590,10 +610,10 @@ int symbol__alloc_hist(struct symbol *sym) size_t sizeof_sym_hist; /* Check for overflow when calculating sizeof_sym_hist */ - if (size > (SIZE_MAX - sizeof(struct sym_hist)) / sizeof(u64)) + if (size > (SIZE_MAX - sizeof(struct sym_hist)) / sizeof(struct sym_hist_entry)) return -1; - sizeof_sym_hist = (sizeof(struct sym_hist) + size * sizeof(u64)); + sizeof_sym_hist = (sizeof(struct sym_hist) + size * sizeof(struct sym_hist_entry)); /* Check for overflow in zalloc argument */ if (sizeof_sym_hist > (SIZE_MAX - sizeof(*notes->src)) @@ -677,7 +697,8 @@ static int __symbol__account_cycles(struct annotation *notes, } static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map, - struct annotation *notes, int evidx, u64 addr) + struct annotation *notes, int evidx, u64 addr, + struct perf_sample *sample) { unsigned offset; struct sym_hist *h; @@ -693,12 +714,15 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map, offset = addr - sym->start; h = annotation__histogram(notes, evidx); - h->sum++; - h->addr[offset]++; + h->nr_samples++; + h->addr[offset].nr_samples++; + h->period += sample->period; + h->addr[offset].period += sample->period; pr_debug3("%#" PRIx64 " %s: period++ [addr: %#" PRIx64 ", %#" PRIx64 - ", evidx=%d] => %" PRIu64 "\n", sym->start, sym->name, - addr, addr - sym->start, evidx, h->addr[offset]); + ", evidx=%d] => nr_samples: %" PRIu64 ", period: %" PRIu64 "\n", + sym->start, sym->name, addr, addr - sym->start, evidx, + h->addr[offset].nr_samples, h->addr[offset].period); return 0; } @@ -718,7 +742,8 @@ static struct annotation *symbol__get_annotation(struct symbol *sym, bool cycles } static int symbol__inc_addr_samples(struct symbol *sym, struct map *map, - int evidx, u64 addr) + int evidx, u64 addr, + struct perf_sample *sample) { struct annotation *notes; @@ -727,7 +752,7 @@ static int symbol__inc_addr_samples(struct symbol *sym, struct map *map, notes = symbol__get_annotation(sym, false); if (notes == NULL) return -ENOMEM; - return __symbol__inc_addr_samples(sym, map, notes, evidx, addr); + return __symbol__inc_addr_samples(sym, map, notes, evidx, addr, sample); } static int symbol__account_cycles(u64 addr, u64 start, @@ -791,14 +816,16 @@ int addr_map_symbol__account_cycles(struct addr_map_symbol *ams, return err; } -int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, int evidx) +int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, struct perf_sample *sample, + int evidx) { - return symbol__inc_addr_samples(ams->sym, ams->map, evidx, ams->al_addr); + return symbol__inc_addr_samples(ams->sym, ams->map, evidx, ams->al_addr, sample); } -int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 ip) +int hist_entry__inc_addr_samples(struct hist_entry *he, struct perf_sample *sample, + int evidx, u64 ip) { - return symbol__inc_addr_samples(he->ms.sym, he->ms.map, evidx, ip); + return symbol__inc_addr_samples(he->ms.sym, he->ms.map, evidx, ip, sample); } static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map *map) @@ -908,11 +935,12 @@ struct disasm_line *disasm__get_next_ip_line(struct list_head *head, struct disa } double disasm__calc_percent(struct annotation *notes, int evidx, s64 offset, - s64 end, const char **path, u64 *nr_samples) + s64 end, const char **path, struct sym_hist_entry *sample) { struct source_line *src_line = notes->src->lines; double percent = 0.0; - *nr_samples = 0; + + sample->nr_samples = sample->period = 0; if (src_line) { size_t sizeof_src_line = sizeof(*src_line) + @@ -926,19 +954,24 @@ double disasm__calc_percent(struct annotation *notes, int evidx, s64 offset, *path = src_line->path; percent += src_line->samples[evidx].percent; - *nr_samples += src_line->samples[evidx].nr; + sample->nr_samples += src_line->samples[evidx].nr; offset++; } } else { struct sym_hist *h = annotation__histogram(notes, evidx); unsigned int hits = 0; + u64 period = 0; - while (offset < end) - hits += h->addr[offset++]; + while (offset < end) { + hits += h->addr[offset].nr_samples; + period += h->addr[offset].period; + ++offset; + } - if (h->sum) { - *nr_samples = hits; - percent = 100.0 * hits / h->sum; + if (h->nr_samples) { + sample->period = period; + sample->nr_samples = hits; + percent = 100.0 * hits / h->nr_samples; } } @@ -1037,10 +1070,10 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st if (dl->offset != -1) { const char *path = NULL; - u64 nr_samples; double percent, max_percent = 0.0; double *ppercents = &percent; - u64 *psamples = &nr_samples; + struct sym_hist_entry sample; + struct sym_hist_entry *psamples = &sample; int i, nr_percent = 1; const char *color; struct annotation *notes = symbol__annotation(sym); @@ -1054,7 +1087,7 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st if (perf_evsel__is_group_event(evsel)) { nr_percent = evsel->nr_members; ppercents = calloc(nr_percent, sizeof(double)); - psamples = calloc(nr_percent, sizeof(u64)); + psamples = calloc(nr_percent, sizeof(struct sym_hist_entry)); if (ppercents == NULL || psamples == NULL) { return -1; } @@ -1065,10 +1098,10 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st notes->src->lines ? i : evsel->idx + i, offset, next ? next->offset : (s64) len, - &path, &nr_samples); + &path, &sample); ppercents[i] = percent; - psamples[i] = nr_samples; + psamples[i] = sample; if (percent > max_percent) max_percent = percent; } @@ -1106,12 +1139,15 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st for (i = 0; i < nr_percent; i++) { percent = ppercents[i]; - nr_samples = psamples[i]; + sample = psamples[i]; color = get_percent_color(percent); if (symbol_conf.show_total_period) + color_fprintf(stdout, color, " %11" PRIu64, + sample.period); + else if (symbol_conf.show_nr_samples) color_fprintf(stdout, color, " %7" PRIu64, - nr_samples); + sample.nr_samples); else color_fprintf(stdout, color, " %7.2f", percent); } @@ -1127,13 +1163,13 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st if (ppercents != &percent) free(ppercents); - if (psamples != &nr_samples) + if (psamples != &sample) free(psamples); } else if (max_lines && printed >= max_lines) return 1; else { - int width = 8; + int width = symbol_conf.show_total_period ? 12 : 8; if (queue) return -1; @@ -1327,7 +1363,7 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil !dso__is_kcore(dso)) return SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX; - build_id_filename = dso__build_id_filename(dso, NULL, 0); + build_id_filename = dso__build_id_filename(dso, NULL, 0, false); if (build_id_filename) { __symbol__join_symfs(filename, filename_size, build_id_filename); free(build_id_filename); @@ -1381,7 +1417,7 @@ static const char *annotate__norm_arch(const char *arch_name) int symbol__disassemble(struct symbol *sym, struct map *map, const char *arch_name, size_t privsize, - struct arch **parch) + struct arch **parch, char *cpuid) { struct dso *dso = map->dso; char command[PATH_MAX * 2]; @@ -1418,6 +1454,9 @@ int symbol__disassemble(struct symbol *sym, struct map *map, } } + if (arch->cpuid_parse && cpuid) + arch->cpuid_parse(arch, cpuid); + pr_debug("%s: filename=%s, sym=%s, start=%#" PRIx64 ", end=%#" PRIx64 "\n", __func__, symfs_filename, sym->name, map->unmap_ip(map, sym->start), map->unmap_ip(map, sym->end)); @@ -1648,19 +1687,19 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map, struct sym_hist *h = annotation__histogram(notes, evidx); struct rb_root tmp_root = RB_ROOT; int nr_pcnt = 1; - u64 h_sum = h->sum; + u64 nr_samples = h->nr_samples; size_t sizeof_src_line = sizeof(struct source_line); if (perf_evsel__is_group_event(evsel)) { for (i = 1; i < evsel->nr_members; i++) { h = annotation__histogram(notes, evidx + i); - h_sum += h->sum; + nr_samples += h->nr_samples; } nr_pcnt = evsel->nr_members; sizeof_src_line += (nr_pcnt - 1) * sizeof(src_line->samples); } - if (!h_sum) + if (!nr_samples) return 0; src_line = notes->src->lines = calloc(len, sizeof_src_line); @@ -1670,7 +1709,7 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map, start = map__rip_2objdump(map, sym->start); for (i = 0; i < len; i++) { - u64 offset, nr_samples; + u64 offset; double percent_max = 0.0; src_line->nr_pcnt = nr_pcnt; @@ -1679,9 +1718,9 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map, double percent = 0.0; h = annotation__histogram(notes, evidx + k); - nr_samples = h->addr[i]; - if (h->sum) - percent = 100.0 * nr_samples / h->sum; + nr_samples = h->addr[i].nr_samples; + if (h->nr_samples) + percent = 100.0 * nr_samples / h->nr_samples; if (percent > percent_max) percent_max = percent; @@ -1750,10 +1789,10 @@ static void symbol__annotate_hits(struct symbol *sym, struct perf_evsel *evsel) u64 len = symbol__size(sym), offset; for (offset = 0; offset < len; ++offset) - if (h->addr[offset] != 0) + if (h->addr[offset].nr_samples != 0) printf("%*" PRIx64 ": %" PRIu64 "\n", BITS_PER_LONG / 2, - sym->start + offset, h->addr[offset]); - printf("%*s: %" PRIu64 "\n", BITS_PER_LONG / 2, "h->sum", h->sum); + sym->start + offset, h->addr[offset].nr_samples); + printf("%*s: %" PRIu64 "\n", BITS_PER_LONG / 2, "h->nr_samples", h->nr_samples); } int symbol__annotate_printf(struct symbol *sym, struct map *map, @@ -1771,7 +1810,7 @@ int symbol__annotate_printf(struct symbol *sym, struct map *map, int printed = 2, queue_len = 0; int more = 0; u64 len; - int width = 8; + int width = symbol_conf.show_total_period ? 12 : 8; int graph_dotted_len; filename = strdup(dso->long_name); @@ -1789,7 +1828,9 @@ int symbol__annotate_printf(struct symbol *sym, struct map *map, width *= evsel->nr_members; graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s for %s (%" PRIu64 " samples)\n", - width, width, "Percent", d_filename, evsel_name, h->sum); + width, width, symbol_conf.show_total_period ? "Period" : + symbol_conf.show_nr_samples ? "Samples" : "Percent", + d_filename, evsel_name, h->nr_samples); printf("%-*.*s----\n", graph_dotted_len, graph_dotted_len, graph_dotted_line); @@ -1853,10 +1894,10 @@ void symbol__annotate_decay_histogram(struct symbol *sym, int evidx) struct sym_hist *h = annotation__histogram(notes, evidx); int len = symbol__size(sym), offset; - h->sum = 0; + h->nr_samples = 0; for (offset = 0; offset < len; ++offset) { - h->addr[offset] = h->addr[offset] * 7 / 8; - h->sum += h->addr[offset]; + h->addr[offset].nr_samples = h->addr[offset].nr_samples * 7 / 8; + h->nr_samples += h->addr[offset].nr_samples; } } @@ -1907,7 +1948,7 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map, u64 len; if (symbol__disassemble(sym, map, perf_evsel__env_arch(evsel), - 0, NULL) < 0) + 0, NULL, NULL) < 0) return -1; len = symbol__size(sym); |