Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo: "Third version of perf tool updates, with the build problems with with using a 'vmlinux.h' generated from the main build fixed, and the bpf skeleton build disabled by default. Build: - Require libtraceevent to build, one can disable it using NO_LIBTRACEEVENT=1. It is required for tools like 'perf sched', 'perf kvm', 'perf trace', etc. libtraceevent is available in most distros so installing 'libtraceevent-devel' should be a one-time event to continue building perf as usual. Using NO_LIBTRACEEVENT=1 produces tooling that is functional and sufficient for lots of users not interested in those libtraceevent dependent features. - Allow Python support in 'perf script' when libtraceevent isn't linked, as not all features requires it, for instance Intel PT does not use tracepoints. - Error if the python interpreter needed for jevents to work isn't available and NO_JEVENTS=1 isn't set, preventing a build without support for JSON vendor events, which is a rare but possible condition. The two check error messages: $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.) $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.) - Make libbpf 1.0 the minimum required when building with out of tree, distro provided libbpf. - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++ demangler, add 'perf test' entry for it. - Make binutils libraries opt in, as distros disable building with it due to licensing, they were used for C++ demangling, for instance. - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or equivalent) isn't installed, we'll just have a build warning: Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev - Add a feature test for scandirat(), that is not implemented so far in musl and uclibc, disabling features that need it, such as scanning for tracepoints in /sys/kernel/tracing/events. perf BPF filters: - New feature where BPF can be used to filter samples, for instance: $ sudo ./perf record -e cycles --filter 'period > 1000' true $ sudo ./perf script perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms]) perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms]) perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms]) true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms]) true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) - In addition to 'period' (PERF_SAMPLE_PERIOD), the other PERF_SAMPLE_ can be used for filtering, and also some other sample accessible values, from tools/perf/Documentation/perf-record.txt: Essentially the BPF filter expression is: <term> <operator> <value> (("," | "||") <term> <operator> <value>)* The <term> can be one of: ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr, code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat, p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock, mem_dtlb, mem_blk, mem_hops The <operator> can be one of: ==, !=, >, >=, <, <=, & The <value> can be one of: <number> (for any term) na, load, store, pfetch, exec (for mem_op) l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl) na, none, hit, miss, hitm, fwd, peer (for mem_snoop) remote (for mem_remote) na, locked (for mem_locked) na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb) na, by_data, by_addr (for mem_blk) hops0, hops1, hops2, hops3 (for mem_hops) perf lock contention: - Show lock type with address. - Track and show mmap_lock, siglock and per-cpu rq_lock with address. This is done for mmap_lock by following the current->mm pointer: $ sudo ./perf lock con -abl -- sleep 10 contended total wait max wait avg wait address symbol ... 16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640 17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0 3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock 3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58 1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70 9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock 14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0 3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock 16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560 11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock 1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8 1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock 581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058 5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070 112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120 381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock 255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80 - Update default map size to 16384. - Allocate single letter option -M for --map-nr-entries, as it is proving being frequently used. - Fix struct rq lock access for older kernels with BPF's CO-RE (Compile once, run everywhere). - Fix problems found with MSAn. perf report/top: - Add inline information when using --call-graph=fp or lbr, as was already done to the --call-graph=dwarf callchain mode. - Improve the 'srcfile' sort key performance by really using an optimization introduced in 6.2 for the 'srcline' sort key that avoids calling addr2line for comparision with each sample. perf sched: - Make 'perf sched latency/map/replay' to use "sched:sched_waking" instead of "sched:sched_waking", consistent with 'perf record' since d566a9c2d482 ("perf sched: Prefer sched_waking event when it exists"). perf ftrace: - Make system wide the default target for latency subcommand, run the following command then generate some network traffic and press control+C: # perf ftrace latency -T __kfree_skb ^C DURATION | COUNT | GRAPH | 0 - 1 us | 27 | ############# | 1 - 2 us | 22 | ########### | 2 - 4 us | 8 | #### | 4 - 8 us | 5 | ## | 8 - 16 us | 24 | ############ | 16 - 32 us | 2 | # | 32 - 64 us | 1 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # perf top: - Add --branch-history (LBR: Last Branch Record) option, just like already available for 'perf record'. - Fix segfault in thread__comm_len() where thread->comm was being used outside thread->comm_lock. perf annotate: - Allow configuring objdump and addr2line in ~/.perfconfig., so that you can use alternative binaries, such as llvm's. perf kvm: - Add TUI mode for 'perf kvm stat report'. Reference counting: - Add reference count checking infrastructure to check for use after free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs, more to come. To build with it use -DREFCNT_CHECKING=1 in the make command line to build tools/perf. Documented at: https://perf.wiki.kernel.org/index.php/Reference_Count_Checking - The above caught, for instance, fix, present in this series: - Fix maps use after put in 'perf test "Share thread maps"': 'maps' is copied from leader, but the leader is put on line 79 and then 'maps' is used to read the reference count below - so a use after put, with the put of maps happening within thread__put. Fixed by reversing the order of puts so that the leader is put last. - Also several fixes were made to places where reference counts were not being held. - Make this one of the tests in 'make -C tools/perf build-test' to regularly build test it and to make sure no direct access to the reference counted structs are made, doing that via accessors to check the validity of the struct pointer. ARM64: - Fix 'perf report' segfault when filtering coresight traces by sparse lists of CPUs. - Add support for 'simd' as a sort field for 'perf report', to show ARM's NEON SIMD's predicate flags: "partial" and "empty". arm64 vendor events: - Add N1 metrics. Intel vendor events: - Add graniterapids, grandridge and sierraforrest events. - Refresh events for: alderlake, aldernaken, broadwell, broadwellde, broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex, jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids, silvermont, skylake, tigerlake and westmereep-dp - Refresh metrics for alderlake-n, broadwell, broadwellde, broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and skylakex. perf stat: - Implement --topdown using JSON metrics. - Add TopdownL1 JSON metric as a default if present, but disable it for now for some Intel hybrid architectures, a series of patches addressing this is being reviewed and will be submitted for v6.5. - Use metrics for --smi-cost. - Update topdown documentation. Vendor events (JSON) infrastructure: - Add support for computing and printing metric threshold values. For instance, here is one found in thesapphirerapids json file: { "BriefDescription": "Percentage of cycles spent in System Management Interrupts.", "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)", "MetricGroup": "smi", "MetricName": "smi_cycles", "MetricThreshold": "smi_cycles > 0.1", "ScaleUnit": "100%" }, - Test parsing metric thresholds with the fake PMU in 'perf test pmu-events'. - Support for printing metric thresholds in 'perf list'. - Add --metric-no-threshold option to 'perf stat'. - Add rand (reverse and) and has_pmem (optane memory) support to metrics. - Sort list of input files to avoid depending on the order from readdir() helping in obtaining reproducible builds. S/390: - Add common metrics: - CPI (cycles per instruction), prbstate (ratio of instructions executed in problem state compared to total number of instructions), l1mp (Level one instruction and data cache misses per 100 instructions). - Add cache metrics for z13, z14, z15 and z16. - Add metric for TLB and cache. ARM: - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE (Memory Tagging Extension) and MOPS (Memory Operations) load/store. Intel PT hardware tracing: - Add event type names UINTR (User interrupt delivered) and UIRET (Exiting from user interrupt routine), documented in table 32-50 "CFE Packet Type and Vector Fields Details" in the Intel Processor Trace chapter of The Intel SDM Volume 3 version 078. - Add support for new branch instructions ERETS and ERETU. - Fix CYC timestamps after standalone CBR ARM CoreSight hardware tracing: - Allow user to override timestamp and contextid settings. - Fix segfault in dso lookup. - Fix timeless decode mode detection. - Add separate decode paths for timeless and per-thread modes. auxtrace: - Fix address filter entire kernel size. Miscellaneous: - Fix use-after-free and unaligned bugs in the PLT handling routines. - Use zfree() to reduce chances of use after free. - Add missing 0x prefix for addresses printed in hexadecimal in 'perf probe'. - Suppress massive unsupported target platform errors in the unwind code. - Fix return incorrect build_id size in elf_read_build_id(). - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 . - Add missing new parameter in kfree_skb tracepoint to the python scripts using it. - Add 'perf bench syscall fork' benchmark. - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in 'perf mem'. - Fix wrong size expectation for perf test 'Setup struct perf_event_attr' caused by the patch adding perf_event_attr::config3. - Fix some spelling mistakes" * tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL" Revert "perf build: Warn for BPF skeletons if endian mismatches" perf metrics: Fix SEGV with --for-each-cgroup perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE perf stat: Separate bperf from bpf_profiler perf test record+probe_libc_inet_pton: Fix call chain match on x86_64 perf test record+probe_libc_inet_pton: Fix call chain match on s390 perf tracepoint: Fix memory leak in is_valid_tracepoint() perf cs-etm: Add fix for coresight trace for any range of CPUs perf build: Fix unescaped # in perf build-test perf unwind: Suppress massive unsupported target platform errors perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it perf script: Print raw ip instead of binary offset for callchain perf symbols: Fix return incorrect build_id size in elf_read_build_id() perf list: Modify the warning message about scandirat(3) perf list: Fix memory leaks in print_tracepoint_events() perf lock contention: Rework offset calculation with BPF CO-RE perf lock contention: Fix struct rq lock access perf stat: Disable TopdownL1 on hybrid perf stat: Avoid SEGV on counter->name ...
author: Linus Torvalds <[email protected]> 2023-05-07 11:32:18 -0700
committer: Linus Torvalds <[email protected]> 2023-05-07 11:32:18 -0700
commit: f085df1be60abf670315c11036261cfaec16b2eb (patch)
tree: c02c07ad31578b90c3cc99be6b5ba680d163bca7 /tools/perf/util/stat-shadow.c
parent: 17784de648be93b4eef0ef8fe28a16ff04feecc7 (diff)
parent: 9a2d5178b9d51e1c5f9e08989ff97fc8d4893f31 (diff)
1 files changed, 329 insertions, 958 deletions
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 806b32156459..eeccab6751d7 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -16,142 +16,43 @@
 #include "iostat.h"
 #include "util/hashmap.h"
 
-/*
- * AGGR_GLOBAL: Use CPU 0
- * AGGR_SOCKET: Use first CPU of socket
- * AGGR_DIE: Use first CPU of die
- * AGGR_CORE: Use first CPU of core
- * AGGR_NONE: Use matching CPU
- * AGGR_THREAD: Not supported?
- */
-
-struct runtime_stat rt_stat;
 struct stats walltime_nsecs_stats;
 struct rusage_stats ru_stats;
 
-struct saved_value {
-	struct rb_node rb_node;
-	struct evsel *evsel;
-	enum stat_type type;
-	int ctx;
-	int map_idx;  /* cpu or thread map index */
-	struct cgroup *cgrp;
-	struct stats stats;
-	u64 metric_total;
-	int metric_other;
+enum {
+	CTX_BIT_USER	= 1 << 0,
+	CTX_BIT_KERNEL	= 1 << 1,
+	CTX_BIT_HV	= 1 << 2,
+	CTX_BIT_HOST	= 1 << 3,
+	CTX_BIT_IDLE	= 1 << 4,
+	CTX_BIT_MAX	= 1 << 5,
 };
 
-static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
-{
-	struct saved_value *a = container_of(rb_node,
-					     struct saved_value,
-					     rb_node);
-	const struct saved_value *b = entry;
-
-	if (a->map_idx != b->map_idx)
-		return a->map_idx - b->map_idx;
-
-	/*
-	 * Previously the rbtree was used to link generic metrics.
-	 * The keys were evsel/cpu. Now the rbtree is extended to support
-	 * per-thread shadow stats. For shadow stats case, the keys
-	 * are cpu/type/ctx/stat (evsel is NULL). For generic metrics
-	 * case, the keys are still evsel/cpu (type/ctx/stat are 0 or NULL).
-	 */
-	if (a->type != b->type)
-		return a->type - b->type;
-
-	if (a->ctx != b->ctx)
-		return a->ctx - b->ctx;
-
-	if (a->cgrp != b->cgrp)
-		return (char *)a->cgrp < (char *)b->cgrp ? -1 : +1;
-
-	if (a->evsel == b->evsel)
-		return 0;
-	if ((char *)a->evsel < (char *)b->evsel)
-		return -1;
-	return +1;
-}
-
-static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
-				     const void *entry)
-{
-	struct saved_value *nd = malloc(sizeof(struct saved_value));
-
-	if (!nd)
-		return NULL;
-	memcpy(nd, entry, sizeof(struct saved_value));
-	return &nd->rb_node;
-}
-
-static void saved_value_delete(struct rblist *rblist __maybe_unused,
-			       struct rb_node *rb_node)
-{
-	struct saved_value *v;
-
-	BUG_ON(!rb_node);
-	v = container_of(rb_node, struct saved_value, rb_node);
-	free(v);
-}
-
-static struct saved_value *saved_value_lookup(struct evsel *evsel,
-					      int map_idx,
-					      bool create,
-					      enum stat_type type,
-					      int ctx,
-					      struct runtime_stat *st,
-					      struct cgroup *cgrp)
-{
-	struct rblist *rblist;
-	struct rb_node *nd;
-	struct saved_value dm = {
-		.map_idx = map_idx,
-		.evsel = evsel,
-		.type = type,
-		.ctx = ctx,
-		.cgrp = cgrp,
-	};
-
-	rblist = &st->value_list;
-
-	/* don't use context info for clock events */
-	if (type == STAT_NSECS)
-		dm.ctx = 0;
-
-	nd = rblist__find(rblist, &dm);
-	if (nd)
-		return container_of(nd, struct saved_value, rb_node);
-	if (create) {
-		rblist__add_node(rblist, &dm);
-		nd = rblist__find(rblist, &dm);
-		if (nd)
-			return container_of(nd, struct saved_value, rb_node);
-	}
-	return NULL;
-}
-
-void runtime_stat__init(struct runtime_stat *st)
-{
-	struct rblist *rblist = &st->value_list;
-
-	rblist__init(rblist);
-	rblist->node_cmp = saved_value_cmp;
-	rblist->node_new = saved_value_new;
-	rblist->node_delete = saved_value_delete;
-}
-
-void runtime_stat__exit(struct runtime_stat *st)
-{
-	rblist__exit(&st->value_list);
-}
-
-void perf_stat__init_shadow_stats(void)
-{
-	runtime_stat__init(&rt_stat);
-}
+enum stat_type {
+	STAT_NONE = 0,
+	STAT_NSECS,
+	STAT_CYCLES,
+	STAT_INSTRUCTIONS,
+	STAT_STALLED_CYCLES_FRONT,
+	STAT_STALLED_CYCLES_BACK,
+	STAT_BRANCHES,
+	STAT_BRANCH_MISS,
+	STAT_CACHE_REFS,
+	STAT_CACHE_MISSES,
+	STAT_L1_DCACHE,
+	STAT_L1_ICACHE,
+	STAT_LL_CACHE,
+	STAT_ITLB_CACHE,
+	STAT_DTLB_CACHE,
+	STAT_L1D_MISS,
+	STAT_L1I_MISS,
+	STAT_LL_MISS,
+	STAT_DTLB_MISS,
+	STAT_ITLB_MISS,
+	STAT_MAX
+};
 
-static int evsel_context(struct evsel *evsel)
+static int evsel_context(const struct evsel *evsel)
 {
 	int ctx = 0;
 
@@ -169,553 +70,307 @@ static int evsel_context(struct evsel *evsel)
 	return ctx;
 }
 
-static void reset_stat(struct runtime_stat *st)
-{
-	struct rblist *rblist;
-	struct rb_node *pos, *next;
-
-	rblist = &st->value_list;
-	next = rb_first_cached(&rblist->entries);
-	while (next) {
-		pos = next;
-		next = rb_next(pos);
-		memset(&container_of(pos, struct saved_value, rb_node)->stats,
-		       0,
-		       sizeof(struct stats));
-	}
-}
-
 void perf_stat__reset_shadow_stats(void)
 {
-	reset_stat(&rt_stat);
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
 	memset(&ru_stats, 0, sizeof(ru_stats));
 }
 
-void perf_stat__reset_shadow_per_stat(struct runtime_stat *st)
+static enum stat_type evsel__stat_type(const struct evsel *evsel)
+{
+	/* Fake perf_hw_cache_op_id values for use with evsel__match. */
+	u64 PERF_COUNT_hw_cache_l1d_miss = PERF_COUNT_HW_CACHE_L1D |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_l1i_miss = PERF_COUNT_HW_CACHE_L1I |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_ll_miss = PERF_COUNT_HW_CACHE_LL |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_dtlb_miss = PERF_COUNT_HW_CACHE_DTLB |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+	u64 PERF_COUNT_hw_cache_itlb_miss = PERF_COUNT_HW_CACHE_ITLB |
+		((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+		((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16);
+
+	if (evsel__is_clock(evsel))
+		return STAT_NSECS;
+	else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES))
+		return STAT_CYCLES;
+	else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS))
+		return STAT_INSTRUCTIONS;
+	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
+		return STAT_STALLED_CYCLES_FRONT;
+	else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND))
+		return STAT_STALLED_CYCLES_BACK;
+	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS))
+		return STAT_BRANCHES;
+	else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES))
+		return STAT_BRANCH_MISS;
+	else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES))
+		return STAT_CACHE_REFS;
+	else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES))
+		return STAT_CACHE_MISSES;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D))
+		return STAT_L1_DCACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I))
+		return STAT_L1_ICACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL))
+		return STAT_LL_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB))
+		return STAT_DTLB_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB))
+		return STAT_ITLB_CACHE;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss))
+		return STAT_L1D_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss))
+		return STAT_L1I_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss))
+		return STAT_LL_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss))
+		return STAT_DTLB_MISS;
+	else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss))
+		return STAT_ITLB_MISS;
+	return STAT_NONE;
+}
+
+static const char *get_ratio_color(const double ratios[3], double val)
 {
-	reset_stat(st);
-}
-
-struct runtime_stat_data {
-	int ctx;
-	struct cgroup *cgrp;
-};
-
-static void update_runtime_stat(struct runtime_stat *st,
-				enum stat_type type,
-				int map_idx, u64 count,
-				struct runtime_stat_data *rsd)
-{
-	struct saved_value *v = saved_value_lookup(NULL, map_idx, true, type,
-						   rsd->ctx, st, rsd->cgrp);
-
-	if (v)
-		update_stats(&v->stats, count);
-}
-
-/*
- * Update various tracking values we maintain to print
- * more semantic information such as miss/hit ratios,
- * instruction rates, etc:
- */
-void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
-				    int map_idx, struct runtime_stat *st)
-{
-	u64 count_ns = count;
-	struct saved_value *v;
-	struct runtime_stat_data rsd = {
-		.ctx = evsel_context(counter),
-		.cgrp = counter->cgrp,
-	};
-
-	count *= counter->scale;
-
-	if (evsel__is_clock(counter))
-		update_runtime_stat(st, STAT_NSECS, map_idx, count_ns, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-		update_runtime_stat(st, STAT_CYCLES, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
-		update_runtime_stat(st, STAT_CYCLES_IN_TX, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TRANSACTION_START))
-		update_runtime_stat(st, STAT_TRANSACTION, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, ELISION_START))
-		update_runtime_stat(st, STAT_ELISION, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
-		update_runtime_stat(st, STAT_TOPDOWN_TOTAL_SLOTS,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
-		update_runtime_stat(st, STAT_TOPDOWN_SLOTS_ISSUED,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
-		update_runtime_stat(st, STAT_TOPDOWN_SLOTS_RETIRED,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
-		update_runtime_stat(st, STAT_TOPDOWN_FETCH_BUBBLES,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
-		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
-		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
-		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_HEAVY_OPS))
-		update_runtime_stat(st, STAT_TOPDOWN_HEAVY_OPS,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_BR_MISPREDICT))
-		update_runtime_stat(st, STAT_TOPDOWN_BR_MISPREDICT,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_LAT))
-		update_runtime_stat(st, STAT_TOPDOWN_FETCH_LAT,
-				    map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, TOPDOWN_MEM_BOUND))
-		update_runtime_stat(st, STAT_TOPDOWN_MEM_BOUND,
-				    map_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
-				    map_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
-		update_runtime_stat(st, STAT_STALLED_CYCLES_BACK,
-				    map_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		update_runtime_stat(st, STAT_BRANCHES, map_idx, count, &rsd);
-	else if (evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_runtime_stat(st, STAT_CACHEREFS, map_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
-		update_runtime_stat(st, STAT_L1_DCACHE, map_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
-		update_runtime_stat(st, STAT_L1_ICACHE, map_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_LL))
-		update_runtime_stat(st, STAT_LL_CACHE, map_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
-		update_runtime_stat(st, STAT_DTLB_CACHE, map_idx, count, &rsd);
-	else if (evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
-		update_runtime_stat(st, STAT_ITLB_CACHE, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, SMI_NUM))
-		update_runtime_stat(st, STAT_SMI_NUM, map_idx, count, &rsd);
-	else if (perf_stat_evsel__is(counter, APERF))
-		update_runtime_stat(st, STAT_APERF, map_idx, count, &rsd);
-
-	if (counter->collect_stat) {
-		v = saved_value_lookup(counter, map_idx, true, STAT_NONE, 0, st,
-				       rsd.cgrp);
-		update_stats(&v->stats, count);
-		if (counter->metric_leader)
-			v->metric_total += count;
-	} else if (counter->metric_leader && !counter->merged_stat) {
-		v = saved_value_lookup(counter->metric_leader,
-				       map_idx, true, STAT_NONE, 0, st, rsd.cgrp);
-		v->metric_total += count;
-		v->metric_other++;
-	}
-}
-
-/* used for get_ratio_color() */
-enum grc_type {
-	GRC_STALLED_CYCLES_FE,
-	GRC_STALLED_CYCLES_BE,
-	GRC_CACHE_MISSES,
-	GRC_MAX_NR
-};
-
-static const char *get_ratio_color(enum grc_type type, double ratio)
-{
-	static const double grc_table[GRC_MAX_NR][3] = {
-		[GRC_STALLED_CYCLES_FE] = { 50.0, 30.0, 10.0 },
-		[GRC_STALLED_CYCLES_BE] = { 75.0, 50.0, 20.0 },
-		[GRC_CACHE_MISSES] 	= { 20.0, 10.0, 5.0 },
-	};
 	const char *color = PERF_COLOR_NORMAL;
 
-	if (ratio > grc_table[type][0])
+	if (val > ratios[0])
 		color = PERF_COLOR_RED;
-	else if (ratio > grc_table[type][1])
+	else if (val > ratios[1])
 		color = PERF_COLOR_MAGENTA;
-	else if (ratio > grc_table[type][2])
+	else if (val > ratios[2])
 		color = PERF_COLOR_YELLOW;
 
 	return color;
 }
 
-static double runtime_stat_avg(struct runtime_stat *st,
-			       enum stat_type type, int map_idx,
-			       struct runtime_stat_data *rsd)
-{
-	struct saved_value *v;
-
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, st, rsd->cgrp);
-	if (!v)
-		return 0.0;
-
-	return avg_stats(&v->stats);
+static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type type)
+{
+	const struct evsel *cur;
+	int evsel_ctx = evsel_context(evsel);
+
+	evlist__for_each_entry(evsel->evlist, cur) {
+		struct perf_stat_aggr *aggr;
+
+		/* Ignore the evsel that is being searched from. */
+		if (evsel == cur)
+			continue;
+
+		/* Ignore evsels that are part of different groups. */
+		if (evsel->core.leader->nr_members > 1 &&
+		    evsel->core.leader != cur->core.leader)
+			continue;
+		/* Ignore evsels with mismatched modifiers. */
+		if (evsel_ctx != evsel_context(cur))
+			continue;
+		/* Ignore if not the cgroup we're looking for. */
+		if (evsel->cgrp != cur->cgrp)
+			continue;
+		/* Ignore if not the stat we're looking for. */
+		if (type != evsel__stat_type(cur))
+			continue;
+
+		aggr = &cur->stats->aggr[aggr_idx];
+		if (type == STAT_NSECS)
+			return aggr->counts.val;
+		return aggr->counts.val * cur->scale;
+	}
+	return 0.0;
 }
 
-static double runtime_stat_n(struct runtime_stat *st,
-			     enum stat_type type, int map_idx,
-			     struct runtime_stat_data *rsd)
+static void print_ratio(struct perf_stat_config *config,
+			const struct evsel *evsel, int aggr_idx,
+			double numerator, struct perf_stat_output_ctx *out,
+			enum stat_type denominator_type,
+			const double color_ratios[3], const char *unit)
 {
-	struct saved_value *v;
+	double denominator = find_stat(evsel, aggr_idx, denominator_type);
 
-	v = saved_value_lookup(NULL, map_idx, false, type, rsd->ctx, st, rsd->cgrp);
-	if (!v)
-		return 0.0;
+	if (numerator && denominator) {
+		double ratio = numerator / denominator * 100.0;
+		const char *color = get_ratio_color(color_ratios, ratio);
 
-	return v->stats.n;
+		out->print_metric(config, out->ctx, color, "%7.2f%%", unit, ratio);
+	} else
+		out->print_metric(config, out->ctx, NULL, NULL, unit, 0);
 }
 
-static void print_stalled_cycles_frontend(struct perf_stat_config *config,
-					  int map_idx, double avg,
-					  struct perf_stat_output_ctx *out,
-					  struct runtime_stat *st,
-					  struct runtime_stat_data *rsd)
+static void print_stalled_cycles_front(struct perf_stat_config *config,
+				const struct evsel *evsel,
+				int aggr_idx, double stalled,
+				struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_STALLED_CYCLES_FE, ratio);
+	static const double color_ratios[3] = {50.0, 30.0, 10.0};
 
-	if (ratio)
-		out->print_metric(config, out->ctx, color, "%7.2f%%", "frontend cycles idle",
-				  ratio);
-	else
-		out->print_metric(config, out->ctx, NULL, NULL, "frontend cycles idle", 0);
+	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, color_ratios,
+		    "frontend cycles idle");
 }
 
-static void print_stalled_cycles_backend(struct perf_stat_config *config,
-					 int map_idx, double avg,
-					 struct perf_stat_output_ctx *out,
-					 struct runtime_stat *st,
-					 struct runtime_stat_data *rsd)
+static void print_stalled_cycles_back(struct perf_stat_config *config,
+				const struct evsel *evsel,
+				int aggr_idx, double stalled,
+				struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	static const double color_ratios[3] = {75.0, 50.0, 20.0};
 
-	total = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_STALLED_CYCLES_BE, ratio);
-
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "backend cycles idle", ratio);
+	print_ratio(config, evsel, aggr_idx, stalled, out, STAT_CYCLES, color_ratios,
+		    "backend cycles idle");
 }
 
-static void print_branch_misses(struct perf_stat_config *config,
-				int map_idx, double avg,
-				struct perf_stat_output_ctx *out,
-				struct runtime_stat *st,
-				struct runtime_stat_data *rsd)
+static void print_branch_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(st, STAT_BRANCHES, map_idx, rsd);
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all branches", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_BRANCHES, color_ratios,
+		    "of all branches");
 }
 
-static void print_l1_dcache_misses(struct perf_stat_config *config,
-				   int map_idx, double avg,
-				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat *st,
-				   struct runtime_stat_data *rsd)
+static void print_l1d_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(st, STAT_L1_DCACHE, map_idx, rsd);
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all L1-dcache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_DCACHE, color_ratios,
+		    "of all L1-dcache accesses");
 }
 
-static void print_l1_icache_misses(struct perf_stat_config *config,
-				   int map_idx, double avg,
-				   struct perf_stat_output_ctx *out,
-				   struct runtime_stat *st,
-				   struct runtime_stat_data *rsd)
+static void print_l1i_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(st, STAT_L1_ICACHE, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all L1-icache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_L1_ICACHE, color_ratios,
+		    "of all L1-icache accesses");
 }
 
-static void print_dtlb_cache_misses(struct perf_stat_config *config,
-				    int map_idx, double avg,
-				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat *st,
-				    struct runtime_stat_data *rsd)
+static void print_ll_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	total = runtime_stat_avg(st, STAT_DTLB_CACHE, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all dTLB cache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_LL_CACHE, color_ratios,
+		    "of all L1-icache accesses");
 }
 
-static void print_itlb_cache_misses(struct perf_stat_config *config,
-				    int map_idx, double avg,
-				    struct perf_stat_output_ctx *out,
-				    struct runtime_stat *st,
-				    struct runtime_stat_data *rsd)
+static void print_dtlb_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
-
-	total = runtime_stat_avg(st, STAT_ITLB_CACHE, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all iTLB cache accesses", ratio);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_DTLB_CACHE, color_ratios,
+		    "of all dTLB cache accesses");
 }
 
-static void print_ll_cache_misses(struct perf_stat_config *config,
-				  int map_idx, double avg,
-				  struct perf_stat_output_ctx *out,
-				  struct runtime_stat *st,
-				  struct runtime_stat_data *rsd)
+static void print_itlb_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	double total, ratio = 0.0;
-	const char *color;
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-	total = runtime_stat_avg(st, STAT_LL_CACHE, map_idx, rsd);
-
-	if (total)
-		ratio = avg / total * 100.0;
-
-	color = get_ratio_color(GRC_CACHE_MISSES, ratio);
-	out->print_metric(config, out->ctx, color, "%7.2f%%", "of all LL-cache accesses", ratio);
-}
-
-/*
- * High level "TopDown" CPU core pipe line bottleneck break down.
- *
- * Basic concept following
- * Yasin, A Top Down Method for Performance analysis and Counter architecture
- * ISPASS14
- *
- * The CPU pipeline is divided into 4 areas that can be bottlenecks:
- *
- * Frontend -> Backend -> Retiring
- * BadSpeculation in addition means out of order execution that is thrown away
- * (for example branch mispredictions)
- * Frontend is instruction decoding.
- * Backend is execution, like computation and accessing data in memory
- * Retiring is good execution that is not directly bottlenecked
- *
- * The formulas are computed in slots.
- * A slot is an entry in the pipeline each for the pipeline width
- * (for example a 4-wide pipeline has 4 slots for each cycle)
- *
- * Formulas:
- * BadSpeculation = ((SlotsIssued - SlotsRetired) + RecoveryBubbles) /
- *			TotalSlots
- * Retiring = SlotsRetired / TotalSlots
- * FrontendBound = FetchBubbles / TotalSlots
- * BackendBound = 1.0 - BadSpeculation - Retiring - FrontendBound
- *
- * The kernel provides the mapping to the low level CPU events and any scaling
- * needed for the CPU pipeline width, for example:
- *
- * TotalSlots = Cycles * 4
- *
- * The scaling factor is communicated in the sysfs unit.
- *
- * In some cases the CPU may not be able to measure all the formulas due to
- * missing events. In this case multiple formulas are combined, as possible.
- *
- * Full TopDown supports more levels to sub-divide each area: for example
- * BackendBound into computing bound and memory bound. For now we only
- * support Level 1 TopDown.
- */
-
-static double sanitize_val(double x)
-{
-	if (x < 0 && x >= -0.02)
-		return 0.0;
-	return x;
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_ITLB_CACHE, color_ratios,
+		    "of all iTLB cache accesses");
 }
 
-static double td_total_slots(int map_idx, struct runtime_stat *st,
-			     struct runtime_stat_data *rsd)
+static void print_cache_miss(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double misses,
+			struct perf_stat_output_ctx *out)
 {
-	return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, map_idx, rsd);
-}
+	static const double color_ratios[3] = {20.0, 10.0, 5.0};
 
-static double td_bad_spec(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double bad_spec = 0;
-	double total_slots;
-	double total;
-
-	total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, map_idx, rsd) -
-		runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, map_idx, rsd) +
-		runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, map_idx, rsd);
-
-	total_slots = td_total_slots(map_idx, st, rsd);
-	if (total_slots)
-		bad_spec = total / total_slots;
-	return sanitize_val(bad_spec);
+	print_ratio(config, evsel, aggr_idx, misses, out, STAT_CACHE_REFS, color_ratios,
+		    "of all cache refs");
 }
 
-static double td_retiring(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
+static void print_instructions(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double instructions,
+			struct perf_stat_output_ctx *out)
 {
-	double retiring = 0;
-	double total_slots = td_total_slots(map_idx, st, rsd);
-	double ret_slots = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED,
-					    map_idx, rsd);
-
-	if (total_slots)
-		retiring = ret_slots / total_slots;
-	return retiring;
-}
-
-static double td_fe_bound(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double fe_bound = 0;
-	double total_slots = td_total_slots(map_idx, st, rsd);
-	double fetch_bub = runtime_stat_avg(st, STAT_TOPDOWN_FETCH_BUBBLES,
-					    map_idx, rsd);
-
-	if (total_slots)
-		fe_bound = fetch_bub / total_slots;
-	return fe_bound;
-}
-
-static double td_be_bound(int map_idx, struct runtime_stat *st,
-			  struct runtime_stat_data *rsd)
-{
-	double sum = (td_fe_bound(map_idx, st, rsd) +
-		      td_bad_spec(map_idx, st, rsd) +
-		      td_retiring(map_idx, st, rsd));
-	if (sum == 0)
-		return 0;
-	return sanitize_val(1.0 - sum);
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
+	double cycles = find_stat(evsel, aggr_idx, STAT_CYCLES);
+	double max_stalled = max(find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_FRONT),
+				find_stat(evsel, aggr_idx, STAT_STALLED_CYCLES_BACK));
+
+	if (cycles) {
+		print_metric(config, ctxp, NULL, "%7.2f ", "insn per cycle",
+			instructions / cycles);
+	} else
+		print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
+
+	if (max_stalled && instructions) {
+		out->new_line(config, ctxp);
+		print_metric(config, ctxp, NULL, "%7.2f ", "stalled cycles per insn",
+			max_stalled / instructions);
+	}
 }
 
-/*
- * Kernel reports metrics multiplied with slots. To get back
- * the ratios we need to recreate the sum.
- */
-
-static double td_metric_ratio(int map_idx, enum stat_type type,
-			      struct runtime_stat *stat,
-			      struct runtime_stat_data *rsd)
+static void print_cycles(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx, double cycles,
+			struct perf_stat_output_ctx *out)
 {
-	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, map_idx, rsd) +
-		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, map_idx, rsd);
-	double d = runtime_stat_avg(stat, type, map_idx, rsd);
-
-	if (sum)
-		return d / sum;
-	return 0;
-}
+	double nsecs = find_stat(evsel, aggr_idx, STAT_NSECS);
 
-/*
- * ... but only if most of the values are actually available.
- * We allow two missing.
- */
+	if (cycles && nsecs) {
+		double ratio = cycles / nsecs;
 
-static bool full_td(int map_idx, struct runtime_stat *stat,
-		    struct runtime_stat_data *rsd)
-{
-	int c = 0;
-
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, map_idx, rsd) > 0)
-		c++;
-	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, map_idx, rsd) > 0)
-		c++;
-	return c >= 2;
+		out->print_metric(config, out->ctx, NULL, "%8.3f", "GHz", ratio);
+	} else
+		out->print_metric(config, out->ctx, NULL, NULL, "GHz", 0);
 }
 
-static void print_smi_cost(struct perf_stat_config *config, int map_idx,
-			   struct perf_stat_output_ctx *out,
-			   struct runtime_stat *st,
-			   struct runtime_stat_data *rsd)
+static void print_nsecs(struct perf_stat_config *config,
+			const struct evsel *evsel,
+			int aggr_idx __maybe_unused, double nsecs,
+			struct perf_stat_output_ctx *out)
 {
-	double smi_num, aperf, cycles, cost = 0.0;
-	const char *color = NULL;
-
-	smi_num = runtime_stat_avg(st, STAT_SMI_NUM, map_idx, rsd);
-	aperf = runtime_stat_avg(st, STAT_APERF, map_idx, rsd);
-	cycles = runtime_stat_avg(st, STAT_CYCLES, map_idx, rsd);
-
-	if ((cycles == 0) || (aperf == 0))
-		return;
-
-	if (smi_num)
-		cost = (aperf - cycles) / aperf * 100.00;
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
+	double wall_time = avg_stats(&walltime_nsecs_stats);
 
-	if (cost > 10)
-		color = PERF_COLOR_RED;
-	out->print_metric(config, out->ctx, color, "%8.1f%%", "SMI cycles%", cost);
-	out->print_metric(config, out->ctx, NULL, "%4.0f", "SMI#", smi_num);
+	if (wall_time) {
+		print_metric(config, ctxp, NULL, "%8.3f", "CPUs utilized",
+			nsecs / (wall_time * evsel->scale));
+	} else
+		print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
 }
 
 static int prepare_metric(struct evsel **metric_events,
 			  struct metric_ref *metric_refs,
 			  struct expr_parse_ctx *pctx,
-			  int map_idx,
-			  struct runtime_stat *st)
+			  int aggr_idx)
 {
-	double scale;
-	char *n;
-	int i, j, ret;
+	int i;
 
 	for (i = 0; metric_events[i]; i++) {
-		struct saved_value *v;
-		struct stats *stats;
-		u64 metric_total = 0;
-		int source_count;
+		char *n;
+		double val;
+		int source_count = 0;
 
 		if (evsel__is_tool(metric_events[i])) {
-			source_count = 1;
+			struct stats *stats;
+			double scale;
+
 			switch (metric_events[i]->tool_event) {
 			case PERF_TOOL_DURATION_TIME:
 				stats = &walltime_nsecs_stats;
@@ -739,35 +394,32 @@ static int prepare_metric(struct evsel **metric_events,
 				pr_err("Unknown tool event '%s'", evsel__name(metric_events[i]));
 				abort();
 			}
+			val = avg_stats(stats) * scale;
+			source_count = 1;
 		} else {
-			v = saved_value_lookup(metric_events[i], map_idx, false,
-					       STAT_NONE, 0, st,
-					       metric_events[i]->cgrp);
-			if (!v)
+			struct perf_stat_evsel *ps = metric_events[i]->stats;
+			struct perf_stat_aggr *aggr = &ps->aggr[aggr_idx];
+
+			if (!aggr)
 				break;
-			stats = &v->stats;
+
 			/*
 			 * If an event was scaled during stat gathering, reverse
 			 * the scale before computing the metric.
 			 */
-			scale = 1.0 / metric_events[i]->scale;
-
+			val = aggr->counts.val * (1.0 / metric_events[i]->scale);
 			source_count = evsel__source_count(metric_events[i]);
-
-			if (v->metric_other)
-				metric_total = v->metric_total * scale;
 		}
 		n = strdup(evsel__metric_id(metric_events[i]));
 		if (!n)
 			return -ENOMEM;
 
-		expr__add_id_val_source_count(pctx, n,
-					metric_total ? : avg_stats(stats) * scale,
-					source_count);
+		expr__add_id_val_source_count(pctx, n, val, source_count);
 	}
 
-	for (j = 0; metric_refs && metric_refs[j].metric_name; j++) {
-		ret = expr__add_ref(pctx, &metric_refs[j]);
+	for (int j = 0; metric_refs && metric_refs[j].metric_name; j++) {
+		int ret = expr__add_ref(pctx, &metric_refs[j]);
+
 		if (ret)
 			return ret;
 	}
@@ -777,21 +429,22 @@ static int prepare_metric(struct evsel **metric_events,
 
 static void generic_metric(struct perf_stat_config *config,
 			   const char *metric_expr,
+			   const char *metric_threshold,
 			   struct evsel **metric_events,
 			   struct metric_ref *metric_refs,
 			   char *name,
 			   const char *metric_name,
 			   const char *metric_unit,
 			   int runtime,
-			   int map_idx,
-			   struct perf_stat_output_ctx *out,
-			   struct runtime_stat *st)
+			   int aggr_idx,
+			   struct perf_stat_output_ctx *out)
 {
 	print_metric_t print_metric = out->print_metric;
 	struct expr_parse_ctx *pctx;
-	double ratio, scale;
+	double ratio, scale, threshold;
 	int i;
 	void *ctxp = out->ctx;
+	const char *color = NULL;
 
 	pctx = expr__ctx_new();
 	if (!pctx)
@@ -801,7 +454,7 @@ static void generic_metric(struct perf_stat_config *config,
 		pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list);
 	pctx->sctx.runtime = runtime;
 	pctx->sctx.system_wide = config->system_wide;
-	i = prepare_metric(metric_events, metric_refs, pctx, map_idx, st);
+	i = prepare_metric(metric_events, metric_refs, pctx, aggr_idx);
 	if (i < 0) {
 		expr__ctx_free(pctx);
 		return;
@@ -811,6 +464,13 @@ static void generic_metric(struct perf_stat_config *config,
 			char *unit;
 			char metric_bf[64];
 
+			if (metric_threshold &&
+			    expr__parse(&threshold, pctx, metric_threshold) == 0 &&
+			    !isnan(threshold)) {
+				color = fpclassify(threshold) == FP_ZERO
+					? PERF_COLOR_GREEN : PERF_COLOR_RED;
+			}
+
 			if (metric_unit && metric_name) {
 				if (perf_pmu__convert_scale(metric_unit,
 					&unit, &scale) >= 0) {
@@ -823,22 +483,22 @@ static void generic_metric(struct perf_stat_config *config,
 					scnprintf(metric_bf, sizeof(metric_bf),
 					  "%s  %s", unit, metric_name);
 
-				print_metric(config, ctxp, NULL, "%8.1f",
+				print_metric(config, ctxp, color, "%8.1f",
 					     metric_bf, ratio);
 			} else {
-				print_metric(config, ctxp, NULL, "%8.2f",
+				print_metric(config, ctxp, color, "%8.2f",
 					metric_name ?
 					metric_name :
 					out->force_header ?  name : "",
 					ratio);
 			}
 		} else {
-			print_metric(config, ctxp, NULL, NULL,
+			print_metric(config, ctxp, color, /*unit=*/NULL,
 				     out->force_header ?
 				     (metric_name ? metric_name : name) : "", 0);
 		}
 	} else {
-		print_metric(config, ctxp, NULL, NULL,
+		print_metric(config, ctxp, color, /*unit=*/NULL,
 			     out->force_header ?
 			     (metric_name ? metric_name : name) : "", 0);
 	}
@@ -846,7 +506,7 @@ static void generic_metric(struct perf_stat_config *config,
 	expr__ctx_free(pctx);
 }
 
-double test_generic_metric(struct metric_expr *mexp, int map_idx, struct runtime_stat *st)
+double test_generic_metric(struct metric_expr *mexp, int aggr_idx)
 {
 	struct expr_parse_ctx *pctx;
 	double ratio = 0.0;
@@ -855,7 +515,7 @@ double test_generic_metric(struct metric_expr *mexp, int map_idx, struct runtime
 	if (!pctx)
 		return NAN;
 
-	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, map_idx, st) < 0)
+	if (prepare_metric(mexp->metric_events, mexp->metric_refs, pctx, aggr_idx) < 0)
 		goto out;
 
 	if (expr__parse(&ratio, pctx, mexp->metric_expr))
@@ -868,344 +528,55 @@ out:
 
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
-				   double avg, int map_idx,
+				   double avg, int aggr_idx,
 				   struct perf_stat_output_ctx *out,
-				   struct rblist *metric_events,
-				   struct runtime_stat *st)
-{
-	void *ctxp = out->ctx;
-	print_metric_t print_metric = out->print_metric;
-	double total, ratio = 0.0, total2;
-	const char *color = NULL;
-	struct runtime_stat_data rsd = {
-		.ctx = evsel_context(evsel),
-		.cgrp = evsel->cgrp,
+				   struct rblist *metric_events)
+{
+	typedef void (*stat_print_function_t)(struct perf_stat_config *config,
+					const struct evsel *evsel,
+					int aggr_idx, double misses,
+					struct perf_stat_output_ctx *out);
+	static const stat_print_function_t stat_print_function[STAT_MAX] = {
+		[STAT_INSTRUCTIONS] = print_instructions,
+		[STAT_BRANCH_MISS] = print_branch_miss,
+		[STAT_L1D_MISS] = print_l1d_miss,
+		[STAT_L1I_MISS] = print_l1i_miss,
+		[STAT_DTLB_MISS] = print_dtlb_miss,
+		[STAT_ITLB_MISS] = print_itlb_miss,
+		[STAT_LL_MISS] = print_ll_miss,
+		[STAT_CACHE_MISSES] = print_cache_miss,
+		[STAT_STALLED_CYCLES_FRONT] = print_stalled_cycles_front,
+		[STAT_STALLED_CYCLES_BACK] = print_stalled_cycles_back,
+		[STAT_CYCLES] = print_cycles,
+		[STAT_NSECS] = print_nsecs,
 	};
+	print_metric_t print_metric = out->print_metric;
+	void *ctxp = out->ctx;
 	struct metric_event *me;
 	int num = 1;
 
 	if (config->iostat_run) {
 		iostat_print_metric(config, evsel, out);
-	} else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
-
-		if (total) {
-			ratio = avg / total;
-			print_metric(config, ctxp, NULL, "%7.2f ",
-					"insn per cycle", ratio);
-		} else {
-			print_metric(config, ctxp, NULL, NULL, "insn per cycle", 0);
-		}
-
-		total = runtime_stat_avg(st, STAT_STALLED_CYCLES_FRONT, map_idx, &rsd);
-
-		total = max(total, runtime_stat_avg(st,
-						    STAT_STALLED_CYCLES_BACK,
-						    map_idx, &rsd));
-
-		if (total && avg) {
-			out->new_line(config, ctxp);
-			ratio = total / avg;
-			print_metric(config, ctxp, NULL, "%7.2f ",
-					"stalled cycles per insn",
-					ratio);
-		}
-	} else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
-		if (runtime_stat_n(st, STAT_BRANCHES, map_idx, &rsd) != 0)
-			print_branch_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all branches", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_L1D |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(st, STAT_L1_DCACHE, map_idx, &rsd) != 0)
-			print_l1_dcache_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all L1-dcache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_L1I |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(st, STAT_L1_ICACHE, map_idx, &rsd) != 0)
-			print_l1_icache_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all L1-icache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_DTLB |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(st, STAT_DTLB_CACHE, map_idx, &rsd) != 0)
-			print_dtlb_cache_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all dTLB cache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_ITLB |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(st, STAT_ITLB_CACHE, map_idx, &rsd) != 0)
-			print_itlb_cache_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all iTLB cache accesses", 0);
-	} else if (
-		evsel->core.attr.type == PERF_TYPE_HW_CACHE &&
-		evsel->core.attr.config ==  ( PERF_COUNT_HW_CACHE_LL |
-					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
-					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-
-		if (runtime_stat_n(st, STAT_LL_CACHE, map_idx, &rsd) != 0)
-			print_ll_cache_misses(config, map_idx, avg, out, st, &rsd);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all LL-cache accesses", 0);
-	} else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
-		total = runtime_stat_avg(st, STAT_CACHEREFS, map_idx, &rsd);
-
-		if (total)
-			ratio = avg * 100 / total;
-
-		if (runtime_stat_n(st, STAT_CACHEREFS, map_idx, &rsd) != 0)
-			print_metric(config, ctxp, NULL, "%8.3f %%",
-				     "of all cache refs", ratio);
-		else
-			print_metric(config, ctxp, NULL, NULL, "of all cache refs", 0);
-	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
-		print_stalled_cycles_frontend(config, map_idx, avg, out, st, &rsd);
-	} else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
-		print_stalled_cycles_backend(config, map_idx, avg, out, st, &rsd);
-	} else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-		total = runtime_stat_avg(st, STAT_NSECS, map_idx, &rsd);
-
-		if (total) {
-			ratio = avg / total;
-			print_metric(config, ctxp, NULL, "%8.3f", "GHz", ratio);
-		} else {
-			print_metric(config, ctxp, NULL, NULL, "Ghz", 0);
-		}
-	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
-
-		if (total)
-			print_metric(config, ctxp, NULL,
-					"%7.2f%%", "transactional cycles",
-					100.0 * (avg / total));
-		else
-			print_metric(config, ctxp, NULL, NULL, "transactional cycles",
-				     0);
-	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
-		total = runtime_stat_avg(st, STAT_CYCLES, map_idx, &rsd);
-		total2 = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (total2 < avg)
-			total2 = avg;
-		if (total)
-			print_metric(config, ctxp, NULL, "%7.2f%%", "aborted cycles",
-				100.0 * ((total2-avg) / total));
-		else
-			print_metric(config, ctxp, NULL, NULL, "aborted cycles", 0);
-	} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
-		total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (avg)
-			ratio = total / avg;
-
-		if (runtime_stat_n(st, STAT_CYCLES_IN_TX, map_idx, &rsd) != 0)
-			print_metric(config, ctxp, NULL, "%8.0f",
-				     "cycles / transaction", ratio);
-		else
-			print_metric(config, ctxp, NULL, NULL, "cycles / transaction",
-				      0);
-	} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
-		total = runtime_stat_avg(st, STAT_CYCLES_IN_TX, map_idx, &rsd);
-
-		if (avg)
-			ratio = total / avg;
-
-		print_metric(config, ctxp, NULL, "%8.0f", "cycles / elision", ratio);
-	} else if (evsel__is_clock(evsel)) {
-		if ((ratio = avg_stats(&walltime_nsecs_stats)) != 0)
-			print_metric(config, ctxp, NULL, "%8.3f", "CPUs utilized",
-				     avg / (ratio * evsel->scale));
-		else
-			print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
-		double fe_bound = td_fe_bound(map_idx, st, &rsd);
-
-		if (fe_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
-				fe_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
-		double retiring = td_retiring(map_idx, st, &rsd);
-
-		if (retiring > 0.7)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
-				retiring * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
-		double bad_spec = td_bad_spec(map_idx, st, &rsd);
-
-		if (bad_spec > 0.1)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
-				bad_spec * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
-		double be_bound = td_be_bound(map_idx, st, &rsd);
-		const char *name = "backend bound";
-		static int have_recovery_bubbles = -1;
-
-		/* In case the CPU does not support topdown-recovery-bubbles */
-		if (have_recovery_bubbles < 0)
-			have_recovery_bubbles = pmu_have_event("cpu",
-					"topdown-recovery-bubbles");
-		if (!have_recovery_bubbles)
-			name = "backend bound/bad spec";
-
-		if (be_bound > 0.2)
-			color = PERF_COLOR_RED;
-		if (td_total_slots(map_idx, st, &rsd) > 0)
-			print_metric(config, ctxp, color, "%8.1f%%", name,
-					be_bound * 100.);
-		else
-			print_metric(config, ctxp, NULL, NULL, name, 0);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
-		   full_td(map_idx, st, &rsd)) {
-		double retiring = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_RETIRING, st,
-						  &rsd);
-		if (retiring > 0.7)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "Retiring",
-				retiring * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
-		   full_td(map_idx, st, &rsd)) {
-		double fe_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_FE_BOUND, st,
-						  &rsd);
-		if (fe_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Frontend Bound",
-				fe_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
-		   full_td(map_idx, st, &rsd)) {
-		double be_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BE_BOUND, st,
-						  &rsd);
-		if (be_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Backend Bound",
-				be_bound * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
-		   full_td(map_idx, st, &rsd)) {
-		double bad_spec = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BAD_SPEC, st,
-						  &rsd);
-		if (bad_spec > 0.1)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Bad Speculation",
-				bad_spec * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_HEAVY_OPS) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double retiring = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_RETIRING, st,
-						  &rsd);
-		double heavy_ops = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_HEAVY_OPS, st,
-						   &rsd);
-		double light_ops = retiring - heavy_ops;
-
-		if (retiring > 0.7 && heavy_ops > 0.1)
-			color = PERF_COLOR_GREEN;
-		print_metric(config, ctxp, color, "%8.1f%%", "Heavy Operations",
-				heavy_ops * 100.);
-		if (retiring > 0.7 && light_ops > 0.6)
-			color = PERF_COLOR_GREEN;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Light Operations",
-				light_ops * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BR_MISPREDICT) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double bad_spec = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BAD_SPEC, st,
-						  &rsd);
-		double br_mis = td_metric_ratio(map_idx,
-						STAT_TOPDOWN_BR_MISPREDICT, st,
-						&rsd);
-		double m_clears = bad_spec - br_mis;
-
-		if (bad_spec > 0.1 && br_mis > 0.05)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Branch Mispredict",
-				br_mis * 100.);
-		if (bad_spec > 0.1 && m_clears > 0.05)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Machine Clears",
-				m_clears * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_LAT) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double fe_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_FE_BOUND, st,
-						  &rsd);
-		double fetch_lat = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_FETCH_LAT, st,
-						   &rsd);
-		double fetch_bw = fe_bound - fetch_lat;
-
-		if (fe_bound > 0.2 && fetch_lat > 0.15)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Fetch Latency",
-				fetch_lat * 100.);
-		if (fe_bound > 0.2 && fetch_bw > 0.1)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Fetch Bandwidth",
-				fetch_bw * 100.);
-	} else if (perf_stat_evsel__is(evsel, TOPDOWN_MEM_BOUND) &&
-			full_td(map_idx, st, &rsd) && (config->topdown_level > 1)) {
-		double be_bound = td_metric_ratio(map_idx,
-						  STAT_TOPDOWN_BE_BOUND, st,
-						  &rsd);
-		double mem_bound = td_metric_ratio(map_idx,
-						   STAT_TOPDOWN_MEM_BOUND, st,
-						   &rsd);
-		double core_bound = be_bound - mem_bound;
-
-		if (be_bound > 0.2 && mem_bound > 0.2)
-			color = PERF_COLOR_RED;
-		print_metric(config, ctxp, color, "%8.1f%%", "Memory Bound",
-				mem_bound * 100.);
-		if (be_bound > 0.2 && core_bound > 0.1)
-			color = PERF_COLOR_RED;
-		else
-			color = NULL;
-		print_metric(config, ctxp, color, "%8.1f%%", "Core Bound",
-				core_bound * 100.);
-	} else if (runtime_stat_n(st, STAT_NSECS, map_idx, &rsd) != 0) {
-		char unit = ' ';
-		char unit_buf[10] = "/sec";
-
-		total = runtime_stat_avg(st, STAT_NSECS, map_idx, &rsd);
-		if (total)
-			ratio = convert_unit_double(1000000000.0 * avg / total, &unit);
-
-		if (unit != ' ')
-			snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
-		print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio);
-	} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
-		print_smi_cost(config, map_idx, out, st, &rsd);
 	} else {
-		num = 0;
+		stat_print_function_t fn = stat_print_function[evsel__stat_type(evsel)];
+
+		if (fn)
+			fn(config, evsel, aggr_idx, avg, out);
+		else {
+			double nsecs =	find_stat(evsel, aggr_idx, STAT_NSECS);
+
+			if (nsecs) {
+				char unit = ' ';
+				char unit_buf[10] = "/sec";
+				double ratio = convert_unit_double(1000000000.0 * avg / nsecs,
+								   &unit);
+
+				if (unit != ' ')
+					snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
+				print_metric(config, ctxp, NULL, "%8.3f", unit_buf, ratio);
+			} else
+				num = 0;
+		}
 	}
 
 	if ((me = metricgroup__lookup(metric_events, evsel, false)) != NULL) {
@@ -1214,10 +585,10 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 		list_for_each_entry (mexp, &me->head, nd) {
 			if (num++ > 0)
 				out->new_line(config, ctxp);
-			generic_metric(config, mexp->metric_expr, mexp->metric_events,
-				       mexp->metric_refs, evsel->name, mexp->metric_name,
-				       mexp->metric_unit, mexp->runtime,
-				       map_idx, out, st);
+			generic_metric(config, mexp->metric_expr, mexp->metric_threshold,
+				       mexp->metric_events, mexp->metric_refs, evsel->name,
+				       mexp->metric_name, mexp->metric_unit, mexp->runtime,
+				       aggr_idx, out);
 		}
 	}
 	if (num == 0)
author	Linus Torvalds <[email protected]>	2023-05-07 11:32:18 -0700
committer	Linus Torvalds <[email protected]>	2023-05-07 11:32:18 -0700
commit	f085df1be60abf670315c11036261cfaec16b2eb (patch)
tree	c02c07ad31578b90c3cc99be6b5ba680d163bca7 /tools/perf/util/stat-shadow.c
parent	17784de648be93b4eef0ef8fe28a16ff04feecc7 (diff)
parent	9a2d5178b9d51e1c5f9e08989ff97fc8d4893f31 (diff)