perf doc: Refresh topdown documentation

perf stat now supports --topdown for any platform with the TopdownL1
metric group including Intel before Icelake. Tweak the documentation
to reflect this.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-43-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit is contained in:
Ian Rogers 2023-02-19 01:28:39 -08:00 committed by Arnaldo Carvalho de Melo
parent 7b86475f02
commit 20cb10eadb
2 changed files with 43 additions and 52 deletions

View file

@ -394,10 +394,10 @@ See perf list output for the possible metrics and metricgroups.
Do not aggregate counts across all monitored CPUs.
--topdown::
Print complete top-down metrics supported by the CPU. This allows to
determine bottle necks in the CPU pipeline for CPU bound workloads,
by breaking the cycles consumed down into frontend bound, backend bound,
bad speculation and retiring.
Print top-down metrics supported by the CPU. This allows to determine
bottle necks in the CPU pipeline for CPU bound workloads, by breaking
the cycles consumed down into frontend bound, backend bound, bad
speculation and retiring.
Frontend bound means that the CPU cannot fetch and decode instructions fast
enough. Backend bound means that computation or memory access is the bottle
@ -430,15 +430,18 @@ CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
--td-level::
Print the top-down statistics that equal to or lower than the input level.
It allows users to print the interested top-down metrics level instead of
the complete top-down metrics.
Print the top-down statistics that equal the input level. It allows
users to print the interested top-down metrics level instead of the
level 1 top-down metrics.
The availability of the top-down metrics level depends on the hardware. For
example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
supports both L1 and L2 top-down metrics.
As the higher levels gather more metrics and use more counters they
will be less accurate. By convention a metric can be examined by
appending '_group' to it and this will increase accuracy compared to
gathering all metrics for a level. For example, level 1 analysis may
highlight 'tma_frontend_bound'. This metric may be drilled into with
'tma_frontend_bound_group' with
'perf stat -M tma_frontend_bound_group...'.
Default: 0 means the max level that the current hardware support.
Error out if the input is higher than the supported max level.
--no-merge::

View file

@ -1,46 +1,35 @@
Using TopDown metrics in user space
-----------------------------------
Using TopDown metrics
---------------------
Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
methodology to break down CPU pipeline execution into 4 bottlenecks:
frontend bound, backend bound, bad speculation, retiring.
TopDown metrics break apart performance bottlenecks. Starting at level
1 it is typical to get metrics on retiring, bad speculation, frontend
bound, and backend bound. Higher levels provide more detail in to the
level 1 bottlenecks, such as at level 2: core bound, memory bound,
heavy operations, light operations, branch mispredicts, machine
clears, fetch latency and fetch bandwidth. For more details see [1][2][3].
For more details on Topdown see [1][5]
perf stat --topdown implements this using available metrics that vary
per architecture.
Traditionally this was implemented by events in generic counters
and specific formulas to compute the bottlenecks.
% perf stat -a --topdown -I1000
# time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation
1.001141351 11.5 34.9 46.9 6.7
2.006141972 13.4 28.1 50.4 8.1
3.010162040 12.9 28.1 51.1 8.0
4.014009311 12.5 28.6 51.8 7.2
5.017838554 11.8 33.0 48.0 7.2
5.704818971 14.0 27.5 51.3 7.3
...
perf stat --topdown implements this.
Full Top Down includes more levels that can break down the
bottlenecks further. This is not directly implemented in perf,
but available in other tools that can run on top of perf,
such as toplev[2] or vtune[3]
New Topdown features in Ice Lake
===============================
New Topdown features in Intel Ice Lake
======================================
With Ice Lake CPUs the TopDown metrics are directly available as
fixed counters and do not require generic counters. This allows
to collect TopDown always in addition to other events.
% perf stat -a --topdown -I1000
# time retiring bad speculation frontend bound backend bound
1.001281330 23.0% 15.3% 29.6% 32.1%
2.003009005 5.0% 6.8% 46.6% 41.6%
3.004646182 6.7% 6.7% 46.0% 40.6%
4.006326375 5.0% 6.4% 47.6% 41.0%
5.007991804 5.1% 6.3% 46.3% 42.3%
6.009626773 6.2% 7.1% 47.3% 39.3%
7.011296356 4.7% 6.7% 46.2% 42.4%
8.012951831 4.7% 6.7% 47.5% 41.1%
...
This also enables measuring TopDown per thread/process instead
of only per core.
Using TopDown through RDPMC in applications on Ice Lake
======================================================
Using TopDown through RDPMC in applications on Intel Ice Lake
=============================================================
For more fine grained measurements it can be useful to
access the new directly from user space. This is more complicated,
@ -301,8 +290,8 @@ This "opens" a new measurement period.
A program using RDPMC for TopDown should schedule such a reset
regularly, as in every few seconds.
Limits on Ice Lake
==================
Limits on Intel Ice Lake
========================
Four pseudo TopDown metric events are exposed for the end-users,
topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
@ -318,8 +307,8 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown
group, the second event of the group is the sampling event.
For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
Extension on Sapphire Rapids Server
===================================
Extension on Intel Sapphire Rapids Server
=========================================
The metrics counter is extended to support TMA method level 2 metrics.
The lower half of the register is the TMA level 1 metrics (legacy).
The upper half is also divided into four 8-bit fields for the new level 2
@ -338,7 +327,6 @@ other four level 2 metrics by subtracting corresponding metrics as below.
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
[2] https://sites.google.com/site/analysismethods/yasin-pubs
[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
[5] https://sites.google.com/site/analysismethods/yasin-pubs