Age | Commit message (Collapse) | Author | Files | Lines |
|
builtin-test-list is primarily concerned with shell script
tests. Rename the file to better reflect this and add a missed header
guard.
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
perf test -vv Symbols is used to indentify symbols within the perf
binary. Add the -F flag so that the test command doesn't fork the test
before running. This removes a little overhead.
Acked-by: Adrian Hunter <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
scandirat is used during the printing of tracepoint events but may be
missing from certain libcs. Add a compatibility implementation that
uses the symlink of an fd in /proc as a path for the reliably present
scandir.
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Scanning /proc is inherently racy. Scanning /proc/pid/task within that
is also racy as the pid can terminate. Rather than failing in
__thread_map__new_all_cpus, skip pids for such failures.
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Correct the short description of the following events:
DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV,
DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT,
DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER,
DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT,
IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT,
IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT,
IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT,
ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT,
ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER.
The second Cache should be L2-Cache.
Output before (display diff of the first four events)
# perf list -d
DCW_REQ
[Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf]
DCW_REQ_CHIP_HIT
[Directory Write Level 1 Data Cache from Cache with Chip HP \
Hit. Unit: cpum_cf]
DCW_REQ_DRAWER_HIT
[Directory Write Level 1 Data Cache from Cache with Drawer \
HP Hit. Unit: cpum_cf]
DCW_REQ_IV
[Directory Write Level 1 Data Cache from Cache with Intervention. \
Unit: cpum_cf]
Output after:
# perf list -d
DCW_REQ
[Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf]
DCW_REQ_CHIP_HIT
[Directory Write Level 1 Data Cache from L2-Cache with Chip HP \
Hit. Unit: cpum_cf]
DCW_REQ_DRAWER_HIT
[Directory Write Level 1 Data Cache from L2-Cache with Drawer \
HP Hit. Unit: cpum_cf]
DCW_REQ_IV
[Directory Write Level 1 Data Cache from L2-Cache with \
Intervention. Unit: cpum_cf]
Fixes: 7f76b3113068 ("perf list: Add IBM z16 event description for s390")
Reported-by: Andreas Krebbel <[email protected]>
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Andreas Krebbel <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Aggregation index was being computed using the evsel's cpumap which
may have a different (typically the same or fewer) entries.
Before:
```
$ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total
CPU0 12.8 0.0 12.9 12.7 0.0 12.6
CPU1
1.007806367 seconds time elapsed
```
After:
```
$ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total
CPU0 15.4 0.0 15.3 15.0 0.0 14.9
CPU18 0.0 0.0 13.5 5.2 0.0 11.9
1.007858736 seconds time elapsed
```
Signed-off-by: Ian Rogers <[email protected]> |
Acked-by: Namhyung Kim <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Kaige Ye <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When merging counts from multiple uncore PMUs the metric is only
computed for the metric leader. When merging/aggregation is disabled,
prior to this patch just the leader's metric would be computed. Fix
this by computing the metric for each PMU.
On a SkylakeX:
Before:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,440,851 ns duration_time
1.003440851 seconds time elapsed
```
After:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1
Performance counter stats for 'system wide':
CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total
CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total
CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total
CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total
CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total
CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total
CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total
CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total
CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0 1,003,353,416 ns duration_time
```
Signed-off-by: Ian Rogers <[email protected]> |
Acked-by: Namhyung Kim <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Kaige Ye <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pass metric_expr and evsel rather than specific variables from the
struct, thereby reducing the number of arguments. This will enable
later fixes.
To reduce the size of the diff, local variables are added to match the
previous parameter names. This isn't done in the case of "name" as
evsel->name is more intention revealing. A whitespace issue is also
addressed.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Kaige Ye <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now perf can show assembly instructions with libcapstone for x86, and the
capstone is better in general.
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Thomas Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now '--insn-trace' accept a argument to specify the output format:
- raw: display raw instructions.
- disasm: display mnemonic instructions (if capstone is installed).
$ sudo perf script --insn-trace=raw
ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7
ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00
ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa
$ sudo perf script --insn-trace=disasm
ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi
ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0
ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction
ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp
ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp
ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Thomas Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In addition to the 'insn' field, this adds a new field 'disasm' to
display mnemonic instructions instead of the raw code.
$ sudo perf script -F +disasm
perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown])
perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown])
ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi
ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Thomas Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently, the instructions of samples are shown as raw hex strings
which are hard to read. x86 has a special option '--xed' to disassemble
the hex string via intel XED tool.
Here we use capstone as our disassembler engine to give more friendly
instructions. We select libcapstone because capstone can provide more
insn details. Perf will fallback to raw instructions if libcapstone is
not available.
The advantages compared to XED tool:
* Support arm, arm64, x86-32, x86_64 (more could be supported),
xed only for x86_64.
* Immediate address operands are shown as symbol+offs.
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Thomas Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Later we will use libcapstone to disassemble instructions of samples.
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: Thomas Richter <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If perf list is invoked with 'metricgroups' include the description
unless it is invoked with flags to exclude it. Make the description of
metricgroup dumping dependent on the desc flag in print_state as with
metrics.
Before:
```
$ perf list metricgroups
List of pre-defined events (to be used in -e or -M):
Metric Groups:
Backend
Bad
BadSpec
...
```
After:
```
$ perf list metricgroups
List of pre-defined events (to be used in -e or -M):
Metric Groups:
Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
BadSpec
...
```
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
I got a strange error on ARM to fail on processing FINISHED_ROUND
record. It turned out that it was failing in symbol__alloc_hist()
because the symbol size is too big.
When a sample is captured on a specific BPF program, it failed. I've
added a debug code and found the end address of the symbol is from
the next module which is placed far way.
ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf]
ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf]
ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here
ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic]
ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic]
ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic]
The logic in symbols__fixup_end() just uses curr->start to update the
prev->end. But in this case, it won't work as it's too different.
I think ARM has a different kernel memory layout for modules and BPF
than on x86. Actually there's a logic to handle kernel and module
boundary. Let's do the same for symbols between different modules.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- tma_c01_wait and tma_c02_wait metrics measure power-performance
states.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Add metrics tma_fp_vector_128b, tma_fp_vector_256b and
tma_info_system_cpus_utilized.
- Remove metrics tma_info_system_mem_parallel_requests,
tma_info_system_core_frequency and
tma_info_system_mem_request_latency.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- tma_c01_wait and tma_c02_wait metrics measure power-performance
states.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update alderlake events to v1.15 released in:
https://github.com/intel/perfmon/commit/282a6951fd9f025cff6c8c0ea16b1fcec786a4cd
Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS,
deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and
UNC_ARB_IFA_OCCUPANCY.ALL.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update skylake events to v58 released in:
https://github.com/intel/perfmon/commit/625fb7507373fef8297052c5f9af9ffe78d460c0
Improves documentation.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update sierraforest events to v1.01 released in:
https://github.com/intel/perfmon/commit/582bca24aa0d742306cd4697c5bd1b1b529aa3ce
Adds the majority of core and uncore events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update alderlake events to v1.02 released in:
https://github.com/intel/perfmon/commit/4931178d1ede1099a3e4ac7e04ed9f073e03d219
Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update meteorlake events to v1.07 released in:
https://github.com/intel/perfmon/commit/62517223080e46bfa9a905a1195c7febae7fdb3e
Umask changed on atom mem_bound events. Adds atom events
ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP,
FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND,
BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT,
BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL,
BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K,
DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M,
DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K,
and alias events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update icelake events to v1.21 released in:
https://github.com/intel/perfmon/commit/54f1246b0496112c1d2b2a49e4859c85caa3dbf4
Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update haswell events to v35 released in:
https://github.com/intel/perfmon/commit/c0f9b34d421941bc3e13c6ca5554e6a54e8bd574
Updates "must be precise" on RTM_RETIRED.ABORTED.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update grandridge events to v1.01 released in:
https://github.com/intel/perfmon/commit/211d60716509d8248e57450e434de98cc6e511d8
Adds the majority of core and uncore events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update emeraldrapids events to v1.03 released in:
https://github.com/intel/perfmon/commit/c7c6f72dae07fee35d5982232829c0cd37f9e28e
Adds uncore CHA events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update broadwell events to v29 released in:
https://github.com/intel/perfmon/commit/47117146c6b9e38811618beca31eba4e41c3d874
Updates "must be precise" on RTM_RETIRED.ABORTED.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update alderlaken events to v1.24 released in:
https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c
Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Update alderlake events to v1.24 released in:
https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c
Adds aliased events, improves documentation and fix some event fields.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Edward Baker <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Samantha Alt <[email protected]>
Cc: Weilin Wang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If we instead decide to generate vmlinux.h from BTF info, it will be
there:
$ pahole timespec64
struct timespec64 {
time64_t tv_sec; /* 0 8 */
long int tv_nsec; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
$
pahole manages to find it from /sys/kernel/btf/vmlinux, that is
generated from the kernel types.
With this linux/bpf.h doesn't need to be included, as its already in the
minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we
need comes when generating a vmlinux.h file from BTF info, i.e. when
using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before
removing linux/bpf.h.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1
|
|
Signed-off-by: Michael Petlan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Test perf interface to kprobes: listing, adding and removing probes. It
is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <[email protected]>
Signed-off-by: Michael Petlan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regex.
Add helpers for verifying that all regexes are found in the output, that
all lines match any pattern from a set and that a certain expression is
not present in the output.
In verbose mode these helpers log mismatches for easier failure
investigation.
Signed-off-by: Veronika Molnarova <[email protected]>
Signed-off-by: Michael Petlan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add new perf probe test case that acts as an entry element in perf test
list. Runs multiple subtests from directory "base_probe", which will be
added in incomming patches and can be expanded without further editing.
Signed-off-by: Veronika Molnarova <[email protected]>
Signed-off-by: Michael Petlan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|