aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22perf test: Rename builtin-test-list and add missed header guardIan Rogers4-3/+7
builtin-test-list is primarily concerned with shell script tests. Rename the file to better reflect this and add a missed header guard. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf tests: Avoid fork in perf_has_symbol testIan Rogers1-1/+1
perf test -vv Symbols is used to indentify symbols within the perf binary. Add the -F flag so that the test command doesn't fork the test before running. This removes a little overhead. Acked-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf list: Add scandirat compatibility functionIan Rogers3-9/+31
scandirat is used during the printing of tracepoint events but may be missing from certain libcs. Add a compatibility implementation that uses the symlink of an fd in /proc as a path for the reliably present scandir. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf thread_map: Skip exited threads when scanning /procIan Rogers1-5/+4
Scanning /proc is inherently racy. Scanning /proc/pid/task within that is also racy as the pid can terminate. Rather than failing in __thread_map__new_all_cpus, skip pids for such failures. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf list: fix short description for some cache eventsThomas Richter1-31/+31
Correct the short description of the following events: DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV, DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT, DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER, DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT, IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT, IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT, IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT, ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT, ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER. The second Cache should be L2-Cache. Output before (display diff of the first four events) # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from Cache with Intervention. \ Unit: cpum_cf] Output after: # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from L2-Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from L2-Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from L2-Cache with \ Intervention. Unit: cpum_cf] Fixes: 7f76b3113068 ("perf list: Add IBM z16 event description for s390") Reported-by: Andreas Krebbel <[email protected]> Signed-off-by: Thomas Richter <[email protected]> Acked-by: Andreas Krebbel <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf stat: Fix metric-only aggregation indexIan Rogers1-2/+7
Aggregation index was being computed using the evsel's cpumap which may have a different (typically the same or fewer) entries. Before: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 12.8 0.0 12.9 12.7 0.0 12.6 CPU1 1.007806367 seconds time elapsed ``` After: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 15.4 0.0 15.3 15.0 0.0 14.9 CPU18 0.0 0.0 13.5 5.2 0.0 11.9 1.007858736 seconds time elapsed ``` Signed-off-by: Ian Rogers <[email protected]> | Acked-by: Namhyung Kim <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Kaige Ye <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Kan Liang <[email protected]> Cc: John Garry <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf metrics: Compute unmerged uncore metrics individuallyIan Rogers2-4/+29
When merging counts from multiple uncore PMUs the metric is only computed for the metric leader. When merging/aggregation is disabled, prior to this patch just the leader's metric would be computed. Fix this by computing the metric for each PMU. On a SkylakeX: Before: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,440,851 ns duration_time 1.003440851 seconds time elapsed ``` After: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,353,416 ns duration_time ``` Signed-off-by: Ian Rogers <[email protected]> | Acked-by: Namhyung Kim <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Kaige Ye <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: John Garry <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf stat: Pass fewer metric argumentsIan Rogers1-20/+18
Pass metric_expr and evsel rather than specific variables from the struct, thereby reducing the number of arguments. This will enable later fixes. To reduce the size of the diff, local variables are added to match the previous parameter names. This isn't done in the case of "name" as evsel->name is more intention revealing. A whitespace issue is also addressed. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Kaige Ye <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: John Garry <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20perf: script: prefer capstone to XEDChangbin Du3-7/+11
Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20perf: script: add raw|disasm arguments to --insn-trace optionChangbin Du2-7/+22
Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20perf: script: add field 'disasm' to display mnemonic instructionsChangbin Du2-7/+21
In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20perf: util: use capstone disasm engine to show assembly instructionsChangbin Du5-6/+155
Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20perf: build: introduce the libcapstoneChangbin Du4-1/+28
Later we will use libcapstone to disassemble instructions of samples. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Cc: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf list: For metricgroup only list include descriptionIan Rogers1-7/+14
If perf list is invoked with 'metricgroups' include the description unless it is invoked with flags to exclude it. Make the description of metricgroup dumping dependent on the desc flag in print_state as with metrics. Before: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend Bad BadSpec ... ``` After: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] BadSpec ... ``` Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf tools: Fixup module symbol end address properlyNamhyung Kim1-2/+19
I got a strange error on ARM to fail on processing FINISHED_ROUND record. It turned out that it was failing in symbol__alloc_hist() because the symbol size is too big. When a sample is captured on a specific BPF program, it failed. I've added a debug code and found the end address of the symbol is from the next module which is placed far way. ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf] ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf] ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic] ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic] ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic] The logic in symbols__fixup_end() just uses curr->start to update the prev->end. But in this case, it won't work as it's too different. I think ARM has a different kernel memory layout for modules and BPF than on x86. Actually there's a logic to handle kernel and module boundary. Let's do the same for symbols between different modules. Signed-off-by: Namhyung Kim <[email protected]> Reviewed-by: Leo Yan <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mike Leach <[email protected]> Cc: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update tigerlake TMA metrics to 4.7Ian Rogers2-157/+261
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update skylakex TMA metrics to 4.7Ian Rogers2-168/+392
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update skylake TMA metrics to 4.7Ian Rogers2-161/+246
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update sapphirerapids TMA metrics to 4.7Ian Rogers2-221/+564
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update sandybridge TMA metrics to 4.7Ian Rogers2-32/+46
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Add metrics tma_fp_vector_128b, tma_fp_vector_256b and tma_info_system_cpus_utilized. - Remove metrics tma_info_system_mem_parallel_requests, tma_info_system_core_frequency and tma_info_system_mem_request_latency. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update rocketlake TMA metrics to 4.7Ian Rogers2-157/+261
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update jaketown TMA metrics to 4.7Ian Rogers2-19/+52
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update ivytown TMA metrics to 4.7Ian Rogers2-91/+116
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update ivybridge TMA metrics to 4.7Ian Rogers2-98/+106
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update icelakex TMA metrics to 4.7Ian Rogers2-177/+421
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update icelake TMA metrics to 4.7Ian Rogers2-150/+260
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update haswellx TMA metrics to 4.7Ian Rogers2-92/+139
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update haswell TMA metrics to 4.7Ian Rogers2-102/+83
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update cascadelakex TMA metrics to 4.7Ian Rogers2-174/+404
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update broadwellx TMA metrics to 4.7Ian Rogers2-104/+153
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update broadwellde TMA metrics to 4.7Ian Rogers2-101/+97
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update broadwell TMA metrics to 4.7Ian Rogers2-114/+97
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update alderlake TMA metrics to 4.7Ian Rogers2-168/+302
Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update tigerlake events to v1.15Ian Rogers4-11/+5
Update alderlake events to v1.15 released in: https://github.com/intel/perfmon/commit/282a6951fd9f025cff6c8c0ea16b1fcec786a4cd Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS, deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and UNC_ARB_IFA_OCCUPANCY.ALL. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update skylake events to v58Ian Rogers4-4/+4
Update skylake events to v58 released in: https://github.com/intel/perfmon/commit/625fb7507373fef8297052c5f9af9ffe78d460c0 Improves documentation. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update sierraforst events to v1.01Ian Rogers14-4/+6942
Update sierraforest events to v1.01 released in: https://github.com/intel/perfmon/commit/582bca24aa0d742306cd4697c5bd1b1b529aa3ce Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update rocketlake events to v1.02Ian Rogers4-11/+4
Update alderlake events to v1.02 released in: https://github.com/intel/perfmon/commit/4931178d1ede1099a3e4ac7e04ed9f073e03d219 Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update meteorlake events to v1.07Ian Rogers6-8/+210
Update meteorlake events to v1.07 released in: https://github.com/intel/perfmon/commit/62517223080e46bfa9a905a1195c7febae7fdb3e Umask changed on atom mem_bound events. Adds atom events ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP, FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND, BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT, BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL, BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K, DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M, DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K, and alias events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update icelake events to v1.21Ian Rogers4-11/+4
Update icelake events to v1.21 released in: https://github.com/intel/perfmon/commit/54f1246b0496112c1d2b2a49e4859c85caa3dbf4 Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update haswell events to v35Ian Rogers2-2/+2
Update haswell events to v35 released in: https://github.com/intel/perfmon/commit/c0f9b34d421941bc3e13c6ca5554e6a54e8bd574 Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update grandridge events to v1.01Ian Rogers13-4/+4367
Update grandridge events to v1.01 released in: https://github.com/intel/perfmon/commit/211d60716509d8248e57450e434de98cc6e511d8 Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update emeraldrapids events to v1.03Ian Rogers2-1/+153
Update emeraldrapids events to v1.03 released in: https://github.com/intel/perfmon/commit/c7c6f72dae07fee35d5982232829c0cd37f9e28e Adds uncore CHA events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update broadwell events to v29Ian Rogers2-2/+2
Update broadwell events to v29 released in: https://github.com/intel/perfmon/commit/47117146c6b9e38811618beca31eba4e41c3d874 Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update alderlaken events to v1.24Ian Rogers3-1/+19
Update alderlaken events to v1.24 released in: https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf vendor events intel: Update alderlake events to v1.24Ian Rogers4-4/+51
Update alderlake events to v1.24 released in: https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c Adds aliased events, improves documentation and fix some event fields. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Edward Baker <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Samantha Alt <[email protected]> Cc: Weilin Wang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.hArnaldo Carvalho de Melo2-14/+8
If we instead decide to generate vmlinux.h from BTF info, it will be there: $ pahole timespec64 struct timespec64 { time64_t tv_sec; /* 0 8 */ long int tv_nsec; /* 8 8 */ /* size: 16, cachelines: 1, members: 2 */ /* last cacheline: 16 bytes */ }; $ pahole manages to find it from /sys/kernel/btf/vmlinux, that is generated from the kernel types. With this linux/bpf.h doesn't need to be included, as its already in the minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we need comes when generating a vmlinux.h file from BTF info, i.e. when using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before removing linux/bpf.h. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1
2024-02-16perf testsuite: Install kprobe tests and common filesMichael Petlan1-0/+5
Signed-off-by: Michael Petlan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf testsuite: Add test for kprobe handlingVeronika Molnarova2-0/+326
Test perf interface to kprobes: listing, adding and removing probes. It is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <[email protected]> Signed-off-by: Michael Petlan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf testsuite: Add common output checking helpersVeronika Molnarova3-0/+107
As a form of validation, it is a common practice to check the outputs of commands whether they contain expected patterns or match a certain regex. Add helpers for verifying that all regexes are found in the output, that all lines match any pattern from a set and that a certain expression is not present in the output. In verbose mode these helpers log mismatches for easier failure investigation. Signed-off-by: Veronika Molnarova <[email protected]> Signed-off-by: Michael Petlan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16perf testsuite: Add test case for perf probeVeronika Molnarova1-0/+23
Add new perf probe test case that acts as an entry element in perf test list. Runs multiple subtests from directory "base_probe", which will be added in incomming patches and can be expanded without further editing. Signed-off-by: Veronika Molnarova <[email protected]> Signed-off-by: Michael Petlan <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]