aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util/cpumap.h
AgeCommit message (Collapse)AuthorFilesLines
2024-05-07perf cpumap: Remove refcnt from 'struct cpu_aggr_map'Ian Rogers1-2/+0
It is assigned a value of 1 and never incremented. Remove and replace puts with delete. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-02-09perf stat: Support per-cluster aggregationYicong Yang1-3/+16
Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Jacobsville). Currently parsing and building cluster topology have been supported since [1]. perf stat has already supported aggregation for other topologies like die or socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T bandwidth contention. This patch add support for "--per-cluster" option for per-cluster aggregation. Also update the docs and related test. The output will be like: [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5 Performance counter stats for 'system wide': S56-D0-CLS158 4 1,321,521,570 LLC-load S56-D0-CLS594 4 794,211,453 LLC-load S56-D0-CLS1030 4 41,623 LLC-load S56-D0-CLS1466 4 41,646 LLC-load S56-D0-CLS1902 4 16,863 LLC-load S56-D0-CLS2338 4 15,721 LLC-load S56-D0-CLS2774 4 22,671 LLC-load [...] On a legacy system without cluster or cluster support, the output will be look like: [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1 Performance counter stats for 'system wide': S56-D0-CLS0 64 18,011,485 cycles S7182-D0-CLS0 64 16,548,835 cycles Note that this patch doesn't mix the cluster information in the outputs of --per-core to avoid breaking any tools/scripts using it. Note that perf recently supports "--per-cache" aggregation, but it's not the same with the cluster although cluster CPUs may share some cache resources. For example on my machine all clusters within a die share the same L3 cache: $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list 0-31 $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list 0-3 [1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die") Tested-by: Jie Zhan <[email protected]> Reviewed-by: Tim Chen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Yicong Yang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-05-27perf pmu: Add CPU map for "cpu" PMUsIan Rogers1-2/+2
A typical "cpu" PMU has no "cpus" or "cpumask" file meaning the CPU map is set to NULL, which also encodes an empty CPU map. Update pmu_cpumask so that if the "cpu" PMU fails to load a CPU map, use a default of all online PMUs. Remove const from cpu_map__online for the sake of reference counting. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ali Saidi <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kang Minchul <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Will Deacon <[email protected]> Cc: Xing Zhengjun <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-05-23perf stat: Setup the foundation to allow aggregation based on cache topologyK Prateek Nayak1-0/+7
Processors based on chiplet architecture, such as AMD EPYC and Hygon do not expose the chiplet details in the sysfs CPU topology information. However, this information can be derived from the per CPU cache level information from the sysfs. 'perf stat' has already supported aggregation based on topology information using core ID, socket ID, etc. It'll be useful to aggregate based on the cache topology to detect problems like imbalance and cache-to-cache sharing at various cache levels. This patch lays the foundation for aggregating data in 'perf stat' based on the processor's cache topology. The cmdline option to aggregate data based on the cache topology is added in Patch 4 of the series while this patch sets up all the necessary functions and variables required to support the new aggregation option. The patch also adds support to display per-cache aggregation, or save it as a JSON or CSV, as splitting it into a separate patch would break builds when compiling with "-Werror=switch-enum" where the compiler will complain about the lack of handling for the AGGR_CACHE case in the output functions. Committer notes: Don't use perf_stat_config in tools/perf/util/cpumap.c, this would make code that is in util/, thus not really specific to a single builtin, use a specific builtin config structure. Move the functions introduced in this patch from tools/perf/util/cpumap.c since it needs access to builtin specific and is not strictly needed to live in the util/ directory. With this 'perf test python' is back building. Suggested-by: Gautham Shenoy <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Wen Pu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-02-19perf stat: Add cpu_aggr_map for loopIan Rogers1-0/+3
Rename variables, add a comment and add a cpu_aggr_map__for_each_idx to aid the readability of the stat-display code. In particular, try to make sure aggr_idx is used consistently to differentiate from other kinds of index. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Caleb Biggers <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Florian Fischer <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Perry Taylor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Xing Zhengjun <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-11-16perf cpumap: Tidy libperf includesIan Rogers1-1/+1
Use public API when possible, don't include internal API in header files in evsel.h. Fix any related breakages. Committer note: There was one missing case, when building for arm64: arch/arm64/util/pmu.c: In function 'pmu_events_table__find': arch/arm64/util/pmu.c:18:30: error: invalid use of undefined type 'struct perf_cpu_map' 18 | if (pmu->cpus->nr != cpu__max_cpu().cpu) | ^~ Fix it by adding one more exception, including <internal/cpumap.h> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-27perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()Namhyung Kim1-1/+1
In case of no aggregation, it needs to keep the original (cpu) ordering in the aggr_map so that it can be in sync with the cpu map. This will make the code easier to handle AGGR_NONE similar to others. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-27perf stat: Add aggr id for global modeNamhyung Kim1-1/+5
To make the code simpler, I'd like to use the same aggregation code for the global mode. We can simply add an id function to return cpu 0 and use print_aggr(). No functional change intended. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-10-06perf stat: Rename to aggr_cpu_id.thread_idxNamhyung Kim1-1/+1
The aggr_cpu_id has a thread value but it's actually an index to the thread_map. To reduce possible confusion, rename it to thread_idx. Suggested-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-19perf cpumap: Fix alignment for masks in event encodingIan Rogers1-1/+3
A mask encoding of a cpu map is laid out as: u16 nr u16 long_size unsigned long mask[]; However, the mask may be 8-byte aligned meaning there is a 4-byte pad after long_size. This means 32-bit and 64-bit builds see the mask as being at different offsets. On top of this the structure is in the byte data[] encoded as: u16 type char data[] This means the mask's struct isn't the required 4 or 8 byte aligned, but is offset by 2. Consequently the long reads and writes are causing undefined behavior as the alignment is broken. Fix the mask struct by creating explicit 32 and 64-bit variants, use a union to avoid data[] and casts; the struct must be packed so the layout matches the existing perf.data layout. Taking an address of a member of a packed struct breaks alignment so pass the packed perf_record_cpu_map_data to functions, so they can access variables with the right alignment. As the 64-bit version has 4 bytes of padding, optimizing writing to only write the 32-bit version. Committer notes: Disable warnings about 'packed' that break the build in some arches like riscv64, but just around that specific struct. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: German Gomez <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Kees Kook <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-22perf cpumap: Remove duplicate include in cpumap.hLv Ruyi1-1/+0
Remove all but the first include of stdbool.h from cpumap.h. Reported-by: Zeal Robot <[email protected]> Signed-off-by: Lv Ruyi <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: CGEL ZTE <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-22perf cpumap: Migrate to libperf cpumap apiIan Rogers1-1/+1
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: AndrĂ© Almeida <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: James Clark <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-15perf cpumap: Add is_dummy() methodArnaldo Carvalho de Melo1-0/+10
Needed to check if a cpu_map is dummy, i.e. not a cpu map at all, for pid monitoring scenarios. This probably needs to move to libperf, but since perf itself is the first and so far only user, leave it at tools/perf/util/. Acked-by: Andi Kleen <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Give CPUs their own typeIan Rogers1-13/+13
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Add CPU to aggr_cpu_idIan Rogers1-0/+8
With no aggregration, such as 'perf stat -A', the aggr_cpu_id lacks a way to describe per CPU aggregation and the core is set to the CPU in places like print_counter_aggrdata in stat-display.c. Setting the core to the CPU is undesirable as the CPU will exceed valid core values and lead to confusion. Add a CPU variable to address this. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Add some comments to cpu_aggr_mapIan Rogers1-1/+9
Move cpu_aggr_map__empty_new() to be with other cpu_aggr_map function. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Move 'has' function to libperfIan Rogers1-2/+0
Make the cpu map argument const for consistency with the rest of the API. Modify cpu_map__idx accordingly. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Rename cpu_map__get_X_aggr_by_cpu functionsIan Rogers1-4/+25
The functions don't use a cpu_map so reduce them to being like constructors of aggr_cpu_id. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Refactor cpu_map__build_map()Ian Rogers1-7/+9
Turn it into a cpu_aggr_map__new(). Pass helper functions. Refactor builtin-stat calls to manually pass function pointers. Try to reduce some copy-paste code. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove cpu_map__cpu(), use libperf functionIan Rogers1-1/+0
Switch the remaining few users of cpu_map__cpu() to perf_cpu_map__cpu() and remove the function. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] [ Did the conversion to perf_ftrace__latency_prepare_bpf() as well, used when building with BUILD_BPF_SKEL=1 ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove map from function names that don't use a mapIan Rogers1-3/+16
Move to the cpu name and document for consistency. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Document cpu__get_node() and remove redundant functionIan Rogers1-1/+4
cpu_map__get_node_id() isn't used externally and merely delegates to cpu__get_node(). Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Rename empty functionsIan Rogers1-2/+2
Remove cpu_map from name as a cpu_map isn't used. Pass a const pointer rather than by value to avoid unnecessary copying. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Simplify equal function nameIan Rogers1-1/+1
Rename cpu_map__compare_aggr_cpu_id() to aggr_cpu_id__equal(), the cpu_map part of the name is misleading. Equal better describes the function than compare. Switch to const pointer rather than value as struct given the number of variables in aggr_cpu_id(). Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove unused cpu_map__socket()Ian Rogers1-7/+0
Unused function so remove. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Add comments to aggr_cpu_id()Ian Rogers1-0/+9
This code is already tested in topology.c. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove map+index get_node()Ian Rogers1-1/+0
Migrate final users to appropriate cpu variant. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove map+index get_core()Ian Rogers1-1/+0
Migrate final users to appropriate cpu variant. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove map+index get_die()Ian Rogers1-1/+0
Migrate final users to appropriate cpu variant. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Remove map+index get_socket()Ian Rogers1-1/+0
Migrate final users to appropriate cpu variant. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf cpumap: Switch cpu_map__build_map() to cpu functionIan Rogers1-1/+1
Avoid error prone cpu_map + idx variant. Remove now unused functions. Committer notes: Remove by now unused perf_env__get_cpu(). Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-01-12perf stat: Add aggr creators that are passed a cpuIan Rogers1-1/+5
The cpu_map and index can get confused. Add variants of the cpu_map__get routines that are passed a cpu. Make the existing cpu_map__get routines use the new functions with a view to remove them when no longer used. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Vineet Singh <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Add separate thread memberJames Clark1-1/+1
A separate field isn't strictly required. The core field could be re-used for thread IDs as a single field was used previously. But separating them will avoid confusion and catch potential errors where core IDs are read as thread IDs and vice versa. Also remove the placeholder id field which is now no longer used. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Add separate core memberJames Clark1-5/+1
Add core as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Add separate die memberJames Clark1-5/+1
Add die as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Add separate socket memberJames Clark1-5/+1
Add socket as a separate member so that it doesn't have to be packed into the int value. When the socket ID was larger than 8 bits the output appeared corrupted or incomplete. For example, here on ThunderX2 'perf stat' reports a socket of -1 and an invalid die number: ./perf stat -a --per-die The socket id number is too big. Performance counter stats for 'system wide': S-1-D255 128 687.99 msec cpu-clock # 57.240 CPUs utilized ... S36-D0 128 842.34 msec cpu-clock # 70.081 CPUs utilized ... And with --per-core there is an entry with an invalid core ID: ./perf stat record -a --per-core The socket id number is too big. Performance counter stats for 'system wide': S-1-D255-C65535 128 671.04 msec cpu-clock # 54.112 CPUs utilized ... S36-D0-C0 4 28.27 msec cpu-clock # 2.279 CPUs utilized ... This fixes the "Session topology" self test on ThunderX2. After this fix the output contains the correct socket and die IDs and no longer prints a warning about the size of the socket ID: ./perf stat --per-die -a Performance counter stats for 'system wide': S36-D0 128 169,869.39 msec cpu-clock # 127.501 CPUs utilized ... S3612-D0 128 169,733.05 msec cpu-clock # 127.398 CPUs utilized Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Add separate node memberJames Clark1-0/+1
Add node as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat aggregation: Start using cpu_aggr_id in mapJames Clark1-1/+1
Use the new cpu_aggr_id struct in the cpu map instead of int so that it can store more data. No functional changes. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf cpumap: Drop in cpu_aggr_map structJames Clark1-5/+5
Replace usages of perf_cpu_map with cpu_aggr map in places that are involved with 'perf stat' aggregation. This will then later be changed to be a map of cpu_aggr_id rather than an int so that more data can be stored. No functional changes. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: John Garry <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf cpumap: Add new map type for aggregationJames Clark1-0/+8
Currently this is a duplicate of perf_cpu_map so that it can be used as a drop in replacement. In a later commit it will be changed from a map of ints to use the new cpu_aggr_id struct. No functional changes. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf stat: Replace aggregation ID with a structJames Clark1-5/+5
Replace all occurences of the usage of int with the new struct cpu_aggr_id. No functional changes. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-12-24perf cpumap: Add new struct for cpu aggregationJames Clark1-0/+8
This struct currently has only a single int member so that it can be used as a drop in replacement for the existing behaviour. Comparison and constructor functions have also been added that will replace usages of '==' and '= -1'. No functional changes. Signed-off-by: James Clark <[email protected]> Acked-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: John Garry <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-29perf evsel: Add iterator to iterate over events ordered by CPUAndi Kleen1-0/+1
Add some common code that is needed to iterate over all events in CPU order. Used in followon patches Signed-off-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06perf stat: Add --per-node agregation supportJiri Olsa1-0/+3
Adding new --per-node option to aggregate counts per NUMA nodes for system-wide mode measurements. You can specify --per-node in live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles 3.003625233 N0 20 6,604,441 cycles 3.003625233 N1 20 1,043,428 cycles 4.005135036 N0 20 6,350,522 cycles 4.005135036 N1 20 681,564 cycles 4.340902364 N0 20 3,403,188 cycles 4.340902364 N1 20 520,705 cycles Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-29libperf: Rename the PERF_RECORD_ structs to have a "perf" prefixJiri Olsa1-2/+2
Even more, to have a "perf_record_" prefix, so that they match the PERF_RECORD_ enum they map to. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-26perf cpumap: No need to include perf.h, ditch itArnaldo Carvalho de Melo1-2/+0
From a quick look this was never needed and just polluted the build, needlessly making things including cpumap.h to be rebuild if perf.h or anything it includes gets changed. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-22perf cpumap: Remove needless includes from cpumap.hArnaldo Carvalho de Melo1-2/+2
The util/cpumap.h file doesn't use anything in refcount.h not in debug.h, it needs just a forward reference to 'struct cpu_map_data', that is defined in util/event.h and cpumap.h was getting indirectly via, of all things, debug.h Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-22libperf: Move perf's cpu_map__idx() to perf_cpu_map__idx()Jiri Olsa1-1/+0
As an internal function that will be used by both perf and libperf, but is not exported at this point. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-22libperf: Move perf's cpu_map__empty() to perf_cpu_map__empty()Jiri Olsa1-6/+1
So it's part of the libperf library as one of basic functions operating on the perf_cpu_map class. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-08-22perf tools: Use perf_cpu_map__nr instead of cpu_map__nrJiri Olsa1-5/+0
Switch the rest of the perf code to use libperf's perf_cpu_map__nr(), which is the same as current cpu_map__nr() and remove the cpu_map__nr() function. Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>