aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2023-11-27perf tests coresight: Remove unused variableszhujun23-3/+0
These variables are never referenced in the code, just remove them. Reviewed-by: James Clark <[email protected]> Signed-off-by: zhujun2 <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf lock: Fix a memory leak on an error pathzhaimingbing1-1/+3
if a strdup-ed string is NULL,the allocated memory needs freeing. Signed-off-by: zhaimingbing <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf parse-events: Make legacy events lower priority than sysfs/JSONIan Rogers4-88/+231
The perf tool has previously made legacy events the priority so with or without a PMU the legacy event would be opened: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempting to add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors After aliases, add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 833967 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... Fixes to make hybrid/BIG.little PMUs behave correctly, ie as core PMUs capable of opening legacy events on each, removing hard coded "cpu_core" and "cpu_atom" Intel PMU names, etc. caused a behavioral difference on Apple/ARM due to latent issues in the PMU driver reported in: https://lore.kernel.org/lkml/[email protected]/ As part of that report Mark Rutland <[email protected]> requested that legacy events not be higher in priority when a PMU is specified reversing what has until this change been perf's default behavior. With this change the above becomes: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/cpu-cycles=0/ ..after resolving event: cpu/event=0x3c/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 827628 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... So the second event has become a raw event as /sys/devices/cpu/events/cpu-cycles exists. A fix was necessary to config_term_pmu in parse-events.c as check_alias expansion needs to happen after config_term_pmu, and config_term_pmu may need calling a second time because of this. config_term_pmu is updated to not use the legacy event when the PMU has such a named event (either from JSON or sysfs). The bulk of this change is updating all of the parse-events test expectations so that if a sysfs/JSON event exists for a PMU the test doesn't fail - a further sign, if it were needed, that the legacy event priority was a known and tested behavior of the perf tool. Reported-by: Hector Martin <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Hector Martin <[email protected]> Tested-by: Marc Zyngier <[email protected]> Acked-by: Mark Rutland <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Initialize the 'alias_rewrote_terms' variable to false to address a clang warning ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf cs-etm: Enable itrace option 'T'Leo Yan1-3/+18
Prior to Armv8.4, the feature FEAT_TRF is not supported by Arm CPUs. Consequently, the sysfs node 'ts_source' will not be set as 1 by the CoreSight ETM driver. On the other hand, the perf tool relies on the 'ts_source' node to determine whether the kernel timestamp is traced. Since the 'ts_source' is not set for Arm CPUs prior to Armv8.4, platforms in this case cannot utilize the traced timestamp as the kernel time. This patch enables the 'T' itrace option, which forcibly utilizes the traced timestamp as the kernel time. If users are aware that their working platform's Arm CoreSight shares the same counter with the kernel time, they can specify 'T' option to decode the traced timestamp as the kernel time. An usage example is: # perf record -e cs_etm// -- test_program # perf script --itrace=i10ibT # perf report --itrace=i10ibT Reviewed-by: James Clark <[email protected]> Signed-off-by: Leo Yan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf auxtrace: Add 'T' itrace option for timestamp traceLeo Yan3-0/+7
An AUX trace can contain timestamp, but in some situations, the hardware trace module (e.g. Arm CoreSight) cannot decide the traced timestamp is the same source with CPU's time, thus the decoder can not use the timestamp trace for samples. This patch introduces 'T' itrace option. If users know the platforms they are working on have the same time counter with CPUs, users can use this new option to tell a decoder for using timestamp trace as kernel time. Signed-off-by: Leo Yan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf test: Basic branch counter supportKan Liang1-0/+30
Add a basic test for the branch counter feature. The test verifies that - The new filter can be successfully applied on the supported platforms. - The counter value can be outputted via the perf report -D Signed-off-by: Kan Liang <[email protected]> Tested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tinghao Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf script perl: Fail check on dynamic allocationzhaimingbing1-0/+3
Return ENOMEM when dynamic allocation failed. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: zhaimingbing <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf script python: Fail check on dynamic allocationParan Lee1-2/+15
Add PyList_New() Fail check in get_field_numeric_entry() function and dynamic allocation checking for set_regs_in_dict(), python_start_script(). Reviewed-by: Adrian Hunter <[email protected]> Reviewed-by: MichelleJin <[email protected]> Signed-off-by: Paran Lee <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Austin Kim <[email protected]> Cc: Honggyu Kim <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-27perf test: Remove atomics from test_loop to avoid test failuresNick Forrington1-3/+1
The current use of atomics can lead to test failures, as tests (such as tests/shell/record.sh) search for samples with "test_loop" as the top-most stack frame, but find frames related to the atomic operation (e.g. __aarch64_ldadd4_relax). This change simply removes the "count" variable, as it is not necessary. Fixes: 1962ab6f6e0b39e4 ("perf test workload thloop: Make count increments atomic") Reviewed-by: James Clark <[email protected]> Signed-off-by: Nick Forrington <[email protected]> Acked-by: Leo Yan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-23perf tools: Address python 3.6 DeprecationWarning for string scapesBenjamin Gray4-6/+6
Python 3.6 introduced a DeprecationWarning for invalid escape sequences. This is upgraded to a SyntaxWarning in Python 3.12, and will eventually be a syntax error. Fix these now to get ahead of it before it's an error. Signed-off-by: Benjamin Gray <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Hartley Sweeten <[email protected]> Cc: Ian Abbott <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mykola Lysenko <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Todd E Brandt <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-22perf build: Ensure sysreg-defs Makefile respects output dirOliver Upton2-10/+16
Currently the sysreg-defs are written out to the source tree unconditionally, ignoring the specified output directory. Correct the build rule to emit the header to the output directory. Opportunistically reorganize the rules to avoid interleaving with the set of beauty make rules. Reported-by: Ian Rogers <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-11-22tools perf: Add arm64 sysreg files to MANIFESTOliver Upton1-0/+2
Ian pointed out that source tarballs are incomplete as of commit e2bdd172e665 ("perf build: Generate arm64's sysreg-defs.h and add to include path"), since the source files needed from the kernel tree do not appear in the manifest. Add them. Reported-by: Ian Rogers <[email protected]> Fixes: e2bdd172e665 ("perf build: Generate arm64's sysreg-defs.h and add to include path") Signed-off-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-11-22tools/perf: Update tools's copy of mips syscall tableNamhyung Kim1-0/+4
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Thomas Bogendoerfer <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-22tools/perf: Update tools's copy of s390 syscall tableNamhyung Kim1-0/+4
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-22tools/perf: Update tools's copy of powerpc syscall tableNamhyung Kim1-0/+4
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-22tools/perf: Update tools's copy of x86 syscall tableNamhyung Kim1-0/+3
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-22tools headers: Update tools's copy of socket.h headerNamhyung Kim1-0/+1
tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/*.sh | head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char *fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-21perf lock contention: Fix a build error on 32-bitYang Jihong1-1/+2
Fix a build error on 32-bit system: util/bpf_lock_contention.c: In function 'lock_contention_get_name': util/bpf_lock_contention.c:253:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u64 {aka long long unsigned int}' [-Werror=format=] snprintf(name_buf, sizeof(name_buf), "cgroup:%lu", cgrp_id); ~~^ %llu cc1: all warnings being treated as errors Fixes: d0c502e46e97 ("perf lock contention: Prepare to handle cgroups") Signed-off-by: Yang Jihong <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-11-21perf kwork: Fix a build error on 32-bitYang Jihong1-1/+1
lkft reported a build error for 32-bit system: builtin-kwork.c: In function 'top_print_work': builtin-kwork.c:1646:28: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] 1646 | ret += printf(" %*ld ", PRINT_PID_WIDTH, work->id); | ~~~^ ~~~~~~~~ | | | | long int u64 {aka long long unsigned int} | %*lld cc1: all warnings being treated as errors make[3]: *** [/builds/linux/tools/build/Makefile.build:106: /home/tuxbuild/.cache/tuxmake/builds/1/build/builtin-kwork.o] Error 1 Fix it. Fixes: 55c40e505234 ("perf kwork top: Introduce new top utility") Reported-by: Linux Kernel Functional Testing <[email protected]> Signed-off-by: Yang Jihong <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
2023-11-15perf vendor events riscv: Add StarFive Dubhe-80 JSON fileJi Sheng Teoh3-0/+241
StarFive's Dubhe-80 supports raw event id 0x00 - 0x22. The raw events are enabled through PMU node of DT binding. Besides raw event, add standard RISC-V firmware events to support monitoring of firmware event. Example of PMU DT node: pmu { compatible = "riscv,pmu"; riscv,raw-event-to-mhpmcounters = /* Event ID 1-31 */ <0x00 0x00 0xFFFFFFFF 0xFFFFFFE0 0x00007FF8>, /* Event ID 32-33 */ <0x00 0x20 0xFFFFFFFF 0xFFFFFFFE 0x00007FF8>, /* Event ID 34 */ <0x00 0x22 0xFFFFFFFF 0xFFFFFF22 0x00007FF8>; }; Example of 'perf stat' output: [root@user]# perf stat -a \ -e access_mmu_stlb \ -e miss_mmu_stlb \ -e access_mmu_pte_c \ -e rob_flush \ -e btb_prediction_miss \ -e itlb_miss \ -e sync_del_fetch_g \ -e icache_miss \ -e bpu_br_retire \ -e bpu_br_miss \ -e ret_ins_retire \ -e ret_ins_miss \ -- openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 39 2048 bits private RSA's in 10.14s Doing 2048 bits public rsa's for 10s: 1563 2048 bits public RSA's in 10.00s version: 3.0.11 built on: Tue Sep 19 13:02:31 2023 UTC options: bn(64,64) CPUINFO: N/A sign verify sign/s verify/s rsa 2048 bits 0.260000s 0.006398s 3.8 156.3 Performance counter stats for 'system wide': 1338350 access_mmu_stlb 1154025 miss_mmu_stlb 1162691 access_mmu_pte_c 34067 rob_flush 11212384 btb_prediction_miss 1256242 itlb_miss 652523491 sync_del_fetch_g 384465 icache_miss 64635789 bpu_br_retire 323440 bpu_br_miss 8785143 ret_ins_retire 31236 ret_ins_miss 20.760822480 seconds time elapsed Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Ji Sheng Teoh <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nikita Shubin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-15perf report: Add s390 raw data interpretation for PAI countersThomas Richter2-8/+103
Commit 39d62336f5c126ad ("s390/pai: add support for cryptography counters") added support for Processor Activity Instrumentation Facility (PAI) counters. These counters values are added as raw data with the perf sample during 'perf record'. Now add support to display these counters in the 'perf report' command. The counter number, its assigned name and value is now printed in addition to the hexadecimal output. Output before: # perf report -D 6 514766399626050 0x7b058 [0x48]: PERF_RECORD_SAMPLE(IP, 0x1): 303977/303977: 0 period: 1 addr: 0 ... thread: paitest:303977 ...... dso: <not found> 0x7b0a0@/root/perf.data.paicrypto [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 00 00 00 09 00 01 00 48 00 00 00 00 00 00 00 00 .......H........ . 0010: 00 04 a3 69 00 04 a3 69 00 01 d4 2d 76 de a0 bb ...i...i...-v... . 0020: 00 00 00 00 00 01 5c 53 00 00 00 06 00 00 00 00 ......\S........ . 0030: 00 00 00 00 00 00 00 01 00 00 00 0c 00 07 00 00 ................ . 0040: 00 00 00 53 96 af 00 00 ...S.... Output after: # perf report -D 6 514766399626050 0x7b058 [0x48]: PERF_RECORD_SAMPLE(IP, 0x1): 303977/303977: 0 period: 1 addr: 0 ... thread: paitest:303977 ...... dso: <not found> 0x7b0a0@/root/perf.data.paicrypto [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 00 00 00 09 00 01 00 48 00 00 00 00 00 00 00 00 .......H........ . 0010: 00 04 a3 69 00 04 a3 69 00 01 d4 2d 76 de a0 bb ...i...i...-v... . 0020: 00 00 00 00 00 01 5c 53 00 00 00 06 00 00 00 00 ......\S........ . 0030: 00 00 00 00 00 00 00 01 00 00 00 0c 00 07 00 00 ................ . 0040: 00 00 00 53 96 af 00 00 ...S.... Counter:007 km_aes_128 Value:0x00000000005396af <--- new Committer notes: Had to add ignore pragmas for that __packed function: +#pragma GCC diagnostic ignored "-Wpacked" +#pragma GCC diagnostic ignored "-Wattributes" Otherwise this doesn't build in things like debian experimentao cross building to mips64, etc. Signed-off-by: Thomas Richter <[email protected]> Tested-by: Sumanth Korikkar <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Corrected non-existent commit referred to the right one: 39d62336f5c126ad ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-12LSM: wireup Linux Security Module syscallsCasey Schaufler4-0/+12
Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules system calls. Signed-off-by: Casey Schaufler <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: [email protected] Reviewed-by: Mickaël Salaün <[email protected]> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <[email protected]>
2023-11-10perf probe: Convert to check dwarf_getcfi featureNamhyung Kim2-4/+9
Now it has a feature check for the dwarf_getcfi(), use it and convert the code to check HAVE_DWARF_CFI_SUPPORT definition. Suggested-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf dwarf-aux: Add die_find_variable_by_reg() helperNamhyung Kim2-0/+79
The die_find_variable_by_reg() will search for a variable or a parameter sub-DIE in the given scope DIE where the location matches to the given register. For the simplest and most common case, memory access usually happens with a base register and an offset to the field so the register holds a pointer in a variable or function parameter. Then we can find one if it has a location expression at the (instruction) address. This function only handles such a simple case for now. In this case, the expression has a DW_OP_regN operation where N < 32. If the register index (N) is greater than or equal to 32, DW_OP_regx operation with an operand which saves the value for the N would be used. It rejects expressions with more operations. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf dwarf-aux: Add die_get_scopes() alternative to dwarf_getscopes()Namhyung Kim2-0/+56
The die_get_scopes() returns the number of enclosing DIEs for the given address and it fills an array of DIEs like dwarf_getscopes(). But it doesn't follow the abstract origin of inlined functions as we want information of the concrete instance. This is needed to check the location of parameters and local variables properly. Users can check the origin separately if needed. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf dwarf-aux: Move #else block of #ifdef HAVE_DWARF_GETLOCATIONS_SUPPORT ↵Namhyung Kim2-9/+17
code to the header file It's a usual convention that the conditional code is handled in a header file. As I'm planning to add some more of them, let's move the current code to the header first. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf dwarf-aux: Fix die_get_typename() for void *Namhyung Kim1-1/+8
The die_get_typename() is to return a C-like type name from DWARF debug entry and it follows data type if the target entry is a pointer type. But I found that void pointers don't have the type attribute to follow and then the function returns an error for that case. This results in a broken type string for void pointer types. For example, the following type entries are pointer types. <1><48c>: Abbrev Number: 4 (DW_TAG_pointer_type) <48d> DW_AT_byte_size : 8 <48d> DW_AT_type : <0x481> <1><491>: Abbrev Number: 211 (DW_TAG_pointer_type) <493> DW_AT_byte_size : 8 <1><494>: Abbrev Number: 4 (DW_TAG_pointer_type) <495> DW_AT_byte_size : 8 <495> DW_AT_type : <0x49e> The first one at offset 48c and the third one at offset 494 have type information. Then they are pointer types for the referenced types. But the second one at offset 491 doesn't have the type attribute. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf tools: Add util/debuginfo.[ch] filesNamhyung Kim5-210/+272
Split debuginfo data structure and related functions into a separate file so that it can be used by other components than the probe-finder. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf annotate: Move raw_comment and raw_func_start fields out of 'struct ↵Namhyung Kim3-9/+20
ins_operands' Thoese two fields are used only for the jump_ops, so move them into the union to save some bytes. Also add jump__delete() callback not to free the fields as they didn't allocate new strings. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: WANG Rui <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf annotate: Pass "-l" option to objdump conditionallyNamhyung Kim1-1/+2
The "-l" option is to print line numbers in the objdump output. perf annotate TUI only can show the line numbers later but it causes big slow downs for the kernel binary. Similarly, showing source code also takes a long time and it already has an option to control it. $ time objdump ... -d -S -C vmlinux > /dev/null real 0m3.474s user 0m3.047s sys 0m0.428s $ time objdump ... -d -l -C vmlinux > /dev/null real 0m1.796s user 0m1.459s sys 0m0.338s $ time objdump ... -d -C vmlinux > /dev/null real 0m0.051s user 0m0.036s sys 0m0.016s As it's not needed for data type profiling, let's make it conditional so that it can skip the unnecessary work. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-10perf header: Additional note on AMD IBS for max_precise pmu capArnaldo Carvalho de Melo4-5/+35
x86 core PMU exposes supported maximum precision level via max_precise PMU capability. Although, AMD core PMU does not support precise mode, certain core PMU events with precise_ip > 0 are allowed and forwarded to IBS OP PMU. Display a note about this in the 'perf report' header output and document the details in the perf-list man page. Signed-off-by: Ravi Bangoria <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Changbin Du <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf bpf: Don't synthesize BPF events when disabledIan Rogers1-0/+3
If BPF sideband events are disabled on the command line, don't synthesize BPF events too. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: German Gomez <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Vincent Whitchurch <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf test: Add support for setting objdump binary via perf configJames Clark2-0/+16
Add a 'perf config' variable that does the same thing as "perf test --objdump <x>". Also update the man page. Committer testing: # perf config test.objdump # perf test "object code reading" 26: Object code reading : Ok # perf config test.objdump=blah # perf config test.objdump test.objdump=blah # perf test "object code reading" 26: Object code reading : FAILED! # perf test -v "object code reading" 26: Object code reading : --- start --- test child forked, pid 600599 Looking at the vmlinux_path (8 entries long) Using /proc/kcore for kernel data Using /proc/kallsyms for symbols Parsing event 'cycles' Using CPUID AuthenticAMD-25-21-0 mmap size 528384B Reading object code for memory address: 0x4d9a02 File is: /home/acme/bin/perf On file address is: 0xd9a02 Objdump command is: blah -z -d --start-address=0x4d9a02 --stop-address=0x4d9a82 /home/acme/bin/perf objdump read too few bytes: 128 Bytes read differ from those read by objdump buf1 (dso): 0x48 0x85 0xff 0x74 0x29 0xe8 0x94 0xdf 0x07 0x00 0x8b 0x73 0x1c 0x48 0x8b 0x43 0x08 0xeb 0xa5 0x0f 0x1f 0x00 0x48 0x8b 0x45 0xe8 0x64 0x48 0x2b 0x04 0x25 0x28 0x00 0x00 0x00 0x75 0x0f 0x48 0x8b 0x5d 0xf8 0xc9 0xc3 0x0f 0x1f 0x00 0x48 0x8b 0x43 0x08 0xeb 0x84 0xe8 0xc5 0x3e 0xf3 0xff 0x0f 0x1f 0x44 0x00 0x00 0x55 0x48 0x89 0xe5 0x41 0x56 0x41 0x55 0x49 0x89 0xd5 0x41 0x54 0x49 0x89 0xfc 0x53 0x48 0x89 0xf3 0x48 0x83 0xec 0x30 0x48 0x8b 0x7e 0x20 0x64 0x48 0x8b 0x04 0x25 0x28 0x00 0x00 0x00 0x48 0x89 0x45 0xd8 0x31 0xc0 0x48 0x89 0x75 0xb0 0x48 0xc7 0x45 0xb8 0x00 0x00 0x00 0x00 0x48 0xc7 0x45 0xc0 0x00 0x00 0x00 0x00 0xe8 0xad 0xfa buf2 (objdump): 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 test child finished with -1 ---- end ---- Object code reading: FAILED! # perf config test.objdump=/usr/bin/objdump # perf config test.objdump test.objdump=/usr/bin/objdump # perf test "object code reading" 26: Object code reading : Ok # Signed-off-by: James Clark <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Tom Rix <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf test: Add option to change objdump binaryJames Clark3-1/+5
All of the other Perf subcommands that use objdump have an option to specify the binary, so add the same option to 'perf test'. This is useful if you have built the kernel with a different toolchain to the system one, where the system objdump may fail to disassemble vmlinux. Now this can be fixed with something like this: $ perf test --objdump llvm-objdump "object code reading" Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Tom Rix <[email protected]> Cc: Yang Jihong <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf tests offcpu: Adjust test case perf record offcpu profiling tests for s390Thomas Richter1-2/+2
On s390 using linux-next the test case: 87: perf record offcpu profiling tests fails. The root cause is this command # ./perf record --off-cpu -e dummy -- ./perf bench sched messaging -l 10 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.231 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.077 MB perf.data (401 samples) ] # It does not generate 800+ sample entries, on s390 usually around 40[1-9], sometimes a few more, but never more than 450. The higher the number of CPUs the lower the number of samples. Looking at function chain: bench_sched_messaging() +--> group() the senders and receiver threads are created. The senders and receivers call function ready() which writes one bytes and wait for a reply using poll system() call. As context switches are counted, the function ready() will trigger a context switch when no input data is available after the write system call. The write system call does not trigger context switches when the data size is small. And writing 1000 bytes (10 iterations with 100 bytes) is not much and certainly won't block. The 400+ context switch on s390 occur when the some receiver/sender threads call ready() and wait for the response from function bench_sched_messaging() being kicked off. Lower the number of expected context switches to 400 to succeed on s390. Suggested-by: Namhyung Kim <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Thomas Richter <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Co-developed-by: Ilya Leoshkevich <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf tools: Add the python_ext_build directory to .gitignoreYang Jihong1-0/+1
`python_ext_build` is the build directory for python.so, ignore it for cleaner git status. Signed-off-by: Yang Jihong <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf tests attr: Fix spelling mistake "whic" to "which"zhaimingbing1-1/+1
There is a spelling mistake, Please fix it. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: zhaimingbing <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Move offsets array from 'struct annotation' to 'struct ↵Namhyung Kim3-13/+13
annotated_source' The offsets array keeps pointers to 'struct annotation_line' entries which are available in the 'struct annotated_source'. Let's move it to there. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Move some source code related fields from 'struct annotation' ↵Namhyung Kim3-21/+22
to 'struct annotated_source' Some fields in the 'struct annotation' are only used with 'struct annotated_source' so better to be moved there in order to reduce memory consumption for other symbols. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Move max_coverage from 'struct annotation' to 'struct ↵Namhyung Kim4-5/+15
annotated_branch' The max_coverage field is only used when branch stack info is available so it'd be natural to move to 'struct annotated_branch'. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Split branch stack cycles info from 'struct annotation'Namhyung Kim4-61/+73
The cycles info is only meaningful when sample has branch stacks. To save the memory for normal cases, move those fields to a new 'struct annotated_branch' and dynamically allocate it when needed. Also move cycles_hist from annotated_source as it's related here. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf annotate: Split branch stack cycles information out of 'struct ↵Namhyung Kim3-24/+54
annotation_line' The cycles info is used only when branch stack is provided. Separate them from 'struct annotation_line' into a separate struct and lazy allocate them to save some memory. Committer notes: Make annotation__compute_ipc() check if the lazy allocation works, bailing out if so, its callers already do error checking and propagation. Signed-off-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf test: Simplify "object code reading" testNamhyung Kim1-53/+23
It tries cycles (or cpu-clock on s390) event with exclude_kernel bit to open. But other arch on a VM can fail with the hardware event and need to fallback to the software event in the same way. So let's get rid of the cpuid check and use generic fallback mechanism using an array of event candidates. Now event in the odd index excludes the kernel so use that for the return value. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: James Clark <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf machine thread: Remove exited threads by defaultIan Rogers5-4/+35
'struct thread' values hold onto references to mmaps, DSOs, etc. When a thread exits it is necessary to clean all of this memory up by removing the thread from the machine's threads. Some tools require this doesn't happen, such as auxtrace events, 'perf report' if offcpu events exist or if a task list is being generated, so add a 'struct symbol_conf' member to make the behavior optional. When an exited thread is left in the machine's threads, mark it as exited. This change relates to commit 40826c45eb0b8856 ("perf thread: Remove notion of dead threads") . Dead threads were removed as they had a reference count of 0 and were difficult to reason about with the reference count checker. Here a thread is removed from threads when it exits, unless via symbol_conf the exited thread isn't remove and is marked as exited. Reference counting behaves as it normally does. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: German Gomez <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Ming Wang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Vincent Whitchurch <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf record: Lazy load kernel symbolsIan Rogers4-3/+12
Commit 5b7ba82a75915e73 ("perf symbols: Load kernel maps before using") changed it so that loading a kernel DSO would cause the symbols for the DSO to be eagerly loaded. For 'perf record' this is overhead as the symbols won't be used. Add a field to 'struct symbol_conf' to control the behavior and disable it for 'perf record' and 'perf inject'. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Changbin Du <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dmitrii Dolgov <[email protected]> Cc: German Gomez <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Dong <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Ming Wang <[email protected]> Cc: Nick Terrell <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Vincent Whitchurch <[email protected]> Cc: Wenyu Liu <[email protected]> Cc: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf tools: Fix spelling mistake "parametrized" -> "parameterized"Colin Ian King2-3/+3
There are spelling mistakes in comments and a pr_debug message. Fix them. Reviewed-by: Athira Jajeev <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-09perf tools: Add branch counter knobKan Liang7-3/+55
Add a new branch filter, "counter", for the branch counter option. It is used to mark the events which should be logged in the branch. If it is applied with the -j option, the counters of all the events should be logged in the branch. If the legacy kernel doesn't support the new branch sample type, switching off the branch counter filter. The stored counter values in each branch are displayed right after the regular branch stack information via perf report -D. Usage examples: # perf record -e "{branch-instructions,branch-misses}:S" -j any,counter Only the first event, branch-instructions, collect the LBR. Both branch-instructions and branch-misses are marked as logged events. The occurrences information of them can be found in the branch stack extension space of each branch. # perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}" Only the first event, branch-instructions, collect the LBR. Only the branch-misses event is marked as a logged event. Committer notes: I noticed 'perf test "Sample parsing"' failing, reported to the list and Kan provided a patch that checks if the evsel has a leader and that evsel->evlist is set, the comment in the source code further explains it. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tinghao Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-06perf header: Support num and width of branch countersKan Liang2-3/+20
To support the branch counters feature, the information of the maximum number of supported counters and the width of the counters is exposed in the sysfs caps folder. The perf tool can use the information to parse the logged counters in each branch. Store the information in the perf_env for later usage. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tinghao Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-11-03Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of ↵Linus Torvalds242-1801/+31495
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Build: - Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. This can be disabled by passing BUILD_BPF_SKEL=0 to make. - Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record: - Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. - Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention: - Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service - Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a - Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. - Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork: - Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. - Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> - Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP> perf bench: - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. - Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test: - Fix various shellcheck issues on the tests written in shell script. - Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing: - Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. - Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. - Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics: - Add "Compat" regex to match event with multiple identifiers. - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc: - Assorted memory leak fixes and footprint reduction. - Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. - Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. - Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. - Update bash shell completion for events and metrics" * tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ...
2023-11-03perf build: Warn about missing libelf before warning about missing libbpfArnaldo Carvalho de Melo1-4/+4
As libelf is a requirement for libbpf if it is not available, as in some container build tests where NO_LIBELF=1 is used, then better warn about the most basic library first. Ditto for libz, check its availability before libbpf too. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>