aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2024-03-21perf intel-pt/intel-bts: Switch perf_cpu_map__has_any_cpu_or_is_empty useIan Rogers2-7/+7
Switch perf_cpu_map__has_any_cpu_or_is_empty() to perf_cpu_map__is_any_cpu_or_is_empty() as a CPU map may contain CPUs as well as the dummy event and perf_cpu_map__is_any_cpu_or_is_empty() is a more correct alternative. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Andrew Jones <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Atish Patra <[email protected]> Cc: Changbin Du <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yang Li <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf arm-spe/cs-etm: Directly iterate CPU mapsIan Rogers2-67/+51
Rather than iterate all CPUs and see if they are in CPU maps, directly iterate the CPU map. Similarly make use of the intersect function taking care for when "any" CPU is specified. Switch perf_cpu_map__has_any_cpu_or_is_empty() to more appropriate alternatives. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Andrew Jones <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Atish Patra <[email protected]> Cc: Changbin Du <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yang Li <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf build: Fix out of tree build related to installation of sysreg-defsEthan Adams1-3/+4
It seems that a previous modification to sysreg-defs, which corrected emitting the header to the specified output directory, exposed missing subdir, prefix variables. This breaks out of tree builds of perf as the file is now built into the output directory, but still tries to descend into output directory as a subdir. Fixes: a29ee6aea7030786 ("perf build: Ensure sysreg-defs Makefile respects output dir") Reviewed-by: Oliver Upton <[email protected]> Reviewed-by: Tycho Andersen <[email protected]> Signed-off-by: Ethan Adams <[email protected]> Tested-by: Tycho Andersen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf auxtrace: Fix multiple use of --itrace optionAdrian Hunter1-1/+3
If the --itrace option is used more than once, the options are combined, but "i" and "y" (sub-)options can be corrupted because itrace_do_parse_synth_opts() incorrectly overwrites the period type and period with default values. For example, with: --itrace=i0ns --itrace=e The processing of "--itrace=e", resets the "i" period from 0 nanoseconds to the default 100 microseconds. Fix by performing the default setting of period type and period only if "i" or "y" are present in the currently processed --itrace value. Fixes: f6986c95af84ff2a ("perf session: Add instruction tracing options") Signed-off-by: Adrian Hunter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf script: Show also errors for --insn-trace optionAdrian Hunter1-1/+1
The trace could be misleading if trace errors are not taken into account, so display them also by adding the itrace "e" option. Note --call-trace and --call-ret-trace already add the itrace "e" option. Fixes: b585ebdb5912cf14 ("perf script: Add --insn-trace for instruction decoding") Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf docs arm_spe: Clarify more SPE requirements related to KPTIJames Clark1-1/+11
The question of exactly when KPTI needs to be disabled comes up a lot because it doesn't always need to be done. Add the relevant kernel function and some examples that describe the behavior. Also describe the interrupt requirement and that no error message will be printed if this isn't met. Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21tools headers: Remove almost unused copy of uapi/stat.h, add few conditional ↵Arnaldo Carvalho de Melo2-2/+11
defines These were used to build perf to provide defines not available in older distros, but this was back in 2017, nowadays most the distros that are supported and I have build containers for work using just the system headers, so ditch them. For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN add them conditionally. Some of these older distros may not have things that are used in 'perf trace', but then they also don't have libtraceevent packages, so don't build 'perf trace'. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and ↵Arnaldo Carvalho de Melo1-2/+0
asm/fcntl.h These were used to build perf to provide defines not available in older distros, but this was back in 2017, nowadays all the distros that are supported and I have build containers for work using just the system headers, so ditch them. Some of these older distros may not have things that are used in 'perf trace', but then they also don't have libtraceevent packages, so don't build 'perf trace'. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Use the system linux/fcntl.h instead of a copy from the kernelArnaldo Carvalho de Melo3-3/+3
Builds ok all the way back to these older distros: 1 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) , clang version 16.0.6 (Red Hat 16.0.6-2.module_el8.9.0+3621+df7f7146) flex 2.6.1 3 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1 flex 2.6.4 15 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0 , Debian clang version 11.0.1-2~deb10u1 flex 2.6.4 32 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 15.0.7 flex 2.6.4 23 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-3) , clang version 13.0.1 (Fedora 13.0.1-1.fc35) flex 2.6.4 38 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 flex 2.6.4 Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the ↵Arnaldo Carvalho de Melo6-13/+364
directory used to build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Stop using the copy of uapi/linux/prctl.hArnaldo Carvalho de Melo1-1/+1
Use the system one, nothing used in that file isn't available in the supported, active distros. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> To: Ian Rogers <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move arch/x86/include/asm/irq_vectors.h copy out of the ↵Arnaldo Carvalho de Melo4-6/+150
directory used to build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it. Suggested-by: Ian Rogers <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move uapi/sound/asound.h copy out of the directory used to ↵Arnaldo Carvalho de Melo5-9/+1262
build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. Suggested-by: Ian Rogers <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used ↵Arnaldo Carvalho de Melo4-6/+237
to build perf It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move uapi/linux/mount.h copy out of the directory used to build ↵Arnaldo Carvalho de Melo8-27/+237
perf It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/mount.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Don't include uapi/linux/mount.h, use sys/mount.h insteadArnaldo Carvalho de Melo1-1/+8
The tools/include/uapi/linux/mount.h file is mostly used for scrapping defines into id->string tables, this is the only place were it is being directly used, stop doing so. Define MOUNT_ATTR_RELATIME and MOUNT_ATTR__ATIME if not available in the system's headers. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perfArnaldo Carvalho de Melo6-8/+386
It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. The only case where it was being used to build was in tools/perf/trace/beauty/sync_file_range.c, because some older systems doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the system's linux/fs.h header instead, defining it if not available. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf beauty: Fix dependency of tables using uapi/linux/mount.hArnaldo Carvalho de Melo1-5/+5
Several such tables were depending on uapi/linux/fs.h, cut and paste error when they were introduced, fix it. Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/Ze9vjxv42PN_QGZv@x1 Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf c2c: Fix a punctuationBhaskar Chowdhury1-1/+1
s/dont/don\'t/ Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-21perf trace: Collect sys_nanosleep first argumentArnaldo Carvalho de Melo2-0/+23
That is a 'struct timespec' passed from userspace to the kernel as we can see with a system wide syscall tracing: root@number:~# perf trace -e nanosleep 0.000 (10.102 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 38.924 (10.077 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 100.177 (10.107 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 139.171 (10.063 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 200.603 (10.105 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 239.399 (10.064 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 300.994 (10.096 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 339.584 (10.067 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 401.335 (10.057 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 439.758 (10.166 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 501.814 (10.110 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 539.983 (10.227 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 602.284 (10.199 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 640.208 (10.105 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 702.662 (10.163 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 740.440 (10.107 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 802.993 (10.159 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 ^Croot@number:~# strace -p 9150 -e nanosleep If we then use the ptrace method to look at that podman process: root@number:~# strace -p 9150 -e nanosleep strace: Process 9150 attached nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 ^Cstrace: Process 9150 detached root@number:~# With some changes we can get something closer to the strace output, still in system wide mode: root@number:~# perf config trace.show_arg_names=false root@number:~# perf config trace.show_duration=false root@number:~# perf config trace.show_timestamp=false root@number:~# perf config trace.show_zeros=true root@number:~# perf config trace.args_alignment=0 root@number:~# perf trace -e nanosleep --max-events=10 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 root@number:~# root@number:~# perf config trace.show_arg_names=false trace.show_duration=false trace.show_timestamp=false trace.show_zeros=true trace.args_alignment=0 root@number:~# cat ~/.perfconfig # this file is auto-generated. [trace] show_arg_names = false show_duration = false show_timestamp = false show_zeros = true args_alignment = 0 root@number:~# This will not get reused by any other syscall as nanosleep is the only one to have as its first argument a 'struct timespec" pointer argument passed from userspace to the kernel: root@number:~# grep timespec /sys/kernel/tracing/events/syscalls/sys_enter_*/format | grep offset:16 /sys/kernel/tracing/events/syscalls/sys_enter_nanosleep/format: field:struct __kernel_timespec * rqtp; offset:16; size:8; signed:0; root@number:~# BTF based pretty printing will simplify all this, but then lets just get the low hanging fruits first. Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2024-03-14net: remove {revc,send}msg_copy_msghdr() from exportsJens Axboe1-7/+0
The only user of these was io_uring, and it's not using them anymore. Make them static and remove them from the socket header file. Signed-off-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-12riscv: andes: Support specifying symbolic firmware and hardware raw eventsLocus Wei-Han Chen5-0/+330
Add the Andes AX45 JSON files that allows specifying symbolic event names for the raw PMU events. Signed-off-by: Locus Wei-Han Chen <[email protected]> Reviewed-by: Yu Chien Peter Lin <[email protected]> Reviewed-by: Charles Ci-Jyun Wu <[email protected]> Reviewed-by: Leo Yu-Chi Liang <[email protected]> Tested-by: Lad Prabhakar <[email protected]> Acked-by: Atish Patra <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-03-06perf annotate: Add comments in the data structuresNamhyung Kim1-7/+62
Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-06perf annotate: Remove sym_hist.addr[] arrayNamhyung Kim2-34/+6
It's not used anymore and the code is coverted to use a hash map. Now sym_hist has a static size, so no need to have sizeof_sym_hist in the struct annotated_source. Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-06perf annotate: Calculate instruction overhead using hashmapNamhyung Kim3-17/+52
Use annotated_source.samples hashmap instead of addr array in the struct sym_hist. Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-06perf annotate: Add a hashmap for symbol histogramNamhyung Kim2-2/+42
Now symbol histogram uses an array to save per-offset sample counts. But it wastes a lot of memory if the symbol has a few samples only. Add a hashmap to save values only for actual samples. For now, it has duplicate histogram (one in the existing array and another in the new hash map). Once it can convert to use the hash in all places, we can get rid of the array later. Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf threads: Reduce table size from 256 to 8Ian Rogers1-1/+1
The threads data structure is an array of hashmaps, previously rbtrees. The two levels allows for a fixed outer array where access is guarded by rw_semaphores. Commit 91e467bc568f ("perf machine: Use hashtable for machine threads") sized the outer table at 256 entries to avoid future scalability problems, however, this means the threads struct is sized at 30,720 bytes. As the hashmaps allow O(1) access for the common find/insert/remove operations, lower the number of entries to 8. This reduces the size overhead to 960 bytes. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf threads: Switch from rbtree to hashmapIan Rogers2-105/+47
The rbtree provides a sorting on entries but this is unused. Switch to using hashmap for O(1) rather than O(log n) find/insert/remove complexity. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf threads: Move threads to its own filesIan Rogers5-273/+285
Move threads out of machine and into its own file. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf machine: Move machine's threads into its own abstractionIan Rogers5-203/+243
Move thread_rb_node into the machine.c file. This hides the implementation of threads from the rest of the code allowing for it to be refactored. Locking discipline is tightened up in this change. As the lock is now encapsulated in threads, the findnew function requires holding it (as it already did in machine). Rather than do conditionals with locks based on whether the thread should be created (which could potentially be error prone with a read lock match with a write unlock), have a separate threads__find that won't create the thread and only holds the read lock. This effectively duplicates the findnew logic, with the existing findnew logic only operating under a write lock assuming creation is necessary as a previous find failed. The creation may still fail with the write lock due to another thread. The duplication is removed in a later next patch that delegates the implementation to hashtable. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf machine: Move fprintf to for_each loop and a callbackIan Rogers1-16/+27
Avoid exposing the threads data structure by switching to the callback machine__for_each_thread approach. machine__fprintf is only used in tests and verbose >3 output so don't turn to list and sort. Add machine__threads_nr to be refactored later. Note, all existing *_fprintf routines ignore fprintf errors. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf trace: Ignore thread hashing in summaryIan Rogers2-23/+23
Commit 91e467bc568f ("perf machine: Use hashtable for machine threads") made the iteration of thread tids unordered. The perf trace --summary output sorts and prints each hash bucket, rather than all threads globally. Change this behavior by turn all threads into a list, sort the list by number of trace events then by tids, finally print the list. This also allows the rbtree in threads to be not accessed outside of machine. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf report: Sort child tasks by tidIan Rogers3-89/+168
Commit 91e467bc568f ("perf machine: Use hashtable for machine threads") made the iteration of thread tids unordered. The perf report --tasks output now shows child threads in an order determined by the hashing. For example, in this snippet tid 3 appears after tid 256 even though they have the same ppid 2: ``` $ perf report --tasks % pid tid ppid comm 0 0 -1 |swapper 2 2 0 | kthreadd 256 256 2 | kworker/12:1H-k 693761 693761 2 | kworker/10:1-mm 1301762 1301762 2 | kworker/1:1-mm_ 1302530 1302530 2 | kworker/u32:0-k 3 3 2 | rcu_gp ... ``` The output is easier to read if threads appear numerically increasing. To allow for this, read all threads into a list then sort with a comparator that orders by the child task's of the first common parent. The list creation and deletion are created as utilities on machine. The indentation is possible by counting the number of parents a child has. With this change the output for the same data file is now like: ``` $ perf report --tasks % pid tid ppid comm 0 0 -1 |swapper 1 1 0 | systemd 823 823 1 | systemd-journal 853 853 1 | systemd-udevd 3230 3230 1 | systemd-timesyn 3236 3236 1 | auditd 3239 3239 3236 | audisp-syslog 3321 3321 1 | accounts-daemon ... ``` Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Oliver Upton <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf vendor events amd: Fix Zen 4 cache latency eventsSandipan Das2-0/+60
L3PMCx0AC and L3PMCx0AD, used in l3_xi_sampled_latency* events, have a quirk that requires them to be programmed with SliceId set to 0x3. Without this, the events do not count at all and affects dependent metrics such as l3_read_miss_latency. If ThreadMask is not specified, the amd-uncore driver internally sets ThreadMask to 0x3, EnAllCores to 0x1 and EnAllSlices to 0x1 but does not set SliceId. Since SliceId must also be set to 0x3 in this case, specify all the other fields explicitly. E.g. $ sudo perf stat -e l3_xi_sampled_latency.all,l3_xi_sampled_latency_requests.all -a sleep 1 Before: Performance counter stats for 'system wide': 0 l3_xi_sampled_latency.all 0 l3_xi_sampled_latency_requests.all 1.005155399 seconds time elapsed After: Performance counter stats for 'system wide': 921,446 l3_xi_sampled_latency.all 54,210 l3_xi_sampled_latency_requests.all 1.005664472 seconds time elapsed Fixes: 5b2ca349c313 ("perf vendor events amd: Add Zen 4 uncore events") Signed-off-by: Sandipan Das <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-03perf version: Display availability of OpenCSD supportJames Clark1-0/+1
This is useful for scripts that work with Perf and ETM trace. Rather than them trying to parse Perf's error output at runtime to see if it was linked or not. Signed-off-by: James Clark <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-29perf vendor events intel: Add umasks/occ_sel to PCU events.Ian Rogers9-0/+27
UMasks were being dropped leading to all PCU UNC_P_POWER_STATE_OCCUPANCY events having the same encoding. Don't drop the umask trying to be consistent with other sources of events like libpfm4 [1]. Older models need to use occ_sel rather than umask, correct these values too. This applies the change from [2]. [1] https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/lib/events/intel_skx_unc_pcu_events.h#l30 [2] https://github.com/captain5050/perfmon/commit/cbd4aee81023e5bfa09677b1ce170ff69e9c423d Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-29perf map: Fix map reference count issuesIan Rogers2-10/+8
The find will get the map, ensure puts are done on all paths. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-29perf lock contention: Account contending locks tooNamhyung Kim3-7/+136
Currently it accounts the contention using delta between timestamps in lock:contention_begin and lock:contention_end tracepoints. But it means the lock should see the both events during the monitoring period. Actually there are 4 cases that happen with the monitoring: monitoring period / \ | | 1: B------+-----------------------+--------E 2: B----+-------------E | 3: | B-----------+----E 4: | B-------------E | | | t0 t1 where B and E mean contention BEGIN and END, respectively. So it only accounts the case 4 for now. It seems there's no way to handle the case 1. The case 2 might be handled if it saved the timestamp (t0), but it lacks the information from the B notably the flags which shows the lock types. Also it could be a nested lock which it currently ignores. So I think we should ignore the case 2. However we can handle the case 3 if we save the timestamp (t1) at the end of the period. And then it can iterate the map entries in the userspace and update the lock stat accordinly. Signed-off-by: Namhyung Kim <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Reviwed-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-02-29perf metrics: Fix segv for metrics with no eventsIan Rogers1-1/+1
A metric may have no events, for example, the transaction metrics on x86 are dependent on there being TSX events. Fix a segv where an evsel of NULL is dereferenced for a metric leader value. Fixes: a59fb796a36b ("perf metrics: Compute unmerged uncore metrics individually") Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-29perf metrics: Fix metric matchingIan Rogers1-12/+10
The metric match function fails for cases like looking for "metric" in the string "all;foo_metric;metric" as the "metric" in "foo_metric" matches but isn't preceeded by a ';'. Fix this by matching the first list item and recursively matching on failure the next item after a semicolon. Signed-off-by: Ian Rogers <[email protected]> Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-26perf pmu: Fix a potential memory leak in perf_pmu__lookup()Christophe JAILLET1-4/+3
The commit in Fixes has reordered some code, but missed an error handling path. 'goto err' now, in order to avoid a memory leak in case of error. Fixes: f63a536f03a2 ("perf pmu: Merge JSON events with sysfs at load time") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/9538b2b634894c33168dfe9d848d4df31fd4d801.1693085544.git.christophe.jaillet@wanadoo.fr
2024-02-26perf test: Fix spelling mistake "curent" -> "current"Colin Ian King1-1/+1
There is a spelling mistake in a pr_debug message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Adrian Hunter <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-26perf test: Use TEST_FAIL in the TEST_ASSERT macros instead of -1Arnaldo Carvalho de Melo1-8/+8
Just to make things clearer, return TEST_FAIL (-1) instead of an open coded -1. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/ZdepeMsjagbf1ufD@x1
2024-02-26perf data convert: Fix segfault when converting to json when cpu_desc isn't setIlkka Koskinen1-1/+3
Arm64 doesn't have Model in /proc/cpuinfo and, thus, cpu_desc doesn't get assigned. Running $ perf data convert --to-json perf.data.json ends up calling output_json_string() with NULL pointer, which causes a segmentation fault. Signed-off-by: Ilkka Koskinen <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: James Clark <[email protected]> Cc: Evgeny Pistun <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-26perf bpf: Check that the minimal vmlinux.h installed is the latest oneArnaldo Carvalho de Melo1-1/+1
When building BPF skels perf will, by default, install a minimalistic vmlinux.h file with the types needed by the BPF skels in tools/perf/util/bpf_skel/ in its build directory. When 29d16de26df17e94 ("perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h") was added, a type used in the augmented_raw_syscalls BPF skel, 'struct timespec64' was not found when building from a pre-existing build directory, because the vmlinux.h there didn't contain that type, ending up with this error, spotted in linux-next: CLANG /tmp/build/perf-tools-next/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o util/bpf_skel/augmented_raw_syscalls.bpf.c:329:15: error: invalid application of 'sizeof' to an incomplete type 'struct timespec64' 329 | __u32 size = sizeof(struct timespec64); | ^ ~~~~~~~~~~~~~~~~~~~ util/bpf_skel/augmented_raw_syscalls.bpf.c:329:29: note: forward declaration of 'struct timespec64' 329 | __u32 size = sizeof(struct timespec64); | ^ util/bpf_skel/augmented_raw_syscalls.bpf.c:350:15: error: invalid application of 'sizeof' to an incomplete type 'struct timespec64' 350 | __u32 size = sizeof(struct timespec64); | ^ ~~~~~~~~~~~~~~~~~~~ util/bpf_skel/augmented_raw_syscalls.bpf.c:350:29: note: forward declaration of 'struct timespec64' 350 | __u32 size = sizeof(struct timespec64); | ^ 2 errors generated. make[2]: *** [Makefile.perf:1158: /tmp/build/perf-tools-next/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o] Error 1 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [Makefile.perf:261: sub-make] Error 2 make: *** [Makefile:113: install-bin] Error 2 make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf' So add a Makefile dependency (Namhyung's suggestion) to make sure that the new tools/perf/util/bpf_skel/vmlinux/vmlinux.h minimal vmlinux is updated in the build directory, providing the moved 'struct timespec64' type. Fixes: 29d16de26df17e94 ("perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h") Reported-by: Stephen Rothwell <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Suggested-by: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/ZdoPrWg-qYFpBJbz@x1
2024-02-23treewide: remove meaningless assignments in MakefilesMasahiro Yamada8-53/+53
In Makefiles, $(error ), $(warning ), and $(info ) expand to the empty string, as explained in the GNU Make manual [1]: "The result of the expansion of this function is the empty string." Therefore, they are no-op except for logging purposes. $(shell ...) expands to the output of the command. It expands to the empty string when the command does not print anything to stdout. Hence, $(shell mkdir ...) is no-op except for creating the directory. Remove meaningless assignments. [1]: https://www.gnu.org/software/make/manual/make.html#Make-Control-Functions Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Arnaldo Carvalho de Melo <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected]
2024-02-23perf print-events: make is_event_supported() more robustMark Rutland1-8/+19
Currently the perf tool doesn't detect support for extended event types on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE hardware events into per-PMU events. This is due to the detection of extended event types not handling mandatory filters required by the M1/M2 PMU driver. PMU drivers and the core perf_events code can require that perf_event_attr::exclude_* filters are configured in a specific way and may reject certain configurations of filters, for example: (a) Many PMUs lack support for any event filtering, and require all perf_event_attr::exclude_* bits to be clear. This includes Alpha's CPU PMU, and ARM CPU PMUs prior to the introduction of PMUv2 in ARMv7, (b) When /proc/sys/kernel/perf_event_paranoid >= 2, the perf core requires that perf_event_attr::exclude_kernel is set. (c) The Apple M1/M2 PMU requires that perf_event_attr::exclude_guest is set as the hardware PMU does not count while a guest is running (but might be extended in future to do so). In is_event_supported(), we try to account for cases (a) and (b), first attempting to open an event without any filters, and if this fails, retrying with perf_event_attr::exclude_kernel set. We do not account for case (c), or any other filters that drivers could theoretically require to be set. Thus is_event_supported() will fail to detect support for any events targeting an Apple M1/M2 PMU, even where events would be supported with perf_event_attr:::exclude_guest set. Since commit: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") ... we use is_event_supported() to detect support for extended types, with the PMU ID encoded into the perf_event_attr::type. As above, on an Apple M1/M2 system this will always fail to detect that the event is supported, and consequently we fail to detect support for extended types even when these are supported, as they have been since commit: 5c816728651ae425 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability") Due to this, the perf tool will not automatically expand plain PERF_TYPE_HARDWARE events into per-PMU events, even when all the necessary kernel support is present. This patch updates is_event_supported() to additionally try opening events with perf_event_attr::exclude_guest set, allowing support for events to be detected on Apple M1/M2 systems. I believe that this is sufficient for all contemporary CPU PMU drivers, though in future it may be necessary to check for other combinations of filter bits. I've deliberately changed the check to not expect a specific error code for missing filters, as today ;the kernel may return a number of different error codes for missing filters (e.g. -EACCESS, -EINVAL, or -EOPNOTSUPP) depending on why and where the filter configuration is rejected, and retrying for any error is more robust. Note that this does not remove the need for commit: a24d9d9dc096fc0d ("perf parse-events: Make legacy events lower priority than sysfs/JSON") ... which is still necessary so that named-pmu/event/ events work on kernels without extended type support, even if the event name happens to be the same as a PERF_EVENT_TYPE_HARDWARE event (e.g. as is the case for the M1/M2 PMU's 'cycles' and 'instructions' events). Fixes: 82fe2e45cdb00de4 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") Signed-off-by: Mark Rutland <[email protected]> Tested-by: Ian Rogers <[email protected]> Tested-by: James Clark <[email protected]> Tested-by: Marc Zyngier <[email protected]> Cc: Hector Martin <[email protected]> Cc: James Clark <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mike Leach <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf tests: Add option to run tests in parallelIan Rogers1-99/+215
By default tests are forked, add an option (-p or --parallel) so that the forked tests are all started in parallel and then their output gathered serially. This is opt-in as running in parallel can cause test flakes. Rather than fork within the code, the start_command/finish_command from libsubcmd are used. This changes how stderr and stdout are handled. The child stderr and stdout are always read to avoid the child blocking. If verbose is 1 (-v) then if the test fails the child stdout and stderr are displayed. If the verbose is >1 (e.g. -vv) then the stdout and stderr from the child are immediately displayed. An unscientific test on my laptop shows the wall clock time for perf test without parallel being 5 minutes 21 seconds and with parallel (-p) being 1 minute 50 seconds. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf tests: Run time generate shell test suitesIan Rogers3-149/+80
Rather than special shell test logic, do a single pass to create an array of test suites. Hold the shell test file name in the test suite priv field. This makes the special shell test logic in builtin-test.c redundant so remove it. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-22perf tests: Use scandirat for shell script findingIan Rogers3-71/+95
Avoid filename appending buffers by using openat, faccessat and scandirat more widely. Turn the script's path back to a file name using readlink from /proc/<pid>/fd/<fd>. Read the script's description using api/io.h to avoid fdopen conversions. Whilst reading perform additional sanity checks on the script's contents. Signed-off-by: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Kan Liang <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: [email protected] Signed-off-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected]