aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2022-07-18perf buildid-list: Add a "-m" option to show kernel and modules build-idsBlake Jones2-0/+20
This new option displays all of the information needed to do external BuildID-based symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). For each kernel module plus the main kernel, it displays the BuildID, the start and end virtual addresses of that module's text range (rounded out to page boundaries), and the pathname of the module. When run as a non-privileged user, the actual addresses of the modules' text ranges are not available, so the tools displays "0, <text length>" for kernel modules and "0, 0xffffffffffffffff" for the kernel itself. Sample output: root# perf buildid-list -m cf6df852fd4da122d616153353cc8f560fd12fe0 ffffffffa5400000 ffffffffa6001e27 [kernel.kallsyms] 1aa7209aa2acb067d66ed6cf7676d65066384d61 ffffffffc0087000 ffffffffc008b000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/sha512_generic.ko 3857815b5bf0183697b68f8fe0ea06121644041e ffffffffc008c000 ffffffffc0098000 /lib/modules/5.15.15-1rodete2-amd64/kernel/arch/x86/crypto/sha512-ssse3.ko 4081fde0bca2bc097cb3e9d1efcb836047d485f1 ffffffffc0099000 ffffffffc009f000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/acpi/button.ko 1ef81ba4890552ea6b0314f9635fc43fc8cef568 ffffffffc00a4000 ffffffffc00aa000 /lib/modules/5.15.15-1rodete2-amd64/kernel/crypto/cryptd.ko cc5c985506cb240d7d082b55ed260cbb851f983e ffffffffc00af000 ffffffffc00b6000 /lib/modules/5.15.15-1rodete2-amd64/kernel/drivers/i2c/busses/i2c-piix4.ko [...] Committer notes: u64 formatter should be PRIx64 for printing as hex numbers, fix this: 28 5.28 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18) builtin-buildid-list.c: In function 'buildid__map_cb': builtin-buildid-list.c:32:24: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] 32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end); | ~~~~^ ~~~~~~~~~~ | | | | long unsigned int u64 {aka long long unsigned int} | %16llx builtin-buildid-list.c:32:30: error: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] 32 | printf("%s %16lx %16lx", bid_buf, map->start, map->end); | ~~~~^ ~~~~~~~~ | | | | long unsigned int u64 {aka long long unsigned int} | %16llx cc1: all warnings being treated as errors Signed-off-by: Blake Jones <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-07-18Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo10-15/+90
To update the perf/core codebase. Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what was added in: 49c692b7dfc9b6c0 ("perf offcpu: Accept allowed sample types only") Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-07-12perf record: Allow to specify max stack depth of fp callchainNamhyung Kim1-6/+12
Currently it has no interface to specify the max stack depth for perf record. Extend the command line parameter to accept a number after 'fp' to specify the depth like '--call-graph fp,32'. Signed-off-by: Namhyung Kim <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-07-02perf synthetic-events: Ignore dead threads during event synthesisNamhyung Kim1-2/+3
When it synthesize various task events, it scans the list of task first and then accesses later. There's a window threads can die between the two and proc entries may not be available. Instead of bailing out, we can ignore that thread and move on. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-07-02perf synthetic-events: Don't sort the task scan result from /procNamhyung Kim1-2/+2
It should not sort the result as procfs already returns a proper ordering of tasks. Actually sorting the order caused problems that it doesn't guararantee to process the main thread first. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-07-02perf unwind: Fix unitialized 'offset' variable on aarch64Ivan Babrou1-1/+1
Commit dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked objects") uncovered the following issue on aarch64: util/unwind-libunwind-local.c: In function 'find_proc_info': util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized] 386 | if (ofs > 0) { | ^ util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here 199 | u64 address, offset; | ^~~~~~ util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized] 371 | if (ofs <= 0) { | ^ util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here 199 | u64 address, offset; | ^~~~~~ util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized] 363 | if (ofs <= 0) { | ^ util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here 199 | u64 address, offset; | ^~~~~~ In file included from util/libunwind/arm64.c:37: Fixes: dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked objects") Signed-off-by: Ivan Babrou <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Ian Rogers <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-28perf bpf: 8 byte align bpil dataIan Rogers1-3/+2
bpil data is accessed assuming 64-bit alignment resulting in undefined behavior as the data is just byte aligned. With an -fsanitize=undefined build the following errors are observed: $ sudo perf record -a sleep 1 util/bpf-event.c:310:22: runtime error: load of misaligned address 0x55f61084520f for type '__u64', which requires 8 byte alignment 0x55f61084520f: note: pointer points here a8 fe ff ff 3c 51 d3 c0 ff ff ff ff 04 84 d3 c0 ff ff ff ff d8 aa d3 c0 ff ff ff ff a4 c0 d3 c0 ^ util/bpf-event.c:311:20: runtime error: load of misaligned address 0x55f61084522f for type '__u32', which requires 4 byte alignment 0x55f61084522f: note: pointer points here ff ff ff ff c7 17 00 00 f1 02 00 00 1f 04 00 00 58 04 00 00 00 00 00 00 0f 00 00 00 63 02 00 00 ^ util/bpf-event.c:198:33: runtime error: member access within misaligned address 0x55f61084523f for type 'const struct bpf_func_info', which requires 4 byte alignment 0x55f61084523f: note: pointer points here 58 04 00 00 00 00 00 00 0f 00 00 00 63 02 00 00 3b 00 00 00 ab 02 00 00 44 00 00 00 14 03 00 00 Correct this by rouding up the data sizes and aligning the pointers. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Monnet <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-28perf offcpu: Accept allowed sample types onlyNamhyung Kim3-1/+24
As offcpu-time event is synthesized at the end, it could not get the all the sample info. Define OFFCPU_SAMPLE_TYPES for allowed ones and mask out others in evsel__config() to prevent parse errors. Because perf sample parsing assumes a specific ordering with the sample types, setting unsupported one would make it fail to read data like perf record -d/--data. Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF") Signed-off-by: Namhyung Kim <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-28perf offcpu: Fix build failure on old kernelsNamhyung Kim1-6/+14
Old kernels have a 'struct task_struct' which contains a "state" field and newer kernels have "__state" instead. While the get_task_state() in the BPF code handles that in some way, it assumed the current kernel has the new definition and it caused a build error on old kernels. We should not assume anything and access them carefully. Do not use 'task struct' directly access it instead using new and old definitions in a row. Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF") Reported-by: Ian Rogers <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-26perf inject: Adjust output data offset for backward compatibilityRaul Silvera2-0/+16
When 'perf inject' creates a new file, it reuses the data offset from the input file. If there has been a change on the size of the header, as happened in v5.12 -> v5.13, the new offsets will be wrong, resulting in a corrupted output file. This change adds the function perf_session__data_offset to compute the data offset based on the current header size, and uses that instead of the offset from the original input file. Signed-off-by: Raul Silvera <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dave Marchevsky <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-26perf build-id: Fix caching files with a wrong build IDAdrian Hunter1-0/+28
Build ID events associate a file name with a build ID. However, when using perf inject, there is no guarantee that the file on the current machine at the current time has that build ID. Fix by comparing the build IDs and skip adding to the cache if they are different. Example: $ echo "int main() {return 0;}" > prog.c $ gcc -o prog prog.c $ perf record --buildid-all ./prog [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data ] $ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; } $ file-buildid prog 444ad9be165d8058a48ce2ffb4e9f55854a3293e $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf 444ad9be165d8058a48ce2ffb4e9f55854a3293e $ echo "int main() {return 1;}" > prog.c $ gcc -o prog prog.c $ file-buildid prog 885524d5aaa24008a3e2b06caa3ea95d013c0fc5 Before: $ perf buildid-cache --purge $(pwd)/prog $ perf inject -i perf.data -o junk $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf 885524d5aaa24008a3e2b06caa3ea95d013c0fc5 $ After: $ perf buildid-cache --purge $(pwd)/prog $ perf inject -i perf.data -o junk $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf $ Fixes: 454c407ec17a0c63 ("perf: add perf-inject builtin") Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tom Zanussi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf script ibs: Support new IBS bits in raw trace dumpRavi Bangoria1-6/+58
Interpret Additional set of IBS register bits while doing perf report/script raw dump. IBS op PMU ex: $ sudo ./perf record -c 130 -a -e ibs_op/l3missonly=1/ --raw-samples $ sudo ./perf report -D ... ibs_op_ctl: 0000004500070008 MaxCnt 128 L3MissOnly 1 En 1 Val 1 CntCtl 0=cycles CurCnt 69 ibs_op_data: 0000000000710002 CompToRetCtr 2 TagToRetCtr 113 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0 ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=A peer cache in a near CCX ibs_op_data3: 000000681d1700a1 LdOp 1 StOp 0 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 1 DcL2TlbHit2M 0 DcMiss 1 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 1 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 1 SwPf 0 OpMemWidth 8 bytes OpDcMissOpenMemReqs 7 DcMissLat 104 TlbRefillLat 0 IBS Fetch PMU ex: $ sudo ./perf record -c 130 -a -e ibs_fetch/l3missonly=1/ --raw-samples $ sudo ./perf report -D ... ibs_fetch_ctl: 3c1f00c700080008 MaxCnt 128 Cnt 128 Lat 199 En 1 Val 1 Comp 1 IcMiss 1 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 0 L2Miss 1 L3MissOnly 1 FetchOcMiss 1 FetchL3Miss 1 With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf tool ibs: Sync AMD IBS header fileRavi Bangoria1-2/+2
IBS support has been enhanced with two new features in upcoming uarch: 1. DataSrc extension 2. L3 miss filtering. Additional set of bits has been introduced in IBS registers to exploit these features. New bits are already defining in arch/x86/ header. Sync it with tools header file. Also rename existing ibs_op_data field 'data_src' to 'data_src_lo'. Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf header: Record non-CPU PMU capabilitiesRavi Bangoria4-83/+151
PMUs advertise their capabilities via sysfs attribute files but the perf tool currently parses only core(CPU) or hybrid core PMU capabilities. Add support of recording non-core PMU capabilities int perf.data header. Note that a newly proposed HEADER_PMU_CAPS is replacing existing HEADER_HYBRID_CPU_PMU_CAPS. Special care is taken for hybrid core PMUs by writing their capabilities first in the perf.data header to make sure new perf.data file being read by old perf tool does not break. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf header: Store PMU caps in an array of stringsRavi Bangoria3-32/+41
Currently all capabilities are stored in a single string separated by NULL character. Instead, store them in an array which makes searching of capability easier. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf header: Pass "cpu" pmu name while printing capsRavi Bangoria1-9/+3
Avoid unnecessary conditional code to check if pmu name is NULL or not by passing "cpu" pmu name to the printing function. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf pmu: Parse pmu caps sysfs only onceRavi Bangoria2-4/+13
In addition to returning nr_caps, cache it locally in struct perf_pmu. Similarly, cache status of whether caps sysfs has already been parsed or not. These will help to avoid parsing sysfs every time the function gets called. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-24perf record ibs: Warn about sampling period skewRavi Bangoria2-0/+8
Samples without an L3 miss are discarded and counter is reset with random value (between 1-15 for fetch PMU and 1-127 for op PMU) when IBS L3 miss filtering is enabled. This causes a sampling period skew but there is no way to reconstruct aggregated sampling period. So print a warning at perf record if user sets l3missonly=1. Ex: # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/ WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled and tagged operation does not cause L3 Miss. This causes sampling period skew. Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Ananth Narayan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Santosh Shukla <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-23perf script: Add some missing event dumpsAdrian Hunter1-0/+3
When the -D option is used, the details of thread-map, cpu-map and event-update events are not currently dumped. Add prints so that they are. Example: # perf record --kcore sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ] # perf script -D | grep 'THREAD\|CPU' 0 0x4950 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 35116 0 0x4978 [0x20]: PERF_RECORD_CPU_MAP: 0-7 # perf script -D | grep -A4 'UPDATE' 0 0x4920 [0x30]: PERF_RECORD_EVENT_UPDATE ... id: 147 ... 0-7 Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-23perf record: Add finished init eventAdrian Hunter3-1/+7
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. This is needed to enable injecting events after the initial synthesized user events (that have an all zero id sample) but before regular events. Committer notes: Add entry about PERF_RECORD_FINISHED_INIT to tools/perf/Documentation/perf.data-file-format.txt. Committer testing: Before: # perf report -D | grep FINISHED 0 0x5910 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) # After: # perf record -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf report -D | grep FINISHED 0 0x5068 [0x8]: PERF_RECORD_FINISHED_INIT: unhandled! 0 0x5390 [0x8]: PERF_RECORD_FINISHED_ROUND FINISHED_ROUND events: 1 ( 0.5%) FINISHED_INIT events: 1 ( 0.5%) # Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-23perf record: Add new option to sample identifierAdrian Hunter2-1/+2
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Add an option to always include sample type PERF_SAMPLE_IDENTIFIER. Committer testing: # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # # perf record --sample-identifier sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data (7 samples) ] # perf evlist -v cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-23perf record: Always record id indexAdrian Hunter1-2/+5
In preparation for recording sideband events in a virtual machine guest so that they can be injected into a host perf.data file. Adjust the logic so that if there are IDs then the id index is recorded. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-23perf data convert: Prefer sampled CPU when exporting JSONShawn M. Chapla1-1/+4
When CPU has been explicitly sampled (via --sample-cpu), prefer this sampled value over the thread CPU value when exporting to JSON. Signed-off-by: Shawn M. Chapla <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-19perf metrics: Ensure at least 1 id per metricIan Rogers1-0/+9
We may have no events for a metric evaluated to a constant. In such a case ensure a tool event is at least evaluated for metric parsing and displaying. Fixes: 8586d2744ff3065e ("perf metrics: Don't add all tool events for sharing") Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-19perf arm-spe: Don't set data source if it's not a memory operationLeo Yan1-14/+8
Except for memory load and store operations, ARM SPE records also can support other operation types, bug when set the data source field the current code assumes a record is a either load operation or store operation, this leads to wrongly synthesize memory samples. This patch strictly checks the record operation type, it only sets data source only for the operation types ARM_SPE_LD and ARM_SPE_ST, otherwise, returns zero for data source. Therefore, we can synthesize memory samples only when data source is a non-zero value, the function arm_spe__is_memory_event() is useless and removed. Fixes: e55ed3423c1bb29f ("perf arm-spe: Synthesize memory event") Reviewed-by: Ali Saidi <[email protected]> Reviewed-by: German Gomez <[email protected]> Signed-off-by: Leo Yan <[email protected]> Tested-by: Ali Saidi <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Andrew Kilroy <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Leo Yan <[email protected]> Cc: Li Huafei <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nick Forrington <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-19perf expr: Allow exponents on floating point valuesIan Rogers1-1/+1
Pass the optional exponent component through to strtod that already supports it. We already have exponents in ScaleUnit and so this adds uniformity. Reported-by: Zhengjun Xing <[email protected]> Reviewed-By: Kajol Jain <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-19perf unwind: Fix uninitialized variableIan Rogers1-1/+1
The 'ret' variable may be uninitialized on error goto paths. Fixes: dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked objects") Reported-by: Sedat Dilek <[email protected]> Reviewed-by: Fangrui Song <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Sedat Dilek <[email protected]> # LLVM-14 (x86-64) Cc: Fangrui Song <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Ullrich <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-06-03perf unwind: Fix segbase for ld.lld linked objectsFangrui Song2-30/+77
segbase is the address of .eh_frame_hdr and table_data is segbase plus the header size. find_proc_info computes segbase as `map->start + segbase - map->pgoff` which is wrong when * .eh_frame_hdr and .text are in different PT_LOAD program headers * and their p_vaddr difference does not equal their p_offset difference Since 10.0, ld.lld's default --rosegment -z noseparate-code layout has such R and RX PT_LOAD program headers. ld.lld (default) => perf report fails to unwind `perf record --call-graph dwarf` recorded data ld.lld --no-rosegment => ok (trivial, no R PT_LOAD) ld.lld -z separate-code => ok but by luck: there are two PT_LOAD but their p_vaddr difference equals p_offset difference ld.bfd -z noseparate-code => ok (trivial, no R PT_LOAD) ld.bfd -z separate-code (default for Linux/x86) => ok but by luck: there are two PT_LOAD but their p_vaddr difference equals p_offset difference To fix the issue, compute segbase as dso's base address plus PT_GNU_EH_FRAME's p_vaddr. The base address is computed by iterating over all dso-associated maps and then subtract the first PT_LOAD p_vaddr (the minimum guaranteed by generic ABI) from the minimum address. In libunwind, find_proc_info transitively called by unw_step is cached, so the iteration overhead is acceptable. Reported-by: Sebastian Ullrich <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Signed-off-by: Fangrui Song <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://github.com/ClangBuiltLinux/linux/issues/1646 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-27perf scripting python: Expose dso and map informationLeo Yan1-4/+17
This change adds dso build_id and corresponding map's start and end address. The info of dso build_id can be used to find dso file path, and we can validate if a branch address falls into the range of map's start and end addresses. In addition, the map's start address can be used as an offset for disassembly. Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Eelco Chaudron <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Tanmay Jagdale <[email protected]> Cc: [email protected] Cc: zengshun . wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-27perf tools arm64: Add support for VG registerJames Clark1-0/+2
Add the name of the VG register so it can be used in --user-regs The event will fail to open if the register is requested but not available so only add it to the mask if the kernel supports sve and also if it supports that specific register. Committer notes: Add conditional definition of HWCAP_SVE, as suggested by Leo Yan, to build on older systems where this is not available in the system headers. Reviewed-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf unwind: Use dynamic register set for DWARF unwindJames Clark2-1/+2
Architectures can detect availability of extra registers at runtime so use this more complete set for unwinding. This will include the VG register on arm64 in a later commit. If the function isn't implemented then PERF_REGS_MASK is returned and there is no change. Committer notes: Added util/perf_regs.c to tools/perf/util/python-ext-sources so that 'perf test python' passes, i.e. the perf python binding has all the symbols it needs, addressing: $ perf test -v python 19: 'import perf' in python : --- start --- test child forked, pid 2037817 python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python3' " Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.cpython-310-x86_64-linux-gnu.so: undefined symbol: arch__user_reg_mask test child finished with -1 ---- end ---- 'import perf' in python: FAILED! $ Reviewed-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf unwind arm64: Use perf's copy of kernel headersJames Clark1-1/+1
Fix this include path to use perf's copy of the kernel header rather than the one from the root of the repo. This fixes build errors when only applying the perf tools part of a patchset rather than both sides. Reported-by: German Gomez <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: German Gomez <[email protected]> Cc: <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf record: Add cgroup support for off-cpu profilingNamhyung Kim3-4/+84
This covers two different use cases. The first one is cgroup filtering given by -G/--cgroup option which controls the off-cpu profiling for tasks in the given cgroups only. The other use case is cgroup sampling which is enabled by --all-cgroups option and it adds PERF_SAMPLE_CGROUP to the sample_type to set the cgroup id of the task in the sample data. Example output. $ sudo perf record -a --off-cpu --all-cgroups sleep 1 $ sudo perf report --stdio -s comm,cgroup --call-graph=no ... # Samples: 144 of event 'offcpu-time' # Event count (approx.): 48452045427 # # Children Self Command Cgroup # ........ ........ ............... .......................................... # 61.57% 5.60% Chrome_ChildIOT /user.slice/user-657345.slice/[email protected]/app.slice/... 29.51% 7.38% Web Content /user.slice/user-657345.slice/[email protected]/app.slice/... 17.48% 1.59% Chrome_IOThread /user.slice/user-657345.slice/[email protected]/app.slice/... 16.48% 4.12% pipewire-pulse /user.slice/user-657345.slice/[email protected]/session.slice/... 14.48% 2.07% perf /user.slice/user-657345.slice/[email protected]/app.slice/... 14.30% 7.15% CompositorTileW /user.slice/user-657345.slice/[email protected]/app.slice/... 13.33% 6.67% Timer /user.slice/user-657345.slice/[email protected]/app.slice/... ... Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf record: Handle argument change in sched_switchNamhyung Kim2-11/+52
Recently sched_switch tracepoint added a new argument for prev_state, but it's hard to handle the change in a BPF program. Instead, we can check the function prototype in BTF before loading the program. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf record: Implement basic filtering for off-cpuNamhyung Kim3-14/+122
It should honor cpu and task filtering with -a, -C or -p, -t options. Committer testing: # perf record --off-cpu --cpu 1 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.722 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.446 MB perf.data (7248 samples) ] # # perf script | head -20 perf 97164 [001] 38287.696761: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696764: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696765: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97164 [001] 38287.696767: 212 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696768: 5130 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97164 [001] 38287.696770: 123063 cycles: ffffffffb6e0011e syscall_return_via_sysret+0x38 (vmlinux) perf 97164 [001] 38287.696803: 2292748 cycles: ffffffffb636c82d __fput+0xad (vmlinux) swapper 0 [001] 38287.702852: 1927474 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97513 97513 [001] 38287.767207: 1172536 cycles: ffffffffb612ff65 newidle_balance+0x5 (vmlinux) swapper 0 [001] 38287.769567: 1073081 cycles: ffffffffb618216d ktime_get_mono_fast_ns+0xd (vmlinux) :97533 97533 [001] 38287.770962: 984460 cycles: ffffffffb65b2900 selinux_socket_sendmsg+0x0 (vmlinux) :97540 97540 [001] 38287.772242: 883462 cycles: ffffffffb6d0bf59 irqentry_exit_to_user_mode+0x9 (vmlinux) swapper 0 [001] 38287.773633: 741963 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97552 97552 [001] 38287.774539: 606680 cycles: ffffffffb62eda0a page_add_file_rmap+0x7a (vmlinux) :97556 97556 [001] 38287.775333: 502254 cycles: ffffffffb634f964 get_obj_cgroup_from_current+0xc4 (vmlinux) :97561 97561 [001] 38287.776163: 427891 cycles: ffffffffb61b1522 cgroup_rstat_updated+0x22 (vmlinux) swapper 0 [001] 38287.776854: 359030 cycles: ffffffffb612fc5e load_balance+0x9ce (vmlinux) :97567 97567 [001] 38287.777312: 330371 cycles: ffffffffb6a8d8d0 skb_set_owner_w+0x0 (vmlinux) :97566 97566 [001] 38287.777589: 311622 cycles: ffffffffb614a7a8 native_queued_spin_lock_slowpath+0x148 (vmlinux) :97512 97512 [001] 38287.777671: 307851 cycles: ffffffffb62e0f35 find_vma+0x55 (vmlinux) # # perf record --off-cpu --cpu 4 perf bench sched messaging -l 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 1.613 [sec] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 1.415 MB perf.data (6729 samples) ] # perf script | head -20 perf 97650 [004] 38323.728036: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728040: 1 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728041: 9 cycles: ffffffffb6070174 native_write_msr+0x4 (vmlinux) perf 97650 [004] 38323.728042: 208 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728044: 5026 cycles: ffffffffb6070176 native_write_msr+0x6 (vmlinux) perf 97650 [004] 38323.728046: 119970 cycles: ffffffffb6d0bebc syscall_exit_to_user_mode+0x1c (vmlinux) perf 97650 [004] 38323.728078: 2190103 cycles: 54b756 perf_tool__process_synth_event+0x16 (/home/acme/bin/perf) swapper 0 [004] 38323.783357: 1593139 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.785352: 1593139 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.797330: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.802350: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) swapper 0 [004] 38323.806333: 1418936 cycles: ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux) :97996 97996 [004] 38323.807145: 1418936 cycles: 7f5db9be6917 [unknown] ([unknown]) :97959 97959 [004] 38323.807730: 1445074 cycles: ffffffffb6329d36 memcg_slab_post_alloc_hook+0x146 (vmlinux) :97959 97959 [004] 38323.808103: 1341584 cycles: ffffffffb62fd90f get_page_from_freelist+0x112f (vmlinux) :97959 97959 [004] 38323.808451: 1227537 cycles: ffffffffb65b2905 selinux_socket_sendmsg+0x5 (vmlinux) :97959 97959 [004] 38323.808768: 1184321 cycles: ffffffffb6d1ba35 _raw_spin_lock_irqsave+0x15 (vmlinux) :97959 97959 [004] 38323.809073: 1153017 cycles: ffffffffb6a8d92d skb_set_owner_w+0x5d (vmlinux) :97959 97959 [004] 38323.809402: 1126875 cycles: ffffffffb6329c64 memcg_slab_post_alloc_hook+0x74 (vmlinux) :97959 97959 [004] 38323.809695: 1073248 cycles: ffffffffb6e0001d entry_SYSCALL_64+0x1d (vmlinux) # Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf record: Enable off-cpu analysis with BPFNamhyung Kim4-0/+368
Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf report: Do not extend sample type of bpf-output eventNamhyung Kim1-2/+2
Currently evsel__new_idx() sets more sample_type bits when it finds a BPF-output event. But it should honor what's recorded in the perf data file rather than blindly sets the bits. Otherwise it could lead to a parse error when it recorded with a modified sample_type. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Blake Jones <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Milian Wolff <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf stat: Add requires_cpu flag for uncoreAdrian Hunter2-1/+2
Uncore events require a CPU i.e. it cannot be -1. The evsel system_wide flag is intended for events that should be on every CPU, which does not make sense for uncore events because uncore events do not map one-to-one with CPUs. These 2 requirements are not exactly the same, so introduce a new flag 'requires_cpu' for the uncore case. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf tools: Allow all_cpus to be a superset of user_requested_cpusAdrian Hunter1-1/+1
To support collection of system-wide events with user requested CPUs, all_cpus must be a superset of user_requested_cpus. In order to support all_cpus to be a superset of user_requested_cpus, all_cpus must be used instead of user_requested_cpus when dealing with CPUs of all events instead of CPUs of requested events. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf evlist: Add evlist__add_dummy_on_all_cpus()Adrian Hunter2-0/+50
Add evlist__add_dummy_on_all_cpus() to enable creating a system-wide dummy event that sets up the system-wide maps before map propagation. For convenience, add evlist__add_aux_dummy() so that the logic can be used whether or not the event needs to be system-wide. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf evlist: Factor out evlist__dummy_event()Adrian Hunter1-2/+8
Factor out evlist__dummy_event() so it can be reused. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameterAdrian Hunter4-10/+7
Remove auxtrace_mmap_params__set_idx() per_cpu parameter because it isn't needed. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf auxtrace: Add mmap_needed to auxtrace_mmap_paramsAdrian Hunter4-6/+21
Add mmap_needed to auxtrace_mmap_params. Currently an auxtrace mmap is always attempted even if the event is not an auxtrace event. That works because, when AUX area tracing, there is always an auxtrace event first for every mmap. Prepare for that not being the case, which it won't be when sideband tracking events are allowed on all CPUs even when auxtrace is limited to selected CPUs. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf build: Stop using __weak bpf_map_create() to handle older libbpf versionsArnaldo Carvalho de Melo1-1/+5
By adding a feature test for bpf_map_create() and providing a fallback if it isn't present in older versions of libbpf. This also fixes the build with torvalds/master at this point: $ git log --oneline -5 torvalds/master babf0bb978e3c9fc (torvalds/master) Merge tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux e375780b631a5fc2 Merge tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 8b728edc5be16179 Merge tag 'fs_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 3f306ea2e18568f6 Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping fbe86daca0ba878b Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi $ Coping with: $ git log --oneline -2 d16495a982324f75 d16495a982324f75 libbpf: remove bpf_create_map*() APIs e2371b1632b1c61c libbpf: start 1.0 development cycle $ As the __weak function fails to build as it calls the now removed bpf_create_map() API. Testing: $ rpm -q libbpf-devel libbpf-devel-0.4.0-2.fc35.x86_64 $ $ make -C tools/perf BUILD_BPF_SKEL=1 LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin $ cat /tmp/build/perf/feature/test-libbpf-bpf_map_create.make.output test-libbpf-bpf_map_create.c: In function ‘main’: test-libbpf-bpf_map_create.c:6:16: error: implicit declaration of function ‘bpf_map_create’; did you mean ‘bpf_map_freeze’? [-Werror=implicit-function-declaration] 6 | return bpf_map_create(0 /* map_type */, NULL /* map_name */, 0, /* key_size */, | ^~~~~~~~~~~~~~ | bpf_map_freeze test-libbpf-bpf_map_create.c:6:87: error: expected expression before ‘,’ token 6 | return bpf_map_create(0 /* map_type */, NULL /* map_name */, 0, /* key_size */, | ^ cc1: all warnings being treated as errors $ $ objdump -dS /tmp/build/perf/perf | grep '<bpf_map_create>:' -A20 000000000058b290 <bpf_map_create>: { 58b290: 55 push %rbp 58b291: 48 89 e5 mov %rsp,%rbp 58b294: 48 83 ec 10 sub $0x10,%rsp 58b298: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 58b29f: 00 00 58b2a1: 48 89 45 f8 mov %rax,-0x8(%rbp) 58b2a5: 31 c0 xor %eax,%eax return bpf_create_map(map_type, key_size, value_size, max_entries, 0); 58b2a7: 48 8b 45 f8 mov -0x8(%rbp),%rax 58b2ab: 64 48 2b 04 25 28 00 sub %fs:0x28,%rax 58b2b2: 00 00 58b2b4: 75 10 jne 58b2c6 <bpf_map_create+0x36> } 58b2b6: c9 leave 58b2b7: 89 d6 mov %edx,%esi 58b2b9: 89 ca mov %ecx,%edx 58b2bb: 44 89 c1 mov %r8d,%ecx return bpf_create_map(map_type, key_size, value_size, max_entries, 0); 58b2be: 45 31 c0 xor %r8d,%r8d $ Cc: Andrii Nakryiko <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Song Liu <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/linux-perf-users/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf build: Stop using __weak btf__raw_data() to handle older libbpf versionsJiri Olsa1-1/+3
By adding a feature test for btf__raw_data() and providing a fallback if it isn't present in older versions of libbpf. Committer testing: $ rpm -q libbpf-devel libbpf-devel-0.4.0-2.fc35.x86_64 $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin $ cat /tmp/build/perf/feature/test-libbpf-btf__raw_data.make.output test-libbpf-btf__raw_data.c: In function ‘main’: test-libbpf-btf__raw_data.c:6:9: error: implicit declaration of function ‘btf__raw_data’; did you mean ‘btf__get_raw_data’? [-Werror=implicit-function-declaration] 6 | btf__raw_data(NULL /* btf_ro */, NULL /* size */); | ^~~~~~~~~~~~~ | btf__get_raw_data cc1: all warnings being treated as errors $ objdump -dS /tmp/build/perf/perf | grep '<btf__raw_data>:' -A20 00000000005b3050 <btf__raw_data>: { 5b3050: 55 push %rbp 5b3051: 48 89 e5 mov %rsp,%rbp 5b3054: 48 83 ec 10 sub $0x10,%rsp 5b3058: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 5b305f: 00 00 5b3061: 48 89 45 f8 mov %rax,-0x8(%rbp) 5b3065: 31 c0 xor %eax,%eax return btf__get_raw_data(btf_ro, size); 5b3067: 48 8b 45 f8 mov -0x8(%rbp),%rax 5b306b: 64 48 2b 04 25 28 00 sub %fs:0x28,%rax 5b3072: 00 00 5b3074: 75 06 jne 5b307c <btf__raw_data+0x2c> } 5b3076: c9 leave return btf__get_raw_data(btf_ro, size); 5b3077: e9 14 99 e5 ff jmp 40c990 <btf__get_raw_data@plt> 5b307c: e8 af a7 e5 ff call 40d830 <__stack_chk_fail@plt> 5b3081: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 5b3088: 00 00 00 00 $ Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf build: Stop using __weak bpf_object__next_map() to handle older libbpf ↵Jiri Olsa1-1/+3
versions By adding a feature test for bpf_object__next_map() and providing a fallback if it isn't present in older versions of libbpf. Committer testing: $ rpm -q libbpf-devel libbpf-devel-0.4.0-2.fc35.x86_64 $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin $ cat /tmp/build/perf/feature/test-libbpf-bpf_object__next_map.make.output test-libbpf-bpf_object__next_map.c: In function ‘main’: test-libbpf-bpf_object__next_map.c:6:9: error: implicit declaration of function ‘bpf_object__next_map’; did you mean ‘bpf_object__next’? [-Werror=implicit-function-declaration] 6 | bpf_object__next_map(NULL /* obj */, NULL /* prev */); | ^~~~~~~~~~~~~~~~~~~~ | bpf_object__next cc1: all warnings being treated as errors $ $ objdump -dS /tmp/build/perf/perf | grep '<bpf_object__next_map>:' -A20 00000000005b2e00 <bpf_object__next_map>: { 5b2e00: 55 push %rbp 5b2e01: 48 89 e5 mov %rsp,%rbp 5b2e04: 48 83 ec 10 sub $0x10,%rsp 5b2e08: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 5b2e0f: 00 00 5b2e11: 48 89 45 f8 mov %rax,-0x8(%rbp) 5b2e15: 31 c0 xor %eax,%eax return bpf_map__next(prev, obj); 5b2e17: 48 8b 45 f8 mov -0x8(%rbp),%rax 5b2e1b: 64 48 2b 04 25 28 00 sub %fs:0x28,%rax 5b2e22: 00 00 5b2e24: 75 0f jne 5b2e35 <bpf_object__next_map+0x35> } 5b2e26: c9 leave 5b2e27: 49 89 f8 mov %rdi,%r8 5b2e2a: 48 89 f7 mov %rsi,%rdi return bpf_map__next(prev, obj); 5b2e2d: 4c 89 c6 mov %r8,%rsi 5b2e30: e9 cb b1 e5 ff jmp 40e000 <bpf_map__next@plt> $ Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf build: Stop using __weak bpf_object__next_program() to handle older ↵Jiri Olsa1-1/+3
libbpf versions By adding a feature test for bpf_object__next_program() and providing a fallback if it isn't present in older versions of libbpf. Committer testing: $ rpm -q libbpf-devel libbpf-devel-0.4.0-2.fc35.x86_64 $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin $ cat /tmp/build/perf/feature/test-libbpf-bpf_object__next_program.make.output test-libbpf-bpf_object__next_program.c: In function ‘main’: test-libbpf-bpf_object__next_program.c:6:9: error: implicit declaration of function ‘bpf_object__next_program’; did you mean ‘bpf_object__unpin_programs’? [-Werror=implicit-function-declaration] 6 | bpf_object__next_program(NULL /* obj */, NULL /* prev */); | ^~~~~~~~~~~~~~~~~~~~~~~~ | bpf_object__unpin_programs cc1: all warnings being treated as errors $ $ objdump -dS /tmp/build/perf/perf | grep '<bpf_object__next_program>:' -A20 00000000005b2dc0 <bpf_object__next_program>: { 5b2dc0: 55 push %rbp 5b2dc1: 48 89 e5 mov %rsp,%rbp 5b2dc4: 48 83 ec 10 sub $0x10,%rsp 5b2dc8: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 5b2dcf: 00 00 5b2dd1: 48 89 45 f8 mov %rax,-0x8(%rbp) 5b2dd5: 31 c0 xor %eax,%eax return bpf_program__next(prev, obj); 5b2dd7: 48 8b 45 f8 mov -0x8(%rbp),%rax 5b2ddb: 64 48 2b 04 25 28 00 sub %fs:0x28,%rax 5b2de2: 00 00 5b2de4: 75 0f jne 5b2df5 <bpf_object__next_program+0x35> } 5b2de6: c9 leave 5b2de7: 49 89 f8 mov %rdi,%r8 5b2dea: 48 89 f7 mov %rsi,%rdi return bpf_program__next(prev, obj); 5b2ded: 4c 89 c6 mov %r8,%rsi 5b2df0: e9 3b b4 e5 ff jmp 40e230 <bpf_program__next@plt> $ Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-26perf build: Stop using __weak bpf_prog_load() to handle older libbpf versionsJiri Olsa1-5/+7
By adding a feature test for bpf_prog_load() and providing a fallback if it isn't present in older versions of libbpf. Committer testing: $ rpm -q libbpf-devel libbpf-devel-0.4.0-2.fc35.x86_64 $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin $ cat /tmp/build/perf/feature/test-libbpf-bpf_prog_load.make.output test-libbpf-bpf_prog_load.c: In function ‘main’: test-libbpf-bpf_prog_load.c:6:16: error: implicit declaration of function ‘bpf_prog_load’ [-Werror=implicit-function-declaration] 6 | return bpf_prog_load(0 /* prog_type */, NULL /* prog_name */, | ^~~~~~~~~~~~~ cc1: all warnings being treated as errors $ $ objdump -dS /tmp/build/perf/perf | grep '<bpf_prog_load>:' -A20 00000000005b2d70 <bpf_prog_load>: { 5b2d70: 55 push %rbp 5b2d71: 48 89 ce mov %rcx,%rsi 5b2d74: 4c 89 c8 mov %r9,%rax 5b2d77: 49 89 d2 mov %rdx,%r10 5b2d7a: 4c 89 c2 mov %r8,%rdx 5b2d7d: 48 89 e5 mov %rsp,%rbp 5b2d80: 48 83 ec 18 sub $0x18,%rsp 5b2d84: 64 48 8b 0c 25 28 00 mov %fs:0x28,%rcx 5b2d8b: 00 00 5b2d8d: 48 89 4d f8 mov %rcx,-0x8(%rbp) 5b2d91: 31 c9 xor %ecx,%ecx return bpf_load_program(prog_type, insns, insn_cnt, license, 5b2d93: 41 8b 49 5c mov 0x5c(%r9),%ecx 5b2d97: 51 push %rcx 5b2d98: 4d 8b 49 60 mov 0x60(%r9),%r9 5b2d9c: 4c 89 d1 mov %r10,%rcx 5b2d9f: 44 8b 40 1c mov 0x1c(%rax),%r8d 5b2da3: e8 f8 aa e5 ff call 40d8a0 <bpf_load_program@plt> } $ Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-23perf intel-pt: Add guest_code supportAdrian Hunter1-2/+18
A common case for KVM test programs is that the test program acts as the hypervisor, creating, running and destroying the virtual machine, and providing the guest object code from its own object code. In this case, the VM is not running an OS, but only the functions loaded into it by the hypervisor test program, and conveniently, loaded at the same virtual addresses. To support that, a new option "--guest-code" has been added in previous patches. In this patch, add support also to Intel PT. In particular, ensure guest_code thread is set up before attempting to walk object code or synthesize samples. Example: # perf record --kcore -e intel_pt/cyc/ -- tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.280 MB perf.data ] # perf script --guest-code --itrace=bep --ns -F-period,+addr,+flags [SNIP] tsc_msrs_test 18436 [007] 10897.962087733: branches: call ffffffffc13b2ff5 __vmx_vcpu_run+0x15 (vmlinux) => ffffffffc13b2f50 vmx_update_host_rsp+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962087733: branches: return ffffffffc13b2f5d vmx_update_host_rsp+0xd (vmlinux) => ffffffffc13b2ffa __vmx_vcpu_run+0x1a (vmlinux) tsc_msrs_test 18436 [007] 10897.962087733: branches: call ffffffffc13b303b __vmx_vcpu_run+0x5b (vmlinux) => ffffffffc13b2f80 vmx_vmenter+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962087836: branches: vmentry ffffffffc13b2f82 vmx_vmenter+0x2 (vmlinux) => 0 [unknown] ([unknown]) [guest/18436] 18436 [007] 10897.962087836: branches: vmentry 0 [unknown] ([unknown]) => 402c81 guest_code+0x131 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962087836: branches: call 402c81 guest_code+0x131 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962088248: branches: vmexit 40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 0 [unknown] ([unknown]) tsc_msrs_test 18436 [007] 10897.962088248: branches: vmexit 0 [unknown] ([unknown]) => ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962088248: branches: jmp ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) => ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) tsc_msrs_test 18436 [007] 10897.962088256: branches: return ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) => ffffffffc13b3040 __vmx_vcpu_run+0x60 (vmlinux) tsc_msrs_test 18436 [007] 10897.962088270: branches: return ffffffffc13b30b6 __vmx_vcpu_run+0xd6 (vmlinux) => ffffffffc13b2f2e vmx_vcpu_enter_exit+0x4e (vmlinux) [SNIP] tsc_msrs_test 18436 [007] 10897.962089321: branches: call ffffffffc13b2ff5 __vmx_vcpu_run+0x15 (vmlinux) => ffffffffc13b2f50 vmx_update_host_rsp+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962089321: branches: return ffffffffc13b2f5d vmx_update_host_rsp+0xd (vmlinux) => ffffffffc13b2ffa __vmx_vcpu_run+0x1a (vmlinux) tsc_msrs_test 18436 [007] 10897.962089321: branches: call ffffffffc13b303b __vmx_vcpu_run+0x5b (vmlinux) => ffffffffc13b2f80 vmx_vmenter+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962089424: branches: vmentry ffffffffc13b2f82 vmx_vmenter+0x2 (vmlinux) => 0 [unknown] ([unknown]) [guest/18436] 18436 [007] 10897.962089424: branches: vmentry 0 [unknown] ([unknown]) => 40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962089701: branches: jmp 40dc1b ucall+0x7b (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 40dc39 ucall+0x99 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962089701: branches: jcc 40dc3c ucall+0x9c (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 40dc20 ucall+0x80 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962089701: branches: jcc 40dc3c ucall+0x9c (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 40dc20 ucall+0x80 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962089701: branches: jcc 40dc37 ucall+0x97 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 40dc50 ucall+0xb0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) [guest/18436] 18436 [007] 10897.962089878: branches: vmexit 40dc55 ucall+0xb5 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) => 0 [unknown] ([unknown]) tsc_msrs_test 18436 [007] 10897.962089878: branches: vmexit 0 [unknown] ([unknown]) => ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) tsc_msrs_test 18436 [007] 10897.962089878: branches: jmp ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) => ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) tsc_msrs_test 18436 [007] 10897.962089887: branches: return ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) => ffffffffc13b3040 __vmx_vcpu_run+0x60 (vmlinux) tsc_msrs_test 18436 [007] 10897.962089901: branches: return ffffffffc13b30b6 __vmx_vcpu_run+0xd6 (vmlinux) => ffffffffc13b2f2e vmx_vcpu_enter_exit+0x4e (vmlinux) [SNIP] # perf kvm --guest-code --guest --host report -i perf.data --stdio | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12 of event 'instructions' # Event count (approx.): 2274583 # # Children Self Command Shared Object Symbol # ........ ........ ............. .................... ........................................... # 54.70% 0.00% tsc_msrs_test [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe | ---entry_SYSCALL_64_after_hwframe do_syscall_64 | |--29.44%--syscall_exit_to_user_mode | exit_to_user_mode_prepare | task_work_run | __fput For more information about Perf tools support for Intel® Processor Trace refer: https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-05-23perf tools: Add guest_code supportAdrian Hunter5-3/+103
A common case for KVM test programs is that the test program acts as the hypervisor, creating, running and destroying the virtual machine, and providing the guest object code from its own object code. In this case, the VM is not running an OS, but only the functions loaded into it by the hypervisor test program, and conveniently, loaded at the same virtual addresses. Normally to resolve addresses, MMAP events are needed to map addresses back to the object code and debug symbols for that object code. Currently, there is no way to get such mapping information from guests but, in the scenario described above, the guest has the same mappings as the hypervisor, so support for that scenario can be achieved. To support that, copy the host thread's maps to the guest thread's maps. Note, we do not discover the guest until we encounter a guest event, which works well because it is not until then that we know that the host thread's maps have been set up. Typically the main function for the guest object code is called "guest_code", hence the name chosen for this feature. Note, that is just a convention, the function could be named anything, and the tools do not care. This is primarily aimed at supporting Intel PT, or similar, where trace data can be recorded for a guest. Refer to the final patch in this series "perf intel-pt: Add guest_code support" for an example. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>