aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2021-11-07perf stat: Fix memory leak on error pathIan Rogers1-0/+1
strdup() is used to deduplicate, ensure it isn't leaking an already created string by freeing first. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-07perf tools: Use __BYTE_ORDER__Ilya Leoshkevich7-10/+10
Switch from the libc-defined __BYTE_ORDER to the compiler-defined __BYTE_ORDER__ in order to make endianness detection more robust, like it was done for libbpf. Signed-off-by: Ilya Leoshkevich <[email protected]> Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-07perf tools: Refactor out kernel symbol argument sanity checkingJames Clark2-0/+24
User supplied values for vmlinux and kallsyms are checked before continuing. Refactor this into a function so that it can be used elsewhere. Reviewed-by: Denis Nikitin <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-07perf symbols: Ignore $a/$d symbols for ARM modulesLexi Shao1-0/+4
On anARM machine, kernel symbols from modules can be resolved to $a instead of printing the actual symbol name. Ignore symbols starting with "$" when building kallsyms rbtree. A sample stacktrace is shown as follows: c0f2e39c schedule_hrtimeout+0x14 ([kernel.kallsyms]) bf4a66d8 $a+0x78 ([test_module]) c0a4f5f4 kthread+0x15c ([kernel.kallsyms]) c0a001f8 ret_from_fork+0x14 ([kernel.kallsyms]) On an ARM machine, $a/$d symbols are used by the compiler to mark the beginning of code/data part in code section. These symbols are filtered out when linking vmlinux(see scripts/kallsyms.c ignored_prefixes), but are left on modules. So there are $a symbols in /proc/kallsyms which share the same addresses with the actual module symbols and confuses perf when resolving symbols. After this patch, the module symbol name is printed: c0f2e39c schedule_hrtimeout+0x14 ([kernel.kallsyms]) bf4a66d8 test_func+0x78 ([test_module]) c0a4f5f4 kthread+0x15c ([kernel.kallsyms]) c0a001f8 ret_from_fork+0x14 ([kernel.kallsyms]) Reviewed-by: James Clark <[email protected]> Signed-off-by: Lexi Shao <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: QiuXi <[email protected]> Cc: Song Liu <[email protected]> Cc: Wangbing <[email protected]> Cc: Xiaoming Ni <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-07perf evsel: Don't set exclude_guest by defaultRavi Bangoria2-5/+8
Perf tool sets exclude_guest by default while calling perf_event_open(). Because IBS does not have filtering capability, it always gets rejected by IBS PMU driver and thus perf falls back to non-precise sampling. Fix it by not setting exclude_guest by default on AMD. Before: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 decreasing precise_ip by one (1) precise_ip 1 decreasing precise_ip by one (0) After: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 Committer notes: Fixup init to zero for perf_env in older compilers: arch/x86/util/evsel.c:15:26: error: missing field 'os_release' initializer [-Werror,-Wmissing-field-initializers] struct perf_env env = {0}; ^ Committer notes: Namhyung remarked: It'd be nice if it can cover explicit "-e cycles:pp" as well. Ravi clarified: For explicit :pp modifier, evsel->precise_max does not get set and thus perf does not try with different attr->precise_ip values while exclude_guest set. So no issue with explicit :pp: $ sudo ./perf record -C 0 -e cycles:pp -vvv |& grep "precise_ip\|exclude_guest" precise_ip 2 exclude_guest 1 precise_ip 2 exclude_guest 1 switching off exclude_guest, exclude_host precise_ip 2 ^C Also, with :P modifier, evsel->precise_max gets set but exclude_guest does not and thus :P also works fine: $ sudo ./perf record -C 0 -e cycles:P -vvv |& grep "precise_ip\|exclude_guest" precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 ^C Reported-by: Kim Phillips <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-06perf evsel: Fix missing exclude_{host,guest} settingNamhyung Kim4-5/+42
The current logic for the perf missing feature has a bug that it can wrongly clear some modifiers like G or H. Actually some PMUs don't support any filtering or exclusion while others do. But we check it as a global feature. For example, the cycles event can have 'G' modifier to enable it only in the guest mode on x86. When you don't run any VMs it'll return 0. # perf stat -a -e cycles:G sleep 1 Performance counter stats for 'system wide': 0 cycles:G 1.000721670 seconds time elapsed But when it's used with other pmu events that don't support G modifier, it'll be reset and return non-zero values. # perf stat -a -e cycles:G,msr/tsc/ sleep 1 Performance counter stats for 'system wide': 538,029,960 cycles:G 16,924,010,738 msr/tsc/ 1.001815327 seconds time elapsed This is because of the missing feature detection logic being global. Add a hashmap to set pmu-specific exclude_host/guest features. Committer notes: Fix 'perf test python' by adding a stub for evsel__find_pmu() in tools/perf/util/python.c, document that it is used so far only for the above reasons so that if anybody needs this in the python binding usecases, we can revisit this. Reported-by: Stephane Eranian <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-06perf bpf: Add missing free to bpf_event__print_bpf_prog_info()Ian Rogers1-1/+3
If btf__new() is called then there needs to be a corresponding btf__free(). Fixes: f8dfeae009effc0b ("perf bpf: Show more BPF program info in print_bpf_prog_info()") Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Yonghong Song <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-06Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo3-3/+21
To pick up some tools/perf/ patches that went via tip/perf/core, such as: tools/perf: Add mem_hops field in perf_mem_data_src structure Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-04perf clang: Fixes for more recent LLVM/clangIan Rogers1-8/+13
The parameters to two functions and the location of a variable have changed in more recent LLVM/clang releases. Remove the unneecessary -fmessage-length and -ferror-limit flags, the former causes failures like: 58: builtin clang support : 58.1: builtin clang compile C source to IR : --- start --- test child forked, pid 279307 error: unknown argument: '-fmessage-length' 1 error generated. test child finished with -1 Tested with LLVM 6, 8, 9, 10 and 11. Reviewed-by: Fangrui Song <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sedat Dilek <[email protected]>, Cc: [email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-02Merge tag 'net-next-for-5.16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2021-11-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-1/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-11-01 We've added 181 non-merge commits during the last 28 day(s) which contain a total of 280 files changed, 11791 insertions(+), 5879 deletions(-). The main changes are: 1) Fix bpf verifier propagation of 64-bit bounds, from Alexei. 2) Parallelize bpf test_progs, from Yucong and Andrii. 3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus. 4) Improve bpf selftests on s390, from Ilya. 5) bloomfilter bpf map type, from Joanne. 6) Big improvements to JIT tests especially on Mips, from Johan. 7) Support kernel module function calls from bpf, from Kumar. 8) Support typeless and weak ksym in light skeleton, from Kumar. 9) Disallow unprivileged bpf by default, from Pawan. 10) BTF_KIND_DECL_TAG support, from Yonghong. 11) Various bpftool cleanups, from Quentin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits) libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose bpf: Factor out helpers for ctx access checking bpf: Factor out a helper to prepare trampoline for struct_ops prog selftests, bpf: Fix broken riscv build riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h tools, build: Add RISC-V to HOSTARCH parsing riscv, bpf: Increase the maximum number of iterations selftests, bpf: Add one test for sockmap with strparser selftests, bpf: Fix test_txmsg_ingress_parser error ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01Merge tag 'x86_misc_for_v5.16_rc1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 changes from Borislav Petkov: - Use the proper interface for the job: get_unaligned() instead of memcpy() in the insn decoder - A randconfig build fix * tag 'x86_misc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Use get_unaligned() instead of memcpy() x86/Kconfig: Fix an unused variable error in dell-smm-hwmon
2021-11-01perf bpf: Pull in bpf_program__get_prog_info_linear()Dave Marchevsky10-32/+373
To prepare for impending deprecation of libbpf's bpf_program__get_prog_info_linear(), pull in the function and associated helpers into the perf codebase and migrate existing uses to the perf copy. Since libbpf's deprecated definitions will still be visible to perf, it is necessary to rename perf's definitions. Signed-off-by: Dave Marchevsky <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-11-01Merge tag 'perf-core-2021-10-31' of ↵Linus Torvalds1-2/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. Tools: - Update for the new mem_hops field in perf_mem_data_src Arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC" * tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses tools/perf: Add mem_hops field in perf_mem_data_src structure perf: Add mem_hops field in perf_mem_data_src structure perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line perf/core: Allow ftrace for functions in kernel/event/core.c perf/x86: Add new event for AUX output counter index perf/x86: Add compiler barrier after updating BTS perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints perf/x86/intel/uncore: Fix Intel SPR IIO event constraints perf/x86/intel/uncore: Fix Intel SPR CHA event constraints perf/x86/intel/uncore: Fix Intel ICX IIO event constraints perf/x86/intel/uncore: Fix invalid unit check perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
2021-10-28perf evsel: Add bitfield_swap() to handle branch_stack endian issueMadhavan Srinivasan2-2/+88
The branch_stack struct has bit field definition which produces different bit ordering for big/little endian. Because of this, when branch_stack sample is collected in a BE system and viewed/reported in a LE system, bit fields of the branch stack are not presented properly. To address this issue, a evsel__bitfield_swap_branch_stack() is defined and introduced in evsel__parse_sample. Signed-off-by: Madhavan Srinivasan <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Athira Jajeev <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf script: Show binary offsets for userspace addrLexi Shao1-3/+9
Show binary offsets for userspace addr with map in perf script output with callchain. In commit 19610184693c("perf script: Show virtual addresses instead of offsets"), the addr shown in perf script output with callchain is changed from binary offsets to virtual address to fix the incorrectness when displaying symbol offset. This is inconvenient in scenario that the binary is stripped and symbol cannot be resolved. If someone wants to further resolve symbols for specific binaries later, he would need an extra step to translate virtual address to binary offset with mapping information recorded in perf.data, which can be difficult for people not familiar with perf. This patch modifies function sample__fprintf_callchain to print binary offset for userspace addr with dsos, and virtual address otherwise. It does not affect symbol offset calculation so symoff remains correct. Before applying this patch: test 1512 78.711307: 533129 cycles: aaaae0da07f4 [unknown] (/tmp/test) aaaae0da0704 [unknown] (/tmp/test) ffffbe9f7ef4 __libc_start_main+0xe4 (/lib64/libc-2.31.so) After this patch: test 1519 111.330127: 406953 cycles: 7f4 [unknown] (/tmp/test) 704 [unknown] (/tmp/test) 20ef4 __libc_start_main+0xe4 (/lib64/libc-2.31.so) Fixes: 19610184693c("perf script: Show virtual addresses instead of offsets") Signed-off-by: Lexi Shao <[email protected]> Cc: Alexander Shishkin <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: QiuXi <[email protected]> Cc: Wangbing <[email protected]> Cc: Xiaoming Ni <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf intel-pt: Support itrace d+o option to direct debug log to stdoutAdrian Hunter2-6/+7
It can be useful to see debug output in between normal output. Add support for AUXTRACE_LOG_FLG_USE_STDOUT to Intel PT. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf auxtrace: Add itrace d+o option to direct debug log to stdoutAdrian Hunter1-0/+2
It can be useful to see debug output in between normal output. Add 'o' to the flags of debug option 'd', so that '--itrace=d+o' can specify output of the debug log to stdout. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf intel-pt: Support itrace A option to approximate IPCAdrian Hunter3-4/+14
Normally, for cycle-acccurate mode, IPC values are an exact number of instructions and cycles. Due to the granularity of timestamps, that happens only when a CYC packet correlates to the event. Support the itrace 'A' option, to use instead, the number of cycles associated with the current timestamp. This provides IPC information for every change of timestamp, but at the expense of accuracy. Due to the granularity of timestamps, the actual number of cycles increases even though the cycles reported does not. The number of instructions is known, but if IPC is reported, cycles can be too low and so IPC is too high. Note that inaccuracy decreases as the period of sampling increases i.e. if the number of cycles is too low by a small amount, that becomes less significant if the number of cycles is large. Furthermore, it can be used in conjunction with dlfilter-show-cycles.so to provide higher granularity cycle information. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf auxtrace: Add itrace A option to approximate IPCAdrian Hunter2-0/+6
Add an option to specify that synthesized IPC can be approximate, rather than completely accurate. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-27perf auxtrace: Add missing Z option to ITRACE_HELPAdrian Hunter1-0/+1
ITRACE_HELP is used by perf commands to display help text for the --itrace option. Add missing Z option. Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-26Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo1-2/+2
To pick up the fixes from upstream. Fix simple conflict on session.c related to the file position fix that went upstream and is touched by the active decomp changes in perf/core. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf dso: Fix /proc/kcore access on 32 bit systemsJames Clark1-1/+1
Because _LARGEFILE64_SOURCE is set in perf, file offset sizes can be 64 bits. If a workflow needs to open /proc/kcore on a 32 bit system (for example to decode Arm ETM kernel trace) then the size value will be wrapped to 32 bits in the function file_size() at this line: dso->data.file_size = st.st_size; Setting the file_size member to be u64 fixes the issue and allows /proc/kcore to be opened. Reported-by: Denis Nikitin <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf list: Display hybrid PMU events with cpu typeJin Yao6-12/+39
Add a new option '--cputype' to 'perf list' to display core-only PMU events or atom-only PMU events. Each hybrid PMU event has been assigned with a PMU name, this patch compares the PMU name before listing the result. For example: perf list --cputype atom ... cache: core_reject_l2q.any [Counts the number of request that were not accepted into the L2Q because the L2Q is FULL. Unit: cpu_atom] ... The "Unit: cpu_atom" is displayed in the brief description section to indicate this is an atom event. Signed-off-by: Jin Yao <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Introduce reader EOF functionAlexey Bayduraev1-1/+7
Introduce function to check end-of-file status. Reviewed-by: Jiri Olsa <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Link: https://lore.kernel.org/r/b3b0e0904da01f9ec84d4ae9368df99ecd231598.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Introduce reader return codesAlexey Bayduraev1-3/+8
Add READER_OK and READER_NODATA return codes to make the code more clear. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/5fca481e91c3c5d2ba033d4c6e9b969f8033ab0f.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Move the event read code to a separate functionAlexey Bayduraev1-15/+31
Separate the reading code of a single event to a new reader__read_event() function. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/ffe570d937138dd24f282978ce7ed9c46a06ff9b.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Move unmap code to reader__mmapAlexey Bayduraev1-17/+13
Move the unmapping code to reader__mmap(), so that the mmap code is located together. Move the head/file_offset computation to reader__mmap(), so all the offset computation is located together and in one place only. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/f1c5e17cfa1ecfe912d10b411be203b55d148bc7.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Move reader map code to a separate functionAlexey Bayduraev1-15/+28
Move the mapping code into a separate reader__mmap() function. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/e445de5bb85bbd91287986802d6ed0ce1b419b5a.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Move init/release code to separate functionsAlexey Bayduraev1-13/+32
Separate init/release code into reader__init() and reader__release_decomp() functions. Remove a duplicate call to ui_progress__init_size(), the same call can be found in __perf_session__process_events(). For multiple traces ui_progress should be initialized by total size before reader__init() calls. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/8bacf247de220be8e57af1d2b796322175f5e257.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Introduce decompressor in reader objectAlexey Bayduraev2-16/+33
Introduce a decompressor data structure with pointers to decomp objects and to zstd object. We cannot just move session->zstd_data to decomp_data as session->zstd_data is not only used for decompression. Adding decompressor data object to reader object and introducing active_decomp into perf_session object to select current decompressor. Thus decompression could be executed separately for each data file. Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/0eee270cb52aebcbd029c8445d9009fd17709d53.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf session: Move all state items to reader objectAlexey Bayduraev1-28/+35
We need all the state info about reader in separate object to load data from multiple files, so we can keep multiple readers at the same time. Moving all items that need to be kept from reader__process_events to the reader object. Introducing mmap_cur to keep current mapping. Suggested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Reviewed-by: Riccardo Mancini <[email protected]> Signed-off-by: Alexey Bayduraev <[email protected]> Tested-by: Riccardo Mancini <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/5c7bdebfaadd7fcb729bd999b181feccaa292e8e.1634113027.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-25perf intel-pt: Add support for PERF_RECORD_AUX_OUTPUT_HW_IDAdrian Hunter1-4/+81
Originally, software only supported redirecting at most one PEBS event to Intel PT (PEBS-via-PT) because it was not able to differentiate one event from another. To overcome that, add support for the PERF_RECORD_AUX_OUTPUT_HW_ID side-band event. Committer notes: Cast the pointer arg to for_each_set_bit() to (unsigned long *), to fix the build on 32-bit systems. Reviewed-by: Alexander Shishkin <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-22perf bpf: Switch to new btf__raw_data APIHengqi Chen1-1/+1
Replace the call to btf__get_raw_data with new API btf__raw_data. The old APIs will be deprecated in libbpf v0.7+. No functionality change. Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-10-20perf tools: Add support for PERF_RECORD_AUX_OUTPUT_HW_IDAdrian Hunter6-0/+41
The PERF_RECORD_AUX_OUTPUT_HW_ID event provides a way to match AUX output data like Intel PT PEBS-via-PT back to the event that it came from, by providing a hardware ID that is present in the AUX output. Reviewed-by: Alexander Shishkin <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Allow modifiers on metricsIan Rogers1-27/+98
By allowing modifiers on metrics we can, for example, gather the same metric for kernel and user mode. On a SkylakeX with TopDownL1 this gives: $ perf stat -M TopDownL1:u,TopDownL1:k -a sleep 2 Performance counter stats for 'system wide': 849,855,577 uops_issued.any:k # 0.06 Bad_Speculation:k # 0.51 Backend_Bound:k (16.71%) 1,995,257,996 cycles:k # 7981031984.00 SLOTS:k # 0.35 Frontend_Bound:k # 0.08 Retiring:k (16.71%) 2,791,940,753 idq_uops_not_delivered.core:k (16.71%) 641,961,928 uops_retired.retire_slots:k (16.71%) 72,239,337 int_misc.recovery_cycles:k (16.71%) 2,294,413,647 uops_issued.any:u # 0.04 Bad_Speculation:u # 0.39 Backend_Bound:u (16.78%) 1,333,248,940 cycles:u # 5332995760.00 SLOTS:u # 0.16 Frontend_Bound:u # 0.40 Retiring:u (16.78%) 858,517,081 idq_uops_not_delivered.core:u (16.78%) 2,153,789,582 uops_retired.retire_slots:u (16.78%) 19,373,627 int_misc.recovery_cycles:u (16.78%) 31,503,661 cpu_clk_unhalted.one_thread_active:k # 0.18 CoreIPC_SMT:k (16.73%) 315,454,104 inst_retired.any:k # 315454104.00 Instructions:k (16.73%) 42,533,729 cpu_clk_unhalted.ref_xclk:k (16.73%) 2,043,119,037 cpu_clk_unhalted.thread:k (16.73%) 28,843,803 cpu_clk_unhalted.one_thread_active:u # 1.55 CoreIPC_SMT:u (16.60%) 2,153,353,869 inst_retired.any:u # 2153353869.00 Instructions:u (16.60%) 28,844,743 cpu_clk_unhalted.ref_xclk:u (16.60%) 1,387,544,378 cpu_clk_unhalted.thread:u (16.60%) 308,031,603 inst_retired.any:k # 0.15 CoreIPC:k (33.19%) 2,036,774,753 cycles:k (33.19%) 1,994,344,281 inst_retired.any:u # 1.59 CoreIPC:u (33.18%) 1,251,538,227 cycles:u (33.18%) 2.000342948 seconds time elapsed Modifiers are naively copy and pasted on to events, this can yield errors like: $ perf stat -M Kernel_Utilization:k -a sleep 2 event syntax error: '..d.thread:k/kk,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/k..' \___ Bad modifier Usage: perf stat [<options>] [<command>] -M, --metrics <metric/metric group list> monitor specified metrics or metric groups (separated by ,) When modifiers are present with constraints, from --metric-no-group or the NMI watchdog, they are no longer placed in the same set - which may miss deduplicating events. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf parse-events: Identify broken modifiersIan Rogers1-0/+10
Previously the broken modifier causes a usage message to printed but nothing else. After: $ perf stat -e 'cycles:kk' -a sleep 2 event syntax error: 'cycles:kk' \___ Bad modifier Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events $ perf stat -e '{instructions,cycles}:kk' -a sleep 2 event syntax error: '..ns,cycles}:kk' \___ Bad modifier Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Switch fprintf() to pr_err()Ian Rogers1-1/+1
There's no clear reason for the inconsistency that stems from the initial commit. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metrics: Modify setup and deduplicationIan Rogers1-251/+262
Previously find_evsel_group was trying to share events while mark-sweeping to eliminate unused events, this was complicated and had issues around uncore events and grouped sharing. This was further complicated by the event string being created while metrics and metric groups were being added, with the string affecting the evlist order. This change moves deduplication before event parsing. Ungrouped events are placed in a single combined set. Groups are checked to see if an earlier (larger) group can support their events. As the deduplication and sharing detection is done on metric IDs before parsing, wildcard expansion problems with uncore events are avoided. Overall the code is simpler while working better. An example of failing to deduplicate can be seen with a list of metrics like the following, where in the after case multiplexing has been avoided: Before: $ perf stat -M Bad_Speculation,Backend_Bound,Frontend_Bound,Retiring -a sleep 2 Performance counter stats for 'system wide': 959,620,872 uops_issued.any # 0.06 Bad_Speculation (50.03%) 2,163,072,261 cycles # 0.09 Retiring (50.03%) 735,827,436 uops_retired.retire_slots (50.03%) 74,676,484 int_misc.recovery_cycles (50.03%) 987,062,794 uops_issued.any # 0.50 Backend_Bound (49.97%) 2,203,734,187 cycles # 0.35 Frontend_Bound (49.97%) 3,085,016,091 idq_uops_not_delivered.core (49.97%) 758,599,232 uops_retired.retire_slots (49.97%) 75,807,526 int_misc.recovery_cycles (49.97%) 2.002103760 seconds time elapsed After: $ sudo perf stat -M Bad_Speculation,Backend_Bound,Frontend_Bound,Retiring -a sleep 2 Performance counter stats for 'system wide': 769,694,676 uops_issued.any # 0.08 Bad_Speculation # 0.41 Backend_Bound 1,087,548,633 cycles # 0.38 Frontend_Bound # 0.14 Retiring 1,642,085,777 idq_uops_not_delivered.core 603,112,590 uops_retired.retire_slots 43,787,854 int_misc.recovery_cycles 2.003844383 seconds time elapsed Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf expr: Add subset_of_ids() utilityIan Rogers2-0/+17
Add a helper that returns true if all the IDs in needles are present in haystack. Later this will be used in sharing events between metrics. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Encode and use metric-id as qualifierIan Rogers4-78/+228
For a metric like IPC a group of events like {instructions,cycles}:W would be formed. If the events names were changed in parsing then the metric expression parser would fail to find them. This change makes the event encoding be something like: {instructions/metric-id=instructions/, cycles/metric-id=cycles/} and then uses the evsel's stable metric-id value to locate the events. This fixes the case that an event is restricted to user because of the paranoia setting: $ echo 2 > /proc/sys/kernel/perf_event_paranoid $ perf stat -M IPC /bin/true Performance counter stats for '/bin/true': 150,298 inst_retired.any:u # 0.77 IPC 187,095 cpu_clk_unhalted.thread:u 0.002042731 seconds time elapsed 0.000000000 seconds user 0.002377000 seconds sys Adding the metric-id as a qualifier has a complication in that qualifiers will become embedded in qualifiers. For example, msr/tsc/ could become msr/tsc,metric-id=msr/tsc// which will fail parse-events. To solve this problem the metric is encoded and decoded for the metric-id with !<num> standing in for an encoded value. Previously ! wasn't parsed. With this msr/tsc/ becomes msr/tsc,metric-id=msr!3tsc!3/ The metric expression parser is changed so that @ isn't changed to /, instead this is done when the ID is encoded for parse events. metricgroup__add_metric_non_group() and metricgroup__add_metric_weak_group() need to inject the metric-id qualifier, so to avoid repetition they are merged into a single metricgroup__build_event_string with error codes more rigorously checked. stat-shadow's prepare_metric() uses the metric-id to match the metricgroup code. As "metric-id=..." is added to all events, it is adding during testing with the fake PMU. This complicates pmu_str_check code as PE_PMU_EVENT_FAKE won't match as part of a configuration. The testing fake PMU case is fixed so that if a known qualifier with an ! is parsed then it isn't reported as a fake PMU. This is sufficient to pass all testing but it and the original mechanism are somewhat brittle. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf parse-events: Allow config on kernel PMU eventsIan Rogers3-32/+52
An event like inst_retired.any on an Intel skylake is found in the pmu-events code created from the pipeline event JSON. The event is an alias for cpu/event=0xc0,period=2000003/ and parse-events recognizes the event with the token PE_KERNEL_PMU_EVENT. The parser doesn't currently allow extra configuration on such events, except for modifiers, so: $ perf stat -e inst_retired.any// /bin/true event syntax error: 'inst_retired.any//' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events This patch adds configuration to these events which can be useful for a number of parameters like name and call-graph: $ sudo perf record -e inst_retired.any/call-graph=lbr/ -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.856 MB perf.data (44 samples) ] It is necessary for the metric code so that we may add metric-id values to these events before they are parsed. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf parse-events: Add new "metric-id" termIan Rogers8-45/+107
Add a new "metric-id" term to events so that metric parsing can set an ID that can be reliably looked up. Metric parsing currently will turn a metric like "instructions/cycles" into a parse events string of "{instructions,cycles}:W". However, parse-events may change "instructions" into "instructions:u" if perf_event_paranoid=2. When this happens expr__resolve_id currently fails as stat-shadow adds the ID "instructions:u" to match with the counter value and the metric tries to look up the ID just "instructions". A later patch will use the new term. An example of the current problem: $ echo -1 > /proc/sys/kernel/perf_event_paranoid $ perf stat -M IPC /bin/true Performance counter stats for '/bin/true': 1,217,161 inst_retired.any # 0.97 IPC 1,250,389 cpu_clk_unhalted.thread 0.002064773 seconds time elapsed 0.002378000 seconds user 0.000000000 seconds sys $ echo 2 > /proc/sys/kernel/perf_event_paranoid $ perf stat -M IPC /bin/true Performance counter stats for '/bin/true': 150,298 inst_retired.any:u # nan IPC 187,095 cpu_clk_unhalted.thread:u 0.002042731 seconds time elapsed 0.000000000 seconds user 0.002377000 seconds sys Note: nan IPC is printed as an effect of "perf metric: Use NAN for missing event IDs." but earlier versions of perf just fail with a parse error and display no value. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf parse-events: Add const to evsel nameIan Rogers6-20/+27
The evsel name is strdup-ed before assignment and so can be const. A later change will add another similar string. Using const makes it clearer that these are not out arguments. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Simplify metric_refs calculationIan Rogers1-54/+23
Don't build a list and then turn to an array, just directly build the array. The size of the array is known due to the search for a duplicate. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Document the internal 'struct metric'Ian Rogers1-0/+20
Add documentation as part of code tidying. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Comment data structuresIan Rogers1-0/+27
Document the data structures maintained by metricgroup.c and used by stat-shadow.c for metric output. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Modify resolution and recursion checkIan Rogers4-265/+174
Modify resolution. Rather than resolving a list of metrics, resolve a metric immediately after it is added. This simplifies knowing the root of the metric's tree so that IDs may be associated with it. A bug in the current implementation is that all the IDs were placed on the first metric in a metric group. Rather than maintain data on IDs' parents to detect cycles, maintain a list of visited metrics and detect cycles if the same metric is visited twice. Only place the root metric onto the list of metrics. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Only add a referenced metric onceIan Rogers1-3/+9
If a metric references other metrics then the same other metrics may be referenced more than once, but the events and metric ref are only needed once. An example of this is in tests/parse-metric.c where DCache_L2_Hits references the metric DCache_L2_All_Hits twice, once directly and once through DCache_L2_All. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-10-20perf metric: Add metric new() and free() methodsIan Rogers1-62/+75
Metrics are complex enough that a new/free reduces the risk of memory leaks. Move static functions used in new. Reviewed-by: John Garry <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrew Kilroy <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Changbin Du <[email protected]> Cc: Denys Zagorui <[email protected]> Cc: Fabian Hemmer <[email protected]> Cc: Felix Fietkau <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jacob Keller <[email protected]> Cc: Jiapeng Chong <[email protected]> Cc: Jin Yao <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Joakim Zhang <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kees Kook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nicholas Fraser <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paul Clarke <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: ShihCheng Tu <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Sumanth Korikkar <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Wan Jiabing <[email protected]> Cc: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>