aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-29MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainerJianjun Wang1-0/+1
Update entry for MediaTek PCIe controller, add Jianjun Wang as MediaTek PCI co-maintainer. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jianjun Wang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Acked-by: Ryder Lee <[email protected]>
2021-04-29PCI: mediatek-gen3: Add system PM supportJianjun Wang1-0/+113
Add suspend_noirq and resume_noirq callback functions to implement PM system suspend and resume hooks for the MediaTek Gen3 PCIe controller. When the system suspends, trigger the PCIe link to enter the L2 state and pull down the PERST# pin, gating the clocks of the MAC layer, and then power-off the physical layer to provide power-saving. When the system resumes, the PCIe link should be re-established and the related control register values should be restored. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jianjun Wang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Acked-by: Ryder Lee <[email protected]>
2021-04-29PCI: mediatek-gen3: Add MSI supportJianjun Wang1-0/+276
Add MSI support for MediaTek Gen3 PCIe controller. This PCIe controller supports up to 256 MSI vectors, the MSI hardware block diagram is as follows: +-----+ | GIC | +-----+ ^ | port->irq | +-+-+-+-+-+-+-+-+ |0|1|2|3|4|5|6|7| (PCIe intc) +-+-+-+-+-+-+-+-+ ^ ^ ^ | | ... | +-------+ +------+ +-----------+ | | | +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets) +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | (MSI vectors) | | | | | | | | | | | | (MSI SET0) (MSI SET1) ... (MSI SET7) With 256 MSI vectors supported, the MSI vectors are composed of 8 sets, each set has its own address for MSI message, and supports 32 MSI vectors to generate interrupt. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jianjun Wang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Acked-by: Ryder Lee <[email protected]>
2021-04-29PCI: mediatek-gen3: Add INTx supportJianjun Wang1-0/+172
Add INTx support for MediaTek Gen3 PCIe controller. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jianjun Wang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Acked-by: Ryder Lee <[email protected]>
2021-04-29PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192Jianjun Wang3-0/+480
MediaTek's PCIe host controller has three generation HWs, the new generation HW is an individual bridge, it supports Gen3 speed and compatible with Gen2, Gen1 speed. Add support for new Gen3 controller which can be found on MT8192. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jianjun Wang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Acked-by: Ryder Lee <[email protected]>
2021-04-29PCI: dwc: Move iATU detection earlierHou Zhiqiang4-3/+12
dw_pcie_ep_init() depends on the detected iATU region numbers to allocate the in/outbound window management bitmap. It fails after 281f1f99cf3a ("PCI: dwc: Detect number of iATU windows"). Move the iATU region detection into a new function, move the detection to the very beginning of dw_pcie_host_init() and dw_pcie_ep_init(). Also remove it from the dw_pcie_setup(), since it's more like a software initialization step than hardware setup. Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-pci/[email protected] Link: https://lore.kernel.org/r/[email protected] Fixes: 281f1f99cf3a ("PCI: dwc: Detect number of iATU windows") Tested-by: Kunihiko Hayashi <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Tested-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Hou Zhiqiang <[email protected]> [DB: moved dw_pcie_iatu_detect to happen after host_init callback] Signed-off-by: Dmitry Baryshkov <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Rob Herring <[email protected]> Cc: [email protected] # v5.11+ Cc: Marek Szyprowski <[email protected]>
2021-04-29PCI: dwc/intel-gw: Remove unused functionJiapeng Chong1-5/+0
Fix the following clang warning: drivers/pci/controller/dwc/pcie-intel-gw.c:84:19: warning: unused function 'pcie_app_rd' [-Wunused-function]. Link: https://lore.kernel.org/r/1618475577-99198-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Krzysztof Wilczyński <[email protected]>
2021-04-29PCI: dwc: Move dw_pcie_msi_init() to dw_pcie_setup_rc()Jisheng Zhang1-1/+2
If the host which makes use of IP's integrated MSI Receiver losts power during suspend, we need to reinit the RC and MSI Receiver in resume. But after we move dw_pcie_msi_init() into the core, we have no API to do so. Usually the dwc users need to call dw_pcie_setup_rc() to reinit the RC, we can solve this problem by moving dw_pcie_msi_init() to dw_pcie_setup_rc(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jisheng Zhang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Rob Herring <[email protected]>
2021-04-29PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functionsKrzysztof Wilczyński2-42/+40
The sysfs_emit() and sysfs_emit_at() functions were introduced to make it less ambiguous which function is preferred when writing to the output buffer in a device attribute's "show" callback [1]. Convert the PCI sysfs object "show" functions from sprintf(), snprintf() and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the latter is aware of the PAGE_SIZE buffer and correctly returns the number of bytes written into the buffer. No functional change intended. [1] Documentation/filesystems/sysfs.rst [bhelgaas: drop dsm_label_utf16s_to_utf8s(), link speed/width changes] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Rearrange smbios_attr_group and acpi_attr_groupKrzysztof Wilczyński1-26/+26
Collect the smbios_attr_group and acpi_attr_group together in the logical order. No functional change intended. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Tidy SMBIOS & ACPI label attributesKrzysztof Wilczyński1-23/+10
Update coding style to reduce distraction. No functional change intended. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Convert "index", "acpi_index", "label" to static attributesKrzysztof Wilczyński3-77/+16
The "label", "index", and "acpi_index" sysfs attributes show firmware label information about the device. If the ACPI Device Name _DSM is implemented for the device, we have: label Device name (optional, may be null) acpi_index Instance number (unique under \_SB scope) When there is no ACPI _DSM and SMBIOS provides an Onboard Devices structure for the device, we have: label Reference Designation, e.g., a silkscreen label index Device Type Instance Previously these attributes were dynamically created either by pci_bus_add_device() or the pci_sysfs_init() initcall, but since they don't need to be created or removed dynamically, we can use a static attribute so the device model takes care of addition and removal automatically. Convert "label", "index", and "acpi_index" to static attributes. Presence of the ACPI _DSM (device_has_acpi_name()) determines whether the ACPI information (label, acpi_index) or the SMBIOS information (label, index) is visible. [bhelgaas: commit log, split to separate patch, add "pci_dev_" prefix] Suggested-by: Oliver O'Halloran <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Define SMBIOS label attributes with DEVICE_ATTR*()Krzysztof Wilczyński1-23/+18
Use DEVICE_ATTR*() to simplify definition of the SMBIOS label attributes. No functional change intended. Note that dev_attr_smbios_label requires __ATTR() because the "label" attribute can be exposed via either ACPI or SMBIOS, and we already have the ACPI label_show() function in this file. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Define ACPI label attributes with DEVICE_ATTR*()Krzysztof Wilczyński1-23/+15
Use DEVICE_ATTR*() to simplify definitions of the ACPI label attributes. No functional change intended. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Rename device_has_dsm() to device_has_acpi_name()Krzysztof Wilczyński1-20/+19
Rename device_has_dsm() to device_has_acpi_name() to better reflect its purpose and move it earlier so it's available for a future SMBIOS .is_visible() function. No functional change intended. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Convert "vpd" to static attributeKrzysztof Wilczyński3-46/+18
The "vpd" sysfs attribute allows access to Vital Product Data (VPD). Previously it was dynamically created either by pci_bus_add_device() or the pci_sysfs_init() initcall, but since it doesn't need to be created or removed dynamically, we can use a static attribute so the device model takes care of addition and removal automatically. Convert "vpd" to a static attribute and use the .is_bin_visible() callback to check whether the device supports VPD. Remove pcie_vpd_create_sysfs_dev_files(), pcie_vpd_remove_sysfs_dev_files(), pci_create_capabilities_sysfs(), and pci_create_capabilities_sysfs(), which are no longer needed. [bhelgaas: This is substantially the same as the earlier patch from Heiner Kallweit <[email protected]>. I included Krzysztof's change here so all the "convert to static attribute" changes are together.] [bhelgaas: rename to vpd_read()/vpd_write() and pci_dev_vpd_attr_group] Suggested-by: Oliver O'Halloran <[email protected]> Based-on: https://lore.kernel.org/r/[email protected] Based-on-patch-by: Heiner Kallweit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Krzysztof Wilczyński <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29PCI/sysfs: Rename "vpd" attribute accessorsBjorn Helgaas1-8/+8
Rename "vpd" attribute accessors so they fit with the BIN_ATTR_RW() macro usage. Currently there is no BIN_ATTR_ADMIN_RW() that uses 0600 permissions, but if there were, it would likely use "vpd_read()" and "vpd_write()". No functional change intended. Extracted from the patch mentioned below by Heiner Kallweit <[email protected]>. Link: https://lore.kernel.org/linux-pci/[email protected]/ Signed-off-by: Bjorn Helgaas <[email protected]>
2021-04-29xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_rangeDarrick J. Wong1-1/+2
The final parameter of filemap_write_and_wait_range is the end of the range to flush, not the length of the range to flush. Fixes: 46afb0628b86 ("xfs: only flush the unshared range in xfs_reflink_unshare") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Chandan Babu R <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2021-04-29xfs: set aside allocation btree blocks from block reservationBrian Foster1-1/+14
The blocks used for allocation btrees (bnobt and countbt) are technically considered free space. This is because as free space is used, allocbt blocks are removed and naturally become available for traditional allocation. However, this means that a significant portion of free space may consist of in-use btree blocks if free space is severely fragmented. On large filesystems with large perag reservations, this can lead to a rare but nasty condition where a significant amount of physical free space is available, but the majority of actual usable blocks consist of in-use allocbt blocks. We have a record of a (~12TB, 32 AG) filesystem with multiple AGs in a state with ~2.5GB or so free blocks tracked across ~300 total allocbt blocks, but effectively at 100% full because the the free space is entirely consumed by refcountbt perag reservation. Such a large perag reservation is by design on large filesystems. The problem is that because the free space is so fragmented, this AG contributes the 300 or so allocbt blocks to the global counters as free space. If this pattern repeats across enough AGs, the filesystem lands in a state where global block reservation can outrun physical block availability. For example, a streaming buffered write on the affected filesystem continues to allow delayed allocation beyond the point where writeback starts to fail due to physical block allocation failures. The expected behavior is for the delalloc block reservation to fail gracefully with -ENOSPC before physical block allocation failure is a possibility. To address this problem, set aside in-use allocbt blocks at reservation time and thus ensure they cannot be reserved until truly available for physical allocation. This allows alloc btree metadata to continue to reside in free space, but dynamically adjusts reservation availability based on internal state. Note that the logic requires that the allocbt counter is fully populated at reservation time before it is fully effective. We currently rely on the mount time AGF scan in the perag reservation initialization code for this dependency on filesystems where it's most important (i.e. with active perag reservations). Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Chandan Babu R <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-04-29xfs: introduce in-core global counter of allocbt blocksBrian Foster3-0/+22
Introduce an in-core counter to track the sum of all allocbt blocks used by the filesystem. This value is currently tracked per-ag via the ->agf_btreeblks field in the AGF, which also happens to include rmapbt blocks. A global, in-core count of allocbt blocks is required to identify the subset of global ->m_fdblocks that consists of unavailable blocks currently used for allocation btrees. To support this calculation at block reservation time, construct a similar global counter for allocbt blocks, populate it on first read of each AGF and update it as allocbt blocks are used and released. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Chandan Babu R <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-04-29xfs: unconditionally read all AGFs on mounts with perag reservationBrian Foster1-11/+23
perag reservation is enabled at mount time on a per AG basis. The upcoming change to set aside allocbt blocks from block reservation requires a populated allocbt counter as soon as possible after mount to be fully effective against large perag reservations. Therefore as a preparation step, initialize the pagf on all mounts where at least one reservation is active. Note that this already occurs to some degree on most default format filesystems as reservation requirement calculations already depend on the AGF or AGI, depending on the reservation type. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Chandan Babu R <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2021-04-29xfs: count free space btree blocks when scrubbing pre-lazysbcount fsesDarrick J. Wong1-1/+38
Since agf_btreeblks didn't exist before the lazysbcount feature, the fs summary count scrubber needs to walk the free space btrees to determine the amount of space being used by those btrees. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Gao Xiang <[email protected]>
2021-04-29xfs: update superblock counters correctly for !lazysbcountDave Chinner2-3/+16
Keep the mount superblock counters up to date for !lazysbcount filesystems so that when we log the superblock they do not need updating in any way because they are already correct. It's found by what Zorro reported: 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev 2. mount $dev $mnt 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) 4. umount $mnt 5. xfs_repair -n $dev and I've seen no problem with this patch. Signed-off-by: Dave Chinner <[email protected]> Reported-by: Zorro Lang <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2021-04-29xfs: don't check agf_btreeblks on pre-lazysbcount filesystemsDarrick J. Wong2-2/+8
The AGF free space btree block counter wasn't added until the lazysbcount feature was added to XFS midway through the life of the V4 format, so ignore the field when checking. Online AGF repair requires rmapbt, so it doesn't need the feature check. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2021-04-29xfs: remove obsolete AGF counter debuggingDarrick J. Wong6-31/+0
In commit f8f2835a9cf3 we changed the behavior of XFS to use EFIs to remove blocks from an overfilled AGFL because there were complaints about transaction overruns that stemmed from trying to free multiple blocks in a single transaction. Unfortunately, that commit missed a subtlety in the debug-mode transaction accounting when a realtime volume is attached. If a realtime file undergoes a data fork mapping change such that realtime extents are allocated (or freed) in the same transaction that a data device block is also allocated (or freed), we can trip a debugging assertion. This can happen (for example) if a realtime extent is allocated and it is necessary to reshape the bmbt to hold the new mapping. When we go to allocate a bmbt block from an AG, the first thing the data device block allocator does is ensure that the freelist is the proper length. If the freelist is too long, it will trim the freelist to the proper length. In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to record the decrement in the AG free list count. Prior to f8f28 we would put the free block back in the free space btrees in the same transaction, which calls xfs_trans_agblocks_delta() to record the increment in the AG free block count. Since AGFL blocks are included in the global free block count (fdblocks), there is no corresponding fdblocks update, so the AGFL free satisfies the following condition in xfs_trans_apply_sb_deltas: /* * Check that superblock mods match the mods made to AGF counters. */ ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) == (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta + tp->t_ag_btree_delta)); The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is the number blocks that were allocated. After commit f8f28 we defer the block freeing to the next chained transaction, which means that the calls to xfs_trans_agflist_delta and xfs_trans_agblocks_delta occur in separate transactions. The (first) transaction that shortens the free list trips on the comparison, which has now become: (X + 0) == ((X) + -1 + 0) because we haven't freed the AGFL block yet; we've only logged an intention to free it. When the second transaction (the deferred free) commits, it will evaluate the expression as: (0 + 0) == (1 + 0 + 0) and trip over that in turn. At this point, the astute reader may note that the two commits tagged by this patch have been in the kernel for a long time but haven't generated any bug reports. How is it that the author became aware of this bug? This originally surfaced as an intermittent failure when I was testing realtime rmap, but a different bug report by Zorro Lang reveals the same assertion occuring on !lazysbcount filesystems. The common factor to both reports (and why this problem wasn't previously reported) becomes apparent if we consider when xfs_trans_apply_sb_deltas is called by __xfs_trans_commit(): if (tp->t_flags & XFS_TRANS_SB_DIRTY) xfs_trans_apply_sb_deltas(tp); With a modern lazysbcount filesystem, transactions update only the percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence xfs_trans_apply_sb_deltas is rarely called. However, updates to the count of free realtime extents are not part of lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or removing data fork mappings to realtime files; similarly, XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems. Dave mentioned in response to an earlier version of this patch: "IIUC, what you are saying is that this debug code is simply not exercised in normal testing and hasn't been for the past decade? And it still won't be exercised on anything other than realtime device testing? "...it was debugging code from 1994 that was largely turned into dead code when lazysbcounters were introduced in 2007. Hence I'm not sure it holds any value anymore." This debugging code isn't especially helpful - you can modify the flcount on one AG and the freeblks of another AG, and it won't trigger. Add the fact that nobody noticed for a decade, and let's just get rid of it (and start testing realtime :P). This bug was found by running generic/051 on either a V4 filesystem lacking lazysbcount; or a V5 filesystem with a realtime volume. Cc: [email protected], [email protected] Fixes: f8f2835a9cf3 ("xfs: defer agfl block frees when dfops is available") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2021-04-29perf build: Defer printing detected features to the end of all feature checksArnaldo Carvalho de Melo1-0/+7
We were doing it in tools/build/Makefile.feature, after running the feature checks, but then in tools/perf/Makefile.config we can call more feature checks when we notice that some feature check failed, like when libbfd wasn't detected and we add libraries to the LDFLAGS of its feature check to try again, etc. Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29tools build: Allow deferring printing the results of feature detectionArnaldo Carvalho de Melo1-10/+17
By setting FEATURE_DISPLAY_DEFERRED=1 a tool may ask for the printout of the detected features in tools/build/Makefile.feature to be done later adter extra feature checks are done that are tool specific. The perf tool will do it via its tools/perf/Makefile.config, as it performs such extra feature checks. Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf build: Regenerate the FEATURE_DUMP file after extra feature checksJiri Olsa1-0/+6
Feature detection is done in tools/build/Makefile.feature, we may exit there with some features not detected and then, in tools/perf/Makefile.config try adding extra libraries to link and then do extra feature checks to see if we now find the feature. This is the case with the disassembler-four-args that checks if the diassembler() function in libopcodes (binutils) has a signature with one or with four arguments, as this is not ABI and they changed it at some point. This is not a problem when doing normal builds, for instance: $ make -C tools/perf O=/tmp/build/perf As we don't use what is in FEATURE-DUMP at that point, but is a problem if we pass FEATURE_DUMP=/previously-detected-features as we do in 'make -C tools/perf build-test' to reuse the feature detection in the many build combinations we test there. When that is done feature-disassembler-four-args will be set to 0, but opensuse 15.1 has the four arguments function signature in disassembler(). The build thus fails. Fix it by rewriting the FEATURE-DUMP file at the end of tools/perf/Makefile.config to register features we retested in that make file. Signed-off-by: Jiri Olsa <[email protected]> Reported-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf session: Dump PERF_RECORD_TIME_CONV eventLeo Yan3-1/+46
Now perf tool uses the common stub function process_event_op2_stub() for dumping TIME_CONV event, thus it doesn't output the clock parameters contained in the event. This patch adds the callback function for dumping the hardware clock parameters in TIME_CONV event. Before: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV : unhandled! [...] After: # perf report -D 0x978 [0x38]: event: 79 . . ... raw event: size 56 bytes . 0000: 4f 00 00 00 00 00 38 00 15 00 00 00 00 00 00 00 O.....8......... . 0010: 00 00 40 01 00 00 00 00 86 89 0b bf df ff ff ff ..@........<BF><DF><FF><FF><FF> . 0020: d1 c1 b2 39 03 00 00 00 ff ff ff ff ff ff ff 00 <D1><C1><B2>9....<FF><FF><FF><FF><FF><FF><FF>. . 0030: 01 01 00 00 00 00 00 00 ........ 0 0 0x978 [0x38]: PERF_RECORD_TIME_CONV ... Time Shift 21 ... Time Muliplier 20971520 ... Time Zero 18446743935180835206 ... Time Cycles 13852918225 ... Time Mask 0xffffffffffffff ... Cap Time Zero 1 ... Cap Time Short 1 : unhandled! [...] Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf session: Add swap operation for event TIME_CONVLeo Yan1-1/+14
Since commit d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV"), the event PERF_RECORD_TIME_CONV has extended the data structure for clock parameters. To be backwards-compatible, this patch adds a dedicated swap operation for the event PERF_RECORD_TIME_CONV, based on checking if the event contains field "time_cycles", it can support both for the old and new event formats. Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf jit: Let convert_timestamp() to be backwards-compatibleLeo Yan2-10/+22
Commit d110162cafc80dad ("perf tsc: Support cap_user_time_short for event TIME_CONV") supports the extended parameters for event TIME_CONV, but it broke the backwards compatibility, so any perf data file with old event format fails to convert timestamp. This patch introduces a helper event_contains() to check if an event contains a specific member or not. For the backwards-compatibility, if the event size confirms the extended parameters are supported in the event TIME_CONV, then copies these parameters. Committer notes: To make this compiler backwards compatible add this patch: - struct perf_tsc_conversion tc = { 0 }; + struct perf_tsc_conversion tc = { .time_shift = 0, }; Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tools: Change fields type in perf_record_time_convLeo Yan1-2/+3
C standard claims "An object declared as type _Bool is large enough to store the values 0 and 1", bool type size can be 1 byte or larger than 1 byte. Thus it's uncertian for bool type size with different compilers. This patch changes the bool type in structure perf_record_time_conv to __u8 type, and pads extra bytes for 8-byte alignment; this can give reliable structure size. Fixes: d110162cafc8 ("perf tsc: Support cap_user_time_short for event TIME_CONV") Suggested-by: Adrian Hunter <[email protected]> Signed-off-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve MacLean <[email protected]> Cc: Yonatan Goldschmidt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tools: Enable libtraceevent dynamic linkingMichael Petlan5-2/+32
Currently we support only static linking with kernel's libtraceevent (tools/lib/traceevent). This patch adds libtraceevent package detection and support to link perf with it dynamically. The libtraceevent package status is displayed with: $ make VF=1 LIBTRACEEVENT_DYNAMIC=1 ... ... libtraceevent: [ on ] Default behavior remains the same (static linking). Committer testing: $ make LIBTRACEEVENT_DYNAMIC=1 VF=1 O=/tmp/build/perf -C tools/perf install-bin |& grep traceevent Makefile.config:1090: *** Error: No libtraceevent devel library found, please install libtraceevent-devel. Stop. $ Signed-off-by: Michael Petlan <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> LPU-Reference: [email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf Documentation: Document intel-hybrid supportJin Yao3-0/+217
Add some words and examples to help understanding of Intel hybrid perf support. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Skip 'perf stat metrics (shadow stat) test' for hybridJin Yao1-0/+3
Currently we don't support shadow stat for hybrid. root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1 Performance counter stats for 'system wide': 12,883,109,591 cpu_core/cycles/ 6,405,163,221 cpu_atom/cycles/ 555,553,778 cpu_core/instructions/ 841,158,734 cpu_atom/instructions/ 1.002644773 seconds time elapsed Now there is no shadow stat 'insn per cycle' reported. We will support it later and now just skip the 'perf stat metrics (shadow stat) test'. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Support 'Convert perf time to TSC' test for hybridJin Yao1-0/+12
Since for "cycles:u' on hybrid platform, it creates two "cycles". So the second evsel in evlist also needs initialization. With this patch, # ./perf test 71 71: Convert perf time to TSC : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Support 'Session topology' test for hybridJin Yao1-2/+11
Force to create one event "cpu_core/cycles/" by default, otherwise in evlist__valid_sample_type, the checking of 'if (evlist->core.nr_entries == 1)' would be failed. # ./perf test 41 41: Session topology : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Support 'Parse and process metrics' test for hybridJin Yao1-2/+6
Some events are not supported. Only pick up some cases for hybrid. # ./perf test 68 68: Parse and process metrics : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Support 'Track with sched_switch' test for hybridJin Yao1-1/+5
Since for "cycles:u' on hybrid platform, it creates two "cycles". So the number of events in evlist is not expected in next test steps. Now we just use one event "cpu_core/cycles:u/" for hybrid. # ./perf test 35 35: Track with sched_switch : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Skip 'Setup struct perf_event_attr' test for hybridJin Yao1-0/+4
For hybrid, the attr.type consists of pmu type id + original type. There will be much changes for this test. Now we temporarily skip this test case and TODO in future. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Add hybrid cases for 'Roundtrip evsel->name' testJin Yao1-7/+12
Since for one hw event, two hybrid events are created. For example, evsel->idx evsel__name(evsel) 0 cycles 1 cycles 2 instructions 3 instructions ... So for comparing the evsel name on hybrid, the evsel->idx needs to be divided by 2. # ./perf test 14 14: Roundtrip evsel->name : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf tests: Add hybrid cases for 'Parse event definition strings' testJin Yao1-0/+171
Add basic hybrid test cases for 'Parse event definition strings' test. # perf test 6 6: Parse event definition strings : Ok Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf record: Uniquify hybrid event nameJin Yao1-0/+28
For perf-record, it would be useful to tell user the pmu which the event belongs to. For example, # perf record -a -- sleep 1 # perf report # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 106 of event 'cpu_core/cycles/' # Event count (approx.): 22043448 # # Overhead Command Shared Object Symbol # ........ ............ ....................... ............................ # ... Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf stat: Warn group events from different hybrid PMUJin Yao6-0/+62
If a group has events which are from different hybrid PMUs, shows a warning: "WARNING: events in group from different hybrid PMUs!" This is to remind the user not to put the core event and atom event into one group. Next, just disable grouping. # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1 WARNING: events in group from different hybrid PMUs! WARNING: grouped events cpus do not match, disabling group: anon group { cpu_core/cycles/, cpu_atom/cycles/ } Performance counter stats for 'system wide': 5,438,125 cpu_core/cycles/ 3,914,586 cpu_atom/cycles/ 1.004250966 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf stat: Filter out unmatched aggregation for hybrid eventJin Yao1-0/+3
perf-stat has supported some aggregation modes, such as --per-core, --per-socket and etc. While for hybrid event, it may only available on part of cpus. So for --per-core, we need to filter out the unavailable cores, for --per-socket, filter out the unavailable sockets, and so on. Before: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 479,530 cpu_core/cycles/ S0-D0-C4 2 175,007 cpu_core/cycles/ S0-D0-C8 2 166,240 cpu_core/cycles/ S0-D0-C12 2 704,673 cpu_core/cycles/ S0-D0-C16 2 865,835 cpu_core/cycles/ S0-D0-C20 2 2,958,461 cpu_core/cycles/ S0-D0-C24 2 163,988 cpu_core/cycles/ S0-D0-C28 2 164,729 cpu_core/cycles/ S0-D0-C32 0 <not counted> cpu_core/cycles/ S0-D0-C33 0 <not counted> cpu_core/cycles/ S0-D0-C34 0 <not counted> cpu_core/cycles/ S0-D0-C35 0 <not counted> cpu_core/cycles/ S0-D0-C36 0 <not counted> cpu_core/cycles/ S0-D0-C37 0 <not counted> cpu_core/cycles/ S0-D0-C38 0 <not counted> cpu_core/cycles/ S0-D0-C39 0 <not counted> cpu_core/cycles/ 1.003597211 seconds time elapsed After: # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 210,428 cpu_core/cycles/ S0-D0-C4 2 444,830 cpu_core/cycles/ S0-D0-C8 2 435,241 cpu_core/cycles/ S0-D0-C12 2 423,976 cpu_core/cycles/ S0-D0-C16 2 859,350 cpu_core/cycles/ S0-D0-C20 2 1,559,589 cpu_core/cycles/ S0-D0-C24 2 163,924 cpu_core/cycles/ S0-D0-C28 2 376,610 cpu_core/cycles/ 1.003621290 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Co-developed-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf stat: Add default hybrid eventsJin Yao1-0/+28
Previously if '-e' is not specified in perf stat, some software events and hardware events are added to evlist by default. Before: # perf stat -a -- sleep 1 Performance counter stats for 'system wide': 24,044.40 msec cpu-clock # 23.946 CPUs utilized 99 context-switches # 4.117 /sec 24 cpu-migrations # 0.998 /sec 3 page-faults # 0.125 /sec 7,000,244 cycles # 0.000 GHz 2,955,024 instructions # 0.42 insn per cycle 608,941 branches # 25.326 K/sec 31,991 branch-misses # 5.25% of all branches 1.004106859 seconds time elapsed Among the events, cycles, instructions, branches and branch-misses are hardware events. One hybrid platform, two hardware events are created for one hardware event. cpu_core/cycles/, cpu_atom/cycles/, cpu_core/instructions/, cpu_atom/instructions/, cpu_core/branches/, cpu_atom/branches/, cpu_core/branch-misses/, cpu_atom/branch-misses/ These events would be added to evlist on hybrid platform. Since parse_events() has been supported to create two hardware events for one event on hybrid platform, so we just use parse_events(evlist, "cycles,instructions,branches,branch-misses") to create the default events and add them to evlist. After: # perf stat -a -- sleep 1 Performance counter stats for 'system wide': 24,043.99 msec cpu-clock # 23.991 CPUs utilized 139 context-switches # 5.781 /sec 25 cpu-migrations # 1.040 /sec 6 page-faults # 0.250 /sec 10,381,751 cpu_core/cycles/ # 431.782 K/sec 1,264,216 cpu_atom/cycles/ # 52.579 K/sec 3,406,958 cpu_core/instructions/ # 141.697 K/sec 414,588 cpu_atom/instructions/ # 17.243 K/sec 705,149 cpu_core/branches/ # 29.327 K/sec 82,358 cpu_atom/branches/ # 3.425 K/sec 40,821 cpu_core/branch-misses/ # 1.698 K/sec 9,086 cpu_atom/branch-misses/ # 377.891 /sec 1.002228863 seconds time elapsed We can see two events are created for one hardware event. One TODO is, the shadow stats looks a bit different, now it's just 'M/sec'. The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats need to be improved in future if we want to get the original shadow stats. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf record: Create two hybrid 'cycles' events by defaultJin Yao7-9/+77
When evlist is empty, for example no '-e' specified in perf record, one default 'cycles' event is added to evlist. While on hybrid platform, it needs to create two default 'cycles' events. One is for cpu_core, the other is for cpu_atom. This patch actually calls evsel__new_cycles() two times to create two 'cycles' events. # ./perf record -vv -a -- sleep 1 ... ------------------------------------------------------------ perf_event_attr: size 120 config 0x400000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 21 ------------------------------------------------------------ perf_event_attr: size 120 config 0x800000000 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|CPU|PERIOD read_format ID disabled 1 inherit 1 freq 1 precise_ip 3 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 27 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 28 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 29 ------------------------------------------------------------ We have to create evlist-hybrid.c otherwise due to the symbol dependency the perf test python would be failed. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf parse-events: Support event inside hybrid pmuJin Yao1-0/+63
On hybrid platform, user may want to enable events on one pmu. Following syntax are supported: cpu_core/<event>/ cpu_atom/<event>/ But the syntax doesn't work for cache event. Before: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 event syntax error: 'cpu_core/LLC-loads/' \___ unknown term 'LLC-loads' for pmu 'cpu_core' Cache events are a bit complex. We can't create aliases for them. We use another solution. For example, if we use "cpu_core/LLC-loads/", in parse_events_add_pmu(), term->config is "LLC-loads". Then we create a new parser to scan "LLC-loads". The parse_events_add_cache() would be called during parsing. The parse_state->hybrid_pmu_name is used to identify the pmu where the event should be enabled on. After: # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1 Performance counter stats for 'system wide': 24,593 cpu_core/LLC-loads/ 1.003911601 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf parse-events: Compare with hybrid pmu nameJin Yao5-8/+34
On hybrid platform, user may want to enable event only on one pmu. Following syntax will be supported: cpu_core/<event>/ cpu_atom/<event>/ For hardware event, hardware cache event and raw event, two events are created by default. We pass the specified pmu name in parse_state and it would be checked before event creation. So next only the event with the specified pmu would be created. Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-04-29perf parse-events: Create two hybrid raw eventsJin Yao1-1/+37
On hybrid platform, same raw event is possible to be available on both cpu_core pmu and cpu_atom pmu. It's supported to create two raw events for one event encoding. For raw events, the attr.type is PMU type. # perf stat -e r3c -a -vv -- sleep 1 Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 ------------------------------------------------------------ perf_event_attr: type 8 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 ------------------------------------------------------------ ... ------------------------------------------------------------ perf_event_attr: type 8 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 r3c: 0: 434449 1001412521 1001412521 r3c: 1: 173162 1001482031 1001482031 r3c: 2: 231710 1001524974 1001524974 r3c: 3: 110012 1001563523 1001563523 r3c: 4: 191517 1001593221 1001593221 r3c: 5: 956458 1001628147 1001628147 r3c: 6: 416969 1001715626 1001715626 r3c: 7: 1047527 1001596650 1001596650 r3c: 8: 103877 1001633520 1001633520 r3c: 9: 70571 1001637898 1001637898 r3c: 10: 550284 1001714398 1001714398 r3c: 11: 1257274 1001738349 1001738349 r3c: 12: 107797 1001801432 1001801432 r3c: 13: 67471 1001836281 1001836281 r3c: 14: 286782 1001923161 1001923161 r3c: 15: 815509 1001952550 1001952550 r3c: 0: 95994 1002071117 1002071117 r3c: 1: 105570 1002142438 1002142438 r3c: 2: 115921 1002189147 1002189147 r3c: 3: 72747 1002238133 1002238133 r3c: 4: 103519 1002276753 1002276753 r3c: 5: 121382 1002315131 1002315131 r3c: 6: 80298 1002248050 1002248050 r3c: 7: 466790 1002278221 1002278221 r3c: 6821369 16026754282 16026754282 r3c: 1162221 8017758990 8017758990 Performance counter stats for 'system wide': 6,821,369 cpu_core/r3c/ 1,162,221 cpu_atom/r3c/ 1.002289965 seconds time elapsed Signed-off-by: Jin Yao <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>