Age | Commit message (Collapse) | Author | Files | Lines |
|
Capitalize topdown metrics' names to follow the intel SDM.
Before:
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
228,094.05 msec cpu-clock # 225.026 CPUs utilized
842 context-switches # 3.691 /sec
224 cpu-migrations # 0.982 /sec
70 page-faults # 0.307 /sec
23,164,105 cycles # 0.000 GHz
29,403,446 instructions # 1.27 insn per cycle
5,268,185 branches # 23.097 K/sec
33,239 branch-misses # 0.63% of all branches
136,248,990 slots # 597.337 K/sec
32,976,450 topdown-retiring # 24.2% retiring
4,651,918 topdown-bad-spec # 3.4% bad speculation
26,148,695 topdown-fe-bound # 19.2% frontend bound
72,515,776 topdown-be-bound # 53.2% backend bound
6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations
3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears
16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth
41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound
1.013634593 seconds time elapsed
After:
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
228,081.94 msec cpu-clock # 225.003 CPUs utilized
824 context-switches # 3.613 /sec
224 cpu-migrations # 0.982 /sec
67 page-faults # 0.294 /sec
22,647,423 cycles # 0.000 GHz
28,870,551 instructions # 1.27 insn per cycle
5,167,099 branches # 22.655 K/sec
32,383 branch-misses # 0.63% of all branches
133,411,074 slots # 584.926 K/sec
32,352,607 topdown-retiring # 24.3% Retiring
4,456,977 topdown-bad-spec # 3.3% Bad Speculation
25,626,487 topdown-fe-bound # 19.2% Frontend Bound
70,955,316 topdown-be-bound # 53.2% Backend Bound
5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations
3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears
16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth
40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound
1.013683125 seconds time elapsed
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The recent kernel added lost count can be read from either read(2) or
ring buffer data with PERF_SAMPLE_READ. As it's a variable length data
we need to access it according to the format info.
But for perf tools use cases, PERF_FORMAT_ID is always set. So we can
only check PERF_FORMAT_LOST bit to determine the data format.
Add sample_read_value_size() and next_sample_read_value() helpers to
make it a bit easier to access. Use them in all places where it reads
the struct sample_read_value.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
A mask encoding of a cpu map is laid out as:
u16 nr
u16 long_size
unsigned long mask[];
However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:
u16 type
char data[]
This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.
Fix the mask struct by creating explicit 32 and 64-bit variants, use a
union to avoid data[] and casts; the struct must be packed so the
layout matches the existing perf.data layout. Taking an address of a
member of a packed struct breaks alignment so pass the packed
perf_record_cpu_map_data to functions, so they can access variables with
the right alignment.
As the 64-bit version has 4 bytes of padding, optimizing writing to only
write the 32-bit version.
Committer notes:
Disable warnings about 'packed' that break the build in some arches like
riscv64, but just around that specific struct.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dave Marchevsky <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf_cpu_map__max() computes the cpumap's maximum value, no need to
iterate over all values.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dave Marchevsky <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Make the cpumap arguments const to make it clearer they are in rather
than out arguments. Make two functions static and remove external
declarations.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dave Marchevsky <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kees Kook <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The pmu_event passed to the pmu_events_table_for_each_event is invalid
after the loop. Copy the entire struct in metricgroup__find_metric.
Reduce the scope of this function to static.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Hide that the pmu_event structs are an array with a new wrapper struct.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The current code assumes that a struct pmu_event can be iterated over
forward until a NULL pmu_event is encountered.
This makes it difficult to refactor pmu_event.
Add a loop function taking a callback function that's passed the struct
pmu_event.
This way the pmu_event is only needed for one element and not an entire
array.
Switch existing code iterating over the pmu_event arrays to use the new
loop function pmu_events_table_for_each_event.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Move usage of the table to pmu-events.c so it may be hidden. By
abstracting the table the implementation can later be changed.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Preparation for hiding pmu_events_map as an implementation detail. While
the map is passed, the table of events is all that is normally wanted.
While modifying the function's types, rename pmu_events_map__find to
pmu_events_table__find to match later encapsulation. Similarly rename
pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Move usage of the table to pmu-events.c so it may be hidden. By
abstracting the table the implementation can later be changed.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Delete the repeated word "into" in comments.
Signed-off-by: shaomin Deng <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Delete the repeated word "to" in comments.
Signed-off-by: shaomin Deng <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When ending a 'perf record' session, the querying of a debuginfod server
can take quite some time. Inform a user about it when -v options is
used.
Signed-off-by: Martin Liška <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use tabs instead of 8 spaces for the indentation.
Signed-off-by: Martin Liška <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
from peer cache line, it can come from a peer core, a peer cluster, or
a remote NUMA node.
This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER. Note, we
take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
with cache level statistics. Therefore, we account the load operations
for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.
So three new metrics are introduced: 'lcl_peer' is for local cache
access, the metric 'rmt_peer' is for remote access (includes remote DRAM
and any caches in remote node), and the metric 'tot_peer' is accounting
the sum value of 'lcl_peer' and 'rmt_peer'.
Reviewed-by: Ali Saidi <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Like Xu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Timothy Hayes <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
the same encoding. I can't find encoding information for any other SPE
implementations to unify their choices with Arm's thus that is left for
future work.
This change populates the mem_lvl_num for Neoverse cores as well as the
deprecated mem_lvl namespace.
Reviewed-by: German Gomez <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Ali Saidi <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Like Xu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Timothy Hayes <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leo Yan <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
it is set.
Before:
memstress 3603 [020] 122.463754: 1 l1d-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 l1d-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 tlb-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 memory: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
After:
memstress 3603 [020] 122.463754: 1 l1d-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 l1d-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 tlb-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 memory: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
Reviewed-by: Ali Saidi <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Like Xu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Timothy Hayes <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
asm/sysreg.h when building arm_spe.h
This cures a current problem where tools/perf/util/arm-spe.c isn't
finding a ARM64 specific asm header, so lets add it for now to make
progress.
Adding a .o specific rule seems clunky, lets try and find if this is
really the right solution.
Signed-off-by: Leo Yan <[email protected]>
Reported-by: Suzuki K Poulose <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Morse <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When -p option used or a workload is given, it needs to handle child
processes. The perf_event can inherit those task events
automatically. We can add a new BPF program in task_newtask
tracepoint to track child processes.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 1
After:
$ sudo perf record -a --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 856
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The current target code uses thread id for tracking tasks because
perf_events need to be opened for each task. But we can use tgid in
BPF maps and check it easily.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current task filter checks task->pid which is different for each
thread. But we want to profile all the threads in the process. So
let's compare process id (or thread-group id: tgid) instead.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 2
After:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 850
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Many cases do not use the extra error information provided by
parse_events and instead pass NULL as the struct parse_events_error
pointer. Add a wrapper for those cases so that the pointer is never
NULL.
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
parse_events() is often called with parse_events_error set to NULL.
Make parse_events_error__handle() not segfault in that case.
A subsequent patch changes to avoid passing NULL in the first place.
Fixes: 43eb05d066795bdf ("perf tests: Support 'Track with sched_switch' test for hybrid")
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add missing free of machine->kallsyms_filename to machine__exit().
Fixes: a5367ecb5353fbf2 ("perf tools: Automatically use guest kcore_dir if present")
Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If a memory allocation fail, we should branch to the error handling path
in order to free some resources allocated a few lines above.
Fixes: 15354d54698648e2 ("perf probe: Generate event name with line number")
Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/b71bcb01fa0c7b9778647235c3ab490f699ba278.1659797452.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some processes store jitted code in memfd mappings to avoid having rwx
mappings. These processes map the code with a writeable mapping and a
read-execute mapping. They write the code using the writeable mapping
and then unmap the writeable mapping. All subsequent execution is
through the read-execute mapping.
perf inject --jit ignores //anon* mappings for each process where a
jitdump is present because it expects to inject mmap events for each
jitted code range, and said jitted code ranges will overlap with the
//anon* mappings.
Ignore /memfd: and [anon:* mappings so that jitted code contained in
/memfd: and [anon:* mappings is treated the same way as jitted code
contained in //anon* mappings.
Signed-off-by: Brian Robbins <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
CSV output is tricky to format and column layout changes are susceptible
to breaking parsers. New JSON-formatted output has variable names to
identify fields that are consistent and informative, making the output
parseable.
CSV output example:
1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
0,,context-switches:u,1204272,100.00,0.000,/sec
0,,cpu-migrations:u,1204272,100.00,0.000,/sec
70,,page-faults:u,1204272,100.00,58.126,K/sec
JSON output example:
{"counter-value" : "3805.723968", "unit" : "msec", "event" :
"cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
: 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
{"counter-value" : "6166.000000", "unit" : "", "event" :
"context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
: 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
{"counter-value" : "466.000000", "unit" : "", "event" :
"cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
: 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
{"counter-value" : "208.000000", "unit" : "", "event" :
"page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
: 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
Also added documentation for JSON option.
There is some tidy up of CSV code including a potential memory over run
in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
Committer notes:
Fixed up using PRIu64 to format u64 values, not %lu.
Committer testing:
⬢[acme@toolbox perf]$ perf stat -j sleep 1
{"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
{"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
{"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
{"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
{"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
{"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
{"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
{"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
{"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
⬢[acme@toolbox perf]$
Signed-off-by: Claire Jensen <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alyssa Ross <[email protected]>
Cc: Claire Jensen <[email protected]>
Cc: Florian Fischer <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Like Xu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Like the normal 'perf lock contention' output, it'd print the number of
lost entries for BPF if exists or -v option is passed.
Currently it uses BROKEN_CONTENDED stat for the lost count (due to full
stack maps).
$ sudo perf lock con -a -b --map-nr-entries 128 sleep 5
...
=== output for debug===
bad: 43, total: 14903
bad rate: 0.29 %
histogram of events caused bad sequence
acquire: 0
acquired: 0
contended: 43
release: 0
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The --map-nr-entries option is to control number of max entries in the
perf lock contention BPF maps.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The lock_contention struct is to carry related fields together and to
minimize the change when we add new config options.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
First noticed with fedora:rawhide:
48 11.10 fedora:rawhide : FAIL gcc version 12.1.1 20220628 (Red Hat 12.1.1-3) (GCC)
util/scripting-engines/trace-event-python.c: In function 'python_start_script':
util/scripting-engines/trace-event-python.c:1899:9: error: 'PySys_SetArgv' is deprecated [-Werror=deprecated-declarations]
1899 | PySys_SetArgv(argc + 1, command_line);
No time now to address this warning, so don't make it an error, in time
we should either add yet more ifdefs to continue supporting older
systems or just convert to whatever new infra python put in place for
argv processing, sigh.
Acked-by: Ian Rogers <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When genelf was introduced it tested for HAVE_LIBCRYPTO not
HAVE_LIBCRYPTO_SUPPORT, which is the define the feature test for openssl
defines, fix it.
This also adds disables the deprecation warning, someone has to fix this
to build with openssl 3.0 before the warning becomes a hard error.
Fixes: 9b07e27f88b9cd78 ("perf inject: Add jitdump mmap injection support")
Reported-by: 谭梓煊 <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Move print_*_events functions out of parse-events.c into a new
print-events.c. Move tracepoint code into tracepoint.c or
trace-event-info.c (sole user). This reduces the dependencies of
parse-events.c and makes it more amenable to being a library in the
future.
Remove some unnecessary definitions from parse-events.h. Fix a
checkpatch.pl warning on using unsigned rather than unsigned int. Fix
some line length warnings too.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Add include linux/stddef.h before perf_events.h for systems where __always_inline isn't pulled in before used, such as older Alpine Linux ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding a #define to side-effect a local include isn't clean, for
example, it inhibits header precompilation. YY_EXTRA_TYPE is
defined to be void* by default, so just remove.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
binutils changed the signature of init_disassemble_info(), which now causes
compilation failures for tools/perf/util/annotate.c, e.g. on debian
unstable.
Relevant binutils commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07
Wire up the feature test and switch to init_disassemble_info_compat(),
which were introduced in prior commits, fixing the compilation failure.
I verified that perf can still disassemble bpf programs by using bpftrace
under load, recording a perf trace, and then annotating the bpf "function"
with and without the changes. With old binutils there's no change in output
before/after this patch. When comparing the output from old binutils (2.35)
to new bintuils with the patch (upstream snapshot) there are a few output
differences, but they are unrelated to this patch. An example hunk is:
1.15 : 55:mov %rbp,%rdx
0.00 : 58:add $0xfffffffffffffff8,%rdx
0.00 : 5c:xor %ecx,%ecx
- 1.03 : 5e:callq 0xffffffffe12aca3c
+ 1.03 : 5e:call 0xffffffffe12aca3c
0.00 : 63:xor %eax,%eax
- 2.18 : 65:leaveq
- 2.82 : 66:retq
+ 2.18 : 65:leave
+ 2.82 : 66:ret
Signed-off-by: Andres Freund <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some functions we use for bpf prologue generation are going to be
deprecated. This change reworks current code not to use them.
We need to replace following functions/struct:
bpf_program__set_prep
bpf_program__nth_fd
struct bpf_prog_prep_result
Currently we use bpf_program__set_prep to hook perf callback before
program is loaded and provide new instructions with the prologue.
We replace this function/ality by taking instructions for specific
program, attaching prologue to them and load such new ebpf programs
with prologue using separate bpf_prog_load calls (outside libbpf
load machinery).
Before we can take and use program instructions, we need libbpf to
actually load it. This way we get the final shape of its instructions
with all relocations and verifier adjustments).
There's one glitch though.. perf kprobe program already assumes
generated prologue code with proper values in argument registers,
so loading such program directly will fail in the verifier.
That's where the fallback pre-load handler fits in and prepends
the initialization code to the program. Once such program is loaded
we take its instructions, cut off the initialization code and prepend
the prologue.
I know.. sorry ;-)
To have access to the program when loading this patch adds support to
register 'fallback' section handler to take care of perf kprobe programs.
The fallback means that it handles any section definition besides the
ones that libbpf handles.
The handler serves two purposes:
- allows perf programs to have special arguments in section name
- allows perf to use pre-load callback where we can attach init
code (zeroing all argument registers) to each perf program
Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The libbpf is switching off support for legacy map definitions [1],
which will break the perf llvm tests.
Moving the base source map definition to BTF-defined, so we need
to use -g compile option for to add debug/BTF info.
[1] https://lore.kernel.org/bpf/[email protected]/
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The perf jvmti agent doesn't create program headers, in this case
fallback on section headers as happened previously.
Committer notes:
To test this, from a public post by Ian:
1) download a Java workload dacapo-9.12-MR1-bach.jar from
https://sourceforge.net/projects/dacapobench/
2) build perf such as "make -C tools/perf O=/tmp/perf NO_LIBBFD=1" it
should detect Java and create /tmp/perf/libperf-jvmti.so
3) run perf with the jvmti agent:
perf record -k 1 java -agentpath:/tmp/perf/libperf-jvmti.so -jar dacapo-9.12-MR1-bach.jar -n 10 fop
4) run perf inject:
perf inject -i perf.data -o perf-injected.data -j
5) run perf report
perf report -i perf-injected.data | grep org.apache.fop
With this patch reverted I see lots of symbols like:
0.00% java jitted-388040-4656.so [.] org.apache.fop.fo.FObj.bind(org.apache.fop.fo.PropertyList)
With the patch (2d86612aacb7805f ("perf symbol: Correct address for bss
symbols")) I see lots of:
dso__load_sym_internal: failed to find program header for symbol:
Lorg/apache/fop/fo/FObj;bind(Lorg/apache/fop/fo/PropertyList;)V
st_value: 0x40
Fixes: 2d86612aacb7805f ("perf symbol: Correct address for bss symbols")
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add -a/--all-cpus and -C/--cpu options for cpu filtering. Also -p/--pid
and --tid options are added for task filtering. The short -t option is
taken for --threads already. Tracking the command line workload is
possible as well.
$ sudo perf lock contention -a -b sleep 1
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add -b/--use-bpf option to use BPF to collect lock contention stats.
For simplicity it now runs system-wide and requires C-c to stop.
Upcoming changes will add the usual filtering.
$ sudo perf lock con -b
^C
contended total wait max wait avg wait type caller
42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20
23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a
6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30
3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c
1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115
1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148
2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b
1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06
2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f
1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Blake Jones <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up the fixes that went upstream via acme/perf/urgent and to get
to v5.19.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.
Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.
The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.
Before:
# ./perf stat -a true
Performance counter stats for 'system wide':
53.70 msec cpu-clock # 25.736 CPUs utilized
80 context-switches # 1.490 K/sec
24 cpu-migrations # 446.951 /sec
52 page-faults # 968.394 /sec
2,788,555 cpu_core/cycles/ # 51.931 M/sec
851,129 cpu_atom/cycles/ # 15.851 M/sec
2,974,030 cpu_core/instructions/ # 55.385 M/sec
416,919 cpu_atom/instructions/ # 7.764 M/sec
586,136 cpu_core/branches/ # 10.916 M/sec
79,872 cpu_atom/branches/ # 1.487 M/sec
14,220 cpu_core/branch-misses/ # 264.819 K/sec
7,691 cpu_atom/branch-misses/ # 143.229 K/sec
0.002086438 seconds time elapsed
After:
# ./perf stat -a true
Performance counter stats for 'system wide':
61.39 msec cpu-clock # 24.874 CPUs utilized
76 context-switches # 1.238 K/sec
24 cpu-migrations # 390.968 /sec
52 page-faults # 847.097 /sec
2,753,695 cpu_core/cycles/ # 44.859 M/sec
903,899 cpu_atom/cycles/ # 14.725 M/sec
2,927,529 cpu_core/instructions/ # 47.690 M/sec
428,498 cpu_atom/instructions/ # 6.980 M/sec
581,299 cpu_core/branches/ # 9.470 M/sec
83,409 cpu_atom/branches/ # 1.359 M/sec
13,641 cpu_core/branch-misses/ # 222.216 K/sec
8,008 cpu_atom/branch-misses/ # 130.453 K/sec
14,761,308 cpu_core/slots/ # 240.466 M/sec
3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring
1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation
5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound
4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound
646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations
1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears
3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth
1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound
0.002467839 seconds time elapsed
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Provide a new solution to replace the reverted commit ac2dc29edd21f9ec
("perf stat: Add default hybrid events")
For the default software attrs, nothing is changed.
For the default hardware attrs, create a new evsel for each hybrid pmu.
With the new solution, adding a new default attr will not require the
special support for the hybrid platform anymore.
Also, the "--detailed" is supported on the hybrid platform
With the patch,
$ perf stat -a -ddd sleep 1
Performance counter stats for 'system wide':
32,231.06 msec cpu-clock # 32.056 CPUs utilized
529 context-switches # 16.413 /sec
32 cpu-migrations # 0.993 /sec
69 page-faults # 2.141 /sec
176,754,151 cpu_core/cycles/ # 5.484 M/sec (41.65%)
161,695,280 cpu_atom/cycles/ # 5.017 M/sec (49.92%)
48,595,992 cpu_core/instructions/ # 1.508 M/sec (49.98%)
32,363,337 cpu_atom/instructions/ # 1.004 M/sec (58.26%)
10,088,639 cpu_core/branches/ # 313.010 K/sec (58.31%)
6,390,582 cpu_atom/branches/ # 198.274 K/sec (58.26%)
846,201 cpu_core/branch-misses/ # 26.254 K/sec (66.65%)
676,477 cpu_atom/branch-misses/ # 20.988 K/sec (58.27%)
14,290,070 cpu_core/L1-dcache-loads/ # 443.363 K/sec (66.66%)
9,983,532 cpu_atom/L1-dcache-loads/ # 309.749 K/sec (58.27%)
740,725 cpu_core/L1-dcache-load-misses/ # 22.982 K/sec (66.66%)
<not supported> cpu_atom/L1-dcache-load-misses/
480,441 cpu_core/LLC-loads/ # 14.906 K/sec (66.67%)
326,570 cpu_atom/LLC-loads/ # 10.132 K/sec (58.27%)
329 cpu_core/LLC-load-misses/ # 10.208 /sec (66.68%)
0 cpu_atom/LLC-load-misses/ # 0.000 /sec (58.32%)
<not supported> cpu_core/L1-icache-loads/
21,982,491 cpu_atom/L1-icache-loads/ # 682.028 K/sec (58.43%)
4,493,189 cpu_core/L1-icache-load-misses/ # 139.406 K/sec (33.34%)
4,711,404 cpu_atom/L1-icache-load-misses/ # 146.176 K/sec (50.08%)
13,713,090 cpu_core/dTLB-loads/ # 425.462 K/sec (33.34%)
9,384,727 cpu_atom/dTLB-loads/ # 291.170 K/sec (50.08%)
157,387 cpu_core/dTLB-load-misses/ # 4.883 K/sec (33.33%)
108,328 cpu_atom/dTLB-load-misses/ # 3.361 K/sec (50.08%)
<not supported> cpu_core/iTLB-loads/
<not supported> cpu_atom/iTLB-loads/
37,655 cpu_core/iTLB-load-misses/ # 1.168 K/sec (33.32%)
61,661 cpu_atom/iTLB-load-misses/ # 1.913 K/sec (50.03%)
<not supported> cpu_core/L1-dcache-prefetches/
<not supported> cpu_atom/L1-dcache-prefetches/
<not supported> cpu_core/L1-dcache-prefetch-misses/
<not supported> cpu_atom/L1-dcache-prefetch-misses/
1.005466919 seconds time elapsed
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xing Zhengjun <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Current perf stat uses the evlist__add_default_attrs() to add the
generic default attrs, and uses arch_evlist__add_default_attrs() to add
the Arch specific default attrs, e.g., Topdown for x86.
It works well for the non-hybrid platforms. However, for a hybrid
platform, the hard code generic default attrs don't work.
Uses arch_evlist__add_default_attrs() to replace the
evlist__add_default_attrs(). The arch_evlist__add_default_attrs() is
modified to invoke the same __evlist__add_default_attrs() for the
generic default attrs. No functional change.
Add default_null_attrs[] to indicate the arch specific attrs.
No functional change for the arch specific default attrs either.
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xing Zhengjun <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The commit 55bcf6ef314ae8ba ("perf: Extend PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE") extends the two types to become PMU aware types for
a hybrid system. However, current evsel__hw_name doesn't take the PMU
type into account. It mistakenly returns the "unknown-hardware" for the
hardware event with a specific PMU type.
Add an arch specific arch_evsel__hw_name() to specially handle the PMU
aware hardware event.
Currently, the extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE is only
supported by X86. Only implement the specific arch_evsel__hw_name() for
X86 in the patch.
Nothing is changed for the other archs.
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Xing Zhengjun <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
bpf_perf_object__next() folded the last element in the list test with the
empty list test. However, this meant that offsets were computed against
null and that a struct list_head was compared against a 'struct
bpf_perf_object'.
Working around this with clang's undefined behavior sanitizer required
-fno-sanitize=null and -fno-sanitize=object-size.
Remove the undefined behavior by using the regular Linux list APIs and
handling the starting case separately from the end testing case.
Looking at uses like bpf_perf_object__for_each(), as the constant NULL
or non-NULL argument can be constant propagated, the code is no less
efficient.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Christy Lee <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Miaoqian Lin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Some symbols are observed with the 'st_value' field zeroed. E.g.
libc.so.6 in Ubuntu contains a symbol '__evoke_link_warning_getwd' which
resides in the '.gnu.warning.getwd' section.
Unlike normal sections, such kind of sections are used for linker
warning when a file calls deprecated functions, but they are not part of
memory images, the symbols in these sections should be dropped.
This patch checks the section attribute SHF_ALLOC bit, if the bit is not
set, it skips symbols to avoid spurious ones.
Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Chang Rui <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When using 'perf mem' and 'perf c2c', an issue is observed that tool
reports the wrong offset for global data symbols. This is a common
issue on both x86 and Arm64 platforms.
Let's see an example, for a test program, below is the disassembly for
its .bss section which is dumped with objdump:
...
Disassembly of section .bss:
0000000000004040 <completed.0>:
...
0000000000004080 <buf1>:
...
00000000000040c0 <buf2>:
...
0000000000004100 <thread>:
...
First we used 'perf mem record' to run the test program and then used
'perf --debug verbose=4 mem report' to observe what's the symbol info
for 'buf1' and 'buf2' structures.
# ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf2 0x30a8-0x30e8
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
symbol__new: buf1 0x3068-0x30a8
...
The perf tool relies on libelf to parse symbols, in executable and
shared object files, 'st_value' holds a virtual address; 'sh_addr' is
the address at which section's first byte should reside in memory, and
'sh_offset' is the byte offset from the beginning of the file to the
first byte in the section. The perf tool uses below formula to convert
a symbol's memory address to a file address:
file_address = st_value - sh_addr + sh_offset
^
` Memory address
We can see the final adjusted address ranges for buf1 and buf2 are
[0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
compiler attribute with 64-byte alignment.
The problem happens for 'sh_offset', libelf returns it as 0x3028 which
is not 64-byte aligned, combining with disassembly, it's likely libelf
doesn't respect the alignment for .bss section, therefore, it doesn't
return the aligned value for 'sh_offset'.
Suggested by Fangrui Song, ELF file contains program header which
contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
segments contain the execution info. A better choice for converting
memory address to file address is using the formula:
file_address = st_value - p_vaddr + p_offset
This patch introduces elf_read_program_header() which returns the
program header based on the passed 'st_value', then it uses the formula
above to calculate the symbol file address; and the debugging log is
updated respectively.
After applying the change:
# ./perf --debug verbose=4 mem report
...
dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf2 0x30c0-0x3100
...
dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
symbol__new: buf1 0x3080-0x30c0
...
Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing")
Reported-by: Chang Rui <[email protected]>
Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Implements workqueue trace bpf function.
Test cases:
# perf kwork -k workqueue lat -b
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
--------------------------------------------------------------------------------------------------------------------------------
(w)addrconf_verify_work | 0002 | 5.856 ms | 1 | 5.856 ms | 111994.634313 s | 111994.640169 s |
(w)vmstat_update | 0001 | 1.247 ms | 1 | 1.247 ms | 111996.462651 s | 111996.463899 s |
(w)neigh_periodic_work | 0001 | 1.183 ms | 1 | 1.183 ms | 111996.462789 s | 111996.463973 s |
(w)neigh_managed_work | 0001 | 0.989 ms | 2 | 1.635 ms | 111996.462820 s | 111996.464455 s |
(w)wb_workfn | 0000 | 0.667 ms | 1 | 0.667 ms | 111996.384273 s | 111996.384940 s |
(w)bpf_prog_free_deferred | 0001 | 0.495 ms | 1 | 0.495 ms | 111986.314201 s | 111986.314696 s |
(w)mix_interrupt_randomness | 0002 | 0.421 ms | 6 | 0.749 ms | 111995.927750 s | 111995.928499 s |
(w)vmstat_shepherd | 0000 | 0.374 ms | 2 | 0.385 ms | 111991.265242 s | 111991.265627 s |
(w)e1000_watchdog | 0002 | 0.356 ms | 5 | 0.390 ms | 111994.528380 s | 111994.528770 s |
(w)vmstat_update | 0000 | 0.231 ms | 2 | 0.365 ms | 111996.384407 s | 111996.384772 s |
(w)flush_to_ldisc | 0006 | 0.165 ms | 1 | 0.165 ms | 111995.930606 s | 111995.930771 s |
(w)flush_to_ldisc | 0000 | 0.094 ms | 2 | 0.095 ms | 111996.460453 s | 111996.460548 s |
--------------------------------------------------------------------------------------------------------------------------------
# perf kwork -k workqueue rep -b
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
--------------------------------------------------------------------------------------------------------------------------------
(w)e1000_watchdog | 0002 | 0.627 ms | 2 | 0.324 ms | 112002.720665 s | 112002.720989 s |
(w)flush_to_ldisc | 0007 | 0.598 ms | 2 | 0.534 ms | 112000.875226 s | 112000.875761 s |
(w)wq_barrier_func | 0007 | 0.492 ms | 1 | 0.492 ms | 112000.876981 s | 112000.877473 s |
(w)flush_to_ldisc | 0007 | 0.281 ms | 1 | 0.281 ms | 112005.826882 s | 112005.827163 s |
(w)mix_interrupt_randomness | 0002 | 0.229 ms | 3 | 0.102 ms | 112005.825671 s | 112005.825774 s |
(w)vmstat_shepherd | 0000 | 0.202 ms | 1 | 0.202 ms | 112001.504511 s | 112001.504713 s |
(w)bpf_prog_free_deferred | 0001 | 0.181 ms | 1 | 0.181 ms | 112000.883251 s | 112000.883432 s |
(w)wb_workfn | 0007 | 0.130 ms | 1 | 0.130 ms | 112001.505195 s | 112001.505325 s |
(w)vmstat_update | 0000 | 0.053 ms | 1 | 0.053 ms | 112001.504763 s | 112001.504815 s |
--------------------------------------------------------------------------------------------------------------------------------
Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|