Age | Commit message (Collapse) | Author | Files | Lines |
|
Rather than scanning all PMUs for a counter name, scan the PMU
associated with the evsel of the sample. This is done to remove a
dependence on pmu-events.h.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Double setting information for an event would produce an error message
associated with the PMU rather than the term that was double setting.
Improve the error message to be on the term.
Before:
$ perf stat -e 'cpu/inst_retired.any,inst_retired.any/' true
event syntax error: 'cpu/inst_retired.any,inst_retired.any/'
\___ Bad event or PMU
Unabled to find PMU or event on a PMU of 'cpu'
Run 'perf list' for a list of valid events
$
After:
$ perf stat -e 'cpu/inst_retired.any,inst_retired.any/' true
event syntax error: '..etired.any,inst_retired.any/'
\___ Bad event or PMU
Unabled to find PMU or event on a PMU of 'cpu'
Initial error:
event syntax error: '..etired.any,inst_retired.any/'
\___ Attempt to set event's scale twice
Run 'perf list' for a list of valid events
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add extra underscore before "for" of pmu_events_table_for_each_event
and pmu_metrics_table_for_each_metric.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In order to be able to lazily compute aliases/events for a PMU, move
the struct perf_pmu_alias into pmu.c.
Add perf_pmu__find_event and perf_pmu__for_each_event that take a
callback that is called for the found event or for each event.
The layout of struct pmu and the event/alias list is unchanged but the
API is altered so that aliases are no longer directly accessed, allowing
for later changes.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The sysfs format files are loaded eagerly in a PMU. Add a flag so that
we create the format but only load the contents when necessary.
Reduce the size of the value in struct perf_pmu_format and avoid holes
so there is no additional space requirement.
For "perf stat -e cycles true" this reduces the number of openat calls
from 648 to 573 (about 12%). The benchmark pmu scan speed is improved
by roughly 5%.
Before:
$ perf bench internals pmu-scan
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 1061.100 usec (+- 9.965 usec)
Average PMU scanning took: 4725.300 usec (+- 260.599 usec)
After:
$ perf bench internals pmu-scan
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 989.170 usec (+- 6.873 usec)
Average PMU scanning took: 4520.960 usec (+- 251.272 usec)
Committer testing:
On a AMD Ryzen 5950x:
Before:
$ perf bench internals pmu-scan -i1000
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 563.466 usec (+- 1.008 usec)
Average PMU scanning took: 1619.174 usec (+- 23.627 usec)
$ perf stat -r5 perf bench internals pmu-scan -i1000
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 583.401 usec (+- 2.098 usec)
Average PMU scanning took: 1677.352 usec (+- 24.636 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 553.254 usec (+- 0.825 usec)
Average PMU scanning took: 1635.655 usec (+- 24.312 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 557.733 usec (+- 0.980 usec)
Average PMU scanning took: 1600.659 usec (+- 23.344 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 554.906 usec (+- 0.774 usec)
Average PMU scanning took: 1595.338 usec (+- 23.288 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 551.798 usec (+- 0.967 usec)
Average PMU scanning took: 1623.213 usec (+- 23.998 usec)
Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs):
3276.82 msec task-clock:u # 0.990 CPUs utilized ( +- 0.82% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
1008 page-faults:u # 307.615 /sec ( +- 0.04% )
12049614778 cycles:u # 3.677 GHz ( +- 0.07% ) (83.34%)
117507478 stalled-cycles-frontend:u # 0.98% frontend cycles idle ( +- 0.33% ) (83.32%)
27106761 stalled-cycles-backend:u # 0.22% backend cycles idle ( +- 9.55% ) (83.36%)
33294953848 instructions:u # 2.76 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.03% ) (83.31%)
6849825049 branches:u # 2.090 G/sec ( +- 0.03% ) (83.37%)
71533903 branch-misses:u # 1.04% of all branches ( +- 0.20% ) (83.30%)
3.3088 +- 0.0302 seconds time elapsed ( +- 0.91% )
$
After:
$ perf stat -r5 perf bench internals pmu-scan -i1000
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 550.702 usec (+- 0.958 usec)
Average PMU scanning took: 1566.577 usec (+- 22.747 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 548.315 usec (+- 0.555 usec)
Average PMU scanning took: 1565.499 usec (+- 22.760 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 548.073 usec (+- 0.555 usec)
Average PMU scanning took: 1586.097 usec (+- 23.299 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 561.184 usec (+- 2.709 usec)
Average PMU scanning took: 1567.153 usec (+- 22.548 usec)
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 1000 times
Average core PMU scanning took: 546.987 usec (+- 0.553 usec)
Average PMU scanning took: 1562.814 usec (+- 22.729 usec)
Performance counter stats for 'perf bench internals pmu-scan -i1000' (5 runs):
3170.86 msec task-clock:u # 0.992 CPUs utilized ( +- 0.22% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
1010 page-faults:u # 318.526 /sec ( +- 0.04% )
11890047674 cycles:u # 3.750 GHz ( +- 0.14% ) (83.27%)
119090499 stalled-cycles-frontend:u # 1.00% frontend cycles idle ( +- 0.46% ) (83.40%)
32502449 stalled-cycles-backend:u # 0.27% backend cycles idle ( +- 8.32% ) (83.30%)
33119141261 instructions:u # 2.79 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.01% ) (83.37%)
6812816561 branches:u # 2.149 G/sec ( +- 0.01% ) (83.29%)
70157855 branch-misses:u # 1.03% of all branches ( +- 0.28% ) (83.38%)
3.19710 +- 0.00826 seconds time elapsed ( +- 0.26% )
$
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Pass the pmu so the aliases and format list can be better abstracted
and later lazily loaded.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Pass the PMU so the format list can be better abstracted and later
lazily loaded.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Did missing conversions in tools/perf/arch/arm*/util/cs-etm.c ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Pass the pmu so the format list can be better abstracted and later
lazily loaded.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Abstract the format list better, hiding it in the PMU, by changing
perf_pmu__config_terms() the PMU rather than the format list in the PMU.
Change the PMU test to pass a dummy PMU for this purpose. Changing the
test allows perf_pmu__del_formats() to become static.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Move declaration from header file to pmu.y and make static.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Avoid having the function in the C and header file, as it is only used
locally by pmu.y.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Rather than read a base path and append into a 2nd path, read the base
path directly into output buffer and append to that.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Done to reduce dependencies on pmu-events.h.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Gaosheng Cui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
I noticed some error with:
# perf list ex_ret_brn
lzma: fopen failed on /usr/lib/modules/5.15.14-100.fc34.x86_64/kernel/net/bluetooth/bnep/bnep.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/gpu/drm/drm_kms_helper.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.18.16-200.fc36.x86_64/kernel/arch/x86/crypto/crct10dif-pclmul.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.16.16-200.fc35.x86_64/kernel/drivers/i2c/busses/i2c-piix4.ko.xz: 'No such file or directory'
<BIG SNIP>
Then using 'perf probe' + 'perf trace' to debug 'perf list', it seems
its some inconsistency in the ~/.debug/ cache where broken build id
symlinks that ends up making it try to uncompress some kernel modules
using the lzma routines:
395.309 perf/3594447 probe_perf:lzma_decompress_to_file(__probe_ip: 6118448, input_string: "/usr/lib/modules/5.18.17-200.fc36.x86_64/kernel/drivers/nvme/host/nvme.ko.xz")
lzma_decompress_to_file (/var/home/acme/bin/perf)
filename__decompress (/var/home/acme/bin/perf)
filename__read_build_id (/var/home/acme/bin/perf)
filename__sprintf_build_id (inlined)
build_id_cache__valid_id (inlined)
build_id_cache__list_all (/var/home/acme/bin/perf)
print_sdt_events (/var/home/acme/bin/perf)
cmd_list (/var/home/acme/bin/perf)
run_builtin (/var/home/acme/bin/perf)
handle_internal_command (inlined)
run_argv (inlined)
main (/var/home/acme/bin/perf)
__libc_start_call_main (/usr/lib64/libc.so.6)
__libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6)
_start (/var/home/acme/bin/perf)
But callers of filename__decompress() already check its return and use
pr_debug(), so be consistent and make functions it calls also use
pr_debug().
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It is undefined behavior to pass NULL as snprintf()'s fmt argument.
Here is an example to trigger the problem:
$ perf stat --metric-only -x, -e instructions -- sleep 1
insn per cycle,
Segmentation fault (core dumped)
With this patch:
$ perf stat --metric-only -x, -e instructions -- sleep 1
insn per cycle,
,
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kaige Ye <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
sizeof(augmented_arg->value) is a power of two.
Similar to what was done in the previous cset for sizeof(saddr), we need
to make sure sizeof(augmented_arg->value) is a power of two to do bounds
checking using &=:
augmented_len &= sizeof(augmented_arg->value) - 1;
Suggested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
a power of two.
We're using the BPF verifier suggestion:
22: (85) call bpf_probe_read#4
R2 min value is negative, either use unsigned or 'var &= const'
That works only when const is a (power of two - 1) so add an assert to
make sure that that is the case.
Suggested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This will allow writing formulas that are conditional on a specific
CPU type or CPU version. It calls through to the existing
strcmp_cpuid_str() function in Perf which has a default weak version,
and an arch specific version for x86 and arm64.
The function takes an 'ID' type value, which is a string. But in this
case Arm CPU IDs are hex numbers prefixed with '0x'. metric.py
assumes strings are only used by event names, and that they can't start
with a number ('0'), so an additional change has to be made to the
regex to convert hex numbers back to 'ID' types.
Signed-off-by: James Clark <[email protected]>
Reviewed-by: John Garry <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Haixin Yu <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Forrington <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sohom Datta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
sizeof(saddr)
This works with:
$ clang -v
clang version 14.0.5 (Fedora 14.0.5-2.fc36)
$
But not with:
$ clang -v
clang version 16.0.6 (Fedora 16.0.6-2.fc38)
$
[root@quaco ~]# perf trace -e connect*,sendto* ping -c 10 localhost
libbpf: prog 'sys_enter_sendto': BPF program load failed: Permission denied
libbpf: prog 'sys_enter_sendto': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function sys_enter_sendto#59
0: R1=ctx(off=0,imm=0) R10=fp0
; int sys_enter_sendto(struct syscall_enter_args *args)
0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
1: (b7) r1 = 0 ; R1_w=0
; int key = 0;
2: (63) *(u32 *)(r10 -4) = r1 ; R1_w=0 R10=fp0 fp-8=0000????
3: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
;
4: (07) r2 += -4 ; R2_w=fp-4
; return bpf_map_lookup_elem(&augmented_args_tmp, &key);
5: (18) r1 = 0xffff8de5a5b8bc00 ; R1_w=map_ptr(off=0,ks=4,vs=8272,imm=0)
7: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0)
8: (bf) r7 = r0 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) R7_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0)
9: (b7) r0 = 1 ; R0_w=1
; if (augmented_args == NULL)
10: (15) if r7 == 0x0 goto pc+25 ; R7_w=map_value(off=0,ks=4,vs=8272,imm=0)
; unsigned int socklen = args->args[5];
11: (79) r1 = *(u64 *)(r6 +56) ; R1_w=scalar() R6_w=ctx(off=0,imm=0)
;
12: (bf) r2 = r1 ; R1_w=scalar(id=2) R2_w=scalar(id=2)
13: (67) r2 <<= 32 ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=0)
14: (77) r2 >>= 32 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
15: (b7) r8 = 128 ; R8=128
; if (socklen > sizeof(augmented_args->saddr))
16: (25) if r2 > 0x80 goto pc+1 ; R2=scalar(umax=128,var_off=(0x0; 0xff))
17: (bf) r8 = r1 ; R1=scalar(id=2) R8_w=scalar(id=2)
; const void *sockaddr_arg = (const void *)args->args[4];
18: (79) r3 = *(u64 *)(r6 +48) ; R3_w=scalar() R6=ctx(off=0,imm=0)
; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg);
19: (bf) r1 = r7 ; R1_w=map_value(off=0,ks=4,vs=8272,imm=0) R7=map_value(off=0,ks=4,vs=8272,imm=0)
20: (07) r1 += 64 ; R1_w=map_value(off=64,ks=4,vs=8272,imm=0)
; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg);
21: (bf) r2 = r8 ; R2_w=scalar(id=2) R8_w=scalar(id=2)
22: (85) call bpf_probe_read#4
R2 min value is negative, either use unsigned or 'var &= const'
processed 22 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'sys_enter_sendto': failed to load: -13
libbpf: failed to load object 'augmented_raw_syscalls_bpf'
libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -13
So use the suggested &= variant since sizeof(saddr) == 128 bytes.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
util/perf_regs.h includes another perf_regs.h:
#include <perf_regs.h>
Here it includes architecture specific header, for example, if we build
arm64 target, the header tools/perf/arch/arm64/include/perf_regs.h is
included.
We use this implicit way to include architecture specific header, which
is not directive; furthermore, util/perf_regs.c is coupled with the
architecture specific definitions.
This patch moves out arch specific header from util/perf_regs.h for
generalizing the 'util' folder, as a result, the source files in 'arch'
folder explicitly include architecture's perf_regs.h.
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Ming Wang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The macros PERF_REGS_MAX and PERF_REGS_MASK are architecture specific,
let's remove them from the common file util/perf_regs.c.
As a side effect, the weak functions arch__intr_reg_mask() and
arch__user_reg_mask() just return zeros, every arch defines its own
functions in the 'arch' folder for returning right values.
Note, we don't need to return intr/user register masks dynamically, this
is because these two functions are invoked during recording phase but
not decoding phase, they are always invoked on the native environment,
thus we don't need to parse them dynamically.
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Ming Wang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We use perf_arch_reg_ip() and perf_arch_reg_sp() to substitute macros
for obtaining the register numbers of SP and IP. This modification
enables cross analysis in the unwinding, therefore, the unwinding is
not restricted to the predefined values by the macros.
Consequently, the macros LIBUNWIND__ARCH_REG_{IP|SP} are removed since
they are no longer used.
Committer notes:
Add missing "util/env.h" header to make sure we have the definition for
perf_env__arch(), that when built with NO_LIBUNWIND=1 isn't available,
i.e. it was being included by sheer luck.
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Ming Wang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The current code uses macros PERF_REG_IP and PERF_REG_SP for parsing
registers and we build perf with these macros statically, which means it
only can correctly analyze CPU registers for the native architecture and
fails to support cross analysis (e.g. we build perf on x86 and cannot
analyze Arm64's registers).
We need to generalize util/perf_regs.c for support multi architectures,
as a first step, this commit introduces new functions perf_arch_reg_ip()
and perf_arch_reg_sp(), these two functions dynamically return IP and SP
register index respectively according to the parameter "arch".
Every architecture has its own functions (like __perf_reg_ip_arm64 and
__perf_reg_sp_arm64), these architecture specific functions are defined
in each arch source file under folder util/perf-regs-arch; at the end
all of them are built into the tool for cross analysis.
Committer notes:
Make DWARF_MINIMAL_REGS() an inline function, so that we can use the
__maybe_unused attribute for the 'arch' parameter, as this will avoid a
build failure when that variable is unused in the callers. That happens
when building on unsupported architectures, the ones without
HAVE_PERF_REGS_SUPPORT defined.
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Ming Wang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Every architecture has a specific register parsing function for
returning register name based on register index, to support cross
analysis (e.g. we use perf x86 binary to parse Arm64's perf data), we
build all these register parsing functions into the tool, this is why
we place all related functions into util/perf_regs.c.
Unfortunately, since util/perf_regs.c needs to include every arch's
perf_regs.h, this easily introduces duplicated definitions coming from
multiple headers, finally it's fragile for building and difficult for
maintenance.
We cannot simply move these register parsing functions into the
corresponding 'arch' folder, the folder is only conditionally built
based on the target architecture.
Therefore, this commit creates a new folder util/perf-regs-arch/ and
uses a dedicated source file to keep every architecture's register
parsing function to avoid definition conflicts.
This is only a refactoring, no functionality change is expected.
Committer notes:
Had to add util/perf-regs-arch/*.c to tools/perf/util/python-ext-sources
to keep 'perf test python' passing.
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eric Lin <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Ming Wang <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
linux/bitfield.h can be included as long as linux/kernel.h is included
first, so change the order of the includes and drop the duplicate macro.
Reviewed-by: John Garry <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jing Zhang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Forrington <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sohom Datta <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add perf_dlfilter_fns.al_cleanup() to do addr_location__exit() on data
passed via perf_dlfilter_fns.resolve_address().
Add dlfilter-test-api-v2 to the "dlfilter C API" test to test it.
Update documentation, clarifying that data returned by APIs should not
be dereferenced after filter_event() and filter_event_early() return.
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions")
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
thread__find_symbol_fb()
As thread__find_symbol_fb() will end up calling thread__find_map() and
it in turn will call these on uninitialized memory:
maps__zput(al->maps);
map__zput(al->map);
thread__zput(al->thread);
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions")
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The `file` parameter in evsel__intval() is checked repeatedly, fix it.
No functional change.
Signed-off-by: Yang Jihong <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
For logical OR operator, the actual sample_flags are in the 'groups'
list so it needs to check entries in the list instead. Otherwise it
would show the following error message.
$ sudo perf record -a -e cycles:p --filter 'period > 100 || weight > 0' sleep 1
Error: cycles:p event does not have sample flags 0
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Actually it should warn on 'weight' is used without WEIGHT flag.
Error: cycles:p event does not have PERF_SAMPLE_WEIGHT
Hint: please add -W option to perf record
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Fixes: 4310551b76e0d676 ("perf bpf filter: Show warning for missing sample flags")
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Now tools/perf/examples/bpf/augmented_syscalls.c is
tools/perf/util/bpf_skel/augmented_syscalls.bpf.c and not enabled as a
BPF event, tidy the comments to reflect this.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Carsten Haitzler <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Previously a BPF event of augmented_raw_syscalls.c could be used to
enable augmentation of syscalls by perf trace. As BPF events are no
longer supported, switch to using a BPF skeleton which when attached
explicitly opens the sysenter and sysexit tracepoints.
The dump map is removed as debugging wasn't supported by the
augmentation and bpf_printk can be used when necessary.
Remove tools/perf/examples/bpf/augmented_raw_syscalls.c so that the
rename/migration to a BPF skeleton captures that this was the source.
Committer notes:
Some minor stylistic changes to help visualizing the diff.
Use libbpf_strerror when failing to load the augmented raw syscalls BPF.
Use bpf_object__for_each_program(prog, trace.skel->obj) to disable auto
attachment for all but the sys_enter, sys_exit tracepoints, to avoid
having to add extra lines as we go adding support for more pointer
receiving syscalls.
Committer testing:
# perf trace -e open* --max-events=10
0.000 ( 0.022 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11
208.833 ( ): gnome-terminal/3223 openat(dfd: CWD, filename: "/proc/51250/cmdline") ...
249.993 ( 0.024 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11
250.118 ( 0.030 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.pressure", flags: RDONLY|CLOEXEC) = 11
250.205 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.current", flags: RDONLY|CLOEXEC) = 11
250.244 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.min", flags: RDONLY|CLOEXEC) = 11
250.282 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.low", flags: RDONLY|CLOEXEC) = 11
250.320 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.swap.current", flags: RDONLY|CLOEXEC) = 11
250.355 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/memory.stat", flags: RDONLY|CLOEXEC) = 11
250.717 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/[email protected]/memory.pressure", flags: RDONLY|CLOEXEC) = 11
#
# perf trace -e *nanosleep* --max-events=10
? ( ): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0
0.007 (10.058 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
10.069 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ...
10.069 (10.056 ms): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0
17.059 ( ): podman/3572 nanosleep(rqtp: 0x7fc4f4d75be0) ...
17.059 (10.061 ms): podman/3572 ... [continued]: nanosleep()) = 0
20.131 (10.059 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
30.195 (10.038 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
40.238 (10.057 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0
50.301 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ...
#
# perf trace -e perf_event* -- perf stat -e instructions,cycles,cache-misses sleep 0.1
0.000 ( 0.011 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
0.013 ( 0.003 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
0.017 ( 0.002 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x3 (PERF_COUNT_HW_CACHE_MISSES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
Performance counter stats for 'sleep 0.1':
1,495,051 instructions # 1.11 insn per cycle
1,347,641 cycles
35,424 cache-misses
0.100935279 seconds time elapsed
0.000924000 seconds user
0.000000000 seconds sys
#
# perf trace -e connect* ssh localhost
0.000 ( 0.012 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.118 ( 0.004 ms): ssh/51346 connect(fd: 6, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.399 ( 0.007 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.426 ( 0.003 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.754 ( 0.009 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET, port: 22, addr: 127.0.0.1 }, addrlen: 16) = 0
0.771 ( 0.010 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0
0.798 ( 0.053 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0
0.870 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.904 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.930 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.957 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
0.981 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
1.006 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
1.036 ( 0.005 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused)
65.077 ( 0.022 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0
66.608 ( 0.014 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0
root@localhost's password:
#
# perf trace -e sendto* ping -c 2 localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.024 ms
0.000 ( 0.011 ms): ping/51357 sendto(fd: 5, buff: 0x7ffcca35e620, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
0.135 ( 0.026 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64
1014.929 ( 0.050 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, flags: CONFIRM, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.046 ms
--- localhost ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1015ms
rtt min/avg/max/mdev = 0.024/0.035/0.046/0.011 ms
#
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Carsten Haitzler <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
New features like the BPF --filter support in perf record have made the
BPF event functionality somewhat redundant. As shown by commit
fcb027c1a4f6 ("perf tools: Revert enable indices setting syntax for BPF
map") and commit 14e4b9f4289a ("perf trace: Raw augmented syscalls fix
libbpf 1.0+ compatibility") the BPF event support hasn't been well
maintained and it adds considerable complexity in areas like event
parsing, not least as '/' is a separator for event modifiers as well as
in paths.
This patch removes support in the event parser for BPF events and then
the associated functions are removed. This leads to the removal of whole
source files like bpf-loader.c. Removing support means that augmented
syscalls in perf trace is broken, this will be fixed in a later commit
adding support using BPF skeletons.
The removal of BPF events causes an unused label warning from flex
generated code, so update build to ignore it:
```
util/parse-events-flex.c:2704:1: error: label ‘find_rule’ defined but not used [-Werror=unused-label]
2704 | find_rule: /* we branch to this label when backing up */
```
Committer notes:
Extracted from a larger patch that was also removing the support for
linking with libllvm and libclang, that were an alternative to using an
external clang execution to compile the .c event source code into BPF
bytecode.
Testing it:
# perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Bad event or PMU
Unabled to find PMU or event on a PMU of 'home'
Initial error:
event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Cannot find PMU `home'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Carsten Haitzler <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This never was in the default build for perf, is difficult to maintain
as it uses clang/llvm internals so ditch it, keeping, for now, the
external compilation of .c BPF into .o bytecode and its subsequent
loading, that is also going to be removed, do it separately to help
bisection and to properly document what is being removed and why.
Committer notes:
Extracted from a larger patch and removed some leftovers, namely
deleting these now unused feature tests:
tools/build/feature/test-clang.cpp
tools/build/feature/test-cxx.cpp
tools/build/feature/test-llvm-version.cpp
tools/build/feature/test-llvm.cpp
Testing the use of BPF events after applying this patch:
To use the external clang/llvm toolchain to compile a .c event and then
use libbpf to load it, to get the syscalls:sys_enter_open* tracepoints
and read the filename pointer, putting it into the ring buffer right
after the usual tracepoint payload for 'perf trace' to then print it:
[root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.c,open* --max-events=10
0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
0.063 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
0.082 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
250.124 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
250.521 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/memory.pressure", flags: RDONLY|CLOEXEC) = 12
251.047 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/memory.current", flags: RDONLY|CLOEXEC) = 12
251.162 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/memory.min", flags: RDONLY|CLOEXEC) = 12
251.242 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/memory.low", flags: RDONLY|CLOEXEC) = 12
251.353 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/memory.swap.current", flags: RDONLY|CLOEXEC) = 12
[root@quaco ~]#
Same thing, but with a prebuilt .o BPF bytecode:
[root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.o,open* --max-events=10
0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
0.083 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
0.062 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
249.985 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
466.763 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/energy_uj") = 13
467.145 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj") = 13
467.311 thermald/1234 openat(dfd: CWD, filename: "/sys/class/thermal/thermal_zone2/temp") = 13
500.040 cgroupify/24006 openat(dfd: 4, filename: ".", flags: RDONLY|CLOEXEC|DIRECTORY|NONBLOCK) = 5
500.295 cgroupify/24006 openat(dfd: 4, filename: "24616/cgroup.procs") = 5
[root@quaco ~]#
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Carsten Haitzler <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: YueHaibing <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
retpolines and IBT
The kprobes optimization check can_optimize() calls
insn_is_indirect_jump() to detect indirect jump instructions in
a target function. If any is found, creating an optprobe is disallowed
in the function because the jump could be from a jump table and could
potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to
indirect thunks which the compiler potentially used to replace original
jumps. This extra check is however unnecessary because jump tables are
disabled when the kernel is built with retpolines. The same is currently
the case with IBT.
Based on this observation, remove the logic to look for calls to
indirect thunks and skip the check for indirect jumps altogether if the
kernel is built with retpolines or IBT. Remove subsequently the symbols
__indirect_thunk_start and __indirect_thunk_end which are no longer
needed.
Dropping this logic indirectly fixes a problem where the range
[__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
return thunk. It caused that machines which used the return thunk as
a mitigation and didn't have it patched by any alternative ended up not
being able to use optprobes in any regular function.
Fixes: 0b53c374b9ef ("x86/retpoline: Use -mfunction-return")
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Suggested-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Petr Pavlu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Fix missing symbol seen in:
```
19: 'import perf' in python :
--- start ---
test child forked, pid 2640936
python usage test: "echo "import sys ; sys.path.insert(0, 'python'); import perf" | '/usr/bin/python3' "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: tools/perf/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: perf_pmus__supports_extended_type
test child finished with -1
---- end ----
'import perf' in python: FAILED!
```
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yang Jihong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
its long_name, type and adjust_symbols
Test "object code reading" fails sometimes for kernel address as below:
Reading object code for memory address: 0xc000000000004c3c
File is: [kernel.kallsyms]
On file address is: 0x14c3c
dso__data_read_offset failed
test child finished with -1
---- end ----
Object code reading: FAILED!
Here dso__data_read_offset() fails for symbol address
0xc000000000004c3c. This is because the DSO long_name here is
"[kernel.kallsyms]" and hence open_dso() fails to open this file. There
is an incorrect DSO to map handling here. The key points here are:
- The DSO long_name is set to "[kernel.kallsyms]". This file is
not present and hence returns error
- The DSO binary type is set to DSO_BINARY_TYPE__NOT_FOUND
- The DSO adjust_symbols member is set to zero
In the end dso__data_read_offset() returns -1 and the address 0x14c3c
can not be resolved. Hence the test fails. But the address actually maps
to the kernel DSO
# objdump -z -d --start-address=0xc000000000004c3c --stop-address=0xc000000000004cbc /home/athira/linux/vmlinux
/home/athira/linux/vmlinux: file format elf64-powerpcle
Disassembly of section .head.text:
c000000000004c3c <exc_virt_0x4c00_system_call+0x3c>:
c000000000004c3c: a6 02 9b 7d mfsrr1 r12
c000000000004c40: 78 13 42 7c mr r2,r2
c000000000004c44: 18 00 4d e9 ld r10,24(r13)
c000000000004c48: 60 c6 4a 61 ori r10,r10,50784
c000000000004c4c: a6 03 49 7d mtctr r10
Fix dso__process_kernel_symbol() to set the binary_type and
adjust_symbols members. dso->adjust_symbols is used by
map__rip_2objdump() which converts the symbol start address to the
objdump address. Also set dso->long_name in dso__load_vmlinux().
Suggested-by: Adrian Hunter <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Disha Goel <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
building with clang < 13.0.0
clang < 13.0.0 doesn't grok -Wno-unused-but-set-variable, so just remove
it to avoid:
error: unknown warning option '-Wno-unused-but-set-variable'; did you mean '-Wno-unused-const-variable'? [-Werror,-Wunknown-warning-option]
make[4]: *** [/git/perf-6.5.0-rc4/tools/build/Makefile.build:128: /tmp/build/perf/util/pmu-flex.o] Error 1
make[4]: *** Waiting for unfinished jobs....
Fixes: ddc8e4c966923ad1 ("perf build: Disable fewer bison warnings")
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/ZNUSWr52jUnVaaa%[email protected]/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up some more fixes that went upstream via the perf-tools fixes
branch.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Andi reported (see link below) a regression when printing the
'duration_time' tool event, where it gets printed as "not counted" for
most of the CPUs, fix it by skipping zero counts for tool events.
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Andi Kleen <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Claire Jensen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/all/ZMlrzcVrVi1lTDmn@tassilo/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
The tests were made with a specific workload, further tests on a
recently updated fedora 38 system with a system wide perf.data file
shows 'perf report' taking excessive time resolving inlines in vmlinux,
so lets revert this until a full investigation and improvement on the
addr2line support code is made.
Reported-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Artem Savkov <[email protected]>
Tested-by: Jesper Dangaard Brouer <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Not used in any other place, so just make it static.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/ZM0pjfOe6R4X%[email protected]/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
in synthesize_perf_probe_command()
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak was detected
elsewhere and lead to an audit, where we found that
synthesize_perf_probe_command() may leak synthesize_perf_probe_point()
return on failure, fix it.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
to add a probe
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak is detect
when trying to add a probe to a non-existent function:
# perf probe -x ~/bin/perf dso__neW
Probe point 'dso__neW' not found.
Error: Failed to add events.
=================================================================
==296634==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f67642ba097 in calloc (/lib64/libasan.so.8+0xba097)
#1 0x7f67641a76f1 in allocate_cfi (/lib64/libdw.so.1+0x3f6f1)
Direct leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x7f67642b95b5 in __interceptor_realloc.part.0 (/lib64/libasan.so.8+0xb95b5)
#1 0x6cac75 in strbuf_grow util/strbuf.c:64
#2 0x6ca934 in strbuf_init util/strbuf.c:25
#3 0x9337d2 in synthesize_perf_probe_point util/probe-event.c:2018
#4 0x92be51 in try_to_find_probe_trace_events util/probe-event.c:964
#5 0x93d5c6 in convert_to_probe_trace_events util/probe-event.c:3512
#6 0x93d6d5 in convert_perf_probe_events util/probe-event.c:3529
#7 0x56f37f in perf_add_probe_events /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:354
#8 0x572fbc in __cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:738
#9 0x5730f2 in cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:766
#10 0x635d81 in run_builtin /var/home/acme/git/perf-tools-next/tools/perf/perf.c:323
#11 0x6362c1 in handle_internal_command /var/home/acme/git/perf-tools-next/tools/perf/perf.c:377
#12 0x63667a in run_argv /var/home/acme/git/perf-tools-next/tools/perf/perf.c:421
#13 0x636b8d in main /var/home/acme/git/perf-tools-next/tools/perf/perf.c:537
#14 0x7f676302950f in __libc_start_call_main (/lib64/libc.so.6+0x2950f)
SUMMARY: AddressSanitizer: 193 byte(s) leaked in 2 allocation(s).
#
synthesize_perf_probe_point() returns a "detachec" strbuf, i.e. a
malloc'ed string that needs to be free'd.
An audit will be performed to find other such cases.
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pick up the fixes that were just merged from perf-tools/perf-tools
for v6.5.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In 616b14b47a86d880 ("perf build: Conditionally define NDEBUG") we
started using NDEBUG=1 when DEBUG=1 isn't present, so code that is
enclosed with assert() is not called.
In dd317df072071903 ("perf build: Make binutil libraries opt in") we
stopped linking against binutils-devel, for licensing reasons.
Recently people asked me why annotation of BPF programs wasn't working,
i.e. this:
$ perf annotate bpf_prog_5280546344e3f45c_kfree_skb
was returning:
case SYMBOL_ANNOTATE_ERRNO__NO_LIBOPCODES_FOR_BPF:
scnprintf(buf, buflen, "Please link with binutils's libopcode to enable BPF annotation");
This was on a fedora rpm, so its new enough that I had to try to test by
rebuilding using BUILD_NONDISTRO=1, only to get it segfaulting on me.
This combination made this libopcode function not to be called:
assert(bfd_check_format(bfdf, bfd_object));
Changing it to:
if (!bfd_check_format(bfdf, bfd_object))
abort();
Made it work, looking at this "check" function made me realize it
changes the 'bfdf' internal state, i.e. we better call it.
So stop using assert() on it, just call it and abort if it fails.
Probably it is better to propagate the error, etc, but it seems it is
unlikely to fail from the usage done so far and we really need to stop
using libopcodes, so do the quick fix above and move on.
With it we have BPF annotation back working when built with
BUILD_NONDISTRO=1:
⬢[acme@toolbox perf-tools-next]$ perf annotate --stdio2 bpf_prog_5280546344e3f45c_kfree_skb | head
No kallsyms or vmlinux with build-id 939bc71a1a51cdc434e60af93c7e734f7d5c0e7e was found
Samples: 12 of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 3000000, [percent: local period]
bpf_prog_5280546344e3f45c_kfree_skb() bpf_prog_5280546344e3f45c_kfree_skb
Percent int kfree_skb(struct trace_event_raw_kfree_skb *args) {
nop
33.33 xchg %ax,%ax
push %rbp
mov %rsp,%rbp
sub $0x180,%rsp
push %rbx
push %r13
⬢[acme@toolbox perf-tools-next]$
Fixes: 6987561c9e86eace ("perf annotate: Enable annotation of BPF programs")
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mohamed Mahmoud <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Dave Tucker <[email protected]>
Cc: Derek Barbosa <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With -Werror the build was failing on fedora rawhide:
[perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC)
[perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$
In file included from /usr/include/python3.12/Python.h:44,
from scripts/python/Perf-Trace-Util/Context.c:14:
/usr/include/python3.12/object.h: In function 'Py_SIZE':
/usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
217 | PyVarObject *var_ob = _PyVarObject_CAST(ob);
| ^~~~~~~~~~~
In file included from /usr/include/python3.12/Python.h:53:
/usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue':
/usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
| ^~~~~~~~~~
<SNIP>
In file included from /usr/include/python3.12/Python.h:44,
from util/scripting-engines/trace-event-python.c:22:
/usr/include/python3.12/object.h: In function 'Py_SIZE':
/usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
217 | PyVarObject *var_ob = _PyVarObject_CAST(ob);
| ^~~~~~~~~~~
CC /tmp/build/perf/util/units.o
CC /tmp/build/perf/util/time-utils.o
In file included from /usr/include/python3.12/Python.h:53:
/usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue':
/usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
| ^~~~~~~~~~
So add -Wno-declaration-after-statement to the python scripting CFLAGS.
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/ZMpdKeO8gU%[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With -Werror the build was failing on fedora rawhide:
[perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC)
[perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$
In file included from /usr/include/python3.12/Python.h:44,
from /git/perf-6.5.0-rc2/tools/perf/util/python.c:2:
/usr/include/python3.12/object.h: In function ‘Py_SIZE’:
/usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
217 | PyVarObject *var_ob = _PyVarObject_CAST(ob);
| ^~~~~~~~~~~
LD /tmp/build/perf/arch/perf-in.o
In file included from /usr/include/python3.12/Python.h:53:
/usr/include/python3.12/cpython/longintrepr.h: In function ‘_PyLong_CompactValue’:
/usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
| ^~~~~~~~~~
So add -Wno-declaration-after-statement to the python binding CFLAGS.
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Since @symbol variable access is not supported by uprobe event, it must be
correctly warn user instead of kernel version update.
Committer testing:
With/without the patch:
[root@quaco ~]# perf probe -x ~/bin/perf -L sigtrap_handler
<sigtrap_handler@/home/acme/git/perf-tools-next/tools/perf/tests/sigtrap.c:0>
0 sigtrap_handler(int signum __maybe_unused, siginfo_t *info, void *ucontext __maybe_unused)
1 {
2 if (!__atomic_fetch_add(&ctx.signal_count, 1, __ATOMIC_RELAXED))
3 ctx.first_siginfo = *info;
4 __atomic_fetch_sub(&ctx.tids_want_signal, syscall(SYS_gettid), __ATOMIC_RELAXED);
5 }
static void *test_thread(void *arg)
{
[root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count"
Without the patch:
[root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count"
Failed to write event: Invalid argument
Please upgrade your kernel to at least 3.14 to have access to feature @ctx
Error: Failed to add events.
[root@quaco ~]#
With the patch:
[root@quaco ~]#
Failed to write event: Invalid argument
@ctx accesses a variable by symbol name, but that is not supported for user application probe.
Error: Failed to add events.
[root@quaco ~]#
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/169055397023.67089.12693645664676964310.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
parse_events_array was set up by event term parsing, which no longer
exists. Remove this struct and references to it.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
This reverts commit e571e029bdbf ("perf tools: Enable indices setting
syntax for BPF map").
The reverted commit added a notion of arrays that could be set as
event terms for BPF events. The parsing hasn't worked over multiple
Linux releases. Given the broken nature of the parsing it appears the
code isn't in use, nor could I find a way for it to be used to add a
test.
The original commit contains a test in the commit message,
however, running it yields:
```
$ perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
event syntax error: '..pf_map_3.c/map:channel.value[0,1,2,3...5]=101/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Given the code can't be used this commit reverts and removes it.
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eduard Zingerman <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|