aboutsummaryrefslogtreecommitdiff
path: root/tools/testing/selftests/bpf/progs/test_autoload.c
diff options
context:
space:
mode:
authorAndrii Nakryiko <[email protected]>2024-08-05 21:29:35 -0700
committerAlexei Starovoitov <[email protected]>2024-08-23 10:00:37 -0700
commitf727b13dbea16c5e117e263aa8aea59d632d5660 (patch)
treeefb4470118c3eee32c78482f8a24f351955d58f5 /tools/testing/selftests/bpf/progs/test_autoload.c
parent4e9e07603ecdfd0816838132f354239771c51edd (diff)
selftests/bpf: add multi-uprobe benchmarks
Add multi-uprobe and multi-uretprobe benchmarks to bench tool. Multi- and classic uprobes/uretprobes have different low-level triggering code paths, so it's sometimes important to be able to benchmark both flavors of uprobes/uretprobes. Sample examples from my dev machine below. Single-threaded peformance almost doesn't differ, but with more parallel CPUs triggering the same uprobe/uretprobe the difference grows. This might be due to [0], but given the code is slightly different, there could be other sources of slowdown. Note, all these numbers will change due to ongoing work to improve uprobe/uretprobe scalability (e.g., [1]), but having benchmark like this is useful for measurements and debugging nevertheless. \#!/bin/bash set -eufo pipefail for p in 1 8 16 32; do for i in uprobe-nop uretprobe-nop uprobe-multi-nop uretprobe-multi-nop; do summary=$(sudo ./bench -w1 -d3 -p$p -a trig-$i | tail -n1) total=$(echo "$summary" | cut -d'(' -f1 | cut -d' ' -f3-) percpu=$(echo "$summary" | cut -d'(' -f2 | cut -d')' -f1 | cut -d'/' -f1) printf "%-21s (%2d cpus): %s (%s/s/cpu)\n" $i $p "$total" "$percpu" done echo done uprobe-nop ( 1 cpus): 1.020 ± 0.005M/s ( 1.020M/s/cpu) uretprobe-nop ( 1 cpus): 0.515 ± 0.009M/s ( 0.515M/s/cpu) uprobe-multi-nop ( 1 cpus): 1.036 ± 0.004M/s ( 1.036M/s/cpu) uretprobe-multi-nop ( 1 cpus): 0.512 ± 0.005M/s ( 0.512M/s/cpu) uprobe-nop ( 8 cpus): 3.481 ± 0.030M/s ( 0.435M/s/cpu) uretprobe-nop ( 8 cpus): 2.222 ± 0.008M/s ( 0.278M/s/cpu) uprobe-multi-nop ( 8 cpus): 3.769 ± 0.094M/s ( 0.471M/s/cpu) uretprobe-multi-nop ( 8 cpus): 2.482 ± 0.007M/s ( 0.310M/s/cpu) uprobe-nop (16 cpus): 2.968 ± 0.011M/s ( 0.185M/s/cpu) uretprobe-nop (16 cpus): 1.870 ± 0.002M/s ( 0.117M/s/cpu) uprobe-multi-nop (16 cpus): 3.541 ± 0.037M/s ( 0.221M/s/cpu) uretprobe-multi-nop (16 cpus): 2.123 ± 0.026M/s ( 0.133M/s/cpu) uprobe-nop (32 cpus): 2.524 ± 0.026M/s ( 0.079M/s/cpu) uretprobe-nop (32 cpus): 1.572 ± 0.003M/s ( 0.049M/s/cpu) uprobe-multi-nop (32 cpus): 2.717 ± 0.003M/s ( 0.085M/s/cpu) uretprobe-multi-nop (32 cpus): 1.687 ± 0.007M/s ( 0.053M/s/cpu) [0] https://lore.kernel.org/linux-trace-kernel/[email protected]/ [1] https://lore.kernel.org/linux-trace-kernel/[email protected]/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'tools/testing/selftests/bpf/progs/test_autoload.c')
0 files changed, 0 insertions, 0 deletions