aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/stackcollapse.py
diff options
context:
space:
mode:
authorEduard Zingerman <[email protected]>2022-06-21 02:53:42 +0300
committerAlexei Starovoitov <[email protected]>2022-06-20 17:40:51 -0700
commit1ade23711971b0eececf0d7fedc29d3c1d2fce01 (patch)
tree8e32aa6e6109037ea5020965f237ae3821b17999 /tools/perf/scripts/python/stackcollapse.py
parent7a42008ca5c700819e4b3003025e5e1695fd1f86 (diff)
bpf: Inline calls to bpf_loop when callback is known
Calls to `bpf_loop` are replaced with direct loops to avoid indirection. E.g. the following: bpf_loop(10, foo, NULL, 0); Is replaced by equivalent of the following: for (int i = 0; i < 10; ++i) foo(i, NULL); This transformation could be applied when: - callback is known and does not change during program execution; - flags passed to `bpf_loop` are always zero. Inlining logic works as follows: - During execution simulation function `update_loop_inline_state` tracks the following information for each `bpf_loop` call instruction: - is callback known and constant? - are flags constant and zero? - Function `optimize_bpf_loop` increases stack depth for functions where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` to apply the inlining. The additional stack space is used to spill registers R6, R7 and R8. These registers are used as loop counter, loop maximal bound and callback context parameter; Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. Signed-off-by: Eduard Zingerman <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
0 files changed, 0 insertions, 0 deletions