diff options
author | Eduard Zingerman <[email protected]> | 2022-06-21 02:53:42 +0300 |
---|---|---|
committer | Alexei Starovoitov <[email protected]> | 2022-06-20 17:40:51 -0700 |
commit | 1ade23711971b0eececf0d7fedc29d3c1d2fce01 (patch) | |
tree | 8e32aa6e6109037ea5020965f237ae3821b17999 /tools/perf/scripts/python/stackcollapse.py | |
parent | 7a42008ca5c700819e4b3003025e5e1695fd1f86 (diff) |
bpf: Inline calls to bpf_loop when callback is known
Calls to `bpf_loop` are replaced with direct loops to avoid
indirection. E.g. the following:
bpf_loop(10, foo, NULL, 0);
Is replaced by equivalent of the following:
for (int i = 0; i < 10; ++i)
foo(i, NULL);
This transformation could be applied when:
- callback is known and does not change during program execution;
- flags passed to `bpf_loop` are always zero.
Inlining logic works as follows:
- During execution simulation function `update_loop_inline_state`
tracks the following information for each `bpf_loop` call
instruction:
- is callback known and constant?
- are flags constant and zero?
- Function `optimize_bpf_loop` increases stack depth for functions
where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop`
to apply the inlining. The additional stack space is used to spill
registers R6, R7 and R8. These registers are used as loop counter,
loop maximal bound and callback context parameter;
Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on
i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op.
Signed-off-by: Eduard Zingerman <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
0 files changed, 0 insertions, 0 deletions