diff options
| author | Alexei Starovoitov <[email protected]> | 2022-06-20 17:40:52 -0700 |
|---|---|---|
| committer | Alexei Starovoitov <[email protected]> | 2022-06-20 17:40:52 -0700 |
| commit | b40b414ec8d971be0b2f6485c3a039b0fa7f078c (patch) | |
| tree | 08ba017f0e2d9b36b1ed98b1e324536be50e83a5 /kernel | |
| parent | aca80dd95e20f1fa0daa212afc83c9fa0ad239e5 (diff) | |
| parent | 0e1bf9ed2000c16fa8e0703e255a23d64a4adb27 (diff) | |
Merge branch 'bpf_loop inlining'
Eduard Zingerman says:
====================
Hi Everyone,
This is the next iteration of the patch. It includes changes suggested
by Song, Joanne and Alexei. Please find updated intro message and
change log below.
This patch implements inlining of calls to bpf_loop helper function
when bpf_loop's callback is statically known. E.g. the rewrite does
the following transformation during BPF program processing:
bpf_loop(10, foo, NULL, 0);
->
for (int i = 0; i < 10; ++i)
foo(i, NULL);
The transformation leads to measurable latency change for simple
loops. Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU /
KVM on i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op.
The change is split in five parts:
* Update to test_verifier.c to specify expected and unexpected
instruction sequences. This allows to check BPF program rewrites
applied by e.g. do_mix_fixups function.
* Update to test_verifier.c to specify BTF function infos and types
per test case. This is necessary for tests that load sub-program
addresses to a variable because of the checks applied by
check_ld_imm function.
* The update to verifier.c that tracks state of the parameters for
each bpf_loop call in a program and decides whether it could be
replaced by a loop.
* A set of test cases for `test_verifier` that use capabilities added
by the first two patches to verify instructions produced by inlining
logic.
* Two test cases for `test_prog` to check that possible corner cases
behave as expected.
Additional details are available in commit messages for each patch.
Changes since v7:
- Call to `mark_chain_precision` is added in `loop_flag_is_zero` to
avoid potential issues with state pruning and precision tracking.
- `flags non-zero` test_verifier test case is updated to have two
execution paths reaching `bpf_loop` call, one with flags = 0,
another with flags = 1. Potentially this test case should be able
to show that call to `mark_chain_precision` is necessary in
`loop_flag_is_zero` but not at the moment. Please refer to
discussion for [PATCH bpf-next v7 3/5] for additional details.
- `stack_depth_extra` computation is updated to guarantee that R6, R7
and R8 offsets are always aligned on 8 byte boundary.
- `stack locations for loop vars` test_verifier test case updated to
show that R6, R7, R8 offsets are indeed aligned when function stack
depth is not a multiple of 8.
- I removed Song Liu's ACK from commit message for [PATCH bpf-next v8
4/5] because I updated the patch. (Please let me know if I had to
keep the ACK tag).
Changes since v6:
- Return value of the `optimize_bpf_loop` function is no longer
ignored. This is necessary to properly propagate -ENOMEM error.
Changes since v5:
- Added function `loop_flag_is_zero` to skip a few checks in
`update_loop_inline_state` when loop instruction is not fit for
inline.
Changes since v4:
- Added missing `static` modifier for `update_loop_inline_state` and
`inline_bpf_loop` functions.
- `update_loop_inline_state` updated for better readability.
- Fields `initialized` and `fit_for_inline` of `struct
bpf_loop_inline_state` are changed back from `bool` to bitfields.
- Acks from Song Liu added to comments for patches 1/5, 2/5, 4/5,
5/5.
Changes since v3:
- Function `adjust_stack_depth_for_loop_inlining` is replaced by
function `optimize_bpf_loop`. Function `optimize_bpf_loop` is
responsible for both stack depth adjustment and call instruction
replacement.
- Changes in `do_misc_fixups` are reverted.
- Changes in `adjust_subprog_starts_after_remove` are reverted and
function `adjust_loop_inline_subprogno` is removed. This is
possible because call to `optimize_bpf_loop` is placed before the
dead code removal in `opt_remove_dead_code` (in contrast to the
position of `do_misc_fixups` where inlining was done in v3).
- Field `bpf_insn_aux_data.loop_inline_state` is now a part of
anonymous union at the start of the `bpf_insn_aux_data`.
- Data structure `bpf_loop_inline_state` is simplified to use single
flag field `fit_for_inline` instead of separate fields
`flags_is_zero` & `callback_is_constant`.
- Macro definition `BPF_MAX_LOOPS` is moved from
`include/linux/bpf_verifier.h` to `include/linux/bpf.h` to avoid
include of `include/linux/bpf_verifier.h` in `bpf_iter.c`.
- `inline_bpf_loop` changed back to use array initialization and hard
coded offsets as in v2.
- Style / formatting updates.
Changes since v2:
- fix for `stack_check` test case in `test_progs-no_alu32`, all tests
are passing now;
- v2 3/3 patch is split in three parts:
- kernel changes
- test_verifier changes
- test_prog changes
- updated `inline_bpf_loop` in `verifier.c` to calculate each offset
used in instructions to avoid "magic" numbers;
- removed newline handling logic in `fail_log` branch of
`do_single_test` in `test_verifier.c` to simplify the patch set;
- styling fixes suggested in review for v2 of this patch set.
Changes since v1:
- allow to use SKIP_INSNS in instruction pattern specification in
test_verifier tests;
- fix for a bug in spill offset assignement for loop vars when
bpf_loop is located in a non-main function.
====================
Signed-off-by: Alexei Starovoitov <[email protected]>
Diffstat (limited to 'kernel')
| -rw-r--r-- | kernel/bpf/bpf_iter.c | 9 | ||||
| -rw-r--r-- | kernel/bpf/verifier.c | 180 |
2 files changed, 180 insertions, 9 deletions
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index d5d96ceca105..7e8fd49406f6 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -723,9 +723,6 @@ const struct bpf_func_proto bpf_for_each_map_elem_proto = { .arg4_type = ARG_ANYTHING, }; -/* maximum number of loops */ -#define MAX_LOOPS BIT(23) - BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx, u64, flags) { @@ -733,9 +730,13 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx, u64 ret; u32 i; + /* Note: these safety checks are also verified when bpf_loop + * is inlined, be careful to modify this code in sync. See + * function verifier.c:inline_bpf_loop. + */ if (flags) return -EINVAL; - if (nr_loops > MAX_LOOPS) + if (nr_loops > BPF_MAX_LOOPS) return -E2BIG; for (i = 0; i < nr_loops; i++) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2859901ffbe3..bf72dc511df6 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7124,6 +7124,41 @@ static int check_get_func_ip(struct bpf_verifier_env *env) return -ENOTSUPP; } +static struct bpf_insn_aux_data *cur_aux(struct bpf_verifier_env *env) +{ + return &env->insn_aux_data[env->insn_idx]; +} + +static bool loop_flag_is_zero(struct bpf_verifier_env *env) +{ + struct bpf_reg_state *regs = cur_regs(env); + struct bpf_reg_state *reg = ®s[BPF_REG_4]; + bool reg_is_null = register_is_null(reg); + + if (reg_is_null) + mark_chain_precision(env, BPF_REG_4); + + return reg_is_null; +} + +static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno) +{ + struct bpf_loop_inline_state *state = &cur_aux(env)->loop_inline_state; + + if (!state->initialized) { + state->initialized = 1; + state->fit_for_inline = loop_flag_is_zero(env); + state->callback_subprogno = subprogno; + return; + } + + if (!state->fit_for_inline) + return; + + state->fit_for_inline = (loop_flag_is_zero(env) && + state->callback_subprogno == subprogno); +} + static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx_p) { @@ -7276,6 +7311,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn err = check_bpf_snprintf_call(env, regs); break; case BPF_FUNC_loop: + update_loop_inline_state(env, meta.subprogno); err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, set_loop_callback_state); break; @@ -7682,11 +7718,6 @@ static bool check_reg_sane_offset(struct bpf_verifier_env *env, return true; } -static struct bpf_insn_aux_data *cur_aux(struct bpf_verifier_env *env) -{ - return &env->insn_aux_data[env->insn_idx]; -} - enum { REASON_BOUNDS = -1, REASON_TYPE = -2, @@ -14315,6 +14346,142 @@ patch_call_imm: return 0; } +static struct bpf_prog *inline_bpf_loop(struct bpf_verifier_env *env, + int position, + s32 stack_base, + u32 callback_subprogno, + u32 *cnt) +{ + s32 r6_offset = stack_base + 0 * BPF_REG_SIZE; + s32 r7_offset = stack_base + 1 * BPF_REG_SIZE; + s32 r8_offset = stack_base + 2 * BPF_REG_SIZE; + int reg_loop_max = BPF_REG_6; + int reg_loop_cnt = BPF_REG_7; + int reg_loop_ctx = BPF_REG_8; + + struct bpf_prog *new_prog; + u32 callback_start; + u32 call_insn_offset; + s32 callback_offset; + + /* This represents an inlined version of bpf_iter.c:bpf_loop, + * be careful to modify this code in sync. + */ + struct bpf_insn insn_buf[] = { + /* Return error and jump to the end of the patch if + * expected number of iterations is too big. + */ + BPF_JMP_IMM(BPF_JLE, BPF_REG_1, BPF_MAX_LOOPS, 2), + BPF_MOV32_IMM(BPF_REG_0, -E2BIG), + BPF_JMP_IMM(BPF_JA, 0, 0, 16), + /* spill R6, R7, R8 to use these as loop vars */ + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_6, r6_offset), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_7, r7_offset), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_8, r8_offset), + /* initialize loop vars */ + BPF_MOV64_REG(reg_loop_max, BPF_REG_1), + BPF_MOV32_IMM(reg_loop_cnt, 0), + BPF_MOV64_REG(reg_loop_ctx, BPF_REG_3), + /* loop header, + * if reg_loop_cnt >= reg_loop_max skip the loop body + */ + BPF_JMP_REG(BPF_JGE, reg_loop_cnt, reg_loop_max, 5), + /* callback call, + * correct callback offset would be set after patching + */ + BPF_MOV64_REG(BPF_REG_1, reg_loop_cnt), + BPF_MOV64_REG(BPF_REG_2, reg_loop_ctx), + BPF_CALL_REL(0), + /* increment loop counter */ + BPF_ALU64_IMM(BPF_ADD, reg_loop_cnt, 1), + /* jump to loop header if callback returned 0 */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, -6), + /* return value of bpf_loop, + * set R0 to the number of iterations + */ + BPF_MOV64_REG(BPF_REG_0, reg_loop_cnt), + /* restore original values of R6, R7, R8 */ + BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_10, r6_offset), + BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_10, r7_offset), + BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_10, r8_offset), + }; + + *cnt = ARRAY_SIZE(insn_buf); + new_prog = bpf_patch_insn_data(env, position, insn_buf, *cnt); + if (!new_prog) + return new_prog; + + /* callback start is known only after patching */ + callback_start = env->subprog_info[callback_subprogno].start; + /* Note: insn_buf[12] is an offset of BPF_CALL_REL instruction */ + call_insn_offset = position + 12; + callback_offset = callback_start - call_insn_offset - 1; + env->prog->insnsi[call_insn_offset].imm = callback_offset; + + return new_prog; +} + +static bool is_bpf_loop_call(struct bpf_insn *insn) +{ + return insn->code == (BPF_JMP | BPF_CALL) && + insn->src_reg == 0 && + insn->imm == BPF_FUNC_loop; +} + +/* For all sub-programs in the program (including main) check + * insn_aux_data to see if there are bpf_loop calls that require + * inlining. If such calls are found the calls are replaced with a + * sequence of instructions produced by `inline_bpf_loop` function and + * subprog stack_depth is increased by the size of 3 registers. + * This stack space is used to spill values of the R6, R7, R8. These + * registers are used to store the loop bound, counter and context + * variables. + */ +static int optimize_bpf_loop(struct bpf_verifier_env *env) +{ + struct bpf_subprog_info *subprogs = env->subprog_info; + int i, cur_subprog = 0, cnt, delta = 0; + struct bpf_insn *insn = env->prog->insnsi; + int insn_cnt = env->prog->len; + u16 stack_depth = subprogs[cur_subprog].stack_depth; + u16 stack_depth_roundup = round_up(stack_depth, 8) - stack_depth; + u16 stack_depth_extra = 0; + + for (i = 0; i < insn_cnt; i++, insn++) { + struct bpf_loop_inline_state *inline_state = + &env->insn_aux_data[i + delta].loop_inline_state; + + if (is_bpf_loop_call(insn) && inline_state->fit_for_inline) { + struct bpf_prog *new_prog; + + stack_depth_extra = BPF_REG_SIZE * 3 + stack_depth_roundup; + new_prog = inline_bpf_loop(env, + i + delta, + -(stack_depth + stack_depth_extra), + inline_state->callback_subprogno, + &cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = new_prog; + insn = new_prog->insnsi + i + delta; + } + + if (subprogs[cur_subprog + 1].start == i + delta + 1) { + subprogs[cur_subprog].stack_depth += stack_depth_extra; + cur_subprog++; + stack_depth = subprogs[cur_subprog].stack_depth; + stack_depth_roundup = round_up(stack_depth, 8) - stack_depth; + stack_depth_extra = 0; + } + } + + env->prog->aux->stack_depth = env->subprog_info[0].stack_depth; + + return 0; +} + static void free_states(struct bpf_verifier_env *env) { struct bpf_verifier_state_list *sl, *sln; @@ -15052,6 +15219,9 @@ skip_full_check: ret = check_max_stack_depth(env); /* instruction rewrites happen after this point */ + if (ret == 0) + ret = optimize_bpf_loop(env); + if (is_priv) { if (ret == 0) opt_hard_wire_dead_code_branches(env); |