aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-08-30bpf: Remove custom build ruleAlexey Gladkov4-6/+6
According to the documentation, when building a kernel with the C=2 parameter, all source files should be checked. But this does not happen for the kernel/bpf/ directory. $ touch kernel/bpf/core.o $ make C=2 CHECK=true kernel/bpf/core.o Outputs: CHECK scripts/mod/empty.c CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers CC kernel/bpf/core.o As can be seen the compilation is done, but CHECK is not executed. This happens because kernel/bpf/Makefile has defined its own rule for compilation and forgotten the macro that does the check. There is no need to duplicate the build code, and this rule can be removed to use generic rules. Acked-by: Masahiro Yamada <[email protected]> Tested-by: Oleg Nesterov <[email protected]> Tested-by: Alan Maguire <[email protected]> Signed-off-by: Alexey Gladkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: Add tests for iter next method returning valid pointerJuntong Deng4-1/+154
This patch adds test cases for iter next method returning valid pointer, which can also used as usage examples. Currently iter next method should return valid pointer. iter_next_trusted is the correct usage and test if iter next method return valid pointer. bpf_iter_task_vma_next has KF_RET_NULL flag, so the returned pointer may be NULL. We need to check if the pointer is NULL before using it. iter_next_trusted_or_null is the incorrect usage. There is no checking before using the pointer, so it will be rejected by the verifier. iter_next_rcu and iter_next_rcu_or_null are similar test cases for KF_RCU_PROTECTED iterators. iter_next_rcu_not_trusted is used to test that the pointer returned by iter next method of KF_RCU_PROTECTED iterator cannot be passed in KF_TRUSTED_ARGS kfuncs. iter_next_ptr_mem_not_trusted is used to test that base type PTR_TO_MEM should not be combined with type flag PTR_TRUSTED. Signed-off-by: Juntong Deng <[email protected]> Link: https://lore.kernel.org/r/AM6PR03MB5848709758F6922F02AF9F1F99962@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Make the pointer returned by iter next method validJuntong Deng1-4/+22
Currently we cannot pass the pointer returned by iter next method as argument to KF_TRUSTED_ARGS or KF_RCU kfuncs, because the pointer returned by iter next method is not "valid". This patch sets the pointer returned by iter next method to be valid. This is based on the fact that if the iterator is implemented correctly, then the pointer returned from the iter next method should be valid. This does not make NULL pointer valid. If the iter next method has KF_RET_NULL flag, then the verifier will ask the ebpf program to check NULL pointer. KF_RCU_PROTECTED iterator is a special case, the pointer returned by iter next method should only be valid within RCU critical section, so it should be with MEM_RCU, not PTR_TRUSTED. Another special case is bpf_iter_num_next, which returns a pointer with base type PTR_TO_MEM. PTR_TO_MEM should not be combined with type flag PTR_TRUSTED (PTR_TO_MEM already means the pointer is valid). The pointer returned by iter next method of other types of iterators is with PTR_TRUSTED. In addition, this patch adds get_iter_from_state to help us get the current iterator from the current state. Signed-off-by: Juntong Deng <[email protected]> Link: https://lore.kernel.org/r/AM6PR03MB584869F8B448EA1C87B7CDA399962@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29Merge branch 'bpf-add-gen_epilogue-to-bpf_verifier_ops'Alexei Starovoitov14-8/+813
Martin KaFai Lau says: ==================== bpf: Add gen_epilogue to bpf_verifier_ops From: Martin KaFai Lau <[email protected]> This set allows the subsystem to patch codes before BPF_EXIT. The verifier ops, .gen_epilogue, is added for this purpose. One of the use case will be in the bpf qdisc, the bpf qdisc subsystem can ensure the skb->dev is in the correct value. The bpf qdisc subsystem can either inline fixing it in the epilogue or call another kernel function to handle it (e.g. drop) in the epilogue. Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd has valid value (e.g. positive value). v5: * Removed the skip_cnt argument from adjust_jmp_off() in patch 2. Instead, reuse the delta argument and skip the [tgt_idx, tgt_idx + delta) instructions. * Added a BPF_JMP32_A macro in patch 3. * Removed pro_epilogue_subprog.c in patch 6. The pro_epilogue_kfunc.c has covered the subprog case. Renamed the file pro_epilogue_kfunc.c to pro_epilogue.c. Some of the SEC names and function names are changed accordingly (mainly shorten them by removing the _kfunc suffix). * Added comments to explain the tail_call result in patch 7. * Fixed the following bpf CI breakages. I ran it in CI manually to confirm: https://github.com/kernel-patches/bpf/actions/runs/10590714532 * s390 zext added "w3 = w3". Adjusted the test to use all ALU64 and BPF_DW to avoid zext. Also changed the "int a" in the "struct st_ops_args" to "u64 a". * llvm17 does not take: *(u64 *)(r1 +0) = 0; so it is changed to: r3 = 0; *(u64 *)(r1 +0) = r3; v4: * Fixed a bug in the memcpy in patch 3 The size in the memcpy should be epilogue_cnt * sizeof(*epilogue_buf) v3: * Moved epilogue_buf[16] to env. Patch 1 is added to move the existing insn_buf[16] to env. * Fixed a case that the bpf prog has a BPF_JMP that goes back to the first instruction of the main prog. The jump back to 1st insn case also applies to the prologue. Patch 2 is added to handle it. * If the bpf main prog has multiple BPF_EXIT, use a BPF_JA to goto the earlier patched epilogue. Note that there are (BPF_JMP32 | BPF_JA) vs (BPF_JMP | BPF_JA) details in the patch 3 commit message. * There are subtle changes in patch 3, so I reset the Reviewed-by. * Added patch 8 and patch 9 to cover the changes in patch 2 and patch 3. * Dropped the kfunc call from pro/epilogue and its selftests. v2: * Remove the RFC tag. Keep the ordering at where .gen_epilogue is called in the verifier relative to the check_max_stack_depth(). This will be consistent with the other extra stack_depth usage like optimize_bpf_loop(). * Use __xlated check provided by the test_loader to check the patched instructions after gen_pro/epilogue (Eduard). * Added Patch 3 by Eduard (Thanks!). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: Test epilogue patching when the main prog has multiple BPF_EXITMartin KaFai Lau2-0/+84
This patch tests the epilogue patching when the main prog has multiple BPF_EXIT. The verifier should have patched the 2nd (and later) BPF_EXIT with a BPF_JA that goes back to the earlier patched epilogue instructions. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: A pro/epilogue test when the main prog jumps back to the 1st insnMartin KaFai Lau2-0/+151
This patch adds a pro/epilogue test when the main prog has a goto insn that goes back to the very first instruction of the prog. It is to test the correctness of the adjust_jmp_off(prog, 0, delta) after the verifier has applied the prologue and/or epilogue patch. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: Add tailcall epilogue testMartin KaFai Lau2-0/+104
This patch adds a gen_epilogue test to test a main prog using a bpf_tail_call. A non test_loader test is used. The tailcall target program, "test_epilogue_subprog", needs to be used in a struct_ops map before it can be loaded. Another struct_ops map is also needed to host the actual "test_epilogue_tailcall" struct_ops program that does the bpf_tail_call. The earlier test_loader patch will attach all struct_ops maps but the bpf_testmod.c does not support >1 attached struct_ops. The earlier patch used the test_loader which has already covered checking for the patched pro/epilogue instructions. This is done by the __xlated tag. This patch goes for the regular skel load and syscall test to do the tailcall test that can also allow to directly pass the the "struct st_ops_args *args" as ctx_in to the SEC("syscall") program. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: Test gen_prologue and gen_epilogueMartin KaFai Lau5-0/+371
This test adds a new struct_ops "bpf_testmod_st_ops" in bpf_testmod. The ops of the bpf_testmod_st_ops is triggered by new kfunc calls "bpf_kfunc_st_ops_test_*logue". These new kfunc calls are primarily used by the SEC("syscall") program. The test triggering sequence is like: SEC("syscall") syscall_prologue(struct st_ops_args *args) bpf_kfunc_st_op_test_prologue(args) st_ops->test_prologue(args) .gen_prologue adds 1000 to args->a .gen_epilogue adds 10000 to args->a .gen_epilogue will also set the r0 to 2 * args->a. The .gen_prologue and .gen_epilogue of the bpf_testmod_st_ops will test the prog->aux->attach_func_name to decide if it needs to generate codes. The main programs of the pro_epilogue.c will call a new kfunc bpf_kfunc_st_ops_inc10 which does "args->a += 10". It will also call a subprog() which does "args->a += 1". This patch uses the test_loader infra to check the __xlated instructions patched after gen_prologue and/or gen_epilogue. The __xlated check is based on Eduard's example (Thanks!) in v1. args->a is returned by the struct_ops prog (either the main prog or the epilogue). Thus, the __retval of the SEC("syscall") prog is checked. For example, when triggering the ops in the 'SEC("struct_ops/test_epilogue") int test_epilogue' The expected args->a is +1 (subprog call) + 10 (kfunc call) + 10000 (.gen_epilogue) = 10011. The expected return value is 2 * 10011 (.gen_epilogue). Suggested-by: Eduard Zingerman <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: attach struct_ops maps before test prog runsEduard Zingerman1-0/+27
In test_loader based tests to bpf_map__attach_struct_ops() before call to bpf_prog_test_run_opts() in order to trigger bpf_struct_ops->reg() callbacks on kernel side. This allows to use __retval macro for struct_ops tests. Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Export bpf_base_func_protoMartin KaFai Lau1-0/+1
The bpf_testmod needs to use the bpf_tail_call helper in a later selftest patch. This patch is to EXPORT_GPL_SYMBOL the bpf_base_func_proto. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Add gen_epilogue to bpf_verifier_opsMartin KaFai Lau4-1/+58
This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar to the existing .gen_prologue. Instead of allowing a subsystem to run code at the beginning of a bpf prog, it allows the subsystem to run code just before the bpf prog exit. One of the use case is to allow the upcoming bpf qdisc to ensure that the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc struct_ops implementation could either fix it up or drop the skb. Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd has sane value (e.g. non zero). The epilogue can do the useful thing (like checking skb->dev) if it can access the bpf prog's ctx. Unlike prologue, r1 may not hold the ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue has returned some instructions in the "epilogue_buf". The existing .gen_prologue is done in convert_ctx_accesses(). The new .gen_epilogue is done in the convert_ctx_accesses() also. When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched with the earlier generated "epilogue_buf". The epilogue patching is only done for the main prog. Only one epilogue will be patched to the main program. When the bpf prog has multiple BPF_EXIT instructions, a BPF_JA is used to goto the earlier patched epilogue. Majority of the archs support (BPF_JMP32 | BPF_JA): x86, arm, s390, risv64, loongarch, powerpc and arc. This patch keeps it simple and always use (BPF_JMP32 | BPF_JA). A new macro BPF_JMP32_A is added to generate the (BPF_JMP32 | BPF_JA) insn. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Adjust BPF_JMP that jumps to the 1st insn of the prologueMartin KaFai Lau1-0/+6
The next patch will add a ctx ptr saving instruction "(r1 = *(u64 *)(r10 -8)" at the beginning for the main prog when there is an epilogue patch (by the .gen_epilogue() verifier ops added in the next patch). There is one corner case if the bpf prog has a BPF_JMP that jumps to the 1st instruction. It needs an adjustment such that those BPF_JMP instructions won't jump to the newly added ctx saving instruction. The commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to the 1st insn.") has the details on this case. Note that the jump back to 1st instruction is not limited to the ctx ptr saving instruction. The same also applies to the prologue. A later test, pro_epilogue_goto_start.c, has a test for the prologue only case. Thus, this patch does one adjustment after gen_prologue and the future ctx ptr saving. It is done by adjust_jmp_off(env->prog, 0, delta) where delta has the total number of instructions in the prologue and the future ctx ptr saving instruction. The adjust_jmp_off(env->prog, 0, delta) assumes that the prologue does not have a goto 1st instruction itself. To accommodate the prologue might have a goto 1st insn itself, this patch changes the adjust_jmp_off() to skip considering the instructions between [tgt_idx, tgt_idx + delta). Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Move insn_buf[16] to bpf_verifier_envMartin KaFai Lau2-7/+11
This patch moves the 'struct bpf_insn insn_buf[16]' stack usage to the bpf_verifier_env. A '#define INSN_BUF_SIZE 16' is also added to replace the ARRAY_SIZE(insn_buf) usages. Both convert_ctx_accesses() and do_misc_fixup() are changed to use the env->insn_buf. It is a refactoring work for adding the epilogue_buf[16] in a later patch. With this patch, the stack size usage decreased. Before: ./kernel/bpf/verifier.c:22133:5: warning: stack frame size (2584) After: ./kernel/bpf/verifier.c:22184:5: warning: stack frame size (2264) Reviewed-by: Eduard Zingerman <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29bpf: Use kvmemdup to simplify the codeHongbo Li1-2/+1
Use kvmemdup instead of kvmalloc() + memcpy() to simplify the code. No functional change intended. Acked-by: Yonghong Song <[email protected]> Signed-off-by: Hongbo Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29docs/bpf: Fix a typo in verifier.rstYiming Xiang1-1/+1
In verifier.rst, there is a typo in section 'Register parentage chains'. Caller saved registers are r0-r5, callee saved registers are r6-r9. Here by context it means callee saved registers rather than caller saved registers. This may confuse users. Signed-off-by: Yiming Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29selftests/bpf: Make sure stashed kptr in local kptr is freed recursivelyAmery Hung1-1/+29
When dropping a local kptr, any kptr stashed into it is supposed to be freed through bpf_obj_free_fields->__bpf_obj_drop_impl recursively. Add a test to make sure it happens. The test first stashes a referenced kptr to "struct task" into a local kptr and gets the reference count of the task. Then, it drops the local kptr and reads the reference count of the task again. Since bpf_obj_free_fields and __bpf_obj_drop_impl will go through the local kptr recursively during bpf_obj_drop, the dtor of the stashed task kptr should eventually be called. The second reference count should be one less than the first one. Signed-off-by: Amery Hung <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-29libbpf: Fix bpf_object__open_skeleton()'s mishandling of optionsAndrii Nakryiko1-33/+19
We do an ugly copying of options in bpf_object__open_skeleton() just to be able to set object name from skeleton's recorded name (while still allowing user to override it through opts->object_name). This is not just ugly, but it also is broken due to memcpy() that doesn't take into account potential skel_opts' and user-provided opts' sizes differences due to backward and forward compatibility. This leads to copying over extra bytes and then failing to validate options properly. It could, technically, lead also to SIGSEGV, if we are unlucky. So just get rid of that memory copy completely and instead pass default object name into bpf_object_open() directly, simplifying all this significantly. The rule now is that obj_name should be non-NULL for bpf_object_open() when called with in-memory buffer, so validate that explicitly as well. We adopt bpf_object__open_mem() to this as well and generate default name (based on buffer memory address and size) outside of bpf_object_open(). Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support") Reported-by: Daniel Müller <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Daniel Müller <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-08-28selftests/bpf: Add test for zero offset or non-zero offset pointers as ↵Juntong Deng4-0/+58
KF_ACQUIRE kfuncs argument This patch adds test cases for zero offset (implicit cast) or non-zero offset pointer as KF_ACQUIRE kfuncs argument. Currently KF_ACQUIRE kfuncs should support passing in pointers like &sk->sk_write_queue (non-zero offset) or &sk->__sk_common (zero offset) and not be rejected by the verifier. Signed-off-by: Juntong Deng <[email protected]> Link: https://lore.kernel.org/r/AM6PR03MB5848CB6F0D4D9068669A905B99952@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-28bpf: Relax KF_ACQUIRE kfuncs strict type matching constraintJuntong Deng1-2/+1
Currently we cannot pass zero offset (implicit cast) or non-zero offset pointers to KF_ACQUIRE kfuncs. This is because KF_ACQUIRE kfuncs requires strict type matching, but zero offset or non-zero offset does not change the type of pointer, which causes the ebpf program to be rejected by the verifier. This can cause some problems, one example is that bpf_skb_peek_tail kfunc [0] cannot be implemented by just passing in non-zero offset pointers. We cannot pass pointers like &sk->sk_write_queue (non-zero offset) or &sk->__sk_common (zero offset) to KF_ACQUIRE kfuncs. This patch makes KF_ACQUIRE kfuncs not require strict type matching. [0]: https://lore.kernel.org/bpf/AM6PR03MB5848CA39CB4B7A4397D380B099B12@AM6PR03MB5848.eurprd03.prod.outlook.com/ Signed-off-by: Juntong Deng <[email protected]> Link: https://lore.kernel.org/r/AM6PR03MB5848FD2BD89BF0B6B5AA3B4C99952@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-28selftests/bpf: Fix incorrect parameters in NULL pointer checkingHao Ge1-1/+1
Smatch reported the following warning: ./tools/testing/selftests/bpf/testing_helpers.c:455 get_xlated_program() warn: variable dereferenced before check 'buf' (see line 454) It seems correct,so let's modify it based on it's suggestion. Actually,commit b23ed4d74c4d ("selftests/bpf: Fix invalid pointer check in get_xlated_program()") fixed an issue in the test_verifier.c once,but it was reverted this time. Let's solve this issue with the minimal changes possible. Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: b4b7a4099b8c ("selftests/bpf: Factor out get_xlated_program() helper") Signed-off-by: Hao Ge <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-28Merge branch 'bpf-arm64-simplify-jited-prologue-epilogue'Alexei Starovoitov1-202/+192
Xu Kuohai says: ==================== bpf, arm64: Simplify jited prologue/epilogue From: Xu Kuohai <[email protected]> The arm64 jit blindly saves/restores all callee-saved registers, making the jited result looks a bit too compliated. For example, for an empty prog, the jited result is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp x19, x20, [sp, #-16]! 1c: stp x21, x22, [sp, #-16]! 20: stp x26, x25, [sp, #-16]! 24: mov x26, #0 28: stp x26, x25, [sp, #-16]! 2c: mov x26, sp 30: stp x27, x28, [sp, #-16]! 34: mov x25, sp 38: bti j // tailcall target 3c: sub sp, sp, #0 40: mov x7, #0 44: add sp, sp, #0 48: ldp x27, x28, [sp], #16 4c: ldp x26, x25, [sp], #16 50: ldp x26, x25, [sp], #16 54: ldp x21, x22, [sp], #16 58: ldp x19, x20, [sp], #16 5c: ldp fp, lr, [sp], #16 60: mov x0, x7 64: autiasp 68: ret Clearly, there is no need to save/restore unused callee-saved registers. This patch does this change, making the jited image to only save/restore the callee-saved registers it uses. Now the jited result of empty prog is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp xzr, x26, [sp, #-16]! 1c: mov x26, sp 20: bti j // tailcall target 24: mov x7, #0 28: ldp xzr, x26, [sp], #16 2c: ldp fp, lr, [sp], #16 30: mov x0, x7 34: autiasp 38: ret ==================== Acked-by: Puranjay Mohan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-28bpf, arm64: Avoid blindly saving/restoring all callee-saved registersXu Kuohai1-111/+183
The arm64 jit blindly saves/restores all callee-saved registers, making the jited result looks a bit too compliated. For example, for an empty prog, the jited result is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp x19, x20, [sp, #-16]! 1c: stp x21, x22, [sp, #-16]! 20: stp x26, x25, [sp, #-16]! 24: mov x26, #0 28: stp x26, x25, [sp, #-16]! 2c: mov x26, sp 30: stp x27, x28, [sp, #-16]! 34: mov x25, sp 38: bti j // tailcall target 3c: sub sp, sp, #0 40: mov x7, #0 44: add sp, sp, #0 48: ldp x27, x28, [sp], #16 4c: ldp x26, x25, [sp], #16 50: ldp x26, x25, [sp], #16 54: ldp x21, x22, [sp], #16 58: ldp x19, x20, [sp], #16 5c: ldp fp, lr, [sp], #16 60: mov x0, x7 64: autiasp 68: ret Clearly, there is no need to save/restore unused callee-saved registers. This patch does this change, making the jited image to only save/restore the callee-saved registers it uses. Now the jited result of empty prog is: 0: bti jc 4: mov x9, lr 8: nop c: paciasp 10: stp fp, lr, [sp, #-16]! 14: mov fp, sp 18: stp xzr, x26, [sp, #-16]! 1c: mov x26, sp 20: bti j // tailcall target 24: mov x7, #0 28: ldp xzr, x26, [sp], #16 2c: ldp fp, lr, [sp], #16 30: mov x0, x7 34: autiasp 38: ret Since bpf prog saves/restores its own callee-saved registers as needed, to make tailcall work correctly, the caller needs to restore its saved registers before tailcall, and the callee needs to save its callee-saved registers after tailcall. This extra restoring/saving instructions increases preformance overhead. [1] provides 2 benchmarks for tailcall scenarios. Below is the perf number measured in an arm64 KVM guest. The result indicates that the performance difference before and after the patch in typical tailcall scenarios is negligible. - Before: Performance counter stats for './test_progs -t tailcalls' (5 runs): 4313.43 msec task-clock # 0.874 CPUs utilized ( +- 0.16% ) 574 context-switches # 133.073 /sec ( +- 1.14% ) 0 cpu-migrations # 0.000 /sec 538 page-faults # 124.727 /sec ( +- 0.57% ) 10697772784 cycles # 2.480 GHz ( +- 0.22% ) (61.19%) 25511241955 instructions # 2.38 insn per cycle ( +- 0.08% ) (66.70%) 5108910557 branches # 1.184 G/sec ( +- 0.08% ) (72.38%) 2800459 branch-misses # 0.05% of all branches ( +- 0.51% ) (72.36%) TopDownL1 # 0.60 retiring ( +- 0.09% ) (66.84%) # 0.21 frontend_bound ( +- 0.15% ) (61.31%) # 0.12 bad_speculation ( +- 0.08% ) (50.11%) # 0.07 backend_bound ( +- 0.16% ) (33.30%) 8274201819 L1-dcache-loads # 1.918 G/sec ( +- 0.18% ) (33.15%) 468268 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 4.69% ) (33.16%) 385383 LLC-loads # 89.345 K/sec ( +- 5.22% ) (33.16%) 38296 LLC-load-misses # 9.94% of all LL-cache accesses ( +- 42.52% ) (38.69%) 6886576501 L1-icache-loads # 1.597 G/sec ( +- 0.35% ) (38.69%) 1848585 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 4.52% ) (44.23%) 9043645883 dTLB-loads # 2.097 G/sec ( +- 0.10% ) (44.33%) 416672 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 5.15% ) (49.89%) 6925626111 iTLB-loads # 1.606 G/sec ( +- 0.35% ) (55.46%) 66220 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.88% ) (55.50%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 4.9372 +- 0.0526 seconds time elapsed ( +- 1.07% ) Performance counter stats for './test_progs -t flow_dissector' (5 runs): 10924.50 msec task-clock # 0.945 CPUs utilized ( +- 0.08% ) 603 context-switches # 55.197 /sec ( +- 1.13% ) 0 cpu-migrations # 0.000 /sec 566 page-faults # 51.810 /sec ( +- 0.42% ) 27381270695 cycles # 2.506 GHz ( +- 0.18% ) (60.46%) 56996583922 instructions # 2.08 insn per cycle ( +- 0.21% ) (66.11%) 10321647567 branches # 944.816 M/sec ( +- 0.17% ) (71.79%) 3347735 branch-misses # 0.03% of all branches ( +- 3.72% ) (72.15%) TopDownL1 # 0.52 retiring ( +- 0.13% ) (66.74%) # 0.27 frontend_bound ( +- 0.14% ) (61.27%) # 0.14 bad_speculation ( +- 0.19% ) (50.36%) # 0.07 backend_bound ( +- 0.42% ) (33.89%) 18740797617 L1-dcache-loads # 1.715 G/sec ( +- 0.43% ) (33.71%) 13715669 L1-dcache-load-misses # 0.07% of all L1-dcache accesses ( +- 32.85% ) (33.34%) 4087551 LLC-loads # 374.164 K/sec ( +- 29.53% ) (33.26%) 267906 LLC-load-misses # 6.55% of all LL-cache accesses ( +- 23.90% ) (38.76%) 15811864229 L1-icache-loads # 1.447 G/sec ( +- 0.12% ) (38.73%) 2976833 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 9.73% ) (44.22%) 20138907471 dTLB-loads # 1.843 G/sec ( +- 0.18% ) (44.15%) 732850 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 11.18% ) (49.64%) 15895726702 iTLB-loads # 1.455 G/sec ( +- 0.15% ) (55.13%) 152075 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 4.71% ) (54.98%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 11.5613 +- 0.0317 seconds time elapsed ( +- 0.27% ) - After: Performance counter stats for './test_progs -t tailcalls' (5 runs): 4278.78 msec task-clock # 0.871 CPUs utilized ( +- 0.15% ) 569 context-switches # 132.982 /sec ( +- 0.58% ) 0 cpu-migrations # 0.000 /sec 539 page-faults # 125.970 /sec ( +- 0.43% ) 10588986432 cycles # 2.475 GHz ( +- 0.20% ) (60.91%) 25303825043 instructions # 2.39 insn per cycle ( +- 0.08% ) (66.48%) 5110756256 branches # 1.194 G/sec ( +- 0.07% ) (72.03%) 2719569 branch-misses # 0.05% of all branches ( +- 2.42% ) (72.03%) TopDownL1 # 0.60 retiring ( +- 0.22% ) (66.31%) # 0.22 frontend_bound ( +- 0.21% ) (60.83%) # 0.12 bad_speculation ( +- 0.26% ) (50.25%) # 0.06 backend_bound ( +- 0.17% ) (33.52%) 8163648527 L1-dcache-loads # 1.908 G/sec ( +- 0.33% ) (33.52%) 694979 L1-dcache-load-misses # 0.01% of all L1-dcache accesses ( +- 30.53% ) (33.52%) 1902347 LLC-loads # 444.600 K/sec ( +- 48.84% ) (33.69%) 96677 LLC-load-misses # 5.08% of all LL-cache accesses ( +- 43.48% ) (39.30%) 6863517589 L1-icache-loads # 1.604 G/sec ( +- 0.37% ) (39.17%) 1871519 L1-icache-load-misses # 0.03% of all L1-icache accesses ( +- 6.78% ) (44.56%) 8927782813 dTLB-loads # 2.087 G/sec ( +- 0.14% ) (44.37%) 438237 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 6.00% ) (49.75%) 6886906831 iTLB-loads # 1.610 G/sec ( +- 0.36% ) (55.08%) 67568 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 3.27% ) (54.86%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 4.9114 +- 0.0309 seconds time elapsed ( +- 0.63% ) Performance counter stats for './test_progs -t flow_dissector' (5 runs): 10948.40 msec task-clock # 0.942 CPUs utilized ( +- 0.05% ) 615 context-switches # 56.173 /sec ( +- 1.65% ) 1 cpu-migrations # 0.091 /sec ( +- 31.62% ) 567 page-faults # 51.788 /sec ( +- 0.44% ) 27334194328 cycles # 2.497 GHz ( +- 0.08% ) (61.05%) 56656528828 instructions # 2.07 insn per cycle ( +- 0.08% ) (66.67%) 10270389422 branches # 938.072 M/sec ( +- 0.10% ) (72.21%) 3453837 branch-misses # 0.03% of all branches ( +- 3.75% ) (72.27%) TopDownL1 # 0.52 retiring ( +- 0.16% ) (66.55%) # 0.27 frontend_bound ( +- 0.09% ) (60.91%) # 0.14 bad_speculation ( +- 0.08% ) (49.85%) # 0.07 backend_bound ( +- 0.16% ) (33.33%) 18982866028 L1-dcache-loads # 1.734 G/sec ( +- 0.24% ) (33.34%) 8802454 L1-dcache-load-misses # 0.05% of all L1-dcache accesses ( +- 52.30% ) (33.31%) 2612962 LLC-loads # 238.661 K/sec ( +- 29.78% ) (33.45%) 264107 LLC-load-misses # 10.11% of all LL-cache accesses ( +- 18.34% ) (39.07%) 15793205997 L1-icache-loads # 1.443 G/sec ( +- 0.15% ) (39.09%) 3930802 L1-icache-load-misses # 0.02% of all L1-icache accesses ( +- 3.72% ) (44.66%) 20097828496 dTLB-loads # 1.836 G/sec ( +- 0.09% ) (44.68%) 961757 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 3.32% ) (50.15%) 15838728506 iTLB-loads # 1.447 G/sec ( +- 0.09% ) (55.62%) 167652 iTLB-load-misses # 0.00% of all iTLB cache accesses ( +- 1.28% ) (55.52%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 11.6173 +- 0.0268 seconds time elapsed ( +- 0.23% ) [1] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Xu Kuohai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-28bpf, arm64: Get rid of fpbXu Kuohai1-93/+11
bpf prog accesses stack using BPF_FP as the base address and a negative immediate number as offset. But arm64 ldr/str instructions only support non-negative immediate number as offset. To simplify the jited result, commit 5b3d19b9bd40 ("bpf, arm64: Adjust the offset of str/ldr(immediate) to positive number") introduced FPB to represent the lowest stack address that the bpf prog being jited may access, and with this address as the baseline, it converts BPF_FP plus negative immediate offset number to FPB plus non-negative immediate offset. Considering that for a given bpf prog, the jited stack space is fixed with A64_SP as the lowest address and BPF_FP as the highest address. Thus we can get rid of FPB and converts BPF_FP plus negative immediate offset to A64_SP plus non-negative immediate offset. Signed-off-by: Xu Kuohai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-27samples/bpf: tracex4: Fix failed to create kretprobe 'kmem_cache_alloc_node+0x0'Rong Tao1-2/+2
commit 7bd230a26648 ("mm/slab: enable slab allocation tagging for kmalloc and friends") [1] swap kmem_cache_alloc_node() to kmem_cache_alloc_node_noprof(). linux/samples/bpf$ sudo ./tracex4 libbpf: prog 'bpf_prog2': failed to create kretprobe 'kmem_cache_alloc_node+0x0' perf event: No such file or directory ERROR: bpf_program__attach failed Signed-off-by: Rong Tao <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://github.com/torvalds/linux/commit/7bd230a26648ac68ab3731ebbc449090f0ac6a37 Link: https://lore.kernel.org/bpf/[email protected]
2024-08-23selftests/bpf: Add tests for bpf_copy_from_user_str kfunc.Jordan Rome4-7/+75
This adds tests for both the happy path and the error path. Signed-off-by: Jordan Rome <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23bpf: Add bpf_copy_from_user_str kfuncJordan Rome3-0/+60
This adds a kfunc wrapper around strncpy_from_user, which can be called from sleepable BPF programs. This matches the non-sleepable 'bpf_probe_read_user_str' helper except it includes an additional 'flags' param, which allows consumers to clear the entire destination buffer on success or failure. Signed-off-by: Jordan Rome <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: use simply-expanded variables for libpcap flagsEduard Zingerman1-3/+4
Save pkg-config output for libpcap as simply-expanded variables. For an obscure reason 'shell' call in LDLIBS/CFLAGS recursively expanded variables makes *.test.o files compilation non-parallel when make is executed with -j option. While at it, reuse 'pkg-config --cflags' call to define -DTRAFFIC_MONITOR=1 option, it's exit status is the same as for 'pkg-config --exists'. Fixes: f52403b6bfea ("selftests/bpf: Add traffic monitor functions.") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2024-08-23Merge branch 'support-bpf_kptr_xchg-into-local-kptr'Alexei Starovoitov8-48/+151
Amery Hung says: ==================== Support bpf_kptr_xchg into local kptr This revision adds substaintial changes to patch 2 to support structures with kptr as the only special btf type. The test is split into local_kptr_stash and task_kfunc_success to remove dependencies on bpf_testmod that would break veristat results. This series allows stashing kptr into local kptr. Currently, kptrs are only allowed to be stashed into map value with bpf_kptr_xchg(). A motivating use case of this series is to enable adding referenced kptr to bpf_rbtree or bpf_list by using allocated object as graph node and the storage of referenced kptr. For example, a bpf qdisc [0] enqueuing a referenced kptr to a struct sk_buff* to a bpf_list serving as a fifo: struct skb_node { struct sk_buff __kptr *skb; struct bpf_list_node node; }; private(A) struct bpf_spin_lock fifo_lock; private(A) struct bpf_list_head fifo __contains(skb_node, node); /* In Qdisc_ops.enqueue */ struct skb_node *skbn; skbn = bpf_obj_new(typeof(*skbn)); if (!skbn) goto drop; /* skb is a referenced kptr to struct sk_buff acquired earilier * but not shown in this code snippet. */ skb = bpf_kptr_xchg(&skbn->skb, skb); if (skb) /* should not happen; do something below releasing skb to * satisfy the verifier */ ... bpf_spin_lock(&fifo_lock); bpf_list_push_back(&fifo, &skbn->node); bpf_spin_unlock(&fifo_lock); The implementation first searches for BPF_KPTR when generating program BTF. Then, we teach the verifier that the detination argument of bpf_kptr_xchg() can be local kptr, and use the btf_record in program BTF to check against the source argument. This series is mostly developed by Dave, who kindly helped and sent me the patchset. The selftests in bpf qdisc (WIP) relies on this series to work. [0] https://lore.kernel.org/netdev/[email protected]/ --- v3 -> v4 - Allow struct in prog btf w/ kptr as the only special field type - Split tests of stashing referenced kptr and local kptr - v3: https://lore.kernel.org/bpf/[email protected]/ v2 -> v3 - Fix prog btf memory leak - Test stashing kptr in prog btf - Test unstashing kptrs after stashing into local kptrs - v2: https://lore.kernel.org/bpf/[email protected]/ v1 -> v2 - Fix the document for bpf_kptr_xchg() - Add a comment explaining changes in the verifier - v1: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: Test bpf_kptr_xchg stashing into local kptrDave Marchevsky2-3/+53
Test stashing both referenced kptr and local kptr into local kptrs. Then, test unstashing them. Acked-by: Martin KaFai Lau <[email protected]> Acked-by: Hou Tao <[email protected]> Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23bpf: Support bpf_kptr_xchg into local kptrDave Marchevsky3-20/+37
Currently, users can only stash kptr into map values with bpf_kptr_xchg(). This patch further supports stashing kptr into local kptr by adding local kptr as a valid destination type. When stashing into local kptr, btf_record in program BTF is used instead of btf_record in map to search for the btf_field of the local kptr. The local kptr specific checks in check_reg_type() only apply when the source argument of bpf_kptr_xchg() is local kptr. Therefore, we make the scope of the check explicit as the destination now can also be local kptr. Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23bpf: Rename ARG_PTR_TO_KPTR -> ARG_KPTR_XCHG_DESTDave Marchevsky3-5/+5
ARG_PTR_TO_KPTR is currently only used by the bpf_kptr_xchg helper. Although it limits reg types for that helper's first arg to PTR_TO_MAP_VALUE, any arbitrary mapval won't do: further custom verification logic ensures that the mapval reg being xchgd-into is pointing to a kptr field. If this is not the case, it's not safe to xchg into that reg's pointee. Let's rename the bpf_arg_type to more accurately describe the fairly specific expectations that this arg type encodes. This is a nonfunctional change. Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23bpf: Search for kptrs in prog BTF structsDave Marchevsky1-18/+52
Currently btf_parse_fields is used in two places to create struct btf_record's for structs: when looking at mapval type, and when looking at any struct in program BTF. The former looks for kptr fields while the latter does not. This patch modifies the btf_parse_fields call made when looking at prog BTF struct types to search for kptrs as well. Before this series there was no reason to search for kptrs in non-mapval types: a referenced kptr needs some owner to guarantee resource cleanup, and map values were the only owner that supported this. If a struct with a kptr field were to have some non-kptr-aware owner, the kptr field might not be properly cleaned up and result in resources leaking. Only searching for kptr fields in mapval was a simple way to avoid this problem. In practice, though, searching for BPF_KPTR when populating struct_meta_tab does not expose us to this risk, as struct_meta_tab is only accessed through btf_find_struct_meta helper, and that helper is only called in contexts where recognizing the kptr field is safe: * PTR_TO_BTF_ID reg w/ MEM_ALLOC flag * Such a reg is a local kptr and must be free'd via bpf_obj_drop, which will correctly handle kptr field * When handling specific kfuncs which either expect MEM_ALLOC input or return MEM_ALLOC output (obj_{new,drop}, percpu_obj_{new,drop}, list+rbtree funcs, refcount_acquire) * Will correctly handle kptr field for same reasons as above * When looking at kptr pointee type * Called by functions which implement "correct kptr resource handling" * In btf_check_and_fixup_fields * Helper that ensures no ownership loops for lists and rbtrees, doesn't care about kptr field existence So we should be able to find BPF_KPTR fields in all prog BTF structs without leaking resources. Further patches in the series will build on this change to support kptr_xchg into non-mapval local kptr. Without this change there would be no kptr field found in such a type. Acked-by: Martin KaFai Lau <[email protected]> Acked-by: Hou Tao <[email protected]> Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23bpf: Let callers of btf_parse_kptr() track life cycle of prog btfAmery Hung2-3/+5
btf_parse_kptr() and btf_record_free() do btf_get() and btf_put() respectively when working on btf_record in program and map if there are kptr fields. If the kptr is from program BTF, since both callers has already tracked the life cycle of program BTF, it is safe to remove the btf_get() and btf_put(). This change prevents memory leak of program BTF later when we start searching for kptr fields when building btf_record for program. It can happen when the btf fd is closed. The btf_put() corresponding to the btf_get() in btf_parse_kptr() was supposed to be called by btf_record_free() in btf_free_struct_meta_tab() in btf_free(). However, it will never happen since the invocation of btf_free() depends on the refcount of the btf to become 0 in the first place. Acked-by: Martin KaFai Lau <[email protected]> Acked-by: Hou Tao <[email protected]> Signed-off-by: Amery Hung <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: add multi-uprobe benchmarksAndrii Nakryiko3-15/+85
Add multi-uprobe and multi-uretprobe benchmarks to bench tool. Multi- and classic uprobes/uretprobes have different low-level triggering code paths, so it's sometimes important to be able to benchmark both flavors of uprobes/uretprobes. Sample examples from my dev machine below. Single-threaded peformance almost doesn't differ, but with more parallel CPUs triggering the same uprobe/uretprobe the difference grows. This might be due to [0], but given the code is slightly different, there could be other sources of slowdown. Note, all these numbers will change due to ongoing work to improve uprobe/uretprobe scalability (e.g., [1]), but having benchmark like this is useful for measurements and debugging nevertheless. \#!/bin/bash set -eufo pipefail for p in 1 8 16 32; do for i in uprobe-nop uretprobe-nop uprobe-multi-nop uretprobe-multi-nop; do summary=$(sudo ./bench -w1 -d3 -p$p -a trig-$i | tail -n1) total=$(echo "$summary" | cut -d'(' -f1 | cut -d' ' -f3-) percpu=$(echo "$summary" | cut -d'(' -f2 | cut -d')' -f1 | cut -d'/' -f1) printf "%-21s (%2d cpus): %s (%s/s/cpu)\n" $i $p "$total" "$percpu" done echo done uprobe-nop ( 1 cpus): 1.020 ± 0.005M/s ( 1.020M/s/cpu) uretprobe-nop ( 1 cpus): 0.515 ± 0.009M/s ( 0.515M/s/cpu) uprobe-multi-nop ( 1 cpus): 1.036 ± 0.004M/s ( 1.036M/s/cpu) uretprobe-multi-nop ( 1 cpus): 0.512 ± 0.005M/s ( 0.512M/s/cpu) uprobe-nop ( 8 cpus): 3.481 ± 0.030M/s ( 0.435M/s/cpu) uretprobe-nop ( 8 cpus): 2.222 ± 0.008M/s ( 0.278M/s/cpu) uprobe-multi-nop ( 8 cpus): 3.769 ± 0.094M/s ( 0.471M/s/cpu) uretprobe-multi-nop ( 8 cpus): 2.482 ± 0.007M/s ( 0.310M/s/cpu) uprobe-nop (16 cpus): 2.968 ± 0.011M/s ( 0.185M/s/cpu) uretprobe-nop (16 cpus): 1.870 ± 0.002M/s ( 0.117M/s/cpu) uprobe-multi-nop (16 cpus): 3.541 ± 0.037M/s ( 0.221M/s/cpu) uretprobe-multi-nop (16 cpus): 2.123 ± 0.026M/s ( 0.133M/s/cpu) uprobe-nop (32 cpus): 2.524 ± 0.026M/s ( 0.079M/s/cpu) uretprobe-nop (32 cpus): 1.572 ± 0.003M/s ( 0.049M/s/cpu) uprobe-multi-nop (32 cpus): 2.717 ± 0.003M/s ( 0.085M/s/cpu) uretprobe-multi-nop (32 cpus): 1.687 ± 0.007M/s ( 0.053M/s/cpu) [0] https://lore.kernel.org/linux-trace-kernel/[email protected]/ [1] https://lore.kernel.org/linux-trace-kernel/[email protected]/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: make use of PROCMAP_QUERY ioctl if availableAndrii Nakryiko3-15/+94
Instead of parsing text-based /proc/<pid>/maps file, try to use PROCMAP_QUERY ioctl() to simplify and speed up data fetching. This logic is used to do uprobe file offset calculation, so any bugs in this logic would manifest as failing uprobe BPF selftests. This also serves as a simple demonstration of one of the intended uses. Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23Merge branch 'follow-up-for-__jited-test-tag'Alexei Starovoitov3-9/+21
Eduard Zingerman says: ==================== follow up for __jited test tag This patch-set is a collection of follow-ups for "__jited test tag to check disassembly after jit" series (see [1]). First patch is most important: as it turns out, I broke all test_loader based tests for s390 CI. E.g. see log [2] for s390 execution of test_progs, note all 'verivier_*' tests being skipped. This happens because of incorrect handling of corner case when get_current_arch() does not know which architecture to return. Second patch makes matching of function return sequence in verifier_tailcall_jit more flexible: -__jited(" retq") +__jited(" {{(retq|jmp 0x)}}") The difference could be seen with and w/o mitigations=off boot parameter for test VM (CI runs with mitigations=off, hence it generates retq). Third patch addresses Alexei's request to add #define and a comment in jit_disasm_helpers.c. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://github.com/kernel-patches/bpf/actions/runs/10518445973/job/29144511595 ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: #define LOCAL_LABEL_LEN for jit_disasm_helpers.cEduard Zingerman1-3/+14
Extract local label length as a #define directive and elaborate why 'i % MAX_LOCAL_LABELS' expression is needed for local labels array initialization. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: match both retq/rethunk in verifier_tailcall_jitEduard Zingerman1-2/+2
Depending on kernel parameters, x86 jit generates either retq or jump to rethunk for 'exit' instruction. The difference could be seen when kernel is booted with and without mitigations=off parameter. Relax the verifier_tailcall_jit test case to match both variants. Fixes: e5bdd6a8be78 ("selftests/bpf: validate jit behaviour for tail calls") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-23selftests/bpf: test_loader.c:get_current_arch() should not return 0Eduard Zingerman1-4/+5
At the moment, when test_loader.c:get_current_arch() can't determine the arch, it returns 0. The arch check in run_subtest() looks as follows: if ((get_current_arch() & spec->arch_mask) == 0) { test__skip(); return; } Which means that all test_loader based tests would be skipped if arch could not be determined. get_current_arch() recognizes x86_64, arm64 and riscv64. Which means that CI skips test_loader tests for s390. Fix this by making sure that get_current_arch() always returns non-zero value. In combination with default spec->arch_mask == -1 this should cover all possibilities. Fixes: f406026fefa7 ("selftests/bpf: by default use arch mask allowing all archs") Fixes: 7d743e4c759c ("selftests/bpf: __jited test tag to check disassembly after jit") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22selftests/bpf: Add testcase for updating attached freplace prog to ↵Leon Hwang3-1/+109
prog_array map Add a selftest to confirm the issue, which gets -EINVAL when update attached freplace prog to prog_array map, has been fixed. cd tools/testing/selftests/bpf; ./test_progs -t tailcalls 328/25 tailcalls/tailcall_freplace:OK 328 tailcalls:OK Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED Acked-by: Yonghong Song <[email protected]> Signed-off-by: Leon Hwang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov1334-9273/+19542
Cross-merge bpf fixes after downstream PR including important fixes (from bpf-next point of view): commit 41c24102af7b ("selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp") commit fdad456cbcca ("bpf: Fix updating attached freplace prog in prog_array map") No conflicts. Adjacent changes in: include/linux/bpf_verifier.h kernel/bpf/verifier.c tools/testing/selftests/bpf/Makefile Link: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22Merge branch 'support-bpf_fastcall-patterns-for-calls-to-kfuncs'Alexei Starovoitov7-102/+192
Eduard Zingerman says: ==================== support bpf_fastcall patterns for calls to kfuncs As an extension of [1], allow bpf_fastcall patterns for kfuncs: - pattern rules are the same as for helpers; - spill/fill removal is allowed only for kfuncs listed in the is_fastcall_kfunc_call (under assumption that such kfuncs would always be members of special_kfunc_list). Allow bpf_fastcall rewrite for bpf_cast_to_kern_ctx() and bpf_rdonly_cast() in order to conjure selftests for this feature. After this patch-set verifier would rewrite the program below: r2 = 1 *(u64 *)(r10 - 32) = r2 call %[bpf_cast_to_kern_ctx] r2 = *(u64 *)(r10 - 32) r0 = r2;" As follows: r2 = 1 /* spill/fill at r10[-32] is removed */ r0 = r1 /* replacement for bpf_cast_to_kern_ctx() */ r0 = r2 exit Also, attribute used by LLVM implementation of the feature had been changed from no_caller_saved_registers to bpf_fastcall (see [2]). This patch-set replaces references to nocsr by references to bpf_fastcall to keep LLVM and Kernel parts in sync. [1] no_caller_saved_registers attribute for helper calls https://lore.kernel.org/bpf/[email protected]/ [2] [BPF] introduce __attribute__((bpf_fastcall)) https://github.com/llvm/llvm-project/pull/105417 Changes v2->v3: - added a patch fixing arch_mask handling in test_loader, otherwise newly added tests for the feature were skipped (a fix for regression introduced by a recent commit); - fixed warning regarding unused 'params' variable; - applied stylistical fixes suggested by Yonghong; - added acks from Yonghong; Changes v1->v2: - added two patches replacing all mentions of nocsr by bpf_fastcall (suggested by Andrii); - removed KF_NOCSR flag (suggested by Yonghong). v1: https://lore.kernel.org/bpf/[email protected]/ v2: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22selftests/bpf: check if bpf_fastcall is recognized for kfuncsEduard Zingerman1-0/+55
Use kfunc_bpf_cast_to_kern_ctx() and kfunc_bpf_rdonly_cast() to verify that bpf_fastcall pattern is recognized for kfunc calls. Acked-by: Yonghong Song <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22selftests/bpf: by default use arch mask allowing all archsEduard Zingerman1-1/+1
If test case does not specify architecture via __arch_* macro consider that it should be run for all architectures. Fixes: 7d743e4c759c ("selftests/bpf: __jited test tag to check disassembly after jit") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22bpf: allow bpf_fastcall for bpf_cast_to_kern_ctx and bpf_rdonly_castEduard Zingerman1-0/+3
do_misc_fixups() relaces bpf_cast_to_kern_ctx() and bpf_rdonly_cast() by a single instruction "r0 = r1". This follows bpf_fastcall contract. This commit allows bpf_fastcall pattern rewrite for these two functions in order to use them in bpf_fastcall selftests. Acked-by: Yonghong Song <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22bpf: support bpf_fastcall patterns for kfuncsEduard Zingerman1-1/+34
Recognize bpf_fastcall patterns around kfunc calls. For example, suppose bpf_cast_to_kern_ctx() follows bpf_fastcall contract (which it does), in such a case allow verifier to rewrite BPF program below: r2 = 1; *(u64 *)(r10 - 32) = r2; call %[bpf_cast_to_kern_ctx]; r2 = *(u64 *)(r10 - 32); r0 = r2; By removing the spill/fill pair: r2 = 1; call %[bpf_cast_to_kern_ctx]; r0 = r2; Acked-by: Yonghong Song <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22selftests/bpf: rename nocsr -> bpf_fastcall in selftestsEduard Zingerman2-15/+15
Attribute used by LLVM implementation of the feature had been changed from no_caller_saved_registers to bpf_fastcall (see [1]). This commit replaces references to nocsr by references to bpf_fastcall to keep LLVM and selftests parts in sync. [1] https://github.com/llvm/llvm-project/pull/105417 Acked-by: Yonghong Song <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22bpf: rename nocsr -> bpf_fastcall in verifierEduard Zingerman4-85/+84
Attribute used by LLVM implementation of the feature had been changed from no_caller_saved_registers to bpf_fastcall (see [1]). This commit replaces references to nocsr by references to bpf_fastcall to keep LLVM and Kernel parts in sync. [1] https://github.com/llvm/llvm-project/pull/105417 Acked-by: Yonghong Song <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22bpf: Fix percpu address space issuesUros Bizjak4-16/+17
In arraymap.c: In bpf_array_map_seq_start() and bpf_array_map_seq_next() cast return values from the __percpu address space to the generic address space via uintptr_t [1]. Correct the declaration of pptr pointer in __bpf_array_map_seq_show() to void __percpu * and cast the value from the generic address space to the __percpu address space via uintptr_t [1]. In hashtab.c: Assign the return value from bpf_mem_cache_alloc() to void pointer and cast the value to void __percpu ** (void pointer to percpu void pointer) before dereferencing. In memalloc.c: Explicitly declare __percpu variables. Cast obj to void __percpu **. In helpers.c: Cast ptr in BPF_CALL_1 and BPF_CALL_2 from generic address space to __percpu address space via const uintptr_t [1]. Found by GCC's named address space checks. There were no changes in the resulting object files. [1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name Signed-off-by: Uros Bizjak <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Song Liu <[email protected]> Cc: Yonghong Song <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-08-22Merge branch ↵Alexei Starovoitov2-0/+133
'bpf-fix-null-pointer-access-for-malformed-bpf_core_type_id_local-relos' Eduard Zingerman says: ==================== bpf: fix null pointer access for malformed BPF_CORE_TYPE_ID_LOCAL relos Liu RuiTong reported an in-kernel null pointer derefence when processing BPF_CORE_TYPE_ID_LOCAL relocations referencing non-existing BTF types. Fix this by adding proper id checks. Changes v2->v3: - selftest update suggested by Andrii: avoid memset(0) for log buffer and do memset(0) for bpf_attr. Changes v1->v2: - moved check from bpf_core_calc_relo_insn() to bpf_core_apply() now both in kernel and in libbpf relocation type id is guaranteed to exist when bpf_core_calc_relo_insn() is called; - added a test case. v1: https://lore.kernel.org/bpf/[email protected]/ v2: https://lore.kernel.org/bpf/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>