aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-12-10bpf: use check_ids() for active_lock comparisonEduard Zingerman1-3/+13
An update for verifier.c:states_equal()/regsafe() to use check_ids() for active spin lock comparisons. This fixes the issue reported by Kumar Kartikeya Dwivedi in [1] using technique suggested by Edward Cree. W/o this commit the verifier might be tricked to accept the following program working with a map containing spin locks: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: if r9 == 0 goto exit ; r9 -> PTR_TO_MAP_VALUE. 3: if r8 == 0 goto exit ; r8 -> PTR_TO_MAP_VALUE. 4: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 5: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 6: bpf_spin_lock(r8) ; active_lock.id == 2. 7: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 8: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 9: bpf_spin_unlock(r9) ; (a,b) active_lock.id == 2. ; (a) r9.id == 2, (b) r9.id == 1. 10: exit(0) Consider two verification paths: (a) 0-10 (b) 0-7,9-10 The path (a) is verified first. If checkpoint is created at (8) the (b) would assume that (8) is safe because regsafe() does not compare register ids for registers of type PTR_TO_MAP_VALUE. [1] https://lore.kernel.org/bpf/[email protected]/ Reported-by: Kumar Kartikeya Dwivedi <[email protected]> Suggested-by: Edward Cree <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-10selftests/bpf: verify states_equal() maintains idmap across all framesEduard Zingerman1-0/+82
A test case that would erroneously pass verification if verifier.c:states_equal() maintains separate register ID mappings for call frames. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-10bpf: states_equal() must build idmap for all function framesEduard Zingerman2-3/+4
verifier.c:states_equal() must maintain register ID mapping across all function frames. Otherwise the following example might be erroneously marked as safe: main: fp[-24] = map_lookup_elem(...) ; frame[0].fp[-24].id == 1 fp[-32] = map_lookup_elem(...) ; frame[0].fp[-32].id == 2 r1 = &fp[-24] r2 = &fp[-32] call foo() r0 = 0 exit foo: 0: r9 = r1 1: r8 = r2 2: r7 = ktime_get_ns() 3: r6 = ktime_get_ns() 4: if (r6 > r7) goto skip_assign 5: r9 = r8 skip_assign: ; <--- checkpoint 6: r9 = *r9 ; (a) frame[1].r9.id == 2 ; (b) frame[1].r9.id == 1 7: if r9 == 0 goto exit: ; mark_ptr_or_null_regs() transfers != 0 info ; for all regs sharing ID: ; (a) r9 != 0 => &frame[0].fp[-32] != 0 ; (b) r9 != 0 => &frame[0].fp[-24] != 0 8: r8 = *r8 ; (a) r8 == &frame[0].fp[-32] ; (b) r8 == &frame[0].fp[-32] 9: r0 = *r8 ; (a) safe ; (b) unsafe exit: 10: exit While processing call to foo() verifier considers the following execution paths: (a) 0-10 (b) 0-4,6-10 (There is also path 0-7,10 but it is not interesting for the issue at hand. (a) is verified first.) Suppose that checkpoint is created at (6) when path (a) is verified, next path (b) is verified and (6) is reached. If states_equal() maintains separate 'idmap' for each frame the mapping at (6) for frame[1] would be empty and regsafe(r9)::check_ids() would add a pair 2->1 and return true, which is an error. If states_equal() maintains single 'idmap' for all frames the mapping at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would return false when trying to add a pair 2->1. This issue was suggested in the following discussion: https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/ Suggested-by: Andrii Nakryiko <[email protected]> Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-10selftests/bpf: test cases for regsafe() bug skipping check_id()Eduard Zingerman2-0/+103
Under certain conditions it was possible for verifier.c:regsafe() to skip check_id() call. This commit adds negative test cases previously errorneously accepted as safe. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-10bpf: regsafe() must not skip check_ids()Eduard Zingerman1-21/+8
The verifier.c:regsafe() has the following shortcut: equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0; ... if (equal) return true; Which is executed regardless old register type. This is incorrect for register types that might have an ID checked by check_ids(), namely: - PTR_TO_MAP_KEY - PTR_TO_MAP_VALUE - PTR_TO_PACKET_META - PTR_TO_PACKET The following pattern could be used to exploit this: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 3: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 4: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 5: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 6: if r9 == 0 goto <exit> ; Nullness info is propagated to all ; registers with matching ID. 7: r1 = *(u64 *) r8 ; Not always safe. Verifier first visits path 1-7 where r8 is verified to be not null at (6). Later the jump from 4 to 6 is examined. The checkpoint for (6) looks as follows: R8_rD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9_rwD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R10=fp0 The current state is: R0=... R6=... R7=... fp-8=... R8=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9=map_value_or_null(id=1,off=0,ks=4,vs=8,imm=0) R10=fp0 Note that R8 states are byte-to-byte identical, so regsafe() would exit early and skip call to check_ids(), thus ID mapping 2->2 will not be added to 'idmap'. Next, states for R9 are compared: these are not identical and check_ids() is executed, but 'idmap' is empty, so check_ids() adds mapping 2->1 to 'idmap' and returns success. This commit pushes the 'equal' down to register types that don't need check_ids(). Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-09docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGEDonald Hunter1-0/+155
Add documentation for the BPF_MAP_TYPE_SK_STORAGE including kernel version introduced, usage and examples. Signed-off-by: Donald Hunter <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08Merge branch 'Dynptr refactorings'Alexei Starovoitov13-202/+369
Kumar Kartikeya Dwivedi says: ==================== This is part 1 of https://lore.kernel.org/bpf/[email protected]. This thread also gives some background on why the refactor is being done: https://lore.kernel.org/bpf/CAEf4Bzb4beTHgVo+G+jehSj8oCeAjRbRcm6MRe=Gr+cajRBwEw@mail.gmail.com As requested in patch 6 by Alexei, it only includes patches which refactors the code, on top of which further fixes will be made in part 2. The refactor itself fixes another issue as a side effect. No functional change is intended (except a few modified log messages). Changelog: ---------- v1 -> v2 v1: https://lore.kernel.org/bpf/[email protected] * Address feedback from Joanne and David, add acks Fixes v1 -> v1 Fixes v1: https://lore.kernel.org/bpf/[email protected] * Collect acks from Joanne and David * Fix misc nits pointed out by Joanne, David * Split move of reg->off alignment check for dynptr into separate change (Alexei) ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08selftests/bpf: Add test for dynptr reinit in user_ringbuf callbackKumar Kartikeya Dwivedi2-8/+45
The original support for bpf_user_ringbuf_drain callbacks simply short-circuited checks for the dynptr state, allowing users to pass PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a dynptr. This bug would have also surfaced with other dynptr helpers in the future that changed dynptr view or modified it in some way. Include test cases for all cases, i.e. both bpf_dynptr_from_mem and bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them. Without the fix, both of these programs load and pass verification. While at it, remove sys_nanosleep target from failure cases' SEC definition, as there is no such tracepoint. Acked-by: David Vernet <[email protected]> Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Use memmove for bpf_dynptr_{read,write}Kumar Kartikeya Dwivedi1-2/+10
It may happen that destination buffer memory overlaps with memory dynptr points to. Hence, we must use memmove to correctly copy from dynptr to destination buffer, or source buffer to dynptr. This actually isn't a problem right now, as memcpy implementation falls back to memmove on detecting overlap and warns about it, but we shouldn't be relying on that. Acked-by: Joanne Koong <[email protected]> Acked-by: David Vernet <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Move PTR_TO_STACK alignment check to process_dynptr_funcKumar Kartikeya Dwivedi1-5/+8
After previous commit, we are minimizing helper specific assumptions from check_func_arg_reg_off, making it generic, and offloading checks for a specific argument type to their respective functions called after check_func_arg_reg_off has been called. This allows relying on a consistent set of guarantees after that call and then relying on them in code that deals with registers for each argument type later. This is in line with how process_spin_lock, process_timer_func, process_kptr_func check reg->var_off to be constant. The same reasoning is used here to move the alignment check into process_dynptr_func. Note that it also needs to check for constant var_off, and accumulate the constant var_off when computing the spi in get_spi, but that fix will come in later changes. Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Rework check_func_arg_reg_offKumar Kartikeya Dwivedi3-27/+40
While check_func_arg_reg_off is the place which performs generic checks needed by various candidates of reg->type, there is some handling for special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and ARG_PTR_TO_RINGBUF_MEM. This commit aims to streamline these special cases and instead leave other things up to argument type specific code to handle. The function will be restrictive by default, and cover all possible cases when OBJ_RELEASE is set, without having to update the function again (and missing to do that being a bug). This is done primarily for two reasons: associating back reg->type to its argument leaves room for the list getting out of sync when a new reg->type is supported by an arg_type. The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something we already handle, whenever a release argument is expected, it should be passed as the pointer that was received from the acquire function. Hence zero fixed and variable offset. There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed with non-zero offset to other helper functions, which makes sense. Hence, lift the arg_type_is_release check for reg->off and cover all possible register types, instead of duplicating the same kind of check twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id). For the release argument, arg_type_is_dynptr is the special case, where we go to actual object being freed through the dynptr, so the offset of the pointer still needs to allow fixed and variable offset and process_dynptr_func will verify them later for the release argument case as well. This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make this exception for any future object on the stack that needs to be released. In this sense, PTR_TO_STACK as a candidate for object on stack argument is a special case for release offset checks, and they need to be done by the helper releasing the object on stack. Since the check has been lifted above all register type checks, remove the duplicated check that is being done for PTR_TO_BTF_ID. Acked-by: Joanne Koong <[email protected]> Acked-by: David Vernet <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Rework process_dynptr_funcKumar Kartikeya Dwivedi7-79/+191
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type for use in callback state, because in case of user ringbuf helpers, there is no dynptr on the stack that is passed into the callback. To reflect such a state, a special register type was created. However, some checks have been bypassed incorrectly during the addition of this feature. First, for arg_type with MEM_UNINIT flag which initialize a dynptr, they must be rejected for such register type. Secondly, in the future, there are plans to add dynptr helpers that operate on the dynptr itself and may change its offset and other properties. In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed to such helpers, however the current code simply returns 0. The rejection for helpers that release the dynptr is already handled. For fixing this, we take a step back and rework existing code in a way that will allow fitting in all classes of helpers and have a coherent model for dealing with the variety of use cases in which dynptr is used. First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together with a DYNPTR_TYPE_* constant that denotes the only type it accepts. Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this fact. To make the distinction clear, use MEM_RDONLY flag to indicate that the helper only operates on the memory pointed to by the dynptr, not the dynptr itself. In C parlance, it would be equivalent to taking the dynptr as a point to const argument. When either of these flags are not present, the helper is allowed to mutate both the dynptr itself and also the memory it points to. Currently, the read only status of the memory is not tracked in the dynptr, but it would be trivial to add this support inside dynptr state of the register. With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to better reflect its usage, it can no longer be passed to helpers that initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr. A note to reviewers is that in code that does mark_stack_slots_dynptr, and unmark_stack_slots_dynptr, we implicitly rely on the fact that PTR_TO_STACK reg is the only case that can reach that code path, as one cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In both cases such helpers won't be setting that flag. The next patch will add a couple of selftest cases to make sure this doesn't break. Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper") Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Propagate errors from process_* checks in check_func_argKumar Kartikeya Dwivedi1-10/+15
Currently, we simply ignore the errors in process_spin_lock, process_timer_func, process_kptr_func, process_dynptr_func. Instead, bubble up the error by storing and checking err variable. Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_funcKumar Kartikeya Dwivedi4-86/+75
ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where the underlying register type is subjected to more special checks to determine the type of object represented by the pointer and its state consistency. Move dynptr checks to their own 'process_dynptr_func' function so that is consistent and in-line with existing code. This also makes it easier to reuse this code for kfunc handling. Then, reuse this consolidated function in kfunc dynptr handling too. Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has been lifted. Acked-by: David Vernet <[email protected]> Acked-by: Joanne Koong <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08Merge branch 'Misc optimizations for bpf mem allocator'Alexei Starovoitov1-4/+27
Hou Tao says: ==================== From: Hou Tao <[email protected]> Hi, The patchset is just misc optimizations for bpf mem allocator. Patch 1 fixes the OOM problem found during running hash-table update benchmark from qp-trie patchset [0]. The benchmark will add htab elements in batch and then delete elements in batch, so freed objects will stack on free_by_rcu and wait for the expiration of RCU grace period. There can be tens of thousands of freed objects and these objects are not available for new allocation, so adding htab element will continue to do new allocation. For the benchmark commmand: "./bench -w3 -d10 -a htab-update -p 16", even the maximum entries of htab is 16384, key_size is 255 and value_size is 4, the peak memory usage will reach 14GB or more. Increasing rcupdate.rcu_task_enqueue_lim will decrease the peak memory to 860MB, but it is still too many. Although the above case is contrived, it is better to fix it and the fixing is simple: just reusing the freed objects in free_by_rcu during allocation. After the fix, the peak memory usage will decrease to 26MB. Beside above case, the memory blow-up problem is also possible when allocation and freeing are done on total different CPUs. I'm trying to fix the blow-up problem by using a global per-cpu work to free these objects in free_by_rcu timely, but it doesn't work very well and I am still digging into it. Patch 2 is a left-over patch from rcu_trace_implies_rcu_gp() patchset [1]. After disscussing with Paul [2], I think it is also safe to skip rcu_barrier() when rcu_trace_implies_rcu_gp() returns true. Comments are always welcome. Change Log: v2: * Patch 1: repharse the commit message (Suggested by Yonghong & Alexei) * Add Acked-by for both patch 1 and 2 v1: https://lore.kernel.org/bpf/[email protected] [0]: https://lore.kernel.org/bpf/[email protected]/ [1]: https://lore.kernel.org/bpf/[email protected]/ [2]: https://lore.kernel.org/bpf/20221021185002.GP5600@paulmck-ThinkPad-P17-Gen-1/ ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is trueHou Tao1-1/+9
If there are pending rcu callback, free_mem_alloc() will use rcu_barrier_tasks_trace() and rcu_barrier() to wait for the pending __free_rcu_tasks_trace() and __free_rcu() callback. If rcu_trace_implies_rcu_gp() is true, there will be no pending __free_rcu(), so it will be OK to skip rcu_barrier() as well. Acked-by: Yonghong Song <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Reuse freed element in free_by_rcu during allocationHou Tao1-3/+18
When there are batched freeing operations on a specific CPU, part of the freed elements ((high_watermark - lower_watermark) / 2 + 1) will be indirectly moved into waiting_for_gp list through free_by_rcu list. After call_rcu_in_progress becomes false again, the remaining elements in free_by_rcu list will be moved to waiting_for_gp list by the next invocation of free_bulk(). However if the expiration of RCU tasks trace grace period is relatively slow, none element in free_by_rcu list will be moved. So instead of invoking __alloc_percpu_gfp() or kmalloc_node() to allocate a new object, in alloc_bulk() just check whether or not there is freed element in free_by_rcu list and reuse it if available. Acked-by: Yonghong Song <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08selftests/bpf: Bring test_offload.py back to lifeStanislav Fomichev1-3/+5
Bpftool has new extra libbpf_det_bind probing map we need to exclude. Also skip trying to load netdevsim modules if it's already loaded (builtin). v2: - drop iproute2->bpftool changes (Toke) Signed-off-by: Stanislav Fomichev <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-08bpf: Fix comment error in fixup_kfunc_call functionYang Jihong1-1/+1
insn->imm for kfunc is the relative address of __bpf_call_base, instead of __bpf_base_call, Fix the comment error. Signed-off-by: Yang Jihong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-08bpf: Do not zero-extend kfunc return valuesBjörn Töpel1-0/+4
In BPF all global functions, and BPF helpers return a 64-bit value. For kfunc calls, this is not the case, and they can return e.g. 32-bit values. The return register R0 for kfuncs calls can therefore be marked as subreg_def != DEF_NOT_SUBREG. In general, if a register is marked with subreg_def != DEF_NOT_SUBREG, some archs (where bpf_jit_needs_zext() returns true) require the verifier to insert explicit zero-extension instructions. For kfuncs calls, however, the caller should do sign/zero extension for return values. In other words, the compiler is responsible to insert proper instructions, not the verifier. An example, provided by Yonghong Song: $ cat t.c extern unsigned foo(void); unsigned bar1(void) { return foo(); } unsigned bar2(void) { if (foo()) return 10; else return 20; } $ clang -target bpf -mcpu=v3 -O2 -c t.c && llvm-objdump -d t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <bar1>: 0: 85 10 00 00 ff ff ff ff call -0x1 1: 95 00 00 00 00 00 00 00 exit 0000000000000010 <bar2>: 2: 85 10 00 00 ff ff ff ff call -0x1 3: bc 01 00 00 00 00 00 00 w1 = w0 4: b4 00 00 00 14 00 00 00 w0 = 0x14 5: 16 01 01 00 00 00 00 00 if w1 == 0x0 goto +0x1 <LBB1_2> 6: b4 00 00 00 0a 00 00 00 w0 = 0xa 0000000000000038 <LBB1_2>: 7: 95 00 00 00 00 00 00 00 exit If the return value of 'foo()' is used in the BPF program, the proper zero-extension will be done. Currently, the verifier correctly marks, say, a 32-bit return value as subreg_def != DEF_NOT_SUBREG, but will fail performing the actual zero-extension, due to a verifier bug in opt_subreg_zext_lo32_rnd_hi32(). load_reg is not properly set to R0, and the following path will be taken: if (WARN_ON(load_reg == -1)) { verbose(env, "verifier bug. zext_dst is set, but no reg is defined\n"); return -EFAULT; } A longer discussion from v1 can be found in the link below. Correct the verifier by avoiding doing explicit zero-extension of R0 for kfunc calls. Note that R0 will still be marked as a sub-register for return values smaller than 64-bit. Fixes: 83a2881903f3 ("bpf: Account for BPF_FETCH in insn_has_def32()") Link: https://lore.kernel.org/bpf/[email protected]/ Suggested-by: Yonghong Song <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07Merge branch 'Document some recent core kfunc additions'Alexei Starovoitov2-2/+200
David Vernet says: ==================== A series of recent patch sets introduced kfuncs that allowed struct task_struct and struct cgroup objects to be used as kptrs. These were introduced in [0], [1], and [2]. [0]: https://lore.kernel.org/lkml/[email protected]/ [1]: https://lore.kernel.org/lkml/[email protected]/T/ [2]: https://lore.kernel.org/lkml/[email protected]/ These are "core" kfuncs, in that they may be used by a wide variety of possible BPF tracepoint or struct_ops programs, and are defined in kernel/bpf/helpers.c. Even though as kfuncs they have no ABI stability guarantees, they should still be properly documented. This patch set adds that documentation. Some other kfuncs were added recently as well, such as bpf_rcu_read_lock() and bpf_rcu_read_unlock(). Those could and should be added to this "Core kfuncs" section as well in subsequent patch sets. Note that this patch set does not contain documentation for bpf_task_acquire_not_zero(), or bpf_task_kptr_get(). As discussed in [3], those kfuncs currently always return NULL pending resolution on how to properly protect their arguments using RCU. [3]: https://lore.kernel.org/all/[email protected]/ --- Changelog: v2 -> v3: - Don't document bpf_task_kptr_get(), and instead provide a more substantive example for bpf_cgroup_kptr_get(). - Further clarify expected behavior of bpf_task_from_pid() in comments (Alexei) v1 -> v2: - Expand comment to specify that a map holds a reference to a task kptr if we don't end up releasing it (Alexei) - Just read task->pid instead of using a probed read (Alexei) ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07bpf/docs: Document struct cgroup * kfuncsDavid Vernet2-1/+116
bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and bpf_cgroup_ancestor(), are kfuncs that were recently added to kernel/bpf/helpers.c. These are "core" kfuncs in that they're available for use in any tracepoint or struct_ops BPF program. Though they have no ABI stability guarantees, we should still document them. This patch adds a struct cgroup * subsection to the Core kfuncs section which describes each of these kfuncs. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07bpf/docs: Document struct task_struct * kfuncsDavid Vernet2-1/+84
bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are kfuncs that were recently added to kernel/bpf/helpers.c. These are "core" kfuncs in that they're available for use for any tracepoint or struct_ops BPF program. Though they have no ABI stability guarantees, we should still document them. This patch adds a new Core kfuncs section to the BPF kfuncs doc, and adds entries for all of these task kfuncs. Note that bpf_task_kptr_get() is not documented, as it still returns NULL while we're working to resolve how it can use RCU to ensure struct task_struct * lifetime. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07selftests/bpf: convert dynptr_fail and map_kptr_fail subtests to generic testerAndrii Nakryiko5-155/+64
Convert big chunks of dynptr and map_kptr subtests to use generic verification_tester. They are switched from using manually maintained tables of test cases, specifying program name and expected error verifier message, to btf_decl_tag-based annotations directly on corresponding BPF programs: __failure to specify that BPF program is expected to fail verification, and __msg() to specify expected log message. Acked-by: John Fastabend <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07selftests/bpf: add generic BPF program tester-loaderAndrii Nakryiko4-1/+272
It's become a common pattern to have a collection of small BPF programs in one BPF object file, each representing one test case. On user-space side of such tests we maintain a table of program names and expected failure or success, along with optional expected verifier log message. This works, but each set of tests reimplement this mundane code over and over again, which is a waste of time for anyone trying to add a new set of tests. Furthermore, it's quite error prone as it's way too easy to miss some entries in these manually maintained test tables (as evidences by dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was accidentally missed; this is fixed in next patch). So this patch implements generic test_loader, which accepts skeleton name and handles the rest of details: opens and loads BPF object file, making sure each program is tested in isolation. Optionally each test case can specify expected BPF verifier log message. In case of failure, tester makes sure to report verifier log, but it also reports verifier log in verbose mode unconditionally. Now, the interesting deviation from existing custom implementations is the use of btf_decl_tag attribute to specify expected-to-fail vs expected-to-succeed markers and, optionally, expected log message directly next to BPF program source code, eliminating the need to manually create and update table of tests. We define few macros wrapping btf_decl_tag with a convention that all values of btf_decl_tag start with "comment:" prefix, and then utilizing a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to define things like expected success/failure, expected verifier message, extra verifier log level (if necessary). This approach is demonstrated by next patch in which two existing sets of failure tests are converted. Tester supports both expected-to-fail and expected-to-succeed programs, though this patch set didn't convert any existing expected-to-succeed programs yet, as existing tests couple BPF program loading with their further execution through attach or test_prog_run. One way to allow testing scenarios like this would be ability to specify custom callback, executed for each successfully loaded BPF program. This is left for follow up patches, after some more analysis of existing test cases. This test_loader is, hopefully, a start of a test_verifier-like runner, but integrated into test_progs infrastructure. It will allow much better "user experience" of defining low-level verification tests that can take advantage of all the libbpf-provided nicety features on BPF side: global variables, declarative maps, etc. All while having a choice of defining it in C or as BPF assembly (through __attribute__((naked)) functions and using embedded asm), depending on what makes most sense in each particular case. This will be explored in follow up patches as well. Acked-by: John Fastabend <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07bpf: Remove unused insn_cnt argument from visit_[func_call_]insn()Andrii Nakryiko1-6/+5
Number of total instructions in BPF program (including subprogs) can and is accessed from env->prog->len. visit_func_call_insn() doesn't do any checks against insn_cnt anymore, relying on push_insn() to do this check internally. So remove unnecessary insn_cnt input argument from visit_func_call_insn() and visit_insn() functions. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-07Merge "do not rely on ALLOW_ERROR_INJECTION for fmod_ret" into bpf-nextAlexei Starovoitov4-10/+53
Merge commit 5b481acab4ce ("bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret") from hid tree into bpf-next. Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_retBenjamin Tissoires4-10/+53
The current way of expressing that a non-bpf kernel component is willing to accept that bpf programs can be attached to it and that they can change the return value is to abuse ALLOW_ERROR_INJECTION. This is debated in the link below, and the result is that it is not a reasonable thing to do. Reuse the kfunc declaration structure to also tag the kernel functions we want to be fmodret. This way we can control from any subsystem which functions are being modified by bpf without touching the verifier. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-06net: xsk: Don't include <linux/rculist.h>Christophe JAILLET1-1/+1
There is no need to include <linux/rculist.h> here. Prefer the less invasive <linux/types.h> which is needed for 'hlist_head'. Signed-off-by: Christophe JAILLET <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/88d6a1d88764cca328610854f890a9ca1f4b029e.1670086246.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06Merge branch 'Refactor verifier prune and jump point handling'Alexei Starovoitov2-45/+64
Andrii Nakryiko says: ==================== Disentangle prune and jump points in BPF verifier code. They are conceptually independent but currently coupled together. This small patch set refactors related code and make it possible to have some instruction marked as pruning or jump point independently. Besides just conceptual cleanliness, this allows to remove unnecessary jump points (saving a tiny bit of performance and memory usage, potentially), and even more importantly it allows for clean extension of special pruning points, similarly to how it's done for BPF_FUNC_timer_set_callback. This will be used by future patches implementing open-coded BPF iterators. v1->v2: - clarified path #3 commit message and a comment in the code (John); - added back mark_jmp_point() to right after subprog call to record non-linear implicit jump from BPF_EXIT to right after CALL <subprog>. ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpf: remove unnecessary prune and jump pointsAndrii Nakryiko1-24/+10
Don't mark some instructions as jump points when there are actually no jumps and instructions are just processed sequentially. Such case is handled naturally by precision backtracking logic without the need to update jump history. See get_prev_insn_idx(). It goes back linearly by one instruction, unless current top of jmp_history is pointing to current instruction. In such case we use `st->jmp_history[cnt - 1].prev_idx` to find instruction from which we jumped to the current instruction non-linearly. Also remove both jump and prune point marking for instruction right after unconditional jumps, as program flow can get to the instruction right after unconditional jump instruction only if there is a jump to that instruction from somewhere else in the program. In such case we'll mark such instruction as prune/jump point because it's a destination of a jump. This change has no changes in terms of number of instructions or states processes across Cilium and selftests programs. Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpf: mostly decouple jump history management from is_state_visited()Andrii Nakryiko1-23/+26
Jump history updating and state equivalence checks are conceptually independent, so move push_jmp_history() out of is_state_visited(). Also make a decision whether to perform state equivalence checks or not one layer higher in do_check(), keeping is_state_visited() unconditionally performing state checks. push_jmp_history() should be performed after state checks. There is just one small non-uniformity. When is_state_visited() finds already validated equivalent state, it propagates precision marks to current state's parent chain. For this to work correctly, jump history has to be updated, so is_state_visited() is doing that internally. But if no equivalent verified state is found, jump history has to be updated in a newly cloned child state, so is_jmp_point() + push_jmp_history() is performed after is_state_visited() exited with zero result, which means "proceed with validation". This change has no functional changes. It's not strictly necessary, but feels right to decouple these two processes. Acked-by: John Fastabend <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpf: decouple prune and jump pointsAndrii Nakryiko2-14/+44
BPF verifier marks some instructions as prune points. Currently these prune points serve two purposes. It's a point where verifier tries to find previously verified state and check current state's equivalence to short circuit verification for current code path. But also currently it's a point where jump history, used for precision backtracking, is updated. This is done so that non-linear flow of execution could be properly backtracked. Such coupling is coincidental and unnecessary. Some prune points are not part of some non-linear jump path, so don't need update of jump history. On the other hand, not all instructions which have to be recorded in jump history necessarily are good prune points. This patch splits prune and jump points into independent flags. Currently all prune points are marked as jump points to minimize amount of changes in this patch, but next patch will perform some optimization of prune vs jmp point placement. No functional changes are intended. Acked-by: John Fastabend <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpf: Loosen alloc obj test in verifier's reg_btf_recordDave Marchevsky1-1/+6
btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c. There, a BTF record is created for any type containing a spin_lock or any next-gen datastructure node/head. Currently, for non-MAP_VALUE types, reg_btf_record will only search for a record using struct_meta_tab if the reg->type exactly matches (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an "allocated obj" type - returned from bpf_obj_new - might pick up other flags while working its way through the program. Loosen the check to be exact for base_type and just use MEM_ALLOC mask for type_flag. This patch is marked Fixes as the original intent of reg_btf_record was unlikely to have been to fail finding btf_record for valid alloc obj types with additional flags, some of which (e.g. PTR_UNTRUSTED) are valid register type states for alloc obj independent of this series. However, I didn't find a specific broken repro case outside of this series' added functionality, so it's possible that nothing was triggering this logic error before. Signed-off-by: Dave Marchevsky <[email protected]> cc: Kumar Kartikeya Dwivedi <[email protected]> Fixes: 4e814da0d599 ("bpf: Allow locking bpf_spin_lock in allocated objects") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpf: Don't use rcu_users to refcount in task kfuncsDavid Vernet3-30/+60
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime. A struct task_struct is refcounted by two different refcount_t fields: 1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0. 2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount. Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled. If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields. For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem. In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior. Fixes: 90660309b0c7 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa75518c ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <[email protected]> Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06Merge branch 'BPF selftests fixes'Andrii Nakryiko2-4/+6
Daan De Meyer says: ==================== This patch series fixes a few issues I've found while integrating the bpf selftests into systemd's mkosi development environment. ==================== Signed-off-by: Andrii Nakryiko <[email protected]>
2022-12-06selftests/bpf: Use CONFIG_TEST_BPF=m instead of CONFIG_TEST_BPF=yDaan De Meyer1-1/+1
CONFIG_TEST_BPF can only be a module, so let's indicate it as such in the selftests config. Signed-off-by: Daan De Meyer <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-06selftests/bpf: Use "is not set" instead of "=n"Daan De Meyer1-1/+1
"=n" is not valid kconfig syntax. Use "is not set" instead to indicate the option should be disabled. Signed-off-by: Daan De Meyer <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-06selftests/bpf: Install all required files to run selftestsDaan De Meyer1-2/+4
When installing the selftests using "make -C tools/testing/selftests install", we need to make sure all the required files to run the selftests are installed. Let's make sure this is the case. Signed-off-by: Daan De Meyer <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-06libbpf: Parse usdt args without offset on x86 (e.g. 8@(%rsp))Timo Hunziker1-0/+8
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by SystemTap. The argument syntax is similar to the existing "memory dereference case" but the offset left out as it's zero (i.e. read the value from the address in the register). We treat it the same as the the "memory dereference case", but set the offset to 0. I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).." error I've run into when attaching to a probe with such an argument. Attaching and reading the correct argument values works. Something similar might be needed for the other supported architectures. [0] Closes: https://github.com/libbpf/libbpf/issues/559 Signed-off-by: Timo Hunziker <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-06selftests/bpf: Allow building bpf tests with CONFIG_XFRM_INTERFACE=[m|n]Martin KaFai Lau1-4/+9
It is useful to use vmlinux.h in the xfrm_info test like other kfunc tests do. In particular, it is common for kfunc bpf prog that requires to use other core kernel structures in vmlinux.h Although vmlinux.h is preferred, it needs a ___local flavor of struct bpf_xfrm_info in order to build the bpf selftests when CONFIG_XFRM_INTERFACE=[m|n]. Cc: Eyal Birger <[email protected]> Fixes: 90a3a05eb33f ("selftests/bpf: add xfrm_info tests") Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-06bpftool: Fix memory leak in do_build_table_cbMiaoqian Lin1-0/+1
strdup() allocates memory for path. We need to release the memory in the following error path. Add free() to avoid memory leak. Fixes: 8f184732b60b ("bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-06riscv, bpf: Emit fixed-length instructions for BPF_PSEUDO_FUNCPu Lehui1-1/+28
For BPF_PSEUDO_FUNC instruction, verifier will refill imm with correct addresses of bpf_calls and then run last pass of JIT. Since the emit_imm of RV64 is variable-length, which will emit appropriate length instructions accorroding to the imm, it may broke ctx->offset, and lead to unpredictable problem, such as inaccurate jump. So let's fix it with fixed-length instructions. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Suggested-by: Björn Töpel <[email protected]> Signed-off-by: Pu Lehui <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-12-05Merge branch 'xfrm: interface: Add unstable helpers for XFRM metadata'Martin KaFai Lau13-2/+574
Eyal Birger says: ==================== This patch series adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. The helpers are integrated into the xfrm_interface module. For this purpose the main functionality of this module is moved to xfrm_interface_core.c. --- changes in v6: fix sparse warning in patch 2 changes in v5: - avoid cleanup of percpu dsts as detailed in patch 2 changes in v3: - tag bpf-next tree instead of ipsec-next - add IFLA_XFRM_COLLECT_METADATA sync patch ==================== Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-05selftests/bpf: add xfrm_info testsEyal Birger5-0/+403
Test the xfrm_info kfunc helpers. The test setup creates three name spaces - NS0, NS1, NS2. XFRM tunnels are setup between NS0 and the two other NSs. The kfunc helpers are used to steer traffic from NS0 to the other NSs based on a userspace populated bpf global variable and validate that the return traffic had arrived from the desired NS. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-05tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.hEyal Birger1-0/+1
Needed for XFRM metadata tests. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-05xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from ↵Eyal Birger7-2/+168
TC-BPF This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-05xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.cEyal Birger2-0/+2
This change allows adding additional files to the xfrm_interface module. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-04selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksymJames Hilliard1-3/+3
Both tolower and toupper are built in c functions, we should not redefine them as this can result in a build error. Fixes the following errors: progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 10 | static inline char tolower(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>' 4 | #include <bpf/bpf_helpers.h> +++ |+#include <ctype.h> 5 | progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 17 | static inline char toupper(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>' See background on this sort of issue: https://stackoverflow.com/a/20582607 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213 (C99, 7.1.3p1) "All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage." This is documented behavior in GCC: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2 Signed-off-by: James Hilliard <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-04bpf, sockmap: fix race in sock_map_free()Eric Dumazet1-0/+2
sock_map_free() calls release_sock(sk) without owning a reference on the socket. This can cause use-after-free as syzbot found [1] Jakub Sitnicki already took care of a similar issue in sock_hash_free() in commit 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") [1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea40565 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: events_unbound bpf_map_free_deferred RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246 RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0 RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000 RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5 R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004 R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] __sock_put include/net/sock.h:779 [inline] tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092 release_sock+0xaf/0x1c0 net/core/sock.c:3468 sock_map_free+0x219/0x2c0 net/core/sock_map.c:356 process_one_work+0x81c/0xd10 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Fixes: 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Jakub Sitnicki <[email protected]> Cc: John Fastabend <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>