From b9b738eeafe54d960ccb240fc1b0c5031a0d76f5 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 10:39:11 -0700 Subject: bpf: Update bpf_design_QA.rst to clarify that kprobes is not ABI This patch updates bpf_design_QA.rst to clarify that the ability to attach a BPF program to a given point in the kernel code via kprobes does not make that attachment point be part of the Linux kernel's ABI. Signed-off-by: Paul E. McKenney Link: https://lore.kernel.org/r/20220802173913.4170192-1-paulmck@kernel.org Signed-off-by: Alexei Starovoitov --- Documentation/bpf/bpf_design_QA.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index 437de2a7a5de..2ed9128cfbec 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -214,6 +214,12 @@ A: NO. Tracepoints are tied to internal implementation details hence they are subject to change and can break with newer kernels. BPF programs need to change accordingly when this happens. +Q: Are places where kprobes can attach part of the stable ABI? +-------------------------------------------------------------- +A: NO. The places to which kprobes can attach are internal implementation +details, which means that they are subject to change and can break with +newer kernels. BPF programs need to change accordingly when this happens. + Q: How much stack space a BPF program uses? ------------------------------------------- A: Currently all program types are limited to 512 bytes of stack -- cgit From 62fc770d90ef5a92931deb477c384f82be122a18 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 10:39:12 -0700 Subject: bpf: Update bpf_design_QA.rst to clarify that attaching to functions is not ABI This patch updates bpf_design_QA.rst to clarify that the ability to attach a BPF program to an arbitrary function in the kernel does not make that function become part of the Linux kernel's ABI. [ paulmck: Apply Daniel Borkmann feedback. ] Signed-off-by: Paul E. McKenney Link: https://lore.kernel.org/r/20220802173913.4170192-2-paulmck@kernel.org Signed-off-by: Alexei Starovoitov --- Documentation/bpf/bpf_design_QA.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index 2ed9128cfbec..a06ae8a828e3 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -279,3 +279,15 @@ cc (congestion-control) implementations. If any of these kernel functions has changed, both the in-tree and out-of-tree kernel tcp cc implementations have to be changed. The same goes for the bpf programs and they have to be adjusted accordingly. + +Q: Attaching to arbitrary kernel functions is an ABI? +----------------------------------------------------- +Q: BPF programs can be attached to many kernel functions. Do these +kernel functions become part of the ABI? + +A: NO. + +The kernel function prototypes will change, and BPF programs attaching to +them will need to change. The BPF compile-once-run-everywhere (CO-RE) +should be used in order to make it easier to adapt your BPF programs to +different versions of the kernel. -- cgit From 8fcf19696a1bdbabb7774e8d063ee4af91833402 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 10:39:13 -0700 Subject: bpf: Update bpf_design_QA.rst to clarify that BTF_ID does not ABIify a function This patch updates bpf_design_QA.rst to clarify that mentioning a function to the BTF_ID macro does not make that function become part of the Linux kernel's ABI. Suggested-by: Alexei Starovoitov Signed-off-by: Paul E. McKenney Link: https://lore.kernel.org/r/20220802173913.4170192-3-paulmck@kernel.org Signed-off-by: Alexei Starovoitov --- Documentation/bpf/bpf_design_QA.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index a06ae8a828e3..a210b8a4df00 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -291,3 +291,10 @@ The kernel function prototypes will change, and BPF programs attaching to them will need to change. The BPF compile-once-run-everywhere (CO-RE) should be used in order to make it easier to adapt your BPF programs to different versions of the kernel. + +Q: Marking a function with BTF_ID makes that function an ABI? +------------------------------------------------------------- +A: NO. + +The BTF_ID macro does not cause a function to become part of the ABI +any more than does the EXPORT_SYMBOL_GPL macro. -- cgit From e2dcac2f58f5a95ab092d1da237ffdc0da1832cf Mon Sep 17 00:00:00 2001 From: Jinghao Jia Date: Fri, 29 Jul 2022 20:17:13 +0000 Subject: BPF: Fix potential bad pointer dereference in bpf_sys_bpf() The bpf_sys_bpf() helper function allows an eBPF program to load another eBPF program from within the kernel. In this case the argument union bpf_attr pointer (as well as the insns and license pointers inside) is a kernel address instead of a userspace address (which is the case of a usual bpf() syscall). To make the memory copying process in the syscall work in both cases, bpfptr_t was introduced to wrap around the pointer and distinguish its origin. Specifically, when copying memory contents from a bpfptr_t, a copy_from_user() is performed in case of a userspace address and a memcpy() is performed for a kernel address. This can lead to problems because the in-kernel pointer is never checked for validity. The problem happens when an eBPF syscall program tries to call bpf_sys_bpf() to load a program but provides a bad insns pointer -- say 0xdeadbeef -- in the bpf_attr union. The helper calls __sys_bpf() which would then call bpf_prog_load() to load the program. bpf_prog_load() is responsible for copying the eBPF instructions to the newly allocated memory for the program; it creates a kernel bpfptr_t for insns and invokes copy_from_bpfptr(). Internally, all bpfptr_t operations are backed by the corresponding sockptr_t operations, which performs direct memcpy() on kernel pointers for copy_from/strncpy_from operations. Therefore, the code is always happy to dereference the bad pointer to trigger a un-handle-able page fault and in turn an oops. However, this is not supposed to happen because at that point the eBPF program is already verified and should not cause a memory error. Sample KASAN trace: [ 25.685056][ T228] ================================================================== [ 25.685680][ T228] BUG: KASAN: user-memory-access in copy_from_bpfptr+0x21/0x30 [ 25.686210][ T228] Read of size 80 at addr 00000000deadbeef by task poc/228 [ 25.686732][ T228] [ 25.686893][ T228] CPU: 3 PID: 228 Comm: poc Not tainted 5.19.0-rc7 #7 [ 25.687375][ T228] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014 [ 25.687991][ T228] Call Trace: [ 25.688223][ T228] [ 25.688429][ T228] dump_stack_lvl+0x73/0x9e [ 25.688747][ T228] print_report+0xea/0x200 [ 25.689061][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.689401][ T228] ? _printk+0x54/0x6e [ 25.689693][ T228] ? _raw_spin_lock_irqsave+0x70/0xd0 [ 25.690071][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.690412][ T228] kasan_report+0xb5/0xe0 [ 25.690716][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691059][ T228] kasan_check_range+0x2bd/0x2e0 [ 25.691405][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691734][ T228] memcpy+0x25/0x60 [ 25.692000][ T228] copy_from_bpfptr+0x21/0x30 [ 25.692328][ T228] bpf_prog_load+0x604/0x9e0 [ 25.692653][ T228] ? cap_capable+0xb4/0xe0 [ 25.692956][ T228] ? security_capable+0x4f/0x70 [ 25.693324][ T228] __sys_bpf+0x3af/0x580 [ 25.693635][ T228] bpf_sys_bpf+0x45/0x240 [ 25.693937][ T228] bpf_prog_f0ec79a5a3caca46_bpf_func1+0xa2/0xbd [ 25.694394][ T228] bpf_prog_run_pin_on_cpu+0x2f/0xb0 [ 25.694756][ T228] bpf_prog_test_run_syscall+0x146/0x1c0 [ 25.695144][ T228] bpf_prog_test_run+0x172/0x190 [ 25.695487][ T228] __sys_bpf+0x2c5/0x580 [ 25.695776][ T228] __x64_sys_bpf+0x3a/0x50 [ 25.696084][ T228] do_syscall_64+0x60/0x90 [ 25.696393][ T228] ? fpregs_assert_state_consistent+0x50/0x60 [ 25.696815][ T228] ? exit_to_user_mode_prepare+0x36/0xa0 [ 25.697202][ T228] ? syscall_exit_to_user_mode+0x20/0x40 [ 25.697586][ T228] ? do_syscall_64+0x6e/0x90 [ 25.697899][ T228] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 25.698312][ T228] RIP: 0033:0x7f6d543fb759 [ 25.698624][ T228] Code: 08 5b 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 a6 0e 00 f7 d8 64 89 01 48 [ 25.699946][ T228] RSP: 002b:00007ffc3df78468 EFLAGS: 00000287 ORIG_RAX: 0000000000000141 [ 25.700526][ T228] RAX: ffffffffffffffda RBX: 00007ffc3df78628 RCX: 00007f6d543fb759 [ 25.701071][ T228] RDX: 0000000000000090 RSI: 00007ffc3df78478 RDI: 000000000000000a [ 25.701636][ T228] RBP: 00007ffc3df78510 R08: 0000000000000000 R09: 0000000000300000 [ 25.702191][ T228] R10: 0000000000000005 R11: 0000000000000287 R12: 0000000000000000 [ 25.702736][ T228] R13: 00007ffc3df78638 R14: 000055a1584aca68 R15: 00007f6d5456a000 [ 25.703282][ T228] [ 25.703490][ T228] ================================================================== [ 25.704050][ T228] Disabling lock debugging due to kernel taint Update copy_from_bpfptr() and strncpy_from_bpfptr() so that: - for a kernel pointer, it uses the safe copy_from_kernel_nofault() and strncpy_from_kernel_nofault() functions. - for a userspace pointer, it performs copy_from_user() and strncpy_from_user(). Fixes: af2ac3e13e45 ("bpf: Prepare bpf syscall to be used from kernel and user space.") Link: https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/ Signed-off-by: Jinghao Jia Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220729201713.88688-1-jinghao@linux.ibm.com Signed-off-by: Alexei Starovoitov --- include/linux/bpfptr.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..79b2f78eec1a 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -49,7 +49,9 @@ static inline void bpfptr_add(bpfptr_t *bpfptr, size_t val) static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, size_t offset, size_t size) { - return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); + if (!bpfptr_is_kernel(src)) + return copy_from_user(dst, src.user + offset, size); + return copy_from_kernel_nofault(dst, src.kernel + offset, size); } static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) @@ -78,7 +80,9 @@ static inline void *kvmemdup_bpfptr(bpfptr_t src, size_t len) static inline long strncpy_from_bpfptr(char *dst, bpfptr_t src, size_t count) { - return strncpy_from_sockptr(dst, (sockptr_t) src, count); + if (bpfptr_is_kernel(src)) + return strncpy_from_kernel_nofault(dst, src.kernel, count); + return strncpy_from_user(dst, src.user, count); } #endif /* _LINUX_BPFPTR_H */ -- cgit From 62d468e5e10012e8b67d066ba7bac0a8afdc3cee Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 2 Aug 2022 15:56:51 +0200 Subject: bpf: Cleanup ftrace hash in bpf_trampoline_put We need to release possible hash from trampoline fops object before removing it, otherwise we leak it. Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)") Signed-off-by: Jiri Olsa Signed-off-by: Andrii Nakryiko Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20220802135651.1794015-1-jolsa@kernel.org --- kernel/bpf/trampoline.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 0f532e6a717f..ff87e38af8a7 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -841,7 +841,10 @@ void bpf_trampoline_put(struct bpf_trampoline *tr) * multiple rcu callbacks. */ hlist_del(&tr->hlist); - kfree(tr->fops); + if (tr->fops) { + ftrace_free_filter(tr->fops); + kfree(tr->fops); + } kfree(tr); out: mutex_unlock(&trampoline_mutex); -- cgit From f1d41f7720c89705c20e4335a807b1c518c2e7be Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 2 Aug 2022 18:33:24 +0200 Subject: mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled The btf_sock_ids array needs struct mptcp_sock BTF ID for the bpf_skc_to_mptcp_sock helper. When CONFIG_MPTCP is disabled, the 'struct mptcp_sock' is not defined and resolve_btfids will complain with: [...] BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol mptcp_sock [...] Add an empty definition for struct mptcp_sock when CONFIG_MPTCP is disabled. Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto") Signed-off-by: Jiri Olsa Signed-off-by: Daniel Borkmann Reviewed-by: Mat Martineau Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220802163324.1873044-1-jolsa@kernel.org --- include/net/mptcp.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index ac9cf7271d46..412479ebf5ad 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -291,4 +291,8 @@ struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk); static inline struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) { return NULL; } #endif +#if !IS_ENABLED(CONFIG_MPTCP) +struct mptcp_sock { }; +#endif + #endif /* __NET_MPTCP_H */ -- cgit From 6644aabbd8973a9f8008cabfd054a36b69a3a3f5 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 4 Aug 2022 13:11:39 -0700 Subject: bpf: Use proper target btf when exporting attach_btf_obj_id When attaching to program, the program itself might not be attached to anything (and, hence, might not have attach_btf), so we can't unconditionally use 'prog->aux->dst_prog->aux->attach_btf'. Instead, use bpf_prog_get_target_btf to pick proper target BTF: * when attached to dst_prog, use dst_prog->aux->btf * when attached to kernel btf, use prog->aux->attach_btf Fixes: b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP") Signed-off-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Acked-by: Hao Luo Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220804201140.1340684-1-sdf@google.com --- kernel/bpf/syscall.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 83c7136c5788..7dc3f8003631 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3886,6 +3886,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, union bpf_attr __user *uattr) { struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info); + struct btf *attach_btf = bpf_prog_get_target_btf(prog); struct bpf_prog_info info; u32 info_len = attr->info.info_len; struct bpf_prog_kstats stats; @@ -4088,10 +4089,8 @@ static int bpf_prog_get_info_by_fd(struct file *file, if (prog->aux->btf) info.btf_id = btf_obj_id(prog->aux->btf); info.attach_btf_id = prog->aux->attach_btf_id; - if (prog->aux->attach_btf) - info.attach_btf_obj_id = btf_obj_id(prog->aux->attach_btf); - else if (prog->aux->dst_prog) - info.attach_btf_obj_id = btf_obj_id(prog->aux->dst_prog->aux->attach_btf); + if (attach_btf) + info.attach_btf_obj_id = btf_obj_id(attach_btf); ulen = info.nr_func_info; info.nr_func_info = prog->aux->func_info_cnt; -- cgit From ffd5cfca5388e9d7c8386343763df41315ac1dd2 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 4 Aug 2022 13:11:40 -0700 Subject: selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf Apparently, no existing selftest covers it. Add a new one where we load cgroup/bind4 program and attach fentry to it. Calling bpf_obj_get_info_by_fd on the fentry program should return non-zero btf_id/btf_obj_id instead of crashing the kernel. Signed-off-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220804201140.1340684-2-sdf@google.com --- .../selftests/bpf/prog_tests/fexit_bpf2bpf.c | 95 ++++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c index 02bb8cbf9194..da860b07abb5 100644 --- a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c +++ b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c @@ -3,6 +3,7 @@ #include #include #include +#include "bind4_prog.skel.h" typedef int (*test_cb)(struct bpf_object *obj); @@ -407,6 +408,98 @@ static void test_func_replace_global_func(void) prog_name, false, NULL); } +static int find_prog_btf_id(const char *name, __u32 attach_prog_fd) +{ + struct bpf_prog_info info = {}; + __u32 info_len = sizeof(info); + struct btf *btf; + int ret; + + ret = bpf_obj_get_info_by_fd(attach_prog_fd, &info, &info_len); + if (ret) + return ret; + + if (!info.btf_id) + return -EINVAL; + + btf = btf__load_from_kernel_by_id(info.btf_id); + ret = libbpf_get_error(btf); + if (ret) + return ret; + + ret = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC); + btf__free(btf); + return ret; +} + +static int load_fentry(int attach_prog_fd, int attach_btf_id) +{ + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .expected_attach_type = BPF_TRACE_FENTRY, + .attach_prog_fd = attach_prog_fd, + .attach_btf_id = attach_btf_id, + ); + struct bpf_insn insns[] = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + + return bpf_prog_load(BPF_PROG_TYPE_TRACING, + "bind4_fentry", + "GPL", + insns, + ARRAY_SIZE(insns), + &opts); +} + +static void test_fentry_to_cgroup_bpf(void) +{ + struct bind4_prog *skel = NULL; + struct bpf_prog_info info = {}; + __u32 info_len = sizeof(info); + int cgroup_fd = -1; + int fentry_fd = -1; + int btf_id; + + cgroup_fd = test__join_cgroup("/fentry_to_cgroup_bpf"); + if (!ASSERT_GE(cgroup_fd, 0, "cgroup_fd")) + return; + + skel = bind4_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel")) + goto cleanup; + + skel->links.bind_v4_prog = bpf_program__attach_cgroup(skel->progs.bind_v4_prog, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.bind_v4_prog, "bpf_program__attach_cgroup")) + goto cleanup; + + btf_id = find_prog_btf_id("bind_v4_prog", bpf_program__fd(skel->progs.bind_v4_prog)); + if (!ASSERT_GE(btf_id, 0, "find_prog_btf_id")) + goto cleanup; + + fentry_fd = load_fentry(bpf_program__fd(skel->progs.bind_v4_prog), btf_id); + if (!ASSERT_GE(fentry_fd, 0, "load_fentry")) + goto cleanup; + + /* Make sure bpf_obj_get_info_by_fd works correctly when attaching + * to another BPF program. + */ + + ASSERT_OK(bpf_obj_get_info_by_fd(fentry_fd, &info, &info_len), + "bpf_obj_get_info_by_fd"); + + ASSERT_EQ(info.btf_id, 0, "info.btf_id"); + ASSERT_EQ(info.attach_btf_id, btf_id, "info.attach_btf_id"); + ASSERT_GT(info.attach_btf_obj_id, 0, "info.attach_btf_obj_id"); + +cleanup: + if (cgroup_fd >= 0) + close(cgroup_fd); + if (fentry_fd >= 0) + close(fentry_fd); + bind4_prog__destroy(skel); +} + /* NOTE: affect other tests, must run in serial mode */ void serial_test_fexit_bpf2bpf(void) { @@ -430,4 +523,6 @@ void serial_test_fexit_bpf2bpf(void) test_fmod_ret_freplace(); if (test__start_subtest("func_replace_global_func")) test_func_replace_global_func(); + if (test__start_subtest("fentry_to_cgroup_bpf")) + test_fentry_to_cgroup_bpf(); } -- cgit From 19f68ed6dc90c93daf7e54d3350ea67fead7cbbf Mon Sep 17 00:00:00 2001 From: Aijun Sun Date: Thu, 4 Aug 2022 10:54:42 +0800 Subject: bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc It is not necessary to allocate contiguous physical memory for BPF program buffer using kcalloc. When the BPF program is large more than memory page size, kcalloc allocates multiple memory pages from buddy system. If the device can not provide sufficient memory, for example in low-end android devices [0], memory allocation for BPF program is likely to fail. Test cases in lib/test_bpf.c all pass on ARM64 QEMU. [0] AndroidTestSuit: page allocation failure: order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=foreground,mems_allowed=0 Call trace: dump_stack+0xa4/0x114 warn_alloc+0xf8/0x14c __alloc_pages_slowpath+0xac8/0xb14 __alloc_pages_nodemask+0x194/0x3d0 kmalloc_order_trace+0x44/0x1e8 __kmalloc+0x29c/0x66c bpf_int_jit_compile+0x17c/0x568 bpf_prog_select_runtime+0x4c/0x1b0 bpf_prepare_filter+0x5fc/0x6bc bpf_prog_create_from_user+0x118/0x1c0 seccomp_set_mode_filter+0x1c4/0x7cc __do_sys_prctl+0x380/0x1424 __arm64_sys_prctl+0x20/0x2c el0_svc_common+0xc8/0x22c el0_svc_handler+0x1c/0x28 el0_svc+0x8/0x100 Signed-off-by: Aijun Sun Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220804025442.22524-1-aijun.sun@unisoc.com --- arch/arm64/net/bpf_jit_comp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 7ca8779ae34f..40aa3e744beb 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1496,7 +1496,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) memset(&ctx, 0, sizeof(ctx)); ctx.prog = prog; - ctx.offset = kcalloc(prog->len + 1, sizeof(int), GFP_KERNEL); + ctx.offset = kvcalloc(prog->len + 1, sizeof(int), GFP_KERNEL); if (ctx.offset == NULL) { prog = orig_prog; goto out_off; @@ -1601,7 +1601,7 @@ skip_init_ctx: ctx.offset[i] *= AARCH64_INSN_SIZE; bpf_prog_fill_jited_linfo(prog, ctx.offset + 1); out_off: - kfree(ctx.offset); + kvfree(ctx.offset); kfree(jit_data); prog->aux->jit_data = NULL; } -- cgit From 1f0752628e76bb3a02109dbd980381f27060ba92 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Tue, 9 Aug 2022 23:30:31 +0200 Subject: bpf: Allow calling bpf_prog_test kfuncs in tracing programs In addition to TC hook, enable these in tracing programs so that they can be used in selftests. Acked-by: Yonghong Song Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- net/bpf/test_run.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index cbc9cd5058cb..d11209367dd0 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1628,6 +1628,7 @@ static int __init bpf_prog_test_run_init(void) int ret; ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_prog_test_kfunc_set); return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc, ARRAY_SIZE(bpf_prog_test_dtor_kfunc), THIS_MODULE); -- cgit From 275c30bcee66a27d1aa97a215d607ad6d49804cb Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Tue, 9 Aug 2022 23:30:32 +0200 Subject: bpf: Don't reinit map value in prealloc_lru_pop The LRU map that is preallocated may have its elements reused while another program holds a pointer to it from bpf_map_lookup_elem. Hence, only check_and_free_fields is appropriate when the element is being deleted, as it ensures proper synchronization against concurrent access of the map value. After that, we cannot call check_and_init_map_value again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while they can be concurrently accessed from a BPF program. This is safe to do as when the map entry is deleted, concurrent access is protected against by check_and_free_fields, i.e. an existing timer would be freed, and any existing kptr will be released by it. The program can create further timers and kptrs after check_and_free_fields, but they will eventually be released once the preallocated items are freed on map destruction, even if the item is never reused again. Hence, the deleted item sitting in the free list can still have resources attached to it, and they would never leak. With spin_lock, we never touch the field at all on delete or update, as we may end up modifying the state of the lock. Since the verifier ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock call, the program will eventually release the lock so that on reuse the new user of the value can take the lock. Essentially, for the preallocated case, we must assume that the map value may always be in use by the program, even when it is sitting in the freelist, and handle things accordingly, i.e. use proper synchronization inside check_and_free_fields, and never reinitialize the special fields when it is reused on update. Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Acked-by: Yonghong Song Signed-off-by: Kumar Kartikeya Dwivedi Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220809213033.24147-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index da7578426a46..4d793a92301b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -311,12 +311,8 @@ static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key, struct htab_elem *l; if (node) { - u32 key_size = htab->map.key_size; - l = container_of(node, struct htab_elem, lru_node); - memcpy(l->key, key, key_size); - check_and_init_map_value(&htab->map, - l->key + round_up(key_size, 8)); + memcpy(l->key, key, htab->map.key_size); return l; } -- cgit From de7b9927105bd2afe940c6ad22de6938edd8b1c1 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Tue, 9 Aug 2022 23:30:33 +0200 Subject: selftests/bpf: Add test for prealloc_lru_pop bug Add a regression test to check against invalid check_and_init_map_value call inside prealloc_lru_pop. The kptr should not be reset to NULL once we set it after deleting the map element. Hence, we trigger a program that updates the element causing its reuse, and checks whether the unref kptr is reset or not. If it is, prealloc_lru_pop does an incorrect check_and_init_map_value call and the test fails. Acked-by: Yonghong Song Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220809213033.24147-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/lru_bug.c | 21 ++++++++++ tools/testing/selftests/bpf/progs/lru_bug.c | 49 ++++++++++++++++++++++++ 2 files changed, 70 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/lru_bug.c create mode 100644 tools/testing/selftests/bpf/progs/lru_bug.c diff --git a/tools/testing/selftests/bpf/prog_tests/lru_bug.c b/tools/testing/selftests/bpf/prog_tests/lru_bug.c new file mode 100644 index 000000000000..3c7822390827 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lru_bug.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +#include "lru_bug.skel.h" + +void test_lru_bug(void) +{ + struct lru_bug *skel; + int ret; + + skel = lru_bug__open_and_load(); + if (!ASSERT_OK_PTR(skel, "lru_bug__open_and_load")) + return; + ret = lru_bug__attach(skel); + if (!ASSERT_OK(ret, "lru_bug__attach")) + goto end; + usleep(1); + ASSERT_OK(skel->data->result, "prealloc_lru_pop doesn't call check_and_init_map_value"); +end: + lru_bug__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/lru_bug.c b/tools/testing/selftests/bpf/progs/lru_bug.c new file mode 100644 index 000000000000..687081a724b3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/lru_bug.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +struct map_value { + struct task_struct __kptr *ptr; +}; + +struct { + __uint(type, BPF_MAP_TYPE_LRU_HASH); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct map_value); +} lru_map SEC(".maps"); + +int pid = 0; +int result = 1; + +SEC("fentry/bpf_ktime_get_ns") +int printk(void *ctx) +{ + struct map_value v = {}; + + if (pid == bpf_get_current_task_btf()->pid) + bpf_map_update_elem(&lru_map, &(int){0}, &v, 0); + return 0; +} + +SEC("fentry/do_nanosleep") +int nanosleep(void *ctx) +{ + struct map_value val = {}, *v; + struct task_struct *current; + + bpf_map_update_elem(&lru_map, &(int){0}, &val, 0); + v = bpf_map_lookup_elem(&lru_map, &(int){0}); + if (!v) + return 0; + bpf_map_delete_elem(&lru_map, &(int){0}); + current = bpf_get_current_task_btf(); + v->ptr = current; + pid = current->pid; + bpf_ktime_get_ns(); + result = !v->ptr; + return 0; +} + +char _license[] SEC("license") = "GPL"; -- cgit From aada476655461a9ab491d8298a415430cdd10278 Mon Sep 17 00:00:00 2001 From: Xu Kuohai Date: Mon, 8 Aug 2022 00:07:35 -0400 Subject: bpf, arm64: Fix bpf trampoline instruction endianness The sparse tool complains as follows: arch/arm64/net/bpf_jit_comp.c:1684:16: warning: incorrect type in assignment (different base types) arch/arm64/net/bpf_jit_comp.c:1684:16: expected unsigned int [usertype] *branch arch/arm64/net/bpf_jit_comp.c:1684:16: got restricted __le32 [usertype] * arch/arm64/net/bpf_jit_comp.c:1700:52: error: subtraction of different types can't work (different base types) arch/arm64/net/bpf_jit_comp.c:1734:29: warning: incorrect type in assignment (different base types) arch/arm64/net/bpf_jit_comp.c:1734:29: expected unsigned int [usertype] * arch/arm64/net/bpf_jit_comp.c:1734:29: got restricted __le32 [usertype] * arch/arm64/net/bpf_jit_comp.c:1918:52: error: subtraction of different types can't work (different base types) This is because the variable branch in function invoke_bpf_prog and the variable branches in function prepare_trampoline are defined as type u32 *, which conflicts with ctx->image's type __le32 *, so sparse complains when assignment or arithmetic operation are performed on these two variables and ctx->image. Since arm64 instructions are always little-endian, change the type of these two variables to __le32 * and call cpu_to_le32() to convert instruction to little-endian before writing it to memory. This is also in line with emit() which internally does cpu_to_le32(), too. Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Reported-by: kernel test robot Signed-off-by: Xu Kuohai Signed-off-by: Daniel Borkmann Reviewed-by: Jean-Philippe Brucker Link: https://lore.kernel.org/bpf/20220808040735.1232002-1-xukuohai@huawei.com --- arch/arm64/net/bpf_jit_comp.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 40aa3e744beb..389623ae5a91 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1643,7 +1643,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, int args_off, int retval_off, int run_ctx_off, bool save_ret) { - u32 *branch; + __le32 *branch; u64 enter_prog; u64 exit_prog; struct bpf_prog *p = l->link.prog; @@ -1698,7 +1698,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, if (ctx->image) { int offset = &ctx->image[ctx->idx] - branch; - *branch = A64_CBZ(1, A64_R(0), offset); + *branch = cpu_to_le32(A64_CBZ(1, A64_R(0), offset)); } /* arg1: prog */ @@ -1713,7 +1713,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl, int args_off, int retval_off, int run_ctx_off, - u32 **branches) + __le32 **branches) { int i; @@ -1784,7 +1784,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; bool save_ret; - u32 **branches = NULL; + __le32 **branches = NULL; /* trampoline stack layout: * [ parent ip ] @@ -1892,7 +1892,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, flags & BPF_TRAMP_F_RET_FENTRY_RET); if (fmod_ret->nr_links) { - branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *), + branches = kcalloc(fmod_ret->nr_links, sizeof(__le32 *), GFP_KERNEL); if (!branches) return -ENOMEM; @@ -1916,7 +1916,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, /* update the branches saved in invoke_bpf_mod_ret with cbnz */ for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) { int offset = &ctx->image[ctx->idx] - branches[i]; - *branches[i] = A64_CBNZ(1, A64_R(10), offset); + *branches[i] = cpu_to_le32(A64_CBNZ(1, A64_R(10), offset)); } for (i = 0; i < fexit->nr_links; i++) -- cgit From 86f44fcec22ce2979507742bc53db8400e454f46 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Mon, 8 Aug 2022 20:58:09 -0700 Subject: bpf: Disallow bpf programs call prog_run command. The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN command from within the program. To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf() kernel function that can only be used by the kernel light skeleton directly. Reported-by: YiFei Zhu Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.") Signed-off-by: Alexei Starovoitov --- kernel/bpf/syscall.c | 20 ++++++++++++++------ tools/lib/bpf/skel_internal.h | 4 ++-- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7dc3f8003631..a1cb0bdc5ad6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5071,9 +5071,6 @@ static bool syscall_prog_is_valid_access(int off, int size, BPF_CALL_3(bpf_sys_bpf, int, cmd, union bpf_attr *, attr, u32, attr_size) { - struct bpf_prog * __maybe_unused prog; - struct bpf_tramp_run_ctx __maybe_unused run_ctx; - switch (cmd) { case BPF_MAP_CREATE: case BPF_MAP_UPDATE_ELEM: @@ -5083,6 +5080,18 @@ BPF_CALL_3(bpf_sys_bpf, int, cmd, union bpf_attr *, attr, u32, attr_size) case BPF_LINK_CREATE: case BPF_RAW_TRACEPOINT_OPEN: break; + default: + return -EINVAL; + } + return __sys_bpf(cmd, KERNEL_BPFPTR(attr), attr_size); +} + +int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) +{ + struct bpf_prog * __maybe_unused prog; + struct bpf_tramp_run_ctx __maybe_unused run_ctx; + + switch (cmd) { #ifdef CONFIG_BPF_JIT /* __bpf_prog_enter_sleepable used by trampoline and JIT */ case BPF_PROG_TEST_RUN: if (attr->test.data_in || attr->test.data_out || @@ -5113,11 +5122,10 @@ BPF_CALL_3(bpf_sys_bpf, int, cmd, union bpf_attr *, attr, u32, attr_size) return 0; #endif default: - return -EINVAL; + return ____bpf_sys_bpf(cmd, attr, size); } - return __sys_bpf(cmd, KERNEL_BPFPTR(attr), attr_size); } -EXPORT_SYMBOL(bpf_sys_bpf); +EXPORT_SYMBOL(kern_sys_bpf); static const struct bpf_func_proto bpf_sys_bpf_proto = { .func = bpf_sys_bpf, diff --git a/tools/lib/bpf/skel_internal.h b/tools/lib/bpf/skel_internal.h index bd6f4505e7b1..70adf7b119b9 100644 --- a/tools/lib/bpf/skel_internal.h +++ b/tools/lib/bpf/skel_internal.h @@ -66,13 +66,13 @@ struct bpf_load_and_run_opts { const char *errstr; }; -long bpf_sys_bpf(__u32 cmd, void *attr, __u32 attr_size); +long kern_sys_bpf(__u32 cmd, void *attr, __u32 attr_size); static inline int skel_sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size) { #ifdef __KERNEL__ - return bpf_sys_bpf(cmd, attr, size); + return kern_sys_bpf(cmd, attr, size); #else return syscall(__NR_bpf, cmd, attr, size); #endif -- cgit From f76fa6b338055054f80c72b29c97fb95c1becadc Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:30 +0800 Subject: bpf: Acquire map uref in .init_seq_private for array map iterator bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Alternative fix is acquiring an extra bpf_link reference just like a pinned map iterator does, but it introduces unnecessary dependency on bpf_link instead of bpf_map. So choose another fix: acquiring an extra map uref in .init_seq_private for array map iterator. Fixes: d3cc2ab546ad ("bpf: Implement bpf iterator for array maps") Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/arraymap.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index d3e734bf8056..624527401d4d 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -649,6 +649,11 @@ static int bpf_iter_init_array_map(void *priv_data, seq_info->percpu_value_buf = value_buf; } + /* bpf_iter_attach_map() acquires a map uref, and the uref may be + * released before or in the middle of iterating map elements, so + * acquire an extra map uref for iterator. + */ + bpf_map_inc_with_uref(map); seq_info->map = map; return 0; } @@ -657,6 +662,7 @@ static void bpf_iter_fini_array_map(void *priv_data) { struct bpf_iter_seq_array_map_info *seq_info = priv_data; + bpf_map_put_with_uref(seq_info->map); kfree(seq_info->percpu_value_buf); } -- cgit From ef1e93d2eeb58a1f08c37b22a2314b94bc045f15 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:31 +0800 Subject: bpf: Acquire map uref in .init_seq_private for hash map iterator bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_hash_map() and releasing it in bpf_iter_fini_hash_map(). Fixes: d6c4503cc296 ("bpf: Implement bpf iterator for hash maps") Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 4d793a92301b..6c530a5e560a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -2060,6 +2060,7 @@ static int bpf_iter_init_hash_map(void *priv_data, seq_info->percpu_value_buf = value_buf; } + bpf_map_inc_with_uref(map); seq_info->map = map; seq_info->htab = container_of(map, struct bpf_htab, map); return 0; @@ -2069,6 +2070,7 @@ static void bpf_iter_fini_hash_map(void *priv_data) { struct bpf_iter_seq_hash_map_info *seq_info = priv_data; + bpf_map_put_with_uref(seq_info->map); kfree(seq_info->percpu_value_buf); } -- cgit From 3c5f6e698b5c538bbb23cd453b22e1e4922cffd8 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:32 +0800 Subject: bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and releasing it in bpf_iter_fini_sk_storage_map(). Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao Acked-by: Yonghong Song Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- net/core/bpf_sk_storage.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index a25ec93729b9..83b89ba824d7 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -875,10 +875,18 @@ static int bpf_iter_init_sk_storage_map(void *priv_data, { struct bpf_iter_seq_sk_storage_map_info *seq_info = priv_data; + bpf_map_inc_with_uref(aux->map); seq_info->map = aux->map; return 0; } +static void bpf_iter_fini_sk_storage_map(void *priv_data) +{ + struct bpf_iter_seq_sk_storage_map_info *seq_info = priv_data; + + bpf_map_put_with_uref(seq_info->map); +} + static int bpf_iter_attach_map(struct bpf_prog *prog, union bpf_iter_link_info *linfo, struct bpf_iter_aux_info *aux) @@ -924,7 +932,7 @@ static const struct seq_operations bpf_sk_storage_map_seq_ops = { static const struct bpf_iter_seq_info iter_seq_info = { .seq_ops = &bpf_sk_storage_map_seq_ops, .init_seq_private = bpf_iter_init_sk_storage_map, - .fini_seq_private = NULL, + .fini_seq_private = bpf_iter_fini_sk_storage_map, .seq_priv_size = sizeof(struct bpf_iter_seq_sk_storage_map_info), }; -- cgit From f0d2b2716d71778d0b0c8eaa433c073287d69d93 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:33 +0800 Subject: bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator sock_map_iter_attach_target() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in sock_map_iter_detach_target() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Fixing it by acquiring an extra map uref in .init_seq_private and releasing it in .fini_seq_private. Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash") Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- net/core/sock_map.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 028813dfecb0..9a9fb9487d63 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -783,13 +783,22 @@ static int sock_map_init_seq_private(void *priv_data, { struct sock_map_seq_info *info = priv_data; + bpf_map_inc_with_uref(aux->map); info->map = aux->map; return 0; } +static void sock_map_fini_seq_private(void *priv_data) +{ + struct sock_map_seq_info *info = priv_data; + + bpf_map_put_with_uref(info->map); +} + static const struct bpf_iter_seq_info sock_map_iter_seq_info = { .seq_ops = &sock_map_seq_ops, .init_seq_private = sock_map_init_seq_private, + .fini_seq_private = sock_map_fini_seq_private, .seq_priv_size = sizeof(struct sock_map_seq_info), }; @@ -1369,18 +1378,27 @@ static const struct seq_operations sock_hash_seq_ops = { }; static int sock_hash_init_seq_private(void *priv_data, - struct bpf_iter_aux_info *aux) + struct bpf_iter_aux_info *aux) { struct sock_hash_seq_info *info = priv_data; + bpf_map_inc_with_uref(aux->map); info->map = aux->map; info->htab = container_of(aux->map, struct bpf_shtab, map); return 0; } +static void sock_hash_fini_seq_private(void *priv_data) +{ + struct sock_hash_seq_info *info = priv_data; + + bpf_map_put_with_uref(info->map); +} + static const struct bpf_iter_seq_info sock_hash_iter_seq_info = { .seq_ops = &sock_hash_seq_ops, .init_seq_private = sock_hash_init_seq_private, + .fini_seq_private = sock_hash_fini_seq_private, .seq_priv_size = sizeof(struct sock_hash_seq_info), }; -- cgit From 52bd05eb7c88e1ad8541a48873188ccebca9da26 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:34 +0800 Subject: bpf: Check the validity of max_rdwr_access for sock local storage map iterator The value of sock local storage map is writable in map iterator, so check max_rdwr_access instead of max_rdonly_access. Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao Acked-by: Yonghong Song Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- net/core/bpf_sk_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 83b89ba824d7..1b7f385643b4 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -904,7 +904,7 @@ static int bpf_iter_attach_map(struct bpf_prog *prog, if (map->map_type != BPF_MAP_TYPE_SK_STORAGE) goto put_map; - if (prog->aux->max_rdonly_access > map->value_size) { + if (prog->aux->max_rdwr_access > map->value_size) { err = -EACCES; goto put_map; } -- cgit From d247049f4fd088e4e40294819a932a6057b3632c Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:35 +0800 Subject: bpf: Only allow sleepable program for resched-able iterator When a sleepable program is attached to a hash map iterator, might_fault() will report "BUG: sleeping function called from invalid context..." if CONFIG_DEBUG_ATOMIC_SLEEP is enabled. The reason is that rcu_read_lock() is held in bpf_hash_map_seq_next() and won't be released until all elements are traversed or bpf_hash_map_seq_stop() is called. Fixing it by reusing BPF_ITER_RESCHED to indicate that only non-sleepable program is allowed for iterator without BPF_ITER_RESCHED. We can revise bpf_iter_link_attach() later if there are other conditions which may cause rcu_read_lock() or spin_lock() issues. Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-7-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/bpf_iter.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 2726a5950cfa..24b755eca0b3 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -68,13 +68,18 @@ static void bpf_iter_done_stop(struct seq_file *seq) iter_priv->done_stop = true; } +static inline bool bpf_iter_target_support_resched(const struct bpf_iter_target_info *tinfo) +{ + return tinfo->reg_info->feature & BPF_ITER_RESCHED; +} + static bool bpf_iter_support_resched(struct seq_file *seq) { struct bpf_iter_priv_data *iter_priv; iter_priv = container_of(seq->private, struct bpf_iter_priv_data, target_private); - return iter_priv->tinfo->reg_info->feature & BPF_ITER_RESCHED; + return bpf_iter_target_support_resched(iter_priv->tinfo); } /* maximum visited objects before bailing out */ @@ -537,6 +542,10 @@ int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, if (!tinfo) return -ENOENT; + /* Only allow sleepable program for resched-able iterator */ + if (prog->aux->sleepable && !bpf_iter_target_support_resched(tinfo)) + return -EINVAL; + link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN); if (!link) return -ENOMEM; -- cgit From 5836d81e4b039429f409007ba7e0235391537397 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:36 +0800 Subject: selftests/bpf: Add tests for reading a dangling map iter fd After closing both related link fd and map fd, reading the map iterator fd to ensure it is OK to do so. Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 92 +++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index a33874b081b6..b690c9e9d346 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -28,6 +28,7 @@ #include "bpf_iter_test_kern6.skel.h" #include "bpf_iter_bpf_link.skel.h" #include "bpf_iter_ksym.skel.h" +#include "bpf_iter_sockmap.skel.h" static int duration; @@ -67,6 +68,50 @@ free_link: bpf_link__destroy(link); } +static void do_read_map_iter_fd(struct bpf_object_skeleton **skel, struct bpf_program *prog, + struct bpf_map *map) +{ + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + union bpf_iter_link_info linfo; + struct bpf_link *link; + char buf[16] = {}; + int iter_fd, len; + + memset(&linfo, 0, sizeof(linfo)); + linfo.map.map_fd = bpf_map__fd(map); + opts.link_info = &linfo; + opts.link_info_len = sizeof(linfo); + link = bpf_program__attach_iter(prog, &opts); + if (!ASSERT_OK_PTR(link, "attach_map_iter")) + return; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create_map_iter")) { + bpf_link__destroy(link); + return; + } + + /* Close link and map fd prematurely */ + bpf_link__destroy(link); + bpf_object__destroy_skeleton(*skel); + *skel = NULL; + + /* Try to let map free work to run first if map is freed */ + usleep(100); + /* Memory used by both sock map and sock local storage map are + * freed after two synchronize_rcu() calls, so wait for it + */ + kern_sync_rcu(); + kern_sync_rcu(); + + /* Read after both map fd and link fd are closed */ + while ((len = read(iter_fd, buf, sizeof(buf))) > 0) + ; + ASSERT_GE(len, 0, "read_iterator"); + + close(iter_fd); +} + static int read_fd_into_buffer(int fd, char *buf, int size) { int bufleft = size; @@ -827,6 +872,20 @@ out: bpf_iter_bpf_array_map__destroy(skel); } +static void test_bpf_array_map_iter_fd(void) +{ + struct bpf_iter_bpf_array_map *skel; + + skel = bpf_iter_bpf_array_map__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_array_map__open_and_load")) + return; + + do_read_map_iter_fd(&skel->skeleton, skel->progs.dump_bpf_array_map, + skel->maps.arraymap1); + + bpf_iter_bpf_array_map__destroy(skel); +} + static void test_bpf_percpu_array_map(void) { DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); @@ -1009,6 +1068,20 @@ out: bpf_iter_bpf_sk_storage_helpers__destroy(skel); } +static void test_bpf_sk_stoarge_map_iter_fd(void) +{ + struct bpf_iter_bpf_sk_storage_map *skel; + + skel = bpf_iter_bpf_sk_storage_map__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_sk_storage_map__open_and_load")) + return; + + do_read_map_iter_fd(&skel->skeleton, skel->progs.dump_bpf_sk_storage_map, + skel->maps.sk_stg_map); + + bpf_iter_bpf_sk_storage_map__destroy(skel); +} + static void test_bpf_sk_storage_map(void) { DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); @@ -1217,6 +1290,19 @@ out: bpf_iter_task_vma__destroy(skel); } +void test_bpf_sockmap_map_iter_fd(void) +{ + struct bpf_iter_sockmap *skel; + + skel = bpf_iter_sockmap__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_sockmap__open_and_load")) + return; + + do_read_map_iter_fd(&skel->skeleton, skel->progs.copy, skel->maps.sockmap); + + bpf_iter_sockmap__destroy(skel); +} + void test_bpf_iter(void) { if (test__start_subtest("btf_id_or_null")) @@ -1267,10 +1353,14 @@ void test_bpf_iter(void) test_bpf_percpu_hash_map(); if (test__start_subtest("bpf_array_map")) test_bpf_array_map(); + if (test__start_subtest("bpf_array_map_iter_fd")) + test_bpf_array_map_iter_fd(); if (test__start_subtest("bpf_percpu_array_map")) test_bpf_percpu_array_map(); if (test__start_subtest("bpf_sk_storage_map")) test_bpf_sk_storage_map(); + if (test__start_subtest("bpf_sk_storage_map_iter_fd")) + test_bpf_sk_stoarge_map_iter_fd(); if (test__start_subtest("bpf_sk_storage_delete")) test_bpf_sk_storage_delete(); if (test__start_subtest("bpf_sk_storage_get")) @@ -1283,4 +1373,6 @@ void test_bpf_iter(void) test_link_iter(); if (test__start_subtest("ksym")) test_ksym_iter(); + if (test__start_subtest("bpf_sockmap_map_iter_fd")) + test_bpf_sockmap_map_iter_fd(); } -- cgit From 939a1a946d755c1565e18694e278e9ba7ba19ccc Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:37 +0800 Subject: selftests/bpf: Add write tests for sk local storage map iterator Add test to validate the overwrite of sock local storage map value in map iterator and another one to ensure out-of-bound value writing is rejected. Signed-off-by: Hou Tao Acked-by: Yonghong Song Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 20 ++++++++++++++++++-- .../bpf/progs/bpf_iter_bpf_sk_storage_map.c | 22 ++++++++++++++++++++-- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index b690c9e9d346..1571a6586b3b 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -1076,7 +1076,7 @@ static void test_bpf_sk_stoarge_map_iter_fd(void) if (!ASSERT_OK_PTR(skel, "bpf_iter_bpf_sk_storage_map__open_and_load")) return; - do_read_map_iter_fd(&skel->skeleton, skel->progs.dump_bpf_sk_storage_map, + do_read_map_iter_fd(&skel->skeleton, skel->progs.rw_bpf_sk_storage_map, skel->maps.sk_stg_map); bpf_iter_bpf_sk_storage_map__destroy(skel); @@ -1117,7 +1117,15 @@ static void test_bpf_sk_storage_map(void) linfo.map.map_fd = map_fd; opts.link_info = &linfo; opts.link_info_len = sizeof(linfo); - link = bpf_program__attach_iter(skel->progs.dump_bpf_sk_storage_map, &opts); + link = bpf_program__attach_iter(skel->progs.oob_write_bpf_sk_storage_map, &opts); + err = libbpf_get_error(link); + if (!ASSERT_EQ(err, -EACCES, "attach_oob_write_iter")) { + if (!err) + bpf_link__destroy(link); + goto out; + } + + link = bpf_program__attach_iter(skel->progs.rw_bpf_sk_storage_map, &opts); if (!ASSERT_OK_PTR(link, "attach_iter")) goto out; @@ -1125,6 +1133,7 @@ static void test_bpf_sk_storage_map(void) if (!ASSERT_GE(iter_fd, 0, "create_iter")) goto free_link; + skel->bss->to_add_val = time(NULL); /* do some tests */ while ((len = read(iter_fd, buf, sizeof(buf))) > 0) ; @@ -1138,6 +1147,13 @@ static void test_bpf_sk_storage_map(void) if (!ASSERT_EQ(skel->bss->val_sum, expected_val, "val_sum")) goto close_iter; + for (i = 0; i < num_sockets; i++) { + err = bpf_map_lookup_elem(map_fd, &sock_fd[i], &val); + if (!ASSERT_OK(err, "map_lookup") || + !ASSERT_EQ(val, i + 1 + skel->bss->to_add_val, "check_map_value")) + break; + } + close_iter: close(iter_fd); free_link: diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c index 6b70ccaba301..c7b8e006b171 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c @@ -16,19 +16,37 @@ struct { __u32 val_sum = 0; __u32 ipv6_sk_count = 0; +__u32 to_add_val = 0; SEC("iter/bpf_sk_storage_map") -int dump_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx) +int rw_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx) { struct sock *sk = ctx->sk; __u32 *val = ctx->value; - if (sk == (void *)0 || val == (void *)0) + if (sk == NULL || val == NULL) return 0; if (sk->sk_family == AF_INET6) ipv6_sk_count++; val_sum += *val; + + *val += to_add_val; + + return 0; +} + +SEC("iter/bpf_sk_storage_map") +int oob_write_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx) +{ + struct sock *sk = ctx->sk; + __u32 *val = ctx->value; + + if (sk == NULL || val == NULL) + return 0; + + *(val + 1) = 0xdeadbeef; + return 0; } -- cgit From c5c0981fd81d35233d625631f13000544c108c53 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 10 Aug 2022 16:05:38 +0800 Subject: selftests/bpf: Ensure sleepable program is rejected by hash map iter Add a test to ensure sleepable program is rejected by hash map iterator. Signed-off-by: Hou Tao Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220810080538.1845898-10-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 6 ++++++ tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c | 9 +++++++++ 2 files changed, 15 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 1571a6586b3b..e89685bd587c 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -679,6 +679,12 @@ static void test_bpf_hash_map(void) goto out; } + /* Sleepable program is prohibited for hash map iterator */ + linfo.map.map_fd = map_fd; + link = bpf_program__attach_iter(skel->progs.sleepable_dummy_dump, &opts); + if (!ASSERT_ERR_PTR(link, "attach_sleepable_prog_to_iter")) + goto out; + linfo.map.map_fd = map_fd; link = bpf_program__attach_iter(skel->progs.dump_bpf_hash_map, &opts); if (!ASSERT_OK_PTR(link, "attach_iter")) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c index 0aa3cd34cbe3..d7a69217fb68 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c @@ -112,3 +112,12 @@ int dump_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx) return 0; } + +SEC("iter.s/bpf_map_elem") +int sleepable_dummy_dump(struct bpf_iter__bpf_map_elem *ctx) +{ + if (ctx->meta->seq_num == 0) + BPF_SEQ_PRINTF(ctx->meta->seq, "map dump starts\n"); + + return 0; +} -- cgit