aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski17-377/+1215
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-5/+6
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampolineMartin KaFai Lau1-0/+23
The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-28bpf: Handle show_fdinfo for the parameterized task BPF iteratorsKui-Feng Lee1-0/+18
Show information of iterators in the respective files under /proc/<pid>/fdinfo/. For example, for a task file iterator with 1723 as the value of tid parameter, its fdinfo would look like the following lines. pos: 0 flags: 02000000 mnt_id: 14 ino: 38 link_type: iter link_id: 51 prog_tag: a590ac96db22b825 prog_id: 299 target_name: task_file task_type: TID tid: 1723 This patch add the last three fields. task_type is the type of the task parameter. TID means the iterator visit only the thread specified by tid. The value of tid in the above example is 1723. For the case of PID task_type, it means the iterator visits only threads of a process and will show the pid value of the process instead of a tid. Signed-off-by: Kui-Feng Lee <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-28bpf: Handle bpf_link_info for the parameterized task BPF iterators.Kui-Feng Lee1-0/+18
Add new fields to bpf_link_info that users can query it through bpf_obj_get_info_by_fd(). Signed-off-by: Kui-Feng Lee <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-28bpf: Parameterize task iterators.Kui-Feng Lee1-22/+166
Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-26bpf: Enforce W^X for bpf trampolineSong Liu1-17/+5
Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that the trampoine follows W^X rule strictly. This will turn off warnings like CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Also remove bpf_jit_alloc_exec_page(), since it is not used any more. Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-26bpf: use bpf_prog_pack for bpf_dispatcherSong Liu2-8/+28
Allocate bpf_dispatcher with bpf_prog_pack_alloc so that bpf_dispatcher can share pages with bpf programs. arch_prepare_bpf_dispatcher() is updated to provide a RW buffer as working area for arch code to write to. This also fixes CPA W^X warnning like: CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-26bpf: Return value in kprobe get_func_ip only for entry addressJiri Olsa1-1/+4
Changing return value of kprobe's version of bpf_get_func_ip to return zero if the attach address is not on the function's entry point. For kprobes attached in the middle of the function we can't easily get to the function address especially now with the CONFIG_X86_KERNEL_IBT support. If user cares about current IP for kprobes attached within the function body, they can get it with PT_REGS_IP(ctx). Suggested-by: Andrii Nakryiko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Martynas Pumputis <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-26bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBTJiri Olsa1-2/+18
Martynas reported bpf_get_func_ip returning +4 address when CONFIG_X86_KERNEL_IBT option is enabled. When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction at the function entry, which screws return value of bpf_get_func_ip() helper that should return the function address. There's short term workaround for kprobe_multi bpf program made by Alexei [1], but we need this fixup also for bpf_get_attach_cookie, that returns cookie based on the entry_ip value. Moving the fixup in the fprobe handler, so both bpf_get_func_ip and bpf_get_attach_cookie get expected function address when CONFIG_X86_KERNEL_IBT option is enabled. Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip so it's clearer this is an ftrace __fentry__ ip. [1] commit 7f0059b58f02 ("selftests/bpf: Fix kprobe_multi test.") Cc: Peter Zijlstra <[email protected]> Reported-by: Martynas Pumputis <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-26ftrace: Keep the resolved addr in kallsyms_callbackJiri Olsa1-2/+1
Keeping the resolved 'addr' in kallsyms_callback, instead of taking ftrace_location value, because we depend on symbol address in the cookie related code. With CONFIG_X86_KERNEL_IBT option the ftrace_location value differs from symbol address, which screwes the symbol address cookies matching. There are 2 users of this function: - bpf_kprobe_multi_link_attach for which this fix is for - get_ftrace_locations which is used by register_fprobe_syms this function needs to get symbols resolved to addresses, but does not need 'ftrace location addresses' at this point there's another ftrace location translation in the path done by ftrace_set_filter_ips call: register_fprobe_syms addrs = get_ftrace_locations register_fprobe_ips(addrs) ... ftrace_set_filter_ips ... __ftrace_match_addr ip = ftrace_location(ip); ... Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-26kprobes: Add new KPROBE_FLAG_ON_FUNC_ENTRY kprobe flagJiri Olsa1-1/+5
Adding KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag to indicate that attach address is on function entry. This is used in following changes in get_func_ip helper to return correct function address. Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-24Merge tag 'cgroup-for-6.0-rc6-fixes' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Add Waiman Long as a cpuset maintainer - cgroup_get_from_id() could be fed a kernfs ID which doesn't point to a cgroup directory but a knob file and then crash. Error out if the lookup kernfs_node isn't a directory. * tag 'cgroup-for-6.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: cgroup_get_from_id() must check the looked-up kn is a directory cpuset: Add Waiman Long as a cpuset maintainer
2022-09-24Merge tag 'wq-for-6.0-rc6-fixes' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Just one patch to improve flush lockdep coverage" * tag 'wq-for-6.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: don't skip lockdep work dependency in cancel_work_sync()
2022-09-23cgroup: cgroup_get_from_id() must check the looked-up kn is a directoryMing Lei1-1/+4
cgroup has to be one kernfs dir, otherwise kernel panic is caused, especially cgroup id is provide from userspace. Reported-by: Marco Patalano <[email protected]> Fixes: 6b658c4863c1 ("scsi: cgroup: Add cgroup_get_from_id()") Cc: Muneendra <[email protected]> Signed-off-by: Ming Lei <[email protected]> Acked-by: Mukesh Ojha <[email protected]> Cc: [email protected] # v5.14+ Signed-off-by: Tejun Heo <[email protected]>
2022-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski14-31/+27
drivers/net/ethernet/freescale/fec.h 7b15515fc1ca ("Revert "fec: Restart PPS after link state change"") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/[email protected]/ drivers/pinctrl/pinctrl-ocelot.c c297561bc98a ("pinctrl: ocelot: Fix interrupt controller") 181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/drivers/net/bonding/Makefile bbb774d921e2 ("net: Add tests for bonding and team address list management") 152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/[email protected]/ drivers/net/can/usb/gs_usb.c 5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-21bpf: Tweak definition of KF_TRUSTED_ARGSKumar Kartikeya Dwivedi1-5/+13
Instead of forcing all arguments to be referenced pointers with non-zero reg->ref_obj_id, tweak the definition of KF_TRUSTED_ARGS to mean that only PTR_TO_BTF_ID (and socket types translated to PTR_TO_BTF_ID) have that constraint, and require their offset to be set to 0. The rest of pointer types are also accomodated in this definition of trusted pointers, but with more relaxed rules regarding offsets. The inherent meaning of setting this flag is that all kfunc pointer arguments have a guranteed lifetime, and kernel object pointers (PTR_TO_BTF_ID, PTR_TO_CTX) are passed in their unmodified form (with offset 0). In general, this is not true for PTR_TO_BTF_ID as it can be obtained using pointer walks. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/cdede0043c47ed7a357f0a915d16f9ce06a1d589.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Always use raw spinlock for hash bucket lockHou Tao1-52/+14
For a non-preallocated hash map on RT kernel, regular spinlock instead of raw spinlock is used for bucket lock. The reason is that on RT kernel memory allocation is forbidden under atomic context and regular spinlock is sleepable under RT. Now hash map has been fully converted to use bpf_map_alloc, and there will be no synchronous memory allocation for non-preallocated hash map, so it is safe to always use raw spinlock for bucket lock on RT. So removing the usage of htab_use_raw_lock() and updating the comments accordingly. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Prevent bpf program recursion for raw tracepoint probesJiri Olsa3-13/+19
We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev <[email protected]> Reported-by: [email protected] [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Add bpf_verify_pkcs7_signature() kfuncRoberto Sassu1-0/+45
Add the bpf_verify_pkcs7_signature() kfunc, to give eBPF security modules the ability to check the validity of a signature against supplied data, by using user-provided or system-provided keys as trust anchor. The new kfunc makes it possible to enforce mandatory policies, as eBPF programs might be allowed to make security decisions only based on data sources the system administrator approves. The caller should provide the data to be verified and the signature as eBPF dynamic pointers (to minimize the number of parameters) and a bpf_key structure containing a reference to the keyring with keys trusted for signature verification, obtained from bpf_lookup_user_key() or bpf_lookup_system_key(). For bpf_key structures obtained from the former lookup function, bpf_verify_pkcs7_signature() completes the permission check deferred by that function by calling key_validate(). key_task_permission() is already called by the PKCS#7 code. Signed-off-by: Roberto Sassu <[email protected]> Acked-by: KP Singh <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncsRoberto Sassu1-0/+135
Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put() kfuncs, to respectively search a key with a given key handle serial number and flags, obtain a key from a pre-determined ID defined in include/linux/verification.h, and cleanup. Introduce system_keyring_id_check() to validate the keyring ID parameter of bpf_lookup_system_key(). Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Export bpf_dynptr_get_size()Roberto Sassu1-1/+1
Export bpf_dynptr_get_size(), so that kernel code dealing with eBPF dynamic pointers can obtain the real size of data carried by this data structure. Signed-off-by: Roberto Sassu <[email protected]> Reviewed-by: Joanne Koong <[email protected]> Acked-by: KP Singh <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21btf: Allow dynamic pointer parameters in kfuncsRoberto Sassu2-5/+38
Allow dynamic pointers (struct bpf_dynptr_kern *) to be specified as parameters in kfuncs. Also, ensure that dynamic pointers passed as argument are valid and initialized, are a pointer to the stack, and of the type local. More dynamic pointer types can be supported in the future. To properly detect whether a parameter is of the desired type, introduce the stringify_struct() macro to compare the returned structure name with the desired name. In addition, protect against structure renames, by halting the build with BUILD_BUG_ON(), so that developers have to revisit the code. To check if a dynamic pointer passed to the kfunc is valid and initialized, and if its type is local, export the existing functions is_dynptr_reg_valid_init() and is_dynptr_type_expected(). Cc: Joanne Koong <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Move dynptr type check to is_dynptr_type_expected()Roberto Sassu1-8/+27
Move dynptr type check to is_dynptr_type_expected() from is_dynptr_reg_valid_init(), so that callers can better determine the cause of a negative result (dynamic pointer not valid/initialized, dynamic pointer of the wrong type). It will be useful for example for BTF, to restrict which dynamic pointer types can be passed to kfuncs, as initially only the local type will be supported. Also, splitting makes the code more readable, since checking the dynamic pointer type is not necessarily related to validity and initialization. Split the validity/initialization and dynamic pointer type check also in the verifier, and adjust the expected error message in the test (a test for an unexpected dynptr type passed to a helper cannot be added due to missing suitable helpers, but this case has been tested manually). Cc: Joanne Koong <[email protected]> Cc: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21btf: Export bpf_dynptr definitionRoberto Sassu1-0/+2
eBPF dynamic pointers is a new feature recently added to upstream. It binds together a pointer to a memory area and its size. The internal kernel structure bpf_dynptr_kern is not accessible by eBPF programs in user space. They instead see bpf_dynptr, which is then translated to the internal kernel structure by the eBPF verifier. The problem is that it is not possible to include at the same time the uapi include linux/bpf.h and the vmlinux BTF vmlinux.h, as they both contain the definition of some structures/enums. The compiler complains saying that the structures/enums are redefined. As bpf_dynptr is defined in the uapi include linux/bpf.h, this makes it impossible to include vmlinux.h. However, in some cases, e.g. when using kfuncs, vmlinux.h has to be included. The only option until now was to include vmlinux.h and add the definition of bpf_dynptr directly in the eBPF program source code from linux/bpf.h. Solve the problem by using the same approach as for bpf_timer (which also follows the same scheme with the _kern suffix for the internal kernel structure). Add the following line in one of the dynamic pointer helpers, bpf_dynptr_from_mem(): BTF_TYPE_EMIT(struct bpf_dynptr); Cc: [email protected] Cc: Joanne Koong <[email protected]> Fixes: 97e03f521050c ("bpf: Add verifier support for dynptrs") Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Yonghong Song <[email protected]> Tested-by: KP Singh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Allow kfuncs to be used in LSM programsKP Singh1-0/+1
In preparation for the addition of new kfuncs, allow kfuncs defined in the tracing subsystem to be used in LSM programs by mapping the LSM program type to the TRACING hook. Signed-off-by: KP Singh <[email protected]> Signed-off-by: Roberto Sassu <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-21bpf: Add bpf_user_ringbuf_drain() helperDavid Vernet3-9/+235
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which will allow user-space applications to publish messages to a ring buffer that is consumed by a BPF program in kernel-space. In order for this map-type to be useful, it will require a BPF helper function that BPF programs can invoke to drain samples from the ring buffer, and invoke callbacks on those samples. This change adds that capability via a new BPF helper function: bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) BPF programs may invoke this function to run callback_fn() on a series of samples in the ring buffer. callback_fn() has the following signature: long callback_fn(struct bpf_dynptr *dynptr, void *context); Samples are provided to the callback in the form of struct bpf_dynptr *'s, which the program can read using BPF helper functions for querying struct bpf_dynptr's. In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register type is added to the verifier to reflect a dynptr that was allocated by a helper function and passed to a BPF program. Unlike PTR_TO_STACK dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR dynptrs need not use reference tracking, as the BPF helper is trusted to properly free the dynptr before returning. The verifier currently only supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL. Note that while the corresponding user-space libbpf logic will be added in a subsequent patch, this patch does contain an implementation of the .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This .map_poll() callback guarantees that an epoll-waiting user-space producer will receive at least one event notification whenever at least one sample is drained in an invocation of bpf_user_ringbuf_drain(), provided that the function is not invoked with the BPF_RB_NO_WAKEUP flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup notification is sent even if no sample was drained. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-21bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map typeDavid Vernet2-6/+59
We want to support a ringbuf map type where samples are published from user-space, to be consumed by BPF programs. BPF currently supports a kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ring buffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ring buffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we one or more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ring buffer. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-21bpf: simplify code in btf_parse_hdrWilliam Dean1-6/+1
It could directly return 'btf_check_sec_info' to simplify code. Signed-off-by: William Dean <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-09-20Merge tag 'execve-v6.0-rc7' of ↵Linus Torvalds2-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve reverts from Kees Cook: "The recent work to support time namespace unsharing turns out to have some undesirable corner cases, so rather than allowing the API to stay exposed for another release, it'd be best to remove it ASAP, with the replacement getting another cycle of testing. Nothing is known to use this yet, so no userspace breakage is expected. For more details, see: https://lore.kernel.org/lkml/[email protected] Summary: - Remove the recent 'unshare time namespace on vfork+exec' feature (Andrei Vagin)" * tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Revert "selftests/timens: add a test for vfork+exit"
2022-09-20bpf: Check whether or not node is NULL before free it in free_bulkHou Tao1-1/+2
llnode could be NULL if there are new allocations after the checking of c-free_cnt > c->high_watermark in bpf_mem_refill() and before the calling of __llist_del_first() in free_bulk (e.g. a PREEMPT_RT kernel or allocation in NMI context). And it will incur oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT_RT SMP CPU: 39 PID: 373 Comm: irq_work/39 Tainted: G W 6.0.0-rc6-rt9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:bpf_mem_refill+0x66/0x130 ...... Call Trace: <TASK> irq_work_single+0x24/0x60 irq_work_run_list+0x24/0x30 run_irq_workd+0x18/0x20 smpboot_thread_fn+0x13f/0x2c0 kthread+0x121/0x140 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Simply fixing it by checking whether or not llnode is NULL in free_bulk(). Fixes: 8d5a8011b35d ("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.") Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-16bpf/btf: Use btf_type_str() whenever possiblePeilin Ye1-9/+8
We have btf_type_str(). Use it whenever possible in btf.c, instead of "btf_kind_str[BTF_INFO_KIND(t->info)]". Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-09-16ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLEPeter Zijlstra (Intel)1-0/+6
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-16bpf: use kvmemdup_bpfptr helperWang Yufen1-9/+4
Use kvmemdup_bpfptr helper instead of open-coding to simplify the code. Signed-off-by: Wang Yufen <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-09-15bpf: Add verifier check for BPF_PTR_POISON retval and argDave Marchevsky2-10/+26
BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing referenced kptr in map") to denote a bpf_func_proto btf_id which the verifier will replace with a dynamically-determined btf_id at verification time. This patch adds verifier 'poison' functionality to BPF_PTR_POISON in order to prepare for expanded use of the value to poison ret- and arg-btf_id in ongoing work, namely rbtree and linked list patchsets [0, 1]. Specifically, when the verifier checks helper calls, it assumes that BPF_PTR_POISON'ed ret type will be replaced with a valid type before - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id. If poisoned btf_id reaches default handling block for either, consider this a verifier internal error and fail verification. Otherwise a helper w/ poisoned btf_id but no verifier logic replacing the type will cause a crash as the invalid pointer is dereferenced. Also move BPF_PTR_POISON to existing include/linux/posion.h header and remove unnecessary shift. [0]: lore.kernel.org/bpf/[email protected] [1]: lore.kernel.org/bpf/[email protected] Signed-off-by: Dave Marchevsky <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-13Revert "fs/exec: allow to unshare a time namespace on vfork+exec"Andrei Vagin2-6/+2
This reverts commit 133e2d3e81de5d9706cab2dd1d52d231c27382e5. Alexey pointed out a few undesirable side effects of the reverted change. First, it doesn't take into account that CLONE_VFORK can be used with CLONE_THREAD. Second, a child process doesn't enter a target time name-space, if its parent dies before the child calls exec. It happens because the parent clears vfork_done. Eric W. Biederman suggests installing a time namespace as a task gets a new mm. It includes all new processes cloned without CLONE_VM and all tasks that call exec(). This is an user API change, but we think there aren't users that depend on the old behavior. It is too late to make such changes in this release, so let's roll back this patch and introduce the right one in the next release. Cc: Alexey Izbyshev <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-10bpf: Add verifier support for custom callback return rangeDave Marchevsky1-1/+6
Verifier logic to confirm that a callback function returns 0 or 1 was added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper"). At the time, callback return value was only used to continue or stop iteration. In order to support callbacks with a broader return value range, such as those added in rbtree series[0] and others, add a callback_ret_range to bpf_func_state. Verifier's helpers which set in_callback_fn will also set the new field, which the verifier will later use to check return value bounds. Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel value as the latter would prevent the valid range (0, U64_MAX) being used. Previous global default tnum_range(0, 1) is explicitly set for extant callback helpers. The change to global default was made after discussion around this patch in rbtree series [1], goal here is to make it more obvious that callback_ret_range should be explicitly set. [0]: lore.kernel.org/bpf/[email protected]/ [1]: lore.kernel.org/bpf/[email protected]/ Signed-off-by: Dave Marchevsky <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-10bpf: Export btf_type_by_id() and bpf_log()Daniel Xu2-0/+2
These symbols will be used in nf_conntrack.ko to support direct writes to `nf_conn`. Signed-off-by: Daniel Xu <[email protected]> Link: https://lore.kernel.org/r/3c98c19dc50d3b18ea5eca135b4fc3a5db036060.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-10bpf: Remove duplicate PTR_TO_BTF_ID RO checkDaniel Xu1-3/+0
Since commit 27ae7997a661 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS") there has existed bpf_verifier_ops:btf_struct_access. When btf_struct_access is _unset_ for a prog type, the verifier runs the default implementation, which is to enforce read only: if (env->ops->btf_struct_access) { [...] } else { if (atype != BPF_READ) { verbose(env, "only read is supported\n"); return -EACCES; } [...] } When btf_struct_access is _set_, the expectation is that btf_struct_access has full control over accesses, including if writes are allowed. Rather than carve out an exception for each prog type that may write to BTF ptrs, delete the redundant check and give full control to btf_struct_access. Signed-off-by: Daniel Xu <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/962da2bff1238746589e332ff1aecc49403cd7ce.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-10bpf: Simplify code by using for_each_cpu_wrap()Punit Agrawal1-32/+16
In the percpu freelist code, it is a common pattern to iterate over the possible CPUs mask starting with the current CPU. The pattern is implemented using a hand rolled while loop with the loop variable increment being open-coded. Simplify the code by using for_each_cpu_wrap() helper to iterate over the possible cpus starting with the current CPU. As a result, some of the special-casing in the loop also gets simplified. No functional change intended. Signed-off-by: Punit Agrawal <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-10bpf: add missing percpu_counter_destroy() in htab_map_alloc()Tetsuo Handa1-0/+2
syzbot is reporting ODEBUG bug in htab_map_alloc() [1], for commit 86fe28f7692d96d2 ("bpf: Optimize element count in non-preallocated hash map.") added percpu_counter_init() to htab_map_alloc() but forgot to add percpu_counter_destroy() to the error path. Link: https://syzkaller.appspot.com/bug?extid=5d1da78b375c3b5e6c2b [1] Reported-by: syzbot <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Fixes: 86fe28f7692d96d2 ("bpf: Optimize element count in non-preallocated hash map.") Reviewed-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-10Merge tag 'dma-mapping-6.0-2022-09-10' of ↵Linus Torvalds3-13/+9
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - revert a panic on swiotlb initialization failure (Yu Zhao) - fix the lookup for partial syncs in dma-debug (Robin Murphy) - fix a shift overflow in swiotlb (Chao Gao) - fix a comment typo in swiotlb (Chao Gao) - mark a function static now that all abusers are gone (Christoph Hellwig) * tag 'dma-mapping-6.0-2022-09-10' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: mark dma_supported static swiotlb: fix a typo swiotlb: avoid potential left shift overflow dma-debug: improve search for partial syncs Revert "swiotlb: panic if nslabs is too small"
2022-09-09Merge tag 'driver-core-6.0-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core and debugfs fixes for 6.0-rc5. Included in here are: - multiple attempts to get the arch_topology code to work properly on non-cluster SMT systems. First attempt caused build breakages in linux-next and 0-day, second try worked. - debugfs fixes for a long-suffering memory leak. The pattern of debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so add debugfs_lookup_and_remove() to fix this problem. Also fix up the scheduler debug code that highlighted this problem. Fixes for other subsystems will be trickling in over the next few months for this same issue once the debugfs function is merged. All of these have been in linux-next since Wednesday with no reported problems" * tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: arch_topology: Make cluster topology span at least SMT CPUs sched/debug: fix dentry leak in update_sched_domain_debugfs debugfs: add debugfs_lookup_and_remove() driver core: fix driver_set_override() issue with empty strings Revert "arch_topology: Make cluster topology span at least SMT CPUs" arch_topology: Make cluster topology span at least SMT CPUs
2022-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-0/+1
Pull rdma fixes from Jason Gunthorpe: "Many bug fixes in several drivers: - Fix misuse of the DMA API in rtrs - Several irdma issues: hung task due to SQ flushing, incorrect capability reporting to userspace, improper error handling for MW corners, touching an uninitialized SGL for during invalidation. - hns was using the wrong page size limits for the HW, an incorrect calculation of wqe_shift causing WQE corruption, and mis computed a timer id. - Fix a crash in SRP triggered by blktests - Fix compiler errors by calling virt_to_page() with the proper type in siw - Userspace triggerable deadlock in ODP - mlx5 could use the wrong profile due to some driver loading races, counters were not working in some device configurations, and a crash on error unwind" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Report RNR NAK generation in device caps RDMA/irdma: Use s/g array in post send only when its valid RDMA/irdma: Return correct WC error for bind operation failure RDMA/irdma: Return error on MR deregister CQP failure RDMA/irdma: Report the correct max cqes from query device MAINTAINERS: Update maintainers of HiSilicon RoCE RDMA/mlx5: Fix UMR cleanup on error flow of driver init RDMA/mlx5: Set local port to one when accessing counters RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile IB/core: Fix a nested dead lock as part of ODP flow RDMA/siw: Pass a pointer to virt_to_page() RDMA/srp: Set scmnd->result only when scmnd is not NULL RDMA/hns: Remove the num_qpc_timer variable RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift RDMA/hns: Fix supported page size RDMA/cma: Fix arguments order in net device validation RDMA/irdma: Fix drain SQ hang with no completion RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
2022-09-08kprobes: Prohibit probes in gate areaChristian A. Ehrhardt1-0/+1
The system call gate area counts as kernel text but trying to install a kprobe in this area fails with an Oops later on. To fix this explicitly disallow the gate area for kprobes. Found by syzkaller with the following reproducer: perf_event_open$cgroup(&(0x7f00000001c0)={0x6, 0x80, 0x0, 0x0, 0x0, 0x0, 0x80ffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext={0x0, 0xffffffffff600000}}, 0xffffffffffffffff, 0x0, 0xffffffffffffffff, 0x0) Sample report: BUG: unable to handle page fault for address: fffffbfff3ac6000 PGD 6dfcb067 P4D 6dfcb067 PUD 6df8f067 PMD 6de4d067 PTE 0 Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 PID: 21978 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b-dirty #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:__insn_get_emulate_prefix arch/x86/lib/insn.c:91 [inline] RIP: 0010:insn_get_emulate_prefix arch/x86/lib/insn.c:106 [inline] RIP: 0010:insn_get_prefixes.part.0+0xa8/0x1110 arch/x86/lib/insn.c:134 Code: 49 be 00 00 00 00 00 fc ff df 48 8b 40 60 48 89 44 24 08 e9 81 00 00 00 e8 e5 4b 39 ff 4c 89 fa 4c 89 f9 48 c1 ea 03 83 e1 07 <42> 0f b6 14 32 38 ca 7f 08 84 d2 0f 85 06 10 00 00 48 89 d8 48 89 RSP: 0018:ffffc900088bf860 EFLAGS: 00010246 RAX: 0000000000040000 RBX: ffffffff9b9bebc0 RCX: 0000000000000000 RDX: 1ffffffff3ac6000 RSI: ffffc90002d82000 RDI: ffffc900088bf9e8 RBP: ffffffff9d630001 R08: 0000000000000000 R09: ffffc900088bf9e8 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: ffffffff9d630000 R14: dffffc0000000000 R15: ffffffff9d630000 FS: 00007f63eef63640(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff3ac6000 CR3: 0000000029d90005 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> insn_get_prefixes arch/x86/lib/insn.c:131 [inline] insn_get_opcode arch/x86/lib/insn.c:272 [inline] insn_get_modrm+0x64a/0x7b0 arch/x86/lib/insn.c:343 insn_get_sib+0x29a/0x330 arch/x86/lib/insn.c:421 insn_get_displacement+0x350/0x6b0 arch/x86/lib/insn.c:464 insn_get_immediate arch/x86/lib/insn.c:632 [inline] insn_get_length arch/x86/lib/insn.c:707 [inline] insn_decode+0x43a/0x490 arch/x86/lib/insn.c:747 can_probe+0xfc/0x1d0 arch/x86/kernel/kprobes/core.c:282 arch_prepare_kprobe+0x79/0x1c0 arch/x86/kernel/kprobes/core.c:739 prepare_kprobe kernel/kprobes.c:1160 [inline] register_kprobe kernel/kprobes.c:1641 [inline] register_kprobe+0xb6e/0x1690 kernel/kprobes.c:1603 __register_trace_kprobe kernel/trace/trace_kprobe.c:509 [inline] __register_trace_kprobe+0x26a/0x2d0 kernel/trace/trace_kprobe.c:477 create_local_trace_kprobe+0x1f7/0x350 kernel/trace/trace_kprobe.c:1833 perf_kprobe_init+0x18c/0x280 kernel/trace/trace_event_perf.c:271 perf_kprobe_event_init+0xf8/0x1c0 kernel/events/core.c:9888 perf_try_init_event+0x12d/0x570 kernel/events/core.c:11261 perf_init_event kernel/events/core.c:11325 [inline] perf_event_alloc.part.0+0xf7f/0x36a0 kernel/events/core.c:11619 perf_event_alloc kernel/events/core.c:12059 [inline] __do_sys_perf_event_open+0x4a8/0x2a00 kernel/events/core.c:12157 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f63ef7efaed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f63eef63028 EFLAGS: 00000246 ORIG_RAX: 000000000000012a RAX: ffffffffffffffda RBX: 00007f63ef90ff80 RCX: 00007f63ef7efaed RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 00000000200001c0 RBP: 00007f63ef86019c R08: 0000000000000000 R09: 0000000000000000 R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000002 R14: 00007f63ef90ff80 R15: 00007f63eef43000 </TASK> Modules linked in: CR2: fffffbfff3ac6000 ---[ end trace 0000000000000000 ]--- RIP: 0010:__insn_get_emulate_prefix arch/x86/lib/insn.c:91 [inline] RIP: 0010:insn_get_emulate_prefix arch/x86/lib/insn.c:106 [inline] RIP: 0010:insn_get_prefixes.part.0+0xa8/0x1110 arch/x86/lib/insn.c:134 Code: 49 be 00 00 00 00 00 fc ff df 48 8b 40 60 48 89 44 24 08 e9 81 00 00 00 e8 e5 4b 39 ff 4c 89 fa 4c 89 f9 48 c1 ea 03 83 e1 07 <42> 0f b6 14 32 38 ca 7f 08 84 d2 0f 85 06 10 00 00 48 89 d8 48 89 RSP: 0018:ffffc900088bf860 EFLAGS: 00010246 RAX: 0000000000040000 RBX: ffffffff9b9bebc0 RCX: 0000000000000000 RDX: 1ffffffff3ac6000 RSI: ffffc90002d82000 RDI: ffffc900088bf9e8 RBP: ffffffff9d630001 R08: 0000000000000000 R09: ffffc900088bf9e8 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: ffffffff9d630000 R14: dffffc0000000000 R15: ffffffff9d630000 FS: 00007f63eef63640(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff3ac6000 CR3: 0000000029d90005 CR4: 0000000000770ef0 PKRU: 55555554 ================================================================== Link: https://lkml.kernel.org/r/[email protected] cc: "Naveen N. Rao" <[email protected]> cc: Anil S Keshavamurthy <[email protected]> cc: "David S. Miller" <[email protected]> Cc: [email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Christian A. Ehrhardt <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-09-07bpf: Add helper macro bpf_for_each_reg_in_vstateKumar Kartikeya Dwivedi1-107/+28
For a lot of use cases in future patches, we will want to modify the state of registers part of some same 'group' (e.g. same ref_obj_id). It won't just be limited to releasing reference state, but setting a type flag dynamically based on certain actions, etc. Hence, we need a way to easily pass a callback to the function that iterates over all registers in current bpf_verifier_state in all frames upto (and including) the curframe. While in C++ we would be able to easily use a lambda to pass state and the callback together, sadly we aren't using C++ in the kernel. The next best thing to avoid defining a function for each case seems like statement expressions in GNU C. The kernel already uses them heavily, hence they can passed to the macro in the style of a lambda. The statement expression will then be substituted in the for loop bodies. Variables __state and __reg are set to current bpf_func_state and reg for each invocation of the expression inside the passed in verifier state. Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers, release_reference, find_good_pkt_pointers, find_equal_scalars to use bpf_for_each_reg_in_vstate. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-07bpf: Support kptrs in percpu arraymapKumar Kartikeya Dwivedi2-10/+26
Enable support for kptrs in percpu BPF arraymap by wiring up the freeing of these kptrs from percpu map elements. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-07bpf: Fix resetting logic for unreferenced kptrsJules Irenge1-1/+1
Sparse reported a warning at bpf_map_free_kptrs() "warning: Using plain integer as NULL pointer" During the process of fixing this warning, it was discovered that the current code erroneously writes to the pointer variable instead of deferencing and writing to the actual kptr. Hence, Sparse tool accidentally helped to uncover this problem. Fix this by doing WRITE_ONCE(*p, 0) instead of WRITE_ONCE(p, 0). Note that the effect of this bug is that unreferenced kptrs will not be cleared during check_and_free_fields. It is not a problem if the clearing is not done during map_free stage, as there is nothing to free for them. Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr") Signed-off-by: Jules Irenge <[email protected]> Link: https://lore.kernel.org/r/Yxi3pJaK6UDjVJSy@playground Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-07bpf/verifier: allow kfunc to return an allocated memBenjamin Tissoires2-33/+113
For drivers (outside of network), the incoming data is not statically defined in a struct. Most of the time the data buffer is kzalloc-ed and thus we can not rely on eBPF and BTF to explore the data. This commit allows to return an arbitrary memory, previously allocated by the driver. An interesting extra point is that the kfunc can mark the exported memory region as read only or read/write. So, when a kfunc is not returning a pointer to a struct but to a plain type, we can consider it is a valid allocated memory assuming that: - one of the arguments is either called rdonly_buf_size or rdwr_buf_size - and this argument is a const from the caller point of view We can then use this parameter as the size of the allocated memory. The memory is either read-only or read-write based on the name of the size parameter. Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-07bpf/btf: bump BTF_KFUNC_SET_MAX_CNTBenjamin Tissoires1-1/+1
net/bpf/test_run.c is already presenting 20 kfuncs. net/netfilter/nf_conntrack_bpf.c is also presenting an extra 10 kfuncs. Given that all the kfuncs are regrouped into one unique set, having only 2 space left prevent us to add more selftests. Bump it to 256. Signed-off-by: Benjamin Tissoires <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>