Age | Commit message (Collapse) | Author | Files | Lines |
|
Fixes a compiler warning. The __bpf_free_used_btfs function
was taking an extra unused struct bpf_prog_aux *aux param
Signed-off-by: Rafael Passos <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Fixes a compiler warning. the bpf_jit_binary_pack_finalize function
was taking an extra bpf_prog parameter that went unused.
This removves it and updates the callers accordingly.
Signed-off-by: Rafael Passos <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
It's confusing to inspect 'prog->aux->tail_call_reachable' with drgn[0],
when bpf prog has tail call but 'tail_call_reachable' is false.
This patch corrects 'tail_call_reachable' when bpf prog has tail call.
Signed-off-by: Leon Hwang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
1e7962114c10 ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error")
165f87691a89 ("bnxt_en: add timestamping statistics support")
No adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bpf and netfilter.
Happy summer solstice! The line count is a bit inflated by a selftest
and update to a driver's FW interface header, in reality this is
slightly below average for us. We are expecting one driver fix from
Intel, but there are no big known issues.
Current release - regressions:
- ipv6: bring NLM_DONE out to a separate recv() again
Current release - new code bugs:
- wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API
Previous releases - regressions:
- wifi: mac80211: fix monitor channel setting with chanctx emulation
(probably most awaited of the fixes in this PR, tracked by Thorsten)
- usb: ax88179_178a: bring back reset on init, if PHY is disconnected
- bpf: fix UML x86_64 compile failure with BPF
- bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
with malicious BPF
- eth: mvpp2: use slab_build_skb() for packets in slab, driver was
missed during API refactoring
- wifi: iwlwifi: add missing unlock of mvm mutex
Previous releases - always broken:
- ipv6: add a number of missing null-checks for in6_dev_get(), in case
IPv6 disabling races with the datapath
- bpf: fix reg_set_min_max corruption of fake_reg
- sched: act_ct: add netns as part of the key of tcf_ct_flow_table"
* tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings
selftests: virtio_net: add forgotten config options
bnxt_en: Restore PTP tx_avail count in case of skb_pad() error
bnxt_en: Set TSO max segs on devices with limits
bnxt_en: Update firmware interface to 1.10.3.44
net: stmmac: Assign configured channel value to EXTTS event
net: do not leave a dangling sk pointer, when socket creation fails
net/tcp_ao: Don't leak ao_info on error-path
ice: Fix VSI list rule with ICE_SW_LKUP_LAST type
ipv6: bring NLM_DONE out to a separate recv() again
selftests: add selftest for the SRv6 End.DX6 behavior with netfilter
selftests: add selftest for the SRv6 End.DX4 behavior with netfilter
netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
netfilter: ipset: Fix suspicious rcu_dereference_protected()
selftests: openvswitch: Set value to nla flags.
octeontx2-pf: Fix linking objects into multiple modules
octeontx2-pf: Add error handling to VLAN unoffload handling
virtio_net: fixing XDP for fully checksummed packets handling
virtio_net: checksum offloading handling fix
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- Restrict gen-API tests for synthetic and kprobe events to only be
built as modules, as they generate dynamic events that cannot be
removed, causing ftracetest and startup selftests to fail
* tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Build event generation tests only as modules
|
|
The bpf arena logic didn't account for mremap operation. Add a refcnt for
multiple mmap events to prevent use-after-free in arena_vm_close.
Fixes: 317460317a02 ("bpf: Introduce bpf_arena.")
Reported-by: Pengfei Xu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Barret Rhoden <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"A single LSM/IMA patch to fix a problem caused by sleeping while in a
RCU critical section"
* tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: Avoid blocking in RCU read-side critical section
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"
* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kcov: don't lose track of remote references during softirqs
mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
mm: fix possible OOB in numa_rebuild_large_mapping()
mm/migrate: fix kernel BUG at mm/compaction.c:2761!
selftests: mm: make map_fixed_noreplace test names stable
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
gcov: add support for GCC 14
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
MAINTAINERS: remove Lorenzo as vmalloc reviewer
Revert "mm: init_mlocked_on_free_v3"
mm/page_table_check: fix crash on ZONE_DEVICE
gcc: disable '-Warray-bounds' for gcc-9
ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
|
|
In coerce_subreg_to_size_sx(), for the case where upper
sign extension bits are the same for smax32 and smin32
values, we missed to setup properly. This is especially
problematic if both smax32 and smin32's sign extension
bits are 1.
The following is a simple example illustrating the inconsistent
verifier states due to missed var_off:
0: (85) call bpf_get_prandom_u32#7 ; R0_w=scalar()
1: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1)
2: (57) r3 &= 15 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
3: (47) r3 |= 128 ; R3_w=scalar(smin=umin=smin32=umin32=128,smax=umax=smax32=umax32=143,var_off=(0x80; 0xf))
4: (bc) w7 = (s8)w3
REG INVARIANTS VIOLATION (alu): range bounds violation u64=[0xffffff80, 0x8f] s64=[0xffffff80, 0x8f]
u32=[0xffffff80, 0x8f] s32=[0x80, 0xffffff8f] var_off=(0x80, 0xf)
The var_off=(0x80, 0xf) is not correct, and the correct one should
be var_off=(0xffffff80; 0xf) since from insn 3, we know that at
insn 4, the sign extension bits will be 1. This patch fixed this
issue by setting var_off properly.
Fixes: 8100928c8814 ("bpf: Support new sign-extension mov insns")
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Zac reported a verification failure and Alexei reproduced the issue
with a simple reproducer ([1]). The verification failure is due to missed
setting for var_off.
The following is the reproducer in [1]:
0: R1=ctx() R10=fp0
0: (71) r3 = *(u8 *)(r10 -387) ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R10=fp0
1: (bc) w7 = (s8)w3 ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f))
2: (36) if w7 >= 0x2533823b goto pc-3
mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1
mark_precise: frame0: regs=r7 stack= before 1: (bc) w7 = (s8)w3
mark_precise: frame0: regs=r3 stack= before 0: (71) r3 = *(u8 *)(r10 -387)
2: R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f))
3: (b4) w0 = 0 ; R0_w=0
4: (95) exit
Note that after insn 1, the var_off for R7 is (0x0; 0x7f). This is not correct
since upper 24 bits of w7 could be 0 or 1. So correct var_off should be
(0x0; 0xffffffff). Missing var_off setting in set_sext32_default_val() caused later
incorrect analysis in zext_32_to_64(dst_reg) and reg_bounds_sync(dst_reg).
To fix the issue, set var_off correctly in set_sext32_default_val(). The correct
reg state after insn 1 becomes:
1: (bc) w7 = (s8)w3 ;
R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
R7_w=scalar(smin=0,smax=umax=0xffffffff,smin32=-128,smax32=127,var_off=(0x0; 0xffffffff))
and at insn 2, the verifier correctly determines either branch is possible.
[1] https://lore.kernel.org/bpf/CAADnVQLPU0Shz7dWV4bn2BgtGdxN3uFHPeobGBA72tpg5Xoykw@mail.gmail.com/
Fixes: 8100928c8814 ("bpf: Support new sign-extension mov insns")
Reported-by: Zac Ecob <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Commit 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare
stage") added a dynamic range for the prepare states, but did not handle
the assignment of the dynstate variable in __cpuhp_setup_state_cpuslocked().
This causes the corresponding startup callback not to be invoked when
calling __cpuhp_setup_state_cpuslocked() with the CPUHP_BP_PREPARE_DYN
parameter, even though it should be.
Currently, the users of __cpuhp_setup_state_cpuslocked(), for one reason or
another, have not triggered this bug.
Fixes: 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare stage")
Signed-off-by: Yuntao Wang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
metadata of the current task into a per-CPU variable. However, the
kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
KCOV objects.
If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
to get interrupted and kcov_remote_start() is called, it ultimately leads
to kcov_remote_stop() NOT restoring the original KCOV reference. So when
the task exits, all registered remote KCOV handles remain active forever.
The most uncomfortable effect (at least for syzkaller) is that the bug
prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If
we obtain it in the parent process and then e.g. drop some
capabilities and continuously fork to execute individual programs, at
some point current->kcov of the forked process is lost,
kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
calls from subsequent forks fail.
And, yes, the efficiency is also affected if we keep on losing remote
kcov objects.
a) kcov_remote_map keeps on growing forever.
b) (If I'm not mistaken), we're also not freeing the memory referenced
by kcov->area.
Fix it by introducing a special kcov_mode that is assigned to the task
that owns a KCOV remote object. It makes kcov_mode_enabled() return true
and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
and write_comp_data().
[[email protected]: replace WRITE_ONCE() with an ordinary assignment]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Signed-off-by: Aleksandr Nogikh <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
long .gcda files with no usable data. To fix this, update GCOV_COUNTERS
to match the value defined by GCC 14.
Tested with GCC versions 14.1.0 and 13.2.0.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Oberparleiter <[email protected]>
Reported-by: Allison Henderson <[email protected]>
Reported-by: Chuck Lever III <[email protected]>
Tested-by: Chuck Lever <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
kernel_wait4() doesn't sleep and returns -EINTR if there is no
eligible child and signal_pending() is true.
That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
return false and avoid a busy-wait loop.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Rachel Menge <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Boqun Feng <[email protected]>
Tested-by: Wei Fu <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Allen Pais <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Zqiang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-14
We've added 8 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 92 insertions(+), 11 deletions(-).
The main changes are:
1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason()
triggered via __bpf_try_make_writable(), from Florian Westphal.
2) Fix removal of kfuncs during linking phase which then throws a kernel
build warning via resolve_btfids about unresolved symbols,
from Tony Ambardar.
3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol
is not available on User Mode Linux, from Maciej Żenczykowski.
4) Fix a register corruption in reg_set_min_max triggering an invariant
violation in BPF verifier, from Daniel Borkmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Harden __bpf_kfunc tag against linker kfunc removal
compiler_types.h: Define __retain for __attribute__((__retain__))
bpf: Avoid splat in pskb_pull_reason
bpf: fix UML x86_64 compile failure
selftests/bpf: Add test coverage for reg_set_min_max handling
bpf: Reduce stack consumption in check_stack_write_fixed_off
bpf: Fix reg_set_min_max corruption of fake_reg
MAINTAINERS: mailmap: Update Stanislav's email address
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Compilers can generate the code
r1 = r2
r1 += 0x1
if r2 < 1000 goto ...
use knowledge of r2 range in subsequent r1 operations
So remember constant delta between r2 and r1 and update r1 after 'if' condition.
Unfortunately LLVM still uses this pattern for loops with 'can_loop' construct:
for (i = 0; i < 1000 && can_loop; i++)
The "undo" pass was introduced in LLVM
https://reviews.llvm.org/D121937
to prevent this optimization, but it cannot cover all cases.
Instead of fighting middle end optimizer in BPF backend teach the verifier
about this pattern.
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Some ciphers do not require state and IV buffer, but with current
implementation 0-sized dynptr is always needed. With adjustment to
verifier we can provide NULL instead of 0-sized dynptr. Make crypto
kfuncs ready for this.
Reviewed-by: Eduard Zingerman <[email protected]>
Signed-off-by: Vadim Fedorenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Some arguments to kfuncs might be NULL in some cases. But currently it's
not possible to pass NULL to any BTF structures because the check for
the suffix is located after all type checks. Move it to earlier place
to allow nullable args.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Vadim Fedorenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts, no adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
A panic happens in ima_match_policy:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
PGD 42f873067 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 5 PID: 1286325 Comm: kubeletmonit.sh
Kdump: loaded Tainted: P
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 0.0.0 02/06/2015
RIP: 0010:ima_match_policy+0x84/0x450
Code: 49 89 fc 41 89 cf 31 ed 89 44 24 14 eb 1c 44 39
7b 18 74 26 41 83 ff 05 74 20 48 8b 1b 48 3b 1d
f2 b9 f4 00 0f 84 9c 01 00 00 <44> 85 73 10 74 ea
44 8b 6b 14 41 f6 c5 01 75 d4 41 f6 c5 02 74 0f
RSP: 0018:ff71570009e07a80 EFLAGS: 00010207
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000200
RDX: ffffffffad8dc7c0 RSI: 0000000024924925 RDI: ff3e27850dea2000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffabfce739
R10: ff3e27810cc42400 R11: 0000000000000000 R12: ff3e2781825ef970
R13: 00000000ff3e2785 R14: 000000000000000c R15: 0000000000000001
FS: 00007f5195b51740(0000)
GS:ff3e278b12d40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000626d24002 CR4: 0000000000361ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ima_get_action+0x22/0x30
process_measurement+0xb0/0x830
? page_add_file_rmap+0x15/0x170
? alloc_set_pte+0x269/0x4c0
? prep_new_page+0x81/0x140
? simple_xattr_get+0x75/0xa0
? selinux_file_open+0x9d/0xf0
ima_file_check+0x64/0x90
path_openat+0x571/0x1720
do_filp_open+0x9b/0x110
? page_counter_try_charge+0x57/0xc0
? files_cgroup_alloc_fd+0x38/0x60
? __alloc_fd+0xd4/0x250
? do_sys_open+0x1bd/0x250
do_sys_open+0x1bd/0x250
do_syscall_64+0x5d/0x1d0
entry_SYSCALL_64_after_hwframe+0x65/0xca
Commit c7423dbdbc9e ("ima: Handle -ESTALE returned by
ima_filter_rule_match()") introduced call to ima_lsm_copy_rule within a
RCU read-side critical section which contains kmalloc with GFP_KERNEL.
This implies a possible sleep and violates limitations of RCU read-side
critical sections on non-PREEMPT systems.
Sleeping within RCU read-side critical section might cause
synchronize_rcu() returning early and break RCU protection, allowing a
UAF to happen.
The root cause of this issue could be described as follows:
| Thread A | Thread B |
| |ima_match_policy |
| | rcu_read_lock |
|ima_lsm_update_rule | |
| synchronize_rcu | |
| | kmalloc(GFP_KERNEL)|
| | sleep |
==> synchronize_rcu returns early
| kfree(entry) | |
| | entry = entry->next|
==> UAF happens and entry now becomes NULL (or could be anything).
| | entry->action |
==> Accessing entry might cause panic.
To fix this issue, we are converting all kmalloc that is called within
RCU read-side critical section to use GFP_ATOMIC.
Fixes: c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()")
Cc: [email protected]
Signed-off-by: GUO Zihua <[email protected]>
Acked-by: John Johansen <[email protected]>
Reviewed-by: Mimi Zohar <[email protected]>
Reviewed-by: Casey Schaufler <[email protected]>
[PM: fixed missing comment, long lines, !CONFIG_IMA_LSM_RULES case]
Signed-off-by: Paul Moore <[email protected]>
|
|
pcpu_hot (defined in arch/x86) is not available on user mode linux (ARCH=um)
Cc: Andrii Nakryiko <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Fixes: 1ae6921009e5 ("bpf: inline bpf_get_smp_processor_id() helper")
Signed-off-by: Maciej Żenczykowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The function argument names were changed but the doc comment was not.
Fix htmldocs build warning by updating doc comments.
Fixes: cce4c40b9606 ("bpf: treewide: Align kfunc signatures to prog point-of-view")
Signed-off-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/r/d0b0eb05f91e12e5795966153b11998d3fc1d433.1718295425.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The fake_reg moved into env->fake_reg given it consumes a lot of stack
space (120 bytes). Migrate the fake_reg in check_stack_write_fixed_off()
as well now that we have it.
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Juan reported that after doing some changes to buzzer [0] and implementing
a new fuzzing strategy guided by coverage, they noticed the following in
one of the probes:
[...]
13: (79) r6 = *(u64 *)(r0 +0) ; R0=map_value(ks=4,vs=8) R6_w=scalar()
14: (b7) r0 = 0 ; R0_w=0
15: (b4) w0 = -1 ; R0_w=0xffffffff
16: (74) w0 >>= 1 ; R0_w=0x7fffffff
17: (5c) w6 &= w0 ; R0_w=0x7fffffff R6_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff))
18: (44) w6 |= 2 ; R6_w=scalar(smin=umin=smin32=umin32=2,smax=umax=umax32=0x7fffffff,var_off=(0x2; 0x7ffffffd))
19: (56) if w6 != 0x7ffffffd goto pc+1
REG INVARIANTS VIOLATION (true_reg2): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
REG INVARIANTS VIOLATION (false_reg2): const tnum out of sync with range bounds u64=[0x0, 0xffffffffffffffff] s64=[0x8000000000000000, 0x7fffffffffffffff] u32=[0x0, 0xffffffff] s32=[0x80000000, 0x7fffffff] var_off=(0x7fffffff, 0x0)
19: R6_w=0x7fffffff
20: (95) exit
from 19 to 21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
21: (14) w6 -= 2147483632 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=14,var_off=(0x2; 0xfffffffd))
22: (76) if w6 s>= 0xe goto pc+1 ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=13,var_off=(0x2; 0xfffffffd))
23: (95) exit
from 22 to 24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
24: (14) w6 -= 14 ; R6_w=0
[...]
What can be seen here is a register invariant violation on line 19. After
the binary-or in line 18, the verifier knows that bit 2 is set but knows
nothing about the rest of the content which was loaded from a map value,
meaning, range is [2,0x7fffffff] with var_off=(0x2; 0x7ffffffd). When in
line 19 the verifier analyzes the branch, it splits the register states
in reg_set_min_max() into the registers of the true branch (true_reg1,
true_reg2) and the registers of the false branch (false_reg1, false_reg2).
Since the test is w6 != 0x7ffffffd, the src_reg is a known constant.
Internally, the verifier creates a "fake" register initialized as scalar
to the value of 0x7ffffffd, and then passes it onto reg_set_min_max(). Now,
for line 19, it is mathematically impossible to take the false branch of
this program, yet the verifier analyzes it. It is impossible because the
second bit of r6 will be set due to the prior or operation and the
constant in the condition has that bit unset (hex(fd) == binary(1111 1101).
When the verifier first analyzes the false / fall-through branch, it will
compute an intersection between the var_off of r6 and of the constant. This
is because the verifier creates a "fake" register initialized to the value
of the constant. The intersection result later refines both registers in
regs_refine_cond_op():
[...]
t = tnum_intersect(tnum_subreg(reg1->var_off), tnum_subreg(reg2->var_off));
reg1->var_off = tnum_with_subreg(reg1->var_off, t);
reg2->var_off = tnum_with_subreg(reg2->var_off, t);
[...]
Since the verifier is analyzing the false branch of the conditional jump,
reg1 is equal to false_reg1 and reg2 is equal to false_reg2, i.e. the reg2
is the "fake" register that was meant to hold a constant value. The resulting
var_off of the intersection says that both registers now hold a known value
of var_off=(0x7fffffff, 0x0) or in other words: this operation manages to
make the verifier think that the "constant" value that was passed in the
jump operation now holds a different value.
Normally this would not be an issue since it should not influence the true
branch, however, false_reg2 and true_reg2 are pointers to the same "fake"
register. Meaning, the false branch can influence the results of the true
branch. In line 24, the verifier assumes R6_w=0, but the actual runtime
value in this case is 1. The fix is simply not passing in the same "fake"
register location as inputs to reg_set_min_max(), but instead making a
copy. Moving the fake_reg into the env also reduces stack consumption by
120 bytes. With this, the verifier successfully rejects invalid accesses
from the test program.
[0] https://github.com/google/buzzer
Fixes: 67420501e868 ("bpf: generalize reg_set_min_max() to handle non-const register comparisons")
Reported-by: Juan José López Jaimez <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Previously, kfunc declarations in bpf_kfuncs.h (and others) used "user
facing" types for kfuncs prototypes while the actual kfunc definitions
used "kernel facing" types. More specifically: bpf_dynptr vs
bpf_dynptr_kern, __sk_buff vs sk_buff, and xdp_md vs xdp_buff.
It wasn't an issue before, as the verifier allows aliased types.
However, since we are now generating kfunc prototypes in vmlinux.h (in
addition to keeping bpf_kfuncs.h around), this conflict creates
compilation errors.
Fix this conflict by using "user facing" types in kfunc definitions.
This results in more casts, but otherwise has no additional runtime
cost.
Note, similar to 5b268d1ebcdc ("bpf: Have bpf_rdonly_cast() take a const
pointer"), we also make kfuncs take const arguments where appropriate in
order to make the kfunc more permissive.
Signed-off-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/r/b58346a63a0e66bc9b7504da751b526b0b189a67.1718207789.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Currently, if a kfunc accepts a projection type as an argument (eg
struct __sk_buff *), the caller must exactly provide exactly the same
type with provable provenance.
However in practice, kfuncs that accept projection types _must_ cast to
the underlying type before use b/c projection type layouts are
completely made up. Thus, it is ok to relax the verifier rules around
implicit conversions.
We will use this functionality in the next commit when we align kfuncs
to user-facing types.
Signed-off-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/r/e2c025cb09ccfd4af1ec9e18284dc3cecff7514d.1718207789.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
We will soon be generating kfunc prototypes from BTF. As part of that,
we need to align the manual signatures in bpf_kfuncs.h with the actual
kfunc definitions. There is currently a conflicting signature for
bpf_session_cookie() w.r.t. return type.
The original intent was to return long * and not __u64 *. You can see
evidence of that intent in a3a5113393cc ("selftests/bpf: Add kprobe
session cookie test").
Fix conflict by changing kfunc definition.
Fixes: 5c919acef851 ("bpf: Add support for kprobe session cookie")
Signed-off-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/r/7043e1c251ab33151d6e3830f8ea1902ed2604ac.1718207789.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The kprobes and synth event generation test modules add events and lock
(get a reference) those event file reference in module init function,
and unlock and delete it in module exit function. This is because those
are designed for playing as modules.
If we make those modules as built-in, those events are left locked in the
kernel, and never be removed. This causes kprobe event self-test failure
as below.
[ 97.349708] ------------[ cut here ]------------
[ 97.353453] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:2133 kprobe_trace_self_tests_init+0x3f1/0x480
[ 97.357106] Modules linked in:
[ 97.358488] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-g699646734ab5-dirty #14
[ 97.361556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 97.363880] RIP: 0010:kprobe_trace_self_tests_init+0x3f1/0x480
[ 97.365538] Code: a8 24 08 82 e9 ae fd ff ff 90 0f 0b 90 48 c7 c7 e5 aa 0b 82 e9 ee fc ff ff 90 0f 0b 90 48 c7 c7 2d 61 06 82 e9 8e fd ff ff 90 <0f> 0b 90 48 c7 c7 33 0b 0c 82 89 c6 e8 6e 03 1f ff 41 ff c7 e9 90
[ 97.370429] RSP: 0000:ffffc90000013b50 EFLAGS: 00010286
[ 97.371852] RAX: 00000000fffffff0 RBX: ffff888005919c00 RCX: 0000000000000000
[ 97.373829] RDX: ffff888003f40000 RSI: ffffffff8236a598 RDI: ffff888003f40a68
[ 97.375715] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 97.377675] R10: ffffffff811c9ae5 R11: ffffffff8120c4e0 R12: 0000000000000000
[ 97.379591] R13: 0000000000000001 R14: 0000000000000015 R15: 0000000000000000
[ 97.381536] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000
[ 97.383813] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 97.385449] CR2: 0000000000000000 CR3: 0000000002244000 CR4: 00000000000006b0
[ 97.387347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 97.389277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 97.391196] Call Trace:
[ 97.391967] <TASK>
[ 97.392647] ? __warn+0xcc/0x180
[ 97.393640] ? kprobe_trace_self_tests_init+0x3f1/0x480
[ 97.395181] ? report_bug+0xbd/0x150
[ 97.396234] ? handle_bug+0x3e/0x60
[ 97.397311] ? exc_invalid_op+0x1a/0x50
[ 97.398434] ? asm_exc_invalid_op+0x1a/0x20
[ 97.399652] ? trace_kprobe_is_busy+0x20/0x20
[ 97.400904] ? tracing_reset_all_online_cpus+0x15/0x90
[ 97.402304] ? kprobe_trace_self_tests_init+0x3f1/0x480
[ 97.403773] ? init_kprobe_trace+0x50/0x50
[ 97.404972] do_one_initcall+0x112/0x240
[ 97.406113] do_initcall_level+0x95/0xb0
[ 97.407286] ? kernel_init+0x1a/0x1a0
[ 97.408401] do_initcalls+0x3f/0x70
[ 97.409452] kernel_init_freeable+0x16f/0x1e0
[ 97.410662] ? rest_init+0x1f0/0x1f0
[ 97.411738] kernel_init+0x1a/0x1a0
[ 97.412788] ret_from_fork+0x39/0x50
[ 97.413817] ? rest_init+0x1f0/0x1f0
[ 97.414844] ret_from_fork_asm+0x11/0x20
[ 97.416285] </TASK>
[ 97.417134] irq event stamp: 13437323
[ 97.418376] hardirqs last enabled at (13437337): [<ffffffff8110bc0c>] console_unlock+0x11c/0x150
[ 97.421285] hardirqs last disabled at (13437370): [<ffffffff8110bbf1>] console_unlock+0x101/0x150
[ 97.423838] softirqs last enabled at (13437366): [<ffffffff8108e17f>] handle_softirqs+0x23f/0x2a0
[ 97.426450] softirqs last disabled at (13437393): [<ffffffff8108e346>] __irq_exit_rcu+0x66/0xd0
[ 97.428850] ---[ end trace 0000000000000000 ]---
And also, since we can not cleanup dynamic_event file, ftracetest are
failed too.
To avoid these issues, build these tests only as modules.
Link: https://lore.kernel.org/all/171811263754.85078.5877446624311852525.stgit@devnote2/
Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module")
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-06-06
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).
The main changes are:
1) Add a user space notification mechanism via epoll when a struct_ops
object is getting detached/unregistered, from Kui-Feng Lee.
2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
tests, from Geliang Tang.
3) Add BTF field (type and string fields, right now) iterator support
to libbpf instead of using existing callback-based approaches,
from Andrii Nakryiko.
4) Extend BPF selftests for the latter with a new btf_field_iter
selftest, from Alan Maguire.
5) Add new kfuncs for a generic, open-coded bits iterator,
from Yafang Shao.
6) Fix BPF selftests' kallsyms_find() helper under kernels configured
with CONFIG_LTO_CLANG_THIN, from Yonghong Song.
7) Remove a bunch of unused structs in BPF selftests,
from David Alan Gilbert.
8) Convert test_sockmap section names into names understood by libbpf
so it can deduce program type and attach type, from Jakub Sitnicki.
9) Extend libbpf with the ability to configure log verbosity
via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.
10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
in nested VMs, from Song Liu.
11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
optimization, from Xiao Wang.
12) Enable BPF programs to declare arrays and struct fields with kptr,
bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
selftests/bpf: Add btf_field_iter selftests
selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
libbpf: Remove callback-based type/string BTF field visitor helpers
bpftool: Use BTF field iterator in btfgen
libbpf: Make use of BTF field iterator in BTF handling code
libbpf: Make use of BTF field iterator in BPF linker code
libbpf: Add BTF field iterator
selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
selftests/bpf: Fix bpf_cookie and find_vma in nested VM
selftests/bpf: Test global bpf_list_head arrays.
selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
bpf: limit the number of levels of a nested struct type.
bpf: look into the types of the fields of a struct type recursively.
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix race between perf_event_free_task() and perf_event_release_kernel()
that can result in missed wakeups and hung tasks"
* tag 'perf-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix missing wakeup when waiting for context reference
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/pensando/ionic/ionic_txrx.c
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
491aee894a08 ("ionic: fix kernel panic in XDP_TX action")
net/ipv6/ip6_fib.c
b4cb4a1391dc ("net: use unrcu_pointer() helper")
b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF and big collection of fixes for WiFi core and
drivers.
Current release - regressions:
- vxlan: fix regression when dropping packets due to invalid src
addresses
- bpf: fix a potential use-after-free in bpf_link_free()
- xdp: revert support for redirect to any xsk socket bound to the
same UMEM as it can result in a corruption
- virtio_net:
- add missing lock protection when reading return code from
control_buf
- fix false-positive lockdep splat in DIM
- Revert "wifi: wilc1000: convert list management to RCU"
- wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
Previous releases - regressions:
- rtnetlink: make the "split" NLM_DONE handling generic, restore the
old behavior for two cases where we started coalescing those
messages with normal messages, breaking sloppily-coded userspace
- wifi:
- cfg80211: validate HE operation element parsing
- cfg80211: fix 6 GHz scan request building
- mt76: mt7615: add missing chanctx ops
- ath11k: move power type check to ASSOC stage, fix connecting to
6 GHz AP
- ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
- rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
- iwlwifi: mvm: fix a crash on 7265
Previous releases - always broken:
- ncsi: prevent multi-threaded channel probing, a spec violation
- vmxnet3: disable rx data ring on dma allocation failure
- ethtool: init tsinfo stats if requested, prevent unintentionally
reporting all-zero stats on devices which don't implement any
- dst_cache: fix possible races in less common IPv6 features
- tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
- ax25: fix two refcounting bugs
- eth: ionic: fix kernel panic in XDP_TX action
Misc:
- tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"
* tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
selftests: net: lib: set 'i' as local
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: support errexit with busywait
net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
ipv6: fix possible race in __fib6_drop_pcpu_from()
af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
af_unix: Annotate data-races around sk->sk_sndbuf.
af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
af_unix: Annotate data-race of sk->sk_state in unix_accept().
af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
af_unix: Annodate data-races around sk->sk_state for writers.
af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
...
|
|
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Limit the number of levels looking into struct types to avoid running out
of stack space.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The verifier has field information for specific special types, such as
kptr, rbtree root, and list head. These types are handled
differently. However, we did not previously examine the types of fields of
a struct type variable. Field information records were not generated for
the kptrs, rbtree roots, and linked_list heads that are not located at the
outermost struct type of a variable.
For example,
struct A {
struct task_struct __kptr * task;
};
struct B {
struct A mem_a;
}
struct B var_b;
It did not examine "struct A" so as not to generate field information for
the kptr in "struct A" for "var_b".
This patch enables BPF programs to define fields of these special types in
a struct type other than the direct type of a variable or in a struct type
that is the type of a field in the value type of a map.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The verifier uses field information for certain special types, such as
kptr, rbtree root, and list head. These types are treated
differently. However, we did not previously support these types in
arrays. This update examines arrays and duplicates field information the
same number of times as the length of the array if the element type is one
of the special types.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Move common code of the two functions to btf_find_field_one().
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
field->size has been initialized by bpf_parse_fields() with the value
returned by btf_field_type_size(). Use it instead of calling
btf_field_type_size() again.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
reg_find_field_offset() always return a btf_field with a matching offset
value. Checking the offset of the returned btf_field is unnecessary.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
After commit 1a80dbcb2dba, bpf_link can be freed by
link->ops->dealloc_deferred, but the code still tests and uses
link->ops->dealloc afterward, which leads to a use-after-free as
reported by syzbot. Actually, one of them should be sufficient, so
just call one of them instead of both. Also add a WARN_ON() in case
of any problematic implementation.
Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period")
Reported-by: [email protected]
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:
ERROR: iterator variable bound on line 762 cannot be NULL
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a Kconfig bug regarding comparisons to 'm' or 'n'
- Replace missed $(srctree)/$(src)
- Fix unneeded kallsyms step 3
- Remove incorrect "compatible" properties from image nodes in
image.fit
- Improve gen_kheaders.sh
- Fix 'make dt_binding_check'
- Clean up unnecessary code
* tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
kheaders: use `command -v` to test for existence of `cpio`
kheaders: explicitly define file modes for archived headers
scripts/make_fit: Drop fdt image entry compatible string
kbuild: remove a stale comment about cleaning in link-vmlinux.sh
kbuild: fix short log for AS in link-vmlinux.sh
kbuild: change scripts/mksysmap into sed script
kbuild: avoid unneeded kallsyms step 3
kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
kconfig: remove redundant check in expr_join_or()
kconfig: fix comparison to constant symbols, 'm', 'n'
kconfig: remove unused expr_is_no()
|
|
The bpf_session_cookie is unavailable for !CONFIG_FPROBE as reported
by Sebastian [1].
To fix that we remove CONFIG_FPROBE ifdef for session kfuncs, which
is fine, because there's filter for session programs.
Then based on bpf_trace.o dependency:
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
we add bpf_session_cookie BTF_ID in special_kfunc_set list dependency
on CONFIG_BPF_EVENTS.
[1] https://lore.kernel.org/bpf/[email protected]/T/#m71c6d5ec71db2967288cb79acedc15cc5dbfeec5
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Sebastian Andrzej Siewior <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Fixes: 5c919acef8514 ("bpf: Add support for kprobe session cookie")
Signed-off-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_classifier.c
abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
56a5cf538c3f ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/[email protected]/
No other adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- dma-mapping benchmark error handling fixes (Fedor Pchelkin)
- correct a config symbol reference in the DMA API documentation (Lukas
Bulwahn)
* tag 'dma-mapping-6.10-2024-05-31' of git://git.infradead.org/users/hch/dma-mapping:
Documentation/core-api: correct reference to SWIOTLB_DYNAMIC
dma-mapping: benchmark: handle NUMA_NO_NODE correctly
dma-mapping: benchmark: fix node id validation
dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
dma-mapping: benchmark: fix up kthread-related error handling
|
|
bpf_link_inc_not_zero() will be used by kernel modules. We will use it in
bpf_testmod.c later.
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add epoll support to bpf struct_ops links to trigger EPOLLHUP event upon
detachment.
This patch implements the "poll" of the "struct file_operations" for BPF
links and introduces a new "poll" operator in the "struct bpf_link_ops". By
implementing "poll" of "struct bpf_link_ops" for the links of struct_ops,
the file descriptor of a struct_ops link can be added to an epoll file
descriptor to receive EPOLLHUP events.
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Implement the detach callback in bpf_link_ops for struct_ops so that user
programs can detach a struct_ops link. The subsystems that struct_ops
objects are registered to can also use this callback to detach the links
being passed to them.
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|