aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-09-30Merge tag 'dma-mapping-6.6-2023-09-30' of ↵Linus Torvalds1-9/+22
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix the narea calculation in swiotlb initialization (Ross Lagerwall) - fix the check whether a device has used swiotlb (Petr Tesarik) * tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix the check whether a device has used software IO TLB swiotlb: use the calculated number of areas
2023-09-30bpf: Use kmalloc_size_roundup() to adjust size_indexHou Tao1-25/+19
Commit d52b59315bf5 ("bpf: Adjust size_index according to the value of KMALLOC_MIN_SIZE") uses KMALLOC_MIN_SIZE to adjust size_index, but as reported by Nathan, the adjustment is not enough, because __kmalloc_minalign() also decides the minimal alignment of slab object as shown in new_kmalloc_cache() and its value may be greater than KMALLOC_MIN_SIZE (e.g., 64 bytes vs 8 bytes under a riscv QEMU VM). Instead of invoking __kmalloc_minalign() in bpf subsystem to find the maximal alignment, just using kmalloc_size_roundup() directly to get the corresponding slab object size for each allocation size. If these two sizes are unmatched, adjust size_index to select a bpf_mem_cache with unit_size equal to the object_size of the underlying slab cache for the allocation size. Fixes: 822fb26bdb55 ("bpf: Add a hint to allocated objects.") Reported-by: Nathan Chancellor <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Hou Tao <[email protected]> Tested-by: Emil Renner Berthing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-29Crash: add lock to serialize crash hotplug handlingBaoquan He1-0/+17
Eric reported that handling corresponding crash hotplug event can be failed easily when many memory hotplug event are notified in a short period. They failed because failing to take __kexec_lock. ======= [ 78.714569] Fallback order for Node 0: 0 [ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886 [ 78.717133] Policy zone: Normal [ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate [ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate [ 80.056643] PEFILE: Unsigned PE binary ======= The memory hotplug events are notified very quickly and very many, while the handling of crash hotplug is much slower relatively. So the atomic variable __kexec_lock and kexec_trylock() can't guarantee the serialization of crash hotplug handling. Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug handling specifically. This doesn't impact the usage of __kexec_lock. Link: https://lkml.kernel.org/r/[email protected] Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support") Signed-off-by: Baoquan He <[email protected]> Tested-by: Eric DeVolder <[email protected]> Reviewed-by: Eric DeVolder <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Cc: Sourabh Jain <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-09-29bpf, mprog: Fix maximum program check on mprog attachmentDaniel Borkmann1-0/+3
After Paul's recent improvement to syzkaller to improve coverage for bpf_mprog and tcx, it hit a splat that the program limit was surpassed. What happened is that the maximum number of progs got added, followed by another prog add request which adds with BPF_F_BEFORE flag relative to the last program in the array. The idx >= bpf_mprog_max() check in bpf_mprog_attach() still passes because the index is below the maximum but the maximum will be surpassed. We need to add a check upfront for insertions to catch this situation. Fixes: 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs") Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Co-developed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: [email protected] Tested-by: [email protected] Link: https://github.com/google/syzkaller/pull/4207 Link: https://lore.kernel.org/bpf/[email protected]
2023-09-28sched/rt: Fix live lock between select_fallback_rq() and RT pushJoel Fernandes (Google)1-0/+1
During RCU-boost testing with the TREE03 rcutorture config, I found that after a few hours, the machine locks up. On tracing, I found that there is a live lock happening between 2 CPUs. One CPU has an RT task running, while another CPU is being offlined which also has an RT task running. During this offlining, all threads are migrated. The migration thread is repeatedly scheduled to migrate actively running tasks on the CPU being offlined. This results in a live lock because select_fallback_rq() keeps picking the CPU that an RT task is already running on only to get pushed back to the CPU being offlined. It is anyway pointless to pick CPUs for pushing tasks to if they are being offlined only to get migrated away to somewhere else. This could also add unwanted latency to this task. Fix these issues by not selecting CPUs in RT if they are not 'active' for scheduling, using the cpu_active_mask. Other parts in core.c already use cpu_active_mask to prevent tasks from being put on CPUs going offline. With this fix I ran the tests for days and could not reproduce the hang. Without the patch, I hit it in a few hours. Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-09-27swiotlb: fix the check whether a device has used software IO TLBPetr Tesarik1-6/+20
When CONFIG_SWIOTLB_DYNAMIC=y, devices which do not use the software IO TLB can avoid swiotlb lookup. A flag is added by commit 1395706a1490 ("swiotlb: search the software IO TLB only if the device makes use of it"), the flag is correctly set, but it is then never checked. Add the actual check here. Note that this code is an alternative to the default pool check, not an additional check, because: 1. swiotlb_find_pool() also searches the default pool; 2. if dma_uses_io_tlb is false, the default swiotlb pool is not used. Tested in a KVM guest against a QEMU RAM-backed SATA disk over virtio and *not* using software IO TLB, this patch increases IOPS by approx 2% for 4-way parallel I/O. The write memory barrier in swiotlb_dyn_alloc() is not needed, because a newly allocated pool must always be observed by swiotlb_find_slots() before an address from that pool is passed to is_swiotlb_buffer(). Correctness was verified using the following litmus test: C swiotlb-new-pool (* * Result: Never * * Check that a newly allocated pool is always visible when the * corresponding swiotlb buffer is visible. *) { mem_pools = default; } P0(int **mem_pools, int *pool) { /* add_mem_pool() */ WRITE_ONCE(*pool, 999); rcu_assign_pointer(*mem_pools, pool); } P1(int **mem_pools, int *flag, int *buf) { /* swiotlb_find_slots() */ int *r0; int r1; rcu_read_lock(); r0 = READ_ONCE(*mem_pools); r1 = READ_ONCE(*r0); rcu_read_unlock(); if (r1) { WRITE_ONCE(*flag, 1); smp_mb(); } /* device driver (presumed) */ WRITE_ONCE(*buf, r1); } P2(int **mem_pools, int *flag, int *buf) { /* device driver (presumed) */ int r0 = READ_ONCE(*buf); /* is_swiotlb_buffer() */ int r1; int *r2; int r3; smp_rmb(); r1 = READ_ONCE(*flag); if (r1) { /* swiotlb_find_pool() */ rcu_read_lock(); r2 = READ_ONCE(*mem_pools); r3 = READ_ONCE(*r2); rcu_read_unlock(); } } exists (2:r0<>0 /\ 2:r3=0) (* Not found. *) Fixes: 1395706a1490 ("swiotlb: search the software IO TLB only if the device makes use of it") Reported-by: Jonathan Corbet <[email protected]> Closes: https://lore.kernel.org/linux-iommu/[email protected]/ Signed-off-by: Petr Tesarik <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-09-26Merge tag 'wq-for-6.6-rc3-fixes' of ↵Linus Torvalds1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - Remove double allocation of wq_update_pod_attrs_buf - Fix missing allocation of pwq_release_worker when wq_cpu_intensive_thresh_us is set to a custom value * tag 'wq-for-6.6-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix missed pwq_release_worker creation in wq_cpu_intensive_thresh_init() workqueue: Removed double allocation of wq_update_pod_attrs_buf
2023-09-25bpf: Count missed stats in trace_call_bpfJiri Olsa1-0/+3
Increase misses stats in case bpf array execution is skipped because of recursion check in trace_call_bpf. Adding bpf_prog_inc_misses_counters that increase misses counts for all bpf programs in bpf_prog_array. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Song Liu <[email protected]> Reviewed-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-09-25bpf: Add missed value to kprobe perf link infoJiri Olsa3-11/+22
Add missed value to kprobe attached through perf link info to hold the stats of missed kprobe handler execution. The kprobe's missed counter gets incremented when kprobe handler is not executed due to another kprobe running on the same cpu. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-09-25bpf: Add missed value to kprobe_multi link infoJiri Olsa1-0/+1
Add missed value to kprobe_multi link info to hold the stats of missed kprobe_multi probe. The missed counter gets incremented when fprobe fails the recursion check or there's no rethook available for return probe. In either case the attached bpf program is not executed. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Song Liu <[email protected]> Reviewed-by: Song Liu <[email protected]> Acked-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-09-25bpf: Count stats for kprobe_multi programsJiri Olsa1-0/+1
Adding support to gather missed stats for kprobe_multi programs due to bpf_prog_active protection. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Song Liu <[email protected]> Reviewed-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-09-24Merge tag 'trace-v6.6-rc2' of ↵Linus Torvalds1-13/+15
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the "bytes" output of the per_cpu stat file The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the accounting was not accurate. It is suppose to show how many used bytes are still in the ring buffer, but even when the ring buffer was empty it would still show there were bytes used. - Fix a bug in eventfs where reading a dynamic event directory (open) and then creating a dynamic event that goes into that diretory screws up the accounting. On close, the newly created event dentry will get a "dput" without ever having a "dget" done for it. The fix is to allocate an array on dir open to save what dentries were actually "dget" on, and what ones to "dput" on close. * tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Remember what dentries were created on dir open ring-buffer: Fix bytes info in per_cpu buffer stats
2023-09-23Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three are cc:stable" * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: nommu: fix empty /proc/<pid>/maps filemap: add filemap_map_order0_folio() to handle order0 folio proc: nommu: /proc/<pid>/maps: release mmap read lock mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement pidfd: prevent a kernel-doc warning argv_split: fix kernel-doc warnings scatterlist: add missing function params to kernel-doc selftests/proc: fixup proc-empty-vm test after KSM changes revert "scripts/gdb/symbols: add specific ko module load command" selftests: link libasan statically for tests with -fsanitize=address task_work: add kerneldoc annotation for 'data' argument mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-22ring-buffer: Fix bytes info in per_cpu buffer statsZheng Yejian1-13/+15
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of bytes in cpu buffer that have not been consumed. However, currently after consuming data by reading file 'trace_pipe', the 'bytes' info was not changed as expected. # cat per_cpu/cpu0/stats entries: 0 overrun: 0 commit overrun: 0 bytes: 568 <--- 'bytes' is problematical !!! oldest event ts: 8651.371479 now ts: 8653.912224 dropped events: 0 read events: 8 The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it: 1. When stat 'read_bytes', account consumed event in rb_advance_reader(); 2. When stat 'entries_bytes', exclude the discarded padding event which is smaller than minimum size because it is invisible to reader. Then use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for page-based read/remove/overrun. Also correct the comments of ring_buffer_bytes_cpu() in this patch. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Fixes: c64e148a3be3 ("trace: Add ring buffer stats to measure rate of events") Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-22Merge tag 'sched-urgent-2023-09-22' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a PF_IDLE initialization bug that generated warnings on tiny-RCU" * tag 'sched-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/sched: Modify initial boot task idle setup
2023-09-21bpf: Disable zero-extension for BPF_MEMSXIlya Leoshkevich1-1/+1
On the architectures that use bpf_jit_needs_zext(), e.g., s390x, the verifier incorrectly inserts a zero-extension after BPF_MEMSX, leading to miscompilations like the one below: 24: 89 1a ff fe 00 00 00 00 "r1 = *(s16 *)(r10 - 2);" # zext_dst set 0x3ff7fdb910e: lgh %r2,-2(%r13,%r0) # load halfword 0x3ff7fdb9114: llgfr %r2,%r2 # wrong! 25: 65 10 00 03 00 00 7f ff if r1 s> 32767 goto +3 <l0_1> # check_cond_jmp_op() Disable such zero-extensions. The JITs need to insert sign-extension themselves, if necessary. Suggested-by: Puranjay Mohan <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Puranjay Mohan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni11-36/+184
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-21Merge tag 'net-6.6-rc3' of ↵Linus Torvalds6-20/+142
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq|txq_stats reference ...
2023-09-20bpf: unconditionally reset backtrack_state masks on global func exitAndrii Nakryiko1-5/+3
In mark_chain_precision() logic, when we reach the entry to a global func, it is expected that R1-R5 might be still requested to be marked precise. This would correspond to some integer input arguments being tracked as precise. This is all expected and handled as a special case. What's not expected is that we'll leave backtrack_state structure with some register bits set. This is because for subsequent precision propagations backtrack_state is reused without clearing masks, as all code paths are carefully written in a way to leave empty backtrack_state with zeroed out masks, for speed. The fix is trivial, we always clear register bit in the register mask, and then, optionally, set reg->precise if register is SCALAR_VALUE type. Reported-by: Chris Mason <[email protected]> Fixes: be2ef8161572 ("bpf: allow precision tracking for programs with subprogs") Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-19pidfd: prevent a kernel-doc warningRandy Dunlap1-1/+1
Change the comment to match the function name that the SYSCALL_DEFINE() macros generate to prevent a kernel-doc warning. kernel/pid.c:628: warning: expecting prototype for pidfd_open(). Prototype was for sys_pidfd_open() instead Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Christian Brauner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-09-19task_work: add kerneldoc annotation for 'data' argumentJens Axboe1-0/+1
A previous commit changed the arguments to task_work_cancel_match(), but didn't document all of them. Link: https://lkml.kernel.org/r/[email protected] Fixes: c7aab1a7c52b ("task_work: add helper for more targeted task_work canceling") Signed-off-by: Jens Axboe <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-09-19bpf: Remove unused variables.Alexei Starovoitov1-5/+1
Remove unused prev_offset, min_size, krec_size variables. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: aaa619ebccb2 ("bpf: Refactor check_btf_func and split into two phases") Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-19bpf: Fix bpf_throw warning on 32-bit archKumar Kartikeya Dwivedi1-1/+1
On 32-bit architectures, the pointer width is 32-bit, while we try to cast from a u64 down to it, the compiler complains on mismatch in integer size. Fix this by first casting to long which should match the pointer width on targets supported by Linux. Fixes: ec5290a178b7 ("bpf: Prevent KASAN false positive with bpf_throw") Reported-by: Matthieu Baerts <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Tested-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-19kernel/sched: Modify initial boot task idle setupLiam R. Howlett2-1/+2
Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be set. Subsequent calls to any cond_resched() will enable IRQs, potentially earlier than the IRQ setup has completed. Recent changes have caused just this scenario and IRQs have been enabled early. This causes a warning later in start_kernel() as interrupts are enabled before they are fully set up. Fix this issue by setting the PF_IDLE flag later in the boot sequence. Although the boot task was marked as idle since (at least) d80e4fda576d, I am not sure that it is wrong to do so. The forced context-switch on idle task was introduced in the tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee. Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
2023-09-18workqueue: Fix missed pwq_release_worker creation in ↵Zqiang1-3/+3
wq_cpu_intensive_thresh_init() Currently, if the wq_cpu_intensive_thresh_us is set to specific value, will cause the wq_cpu_intensive_thresh_init() early exit and missed creation of pwq_release_worker. this commit therefore create the pwq_release_worker in advance before checking the wq_cpu_intensive_thresh_us. Signed-off-by: Zqiang <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Fixes: 967b494e2fd1 ("workqueue: Use a kthread_worker to release pool_workqueues")
2023-09-18workqueue: Removed double allocation of wq_update_pod_attrs_bufSteven Rostedt (Google)1-3/+0
First commit 2930155b2e272 ("workqueue: Initialize unbound CPU pods later in the boot") added the initialization of wq_update_pod_attrs_buf to workqueue_init_early(), and then latter on, commit 84193c07105c6 ("workqueue: Generalize unbound CPU pods") added it as well. This appeared in a kmemleak run where the second allocation made the first allocation leak. Fixes: 84193c07105c6 ("workqueue: Generalize unbound CPU pods") Signed-off-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-09-17Merge tag 'sched-urgent-2023-09-17' of ↵Linus Torvalds1-2/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Fix a performance regression on large SMT systems, an Intel SMT4 balancing bug, and a topology setup bug on (Intel) hybrid processors" * tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain sched/fair: Fix SMT4 group_smt_balance handling sched/fair: Optimize should_we_balance() for large SMT systems
2023-09-17Merge tag 'core-urgent-2023-09-17' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull WARN fix from Ingo Molnar: "Fix a missing preempt-enable in the WARN() slowpath" * tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: panic: Reenable preemption in WARN slowpath
2023-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller10-168/+733
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-16bpf: Fix kfunc callback register type handlingKumar Kartikeya Dwivedi1-0/+4
The kfunc code to handle KF_ARG_PTR_TO_CALLBACK does not check the reg type before using reg->subprogno. This can accidently permit invalid pointers from being passed into callback helpers (e.g. silently from different paths). Likewise, reg->subprogno from the per-register type union may not be meaningful either. We need to reject any other type except PTR_TO_FUNC. Acked-by: Dave Marchevsky <[email protected]> Fixes: 5d92ddc3de1b ("bpf: Add callback validation to kfunc verifier logic") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Disallow fentry/fexit/freplace for exception callbacksKumar Kartikeya Dwivedi2-0/+7
During testing, it was discovered that extensions to exception callbacks had no checks, upon running a testcase, the kernel ended up running off the end of a program having final call as bpf_throw, and hitting int3 instructions. The reason is that while the default exception callback would have reset the stack frame to return back to the main program's caller, the replacing extension program will simply return back to bpf_throw, which will instead return back to the program and the program will continue execution, now in an undefined state where anything could happen. The way to support extensions to an exception callback would be to mark the BPF_PROG_TYPE_EXT main subprog as an exception_cb, and prevent it from calling bpf_throw. This would make the JIT produce a prologue that restores saved registers and reset the stack frame. But let's not do that until there is a concrete use case for this, and simply disallow this for now. Similar issues will exist for fentry and fexit cases, where trampoline saves data on the stack when invoking exception callback, which however will then end up resetting the stack frame, and on return, the fexit program will never will invoked as the return address points to the main program's caller in the kernel. Instead of additional complexity and back and forth between the two stacks to enable such a use case, simply forbid it. One key point here to note is that currently X86_TAIL_CALL_OFFSET didn't require any modifications, even though we emit instructions before the corresponding endbr64 instruction. This is because we ensure that a main subprog never serves as an exception callback, and therefore the exception callback (which will be a global subprog) can never serve as the tail call target, eliminating any discrepancies. However, once we support a BPF_PROG_TYPE_EXT to also act as an exception callback, it will end up requiring change to the tail call offset to account for the extra instructions. For simplicitly, tail calls could be disabled for such targets. Noting the above, it appears better to wait for a concrete use case before choosing to permit extension programs to replace exception callbacks. As a precaution, we disable fentry and fexit for exception callbacks as well. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Detect IP == ksym.end as part of BPF programKumar Kartikeya Dwivedi1-1/+5
Now that bpf_throw kfunc is the first such call instruction that has noreturn semantics within the verifier, this also kicks in dead code elimination in unprecedented ways. For one, any instruction following a bpf_throw call will never be marked as seen. Moreover, if a callchain ends up throwing, any instructions after the call instruction to the eventually throwing subprog in callers will also never be marked as seen. The tempting way to fix this would be to emit extra 'int3' instructions which bump the jited_len of a program, and ensure that during runtime when a program throws, we can discover its boundaries even if the call instruction to bpf_throw (or to subprogs that always throw) is emitted as the final instruction in the program. An example of such a program would be this: do_something(): ... r0 = 0 exit foo(): r1 = 0 call bpf_throw r0 = 0 exit bar(cond): if r1 != 0 goto pc+2 call do_something exit call foo r0 = 0 // Never seen by verifier exit // main(ctx): r1 = ... call bar r0 = 0 exit Here, if we do end up throwing, the stacktrace would be the following: bpf_throw foo bar main In bar, the final instruction emitted will be the call to foo, as such, the return address will be the subsequent instruction (which the JIT emits as int3 on x86). This will end up lying outside the jited_len of the program, thus, when unwinding, we will fail to discover the return address as belonging to any program and end up in a panic due to the unreliable stack unwinding of BPF programs that we never expect. To remedy this case, make bpf_prog_ksym_find treat IP == ksym.end as part of the BPF program, so that is_bpf_text_address returns true when such a case occurs, and we are able to unwind reliably when the final instruction ends up being a call instruction. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Prevent KASAN false positive with bpf_throwKumar Kartikeya Dwivedi1-0/+6
The KASAN stack instrumentation when CONFIG_KASAN_STACK is true poisons the stack of a function when it is entered and unpoisons it when leaving. However, in the case of bpf_throw, we will never return as we switch our stack frame to the BPF exception callback. Later, this discrepancy will lead to confusing KASAN splats when kernel resumes execution on return from the BPF program. Fix this by unpoisoning everything below the stack pointer of the BPF program, which should cover the range that would not be unpoisoned. An example splat is below: BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x14e/0x170 Write of size 8 at addr ffffc900013af958 by task test_progs/227 CPU: 0 PID: 227 Comm: test_progs Not tainted 6.5.0-rc2-g43f1c6c9052a-dirty #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-2.fc39 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x670 ? arch_stack_walk+0x79/0x100 kasan_report+0xda/0x110 ? stack_trace_consume_entry+0x14e/0x170 ? stack_trace_consume_entry+0x14e/0x170 ? __pfx_stack_trace_consume_entry+0x10/0x10 stack_trace_consume_entry+0x14e/0x170 ? __sys_bpf+0xf2e/0x41b0 arch_stack_walk+0x8b/0x100 ? __sys_bpf+0xf2e/0x41b0 ? bpf_prog_test_run_skb+0x341/0x1c70 ? bpf_prog_test_run_skb+0x341/0x1c70 stack_trace_save+0x9b/0xd0 ? __pfx_stack_trace_save+0x10/0x10 ? __kasan_slab_free+0x109/0x180 ? bpf_prog_test_run_skb+0x341/0x1c70 ? __sys_bpf+0xf2e/0x41b0 ? __x64_sys_bpf+0x78/0xc0 ? do_syscall_64+0x3c/0x90 ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 kasan_save_stack+0x33/0x60 ? kasan_save_stack+0x33/0x60 ? kasan_set_track+0x25/0x30 ? kasan_save_free_info+0x2b/0x50 ? __kasan_slab_free+0x109/0x180 ? kmem_cache_free+0x191/0x460 ? bpf_prog_test_run_skb+0x341/0x1c70 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x109/0x180 kmem_cache_free+0x191/0x460 bpf_prog_test_run_skb+0x341/0x1c70 ? __pfx_bpf_prog_test_run_skb+0x10/0x10 ? __fget_light+0x51/0x220 __sys_bpf+0xf2e/0x41b0 ? __might_fault+0xa2/0x170 ? __pfx___sys_bpf+0x10/0x10 ? lock_release+0x1de/0x620 ? __might_fault+0xcd/0x170 ? __pfx_lock_release+0x10/0x10 ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 __x64_sys_bpf+0x78/0xc0 ? syscall_enter_from_user_mode+0x20/0x50 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f0fbb38880d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 45 12 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe13907de8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007ffe13908708 RCX: 00007f0fbb38880d RDX: 0000000000000050 RSI: 00007ffe13907e20 RDI: 000000000000000a RBP: 00007ffe13907e00 R08: 0000000000000000 R09: 00007ffe13907e20 R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000003 R13: 0000000000000000 R14: 00007f0fbb532000 R15: 0000000000cfbd90 </TASK> The buggy address belongs to stack of task test_progs/227 KASAN internal error: frame info validation failed; invalid marker: 0 The buggy address belongs to the virtual mapping at [ffffc900013a8000, ffffc900013b1000) created by: kernel_clone+0xcd/0x600 The buggy address belongs to the physical page: page:00000000b70f4332 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11418f flags: 0x2fffe0000000000(node=0|zone=2|lastcpupid=0x7fff) page_type: 0xffffffff() raw: 02fffe0000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffc900013af800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900013af880: 00 00 00 f1 f1 f1 f1 00 00 00 f3 f3 f3 f3 f3 00 >ffffc900013af900: 00 00 00 00 00 00 00 00 00 00 00 f1 00 00 00 00 ^ ffffc900013af980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900013afa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint Cc: Andrey Ryabinin <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Vincenzo Frascino <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Treat first argument as return value for bpf_throwKumar Kartikeya Dwivedi1-13/+24
In case of the default exception callback, change the behavior of bpf_throw, where the passed cookie value is no longer ignored, but is instead the return value of the default exception callback. As such, we need to place restrictions on the value being passed into bpf_throw in such a case, only allowing those permitted by the check_return_code function. Thus, bpf_throw can now control the return value of the program from each call site without having the user install a custom exception callback just to override the return value when an exception is thrown. We also modify the hidden subprog instructions to now move BPF_REG_1 to BPF_REG_0, so as to set the return value before exit in the default callback. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Perform CFG walk for exception callbackKumar Kartikeya Dwivedi1-2/+13
Since exception callbacks are not referenced using bpf_pseudo_func and bpf_pseudo_call instructions, check_cfg traversal will never explore instructions of the exception callback. Even after adding the subprog, the program will then fail with a 'unreachable insn' error. We thus need to begin walking from the start of the exception callback again in check_cfg after a complete CFG traversal finishes, so as to explore the CFG rooted at the exception callback. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Add support for custom exception callbacksKumar Kartikeya Dwivedi2-16/+126
By default, the subprog generated by the verifier to handle a thrown exception hardcodes a return value of 0. To allow user-defined logic and modification of the return value when an exception is thrown, introduce the 'exception_callback:' declaration tag, which marks a callback as the default exception handler for the program. The format of the declaration tag is 'exception_callback:<value>', where <value> is the name of the exception callback. Each main program can be tagged using this BTF declaratiion tag to associate it with an exception callback. In case the tag is absent, the default callback is used. As such, the exception callback cannot be modified at runtime, only set during verification. Allowing modification of the callback for the current program execution at runtime leads to issues when the programs begin to nest, as any per-CPU state maintaing this information will have to be saved and restored. We don't want it to stay in bpf_prog_aux as this takes a global effect for all programs. An alternative solution is spilling the callback pointer at a known location on the program stack on entry, and then passing this location to bpf_throw as a parameter. However, since exceptions are geared more towards a use case where they are ideally never invoked, optimizing for this use case and adding to the complexity has diminishing returns. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Refactor check_btf_func and split into two phasesKumar Kartikeya Dwivedi1-28/+100
This patch splits the check_btf_info's check_btf_func check into two separate phases. The first phase sets up the BTF and prepares func_info, but does not perform any validation of required invariants for subprogs just yet. This is left to the second phase, which happens where check_btf_info executes currently, and performs the line_info and CO-RE relocation. The reason to perform this split is to obtain the userspace supplied func_info information before we perform the add_subprog call, where we would now require finding and adding subprogs that may not have a bpf_pseudo_call or bpf_pseudo_func instruction in the program. We require this as we want to enable userspace to supply exception callbacks that can override the default hidden subprogram generated by the verifier (which performs a hardcoded action). In such a case, the exception callback may never be referenced in an instruction, but will still be suitably annotated (by way of BTF declaration tags). For finding this exception callback, we would require the program's BTF information, and the supplied func_info information which maps BTF type IDs to subprograms. Since the exception callback won't actually be referenced through instructions, later checks in check_cfg and do_check_subprogs will not verify the subprog. This means that add_subprog needs to add them in the add_subprog_and_kfunc phase before we move forward, which is why the BTF and func_info are required at that point. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Implement BPF exceptionsKumar Kartikeya Dwivedi3-15/+141
This patch implements BPF exceptions, and introduces a bpf_throw kfunc to allow programs to throw exceptions during their execution at runtime. A bpf_throw invocation is treated as an immediate termination of the program, returning back to its caller within the kernel, unwinding all stack frames. This allows the program to simplify its implementation, by testing for runtime conditions which the verifier has no visibility into, and assert that they are true. In case they are not, the program can simply throw an exception from the other branch. BPF exceptions are explicitly *NOT* an unlikely slowpath error handling primitive, and this objective has guided design choices of the implementation of the them within the kernel (with the bulk of the cost for unwinding the stack offloaded to the bpf_throw kfunc). The implementation of this mechanism requires use of add_hidden_subprog mechanism introduced in the previous patch, which generates a couple of instructions to move R1 to R0 and exit. The JIT then rewrites the prologue of this subprog to take the stack pointer and frame pointer as inputs and reset the stack frame, popping all callee-saved registers saved by the main subprog. The bpf_throw function then walks the stack at runtime, and invokes this exception subprog with the stack and frame pointers as parameters. Reviewers must take note that currently the main program is made to save all callee-saved registers on x86_64 during entry into the program. This is because we must do an equivalent of a lightweight context switch when unwinding the stack, therefore we need the callee-saved registers of the caller of the BPF program to be able to return with a sane state. Note that we have to additionally handle r12, even though it is not used by the program, because when throwing the exception the program makes an entry into the kernel which could clobber r12 after saving it on the stack. To be able to preserve the value we received on program entry, we push r12 and restore it from the generated subprogram when unwinding the stack. For now, bpf_throw invocation fails when lingering resources or locks exist in that path of the program. In a future followup, bpf_throw will be extended to perform frame-by-frame unwinding to release lingering resources for each stack frame, removing this limitation. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16bpf: Implement support for adding hidden subprogsKumar Kartikeya Dwivedi3-10/+40
Introduce support in the verifier for generating a subprogram and include it as part of a BPF program dynamically after the do_check phase is complete. The first user will be the next patch which generates default exception callbacks if none are set for the program. The phase of invocation will be do_misc_fixups. Note that this is an internal verifier function, and should be used with instruction blocks which uphold the invariants stated in check_subprogs. Since these subprogs are always appended to the end of the instruction sequence of the program, it becomes relatively inexpensive to do the related adjustments to the subprog_info of the program. Only the fake exit subprogram is shifted forward, making room for our new subprog. This is useful to insert a new subprogram, get it JITed, and obtain its function pointer. The next patch will use this functionality to insert a default exception callback which will be invoked after unwinding the stack. Note that these added subprograms are invisible to userspace, and never reported in BPF_OBJ_GET_INFO_BY_ID etc. For now, only a single subprogram is supported, but more can be easily supported in the future. To this end, two function counts are introduced now, the existing func_cnt, and real_func_cnt, the latter including hidden programs. This allows us to conver the JIT code to use the real_func_cnt for management of resources while syscall path continues working with existing func_cnt. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16arch/x86: Implement arch_bpf_stack_walkKumar Kartikeya Dwivedi1-0/+9
The plumbing for offline unwinding when we throw an exception in programs would require walking the stack, hence introduce a new arch_bpf_stack_walk function. This is provided when the JIT supports exceptions, i.e. bpf_jit_supports_exceptions is true. The arch-specific code is really minimal, hence it should be straightforward to extend this support to other architectures as well, as it reuses the logic of arch_stack_walk, but allowing access to unwind_state data. Once the stack pointer and frame pointer are known for the main subprog during the unwinding, we know the stack layout and location of any callee-saved registers which must be restored before we return back to the kernel. This handling will be added in the subsequent patches. Note that while we primarily unwind through BPF frames, which are effectively CONFIG_UNWINDER_FRAME_POINTER, we still need one of this or CONFIG_UNWINDER_ORC to be able to unwind through the bpf_throw frame from which we begin walking the stack. We also require both sp and bp (stack and frame pointers) from the unwind_state structure, which are only available when one of these two options are enabled. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller6-20/+142
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 450 insertions(+), 36 deletions(-). The main changes are: 1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao. 2) Check whether override is allowed in kprobe mult, from Jiri Olsa. 3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick. 4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann, Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor, Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-15Merge tag 'pm-6.6-rc2' of ↵Linus Torvalds3-14/+16
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Fix the handling of block devices in the test_resume mode of hibernation (Chen Yu)" * tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: hibernate: Fix the exclusive get block device in test_resume mode PM: hibernate: Rename function parameter from snapshot_test to exclusive
2023-09-15bpf: Allow to use kfunc XDP hints and frags togetherLarysa Zaremba1-1/+8
There is no fundamental reason, why multi-buffer XDP and XDP kfunc RX hints cannot coexist in a single program. Allow those features to be used together by modifying the flags condition for dev-bound-only programs, segments are still prohibited for fully offloaded programs, hence additional check. Suggested-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/CAKH8qBuzgtJj=OKMdsxEkyML36VsAuZpcrsXcyqjdKXSJCBq=Q@mail.gmail.com/ Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15bpf: expose information about supported xdp metadata kfuncStanislav Fomichev1-1/+1
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15bpf: make it easier to add new metadata kfuncStanislav Fomichev1-4/+5
No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15bpf: Fix uprobe_multi get_pid_task error pathJiri Olsa1-1/+3
Dan reported Smatch static checker warning due to missing error value set in uprobe multi link's get_pid_task error path. Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Jiri Olsa <[email protected]> Reviewed-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-15bpf: Skip unit_size checking for global per-cpu allocatorHou Tao1-0/+7
For global per-cpu allocator, the size of free object in free list doesn't match with unit_size and now there is no way to get the size of per-cpu pointer saved in free object, so just skip the checking. Reported-by: Stephen Rothwell <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Hou Tao <[email protected]> Tested-by: Biju Das <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-15panic: Reenable preemption in WARN slowpathLukas Wunner1-0/+1
Commit: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG") amended warn_slowpath_fmt() to disable preemption until the WARN splat has been emitted. However the commit neglected to reenable preemption in the !fmt codepath, i.e. when a WARN splat is emitted without additional format string. One consequence is that users may see more splats than intended. E.g. a WARN splat emitted in a work item results in at least two extra splats: BUG: workqueue leaked lock or atomic (emitted by process_one_work()) BUG: scheduling while atomic (emitted by worker_thread() -> schedule()) Ironically the point of the commit was to *avoid* extra splats. ;) Fix it. Fixes: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG") Signed-off-by: Lukas Wunner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/3ec48fde01e4ee6505f77908ba351bad200ae3d1.1694763684.git.lukas@wunner.de
2023-09-14bpf: Charge modmem for struct_ops trampolineSong Liu1-4/+22
Current code charges modmem for regular trampoline, but not for struct_ops trampoline. Add bpf_jit_[charge|uncharge]_modmem() to struct_ops so the trampoline is charged in both cases. Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-13Merge tag 'trace-v6.6-rc1' of ↵Linus Torvalds6-30/+88
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add missing LOCKDOWN checks for eventfs callers When LOCKDOWN is active for tracing, it causes inconsistent state when some functions succeed and others fail. - Use dput() to free the top level eventfs descriptor There was a race between accesses and freeing it. - Fix a long standing bug that eventfs exposed due to changing timings by dynamically creating files. That is, If a event file is opened for an instance, there's nothing preventing the instance from being removed which will make accessing the files cause use-after-free bugs. - Fix a ring buffer race that happens when iterating over the ring buffer while writers are active. Check to make sure not to read the event meta data if it's beyond the end of the ring buffer sub buffer. - Fix the print trigger that disappeared because the test to create it was looking for the event dir field being filled, but now it has the "ef" field filled for the eventfs structure. - Remove the unused "dir" field from the event structure. - Fix the order of the trace_dynamic_info as it had it backwards for the offset and len fields for which one was for which endianess. - Fix NULL pointer dereference with eventfs_remove_rec() If an allocation fails in one of the eventfs_add_*() functions, the caller of it in event_subsystem_dir() or event_create_dir() assigns the result to the structure. But it's assigning the ERR_PTR and not NULL. This was passed to eventfs_remove_rec() which expects either a good pointer or a NULL, not ERR_PTR. The fix is to not assign the ERR_PTR to the structure, but to keep it NULL on error. - Fix list_for_each_rcu() to use list_for_each_srcu() in dcache_dir_open_wrapper(). One iteration of the code used RCU but because it had to call sleepable code, it had to be changed to use SRCU, but one of the iterations was missed. - Fix synthetic event print function to use "as_u64" instead of passing in a pointer to the union. To fix big/little endian issues, the u64 that represented several types was turned into a union to define the types properly. * tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec() tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper() tracing/synthetic: Print out u64 values properly tracing/synthetic: Fix order of struct trace_dynamic_info selftests/ftrace: Fix dependencies for some of the synthetic event tests tracing: Remove unused trace_event_file dir field tracing: Use the new eventfs descriptor for print trigger ring-buffer: Do not attempt to read past "commit" tracefs/eventfs: Free top level files on removal ring-buffer: Avoid softlockup in ring_buffer_resize() tracing: Have event inject files inc the trace array ref count tracing: Have option files inc the trace array ref count tracing: Have current_trace inc the trace array ref count tracing: Have tracing_max_latency inc the trace array ref count tracing: Increase trace array ref count on enable and filter files tracefs/eventfs: Use dput to free the toplevel events directory tracefs/eventfs: Add missing lockdown checks tracefs: Add missing lockdown check to tracefs_create_dir()