aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2024-06-28rcu: Remove full memory barrier on boot time eqs sanity checkFrederic Weisbecker1-1/+1
When the boot CPU initializes the per-CPU data on behalf of all possible CPUs, a sanity check is performed on each of them to make sure none is initialized in an extended quiescent state. This check involves a full memory barrier which is useless at this early boot stage. Do a plain access instead. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
2024-06-28rcu/exp: Remove superfluous full memory barrier upon first EQS snapshotFrederic Weisbecker1-1/+15
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock(). Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2024-06-28rcu: Remove superfluous full memory barrier upon first EQS snapshotFrederic Weisbecker1-1/+12
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock(). Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2024-06-28rcu: Remove full ordering on second EQS snapshotFrederic Weisbecker1-1/+9
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. Also the GP kthread must observe all accesses performed by the target prior it entering in EQS. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. Also the GP kthread later observing that EQS must also observe all accesses performed by the target prior it entering in EQS. This ordering is explicitly performed both on the first EQS snapshot and on the second one as well through the combination of a preceding full barrier followed by an acquire read. However the second snapshot's full memory barrier is redundant and not needed to enforce the above guarantees: GP kthread Remote target ---- ----- // Access prior GP WRITE_ONCE(A, 1) // first snapshot smp_mb() x = smp_load_acquire(EQS) // Access prior GP WRITE_ONCE(B, 1) // EQS enter // implied full barrier by atomic_add_return() atomic_add_return(RCU_DYNTICKS_IDX, EQS) // implied full barrier by atomic_add_return() READ_ONCE(A) // second snapshot y = smp_load_acquire(EQS) z = READ_ONCE(B) If the GP kthread above fails to observe the remote target in EQS (x not in EQS), the remote target will observe A == 1 after further entering in EQS. Then the second snapshot taken by the GP kthread only need to be an acquire read in order to observe z == 1. Therefore remove the needless full memory barrier on second snapshot. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2024-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-28/+131
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling") d9c04209990b ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-27sched_ext: add CONFIG_DEBUG_INFO_BTF dependencyAndrea Righi1-1/+1
Without BTF, attempting to load any sched_ext scheduler will result in an error like the following: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? This makes sched_ext pretty much unusable, so explicitly depend on CONFIG_DEBUG_INFO_BTF to prevent these issues. Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-27sched_ext: fix typo in set_weight() descriptionAndrea Righi1-1/+1
Correct eight to weight in the description of the .set_weight() operation in sched_ext_ops. Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-27Merge tag 'asm-generic-fixes-6.10' of ↵Linus Torvalds5-37/+33
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fixes from Arnd Bergmann: "These are some bugfixes for system call ABI issues I found while working on a cleanup series. None of these are urgent since these bugs have gone unnoticed for many years, but I think we probably want to backport them all to stable kernels, so it makes sense to have the fixes included as early as possible. One more fix addresses a compile-time warning in kallsyms that was uncovered by a patch I did to enable additional warnings in 6.10. I had mistakenly thought that this fix was already merged through the module tree, but as Geert pointed out it was still missing" * tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: kallsyms: rework symbol lookup return codes linux/syscalls.h: add missing __user annotations syscalls: mmap(): use unsigned offset type consistently s390: remove native mmap2() syscall hexagon: fix fadvise64_64 calling conventions csky, hexagon: fix broken sys_sync_file_range sh: rework sync_file_range ABI powerpc: restore some missing spu syscalls parisc: use generic sys_fanotify_mark implementation parisc: use correct compat recv/recvfrom syscalls sparc: fix compat recv/recvfrom syscalls sparc: fix old compat_sys_select() syscalls: fix compat_sys_io_pgetevents_time64 usage ftruncate: pass a signed offset
2024-06-27Merge tag 'net-6.10-rc6' of ↵Linus Torvalds3-11/+97
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bpf and netfilter. There are a bunch of regressions addressed here, but hopefully nothing spectacular. We are still waiting the driver fix from Intel, mentioned by Jakub in the previous networking pull. Current release - regressions: - core: add softirq safety to netdev_rename_lock - tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO - batman-adv: fix RCU race at module unload time Previous releases - regressions: - openvswitch: get related ct labels from its master if it is not confirmed - eth: bonding: fix incorrect software timestamping report - eth: mlxsw: fix memory corruptions on spectrum-4 systems - eth: ionic: use dev_consume_skb_any outside of napi Previous releases - always broken: - netfilter: fully validate NFT_DATA_VALUE on store to data registers - unix: several fixes for OoB data - tcp: fix race for duplicate reqsk on identical SYN - bpf: - fix may_goto with negative offset - fix the corner case with may_goto and jump to the 1st insn - fix overrunning reservations in ringbuf - can: - j1939: recover socket queue on CAN bus error during BAM transmission - mcp251xfd: fix infinite loop when xmit fails - dsa: microchip: monitor potential faults in half-duplex mode - eth: vxlan: pull inner IP header in vxlan_xmit_one() - eth: ionic: fix kernel panic due to multi-buffer handling Misc: - selftest: unix tests refactor and a lot of new cases added" * tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) net: mana: Fix possible double free in error handling path selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c. af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head. selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c selftest: af_unix: Check SIGURG after every send() in msg_oob.c selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c af_unix: Don't stop recv() at consumed ex-OOB skb. selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c. af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head. af_unix: Stop recv(MSG_PEEK) at consumed OOB skb. selftest: af_unix: Add msg_oob.c. selftest: af_unix: Remove test_unix_oob.c. tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers net: usb: qmi_wwan: add Telit FN912 compositions tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO ionic: use dev_consume_skb_any outside of napi net: dsa: microchip: fix wrong register write when masking interrupt Fix race for duplicate reqsk on identical SYN ibmvnic: Add tx check to prevent skb leak ...
2024-06-27kallsyms: rework symbol lookup return codesArnd Bergmann4-36/+32
Building with W=1 in some configurations produces a false positive warning for kallsyms: kernel/kallsyms.c: In function '__sprint_symbol.isra': kernel/kallsyms.c:503:17: error: 'strcpy' source argument is the same as destination [-Werror=restrict] 503 | strcpy(buffer, name); | ^~~~~~~~~~~~~~~~~~~~ This originally showed up while building with -O3, but later started happening in other configurations as well, depending on inlining decisions. The underlying issue is that the local 'name' variable is always initialized to the be the same as 'buffer' in the called functions that fill the buffer, which gcc notices while inlining, though it could see that the address check always skips the copy. The calling conventions here are rather unusual, as all of the internal lookup functions (bpf_address_lookup, ftrace_mod_address_lookup, ftrace_func_address_lookup, module_address_lookup and kallsyms_lookup_buildid) already use the provided buffer and either return the address of that buffer to indicate success, or NULL for failure, but the callers are written to also expect an arbitrary other buffer to be returned. Rework the calling conventions to return the length of the filled buffer instead of its address, which is simpler and easier to follow as well as avoiding the warning. Leave only the kallsyms_lookup() calling conventions unchanged, since that is called from 16 different functions and adapting this would be a much bigger change. Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Tested-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2024-06-26Merge tag 'wq-for-6.10-rc5-fixes' of ↵Linus Torvalds1-17/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two patches to fix kworker name formatting" * tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Increase worker desc's length to 32 workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string
2024-06-26bpf: add missing check_func_arg_reg_off() to prevent out-of-bounds memory ↵Matt Bobrowski1-6/+11
accesses Currently, it's possible to pass in a modified CONST_PTR_TO_DYNPTR to a global function as an argument. The adverse effects of this is that BPF helpers can continue to make use of this modified CONST_PTR_TO_DYNPTR from within the context of the global function, which can unintentionally result in out-of-bounds memory accesses and therefore compromise overall system stability i.e. [ 244.157771] BUG: KASAN: slab-out-of-bounds in bpf_dynptr_data+0x137/0x140 [ 244.161345] Read of size 8 at addr ffff88810914be68 by task test_progs/302 [ 244.167151] CPU: 0 PID: 302 Comm: test_progs Tainted: G O E 6.10.0-rc3-00131-g66b586715063 #533 [ 244.174318] Call Trace: [ 244.175787] <TASK> [ 244.177356] dump_stack_lvl+0x66/0xa0 [ 244.179531] print_report+0xce/0x670 [ 244.182314] ? __virt_addr_valid+0x200/0x3e0 [ 244.184908] kasan_report+0xd7/0x110 [ 244.187408] ? bpf_dynptr_data+0x137/0x140 [ 244.189714] ? bpf_dynptr_data+0x137/0x140 [ 244.192020] bpf_dynptr_data+0x137/0x140 [ 244.194264] bpf_prog_b02a02fdd2bdc5fa_global_call_bpf_dynptr_data+0x22/0x26 [ 244.198044] bpf_prog_b0fe7b9d7dc3abde_callback_adjust_bpf_dynptr_reg_off+0x1f/0x23 [ 244.202136] bpf_user_ringbuf_drain+0x2c7/0x570 [ 244.204744] ? 0xffffffffc0009e58 [ 244.206593] ? __pfx_bpf_user_ringbuf_drain+0x10/0x10 [ 244.209795] bpf_prog_33ab33f6a804ba2d_user_ringbuf_callback_const_ptr_to_dynptr_reg_off+0x47/0x4b [ 244.215922] bpf_trampoline_6442502480+0x43/0xe3 [ 244.218691] __x64_sys_prlimit64+0x9/0xf0 [ 244.220912] do_syscall_64+0xc1/0x1d0 [ 244.223043] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 244.226458] RIP: 0033:0x7ffa3eb8f059 [ 244.228582] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48 [ 244.241307] RSP: 002b:00007ffa3e9c6eb8 EFLAGS: 00000206 ORIG_RAX: 000000000000012e [ 244.246474] RAX: ffffffffffffffda RBX: 00007ffa3e9c7cdc RCX: 00007ffa3eb8f059 [ 244.250478] RDX: 00007ffa3eb162b4 RSI: 0000000000000000 RDI: 00007ffa3e9c7fb0 [ 244.255396] RBP: 00007ffa3e9c6ed0 R08: 00007ffa3e9c76c0 R09: 0000000000000000 [ 244.260195] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffff80 [ 244.264201] R13: 000000000000001c R14: 00007ffc5d6b4260 R15: 00007ffa3e1c7000 [ 244.268303] </TASK> Add a check_func_arg_reg_off() to the path in which the BPF verifier verifies the arguments of global function arguments, specifically those which take an argument of type ARG_PTR_TO_DYNPTR | MEM_RDONLY. Also, process_dynptr_func() doesn't appear to perform any explicit and strict type matching on the supplied register type, so let's also enforce that a register either type PTR_TO_STACK or CONST_PTR_TO_DYNPTR is by the caller. Reported-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Matt Bobrowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-06-25workqueue: Improve scalability of workqueue watchdog touchNicholas Piggin1-2/+8
On a ~2000 CPU powerpc system, hard lockups have been observed in the workqueue code when stop_machine runs (in this case due to CPU hotplug). This is due to lots of CPUs spinning in multi_cpu_stop, calling touch_nmi_watchdog() which ends up calling wq_watchdog_touch(). wq_watchdog_touch() writes to the global variable wq_watchdog_touched, and that can find itself in the same cacheline as other important workqueue data, which slows down operations to the point of lockups. In the case of the following abridged trace, worker_pool_idr was in the hot line, causing the lockups to always appear at idr_find. watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find Call Trace: get_work_pool __queue_work call_timer_fn run_timer_softirq __do_softirq do_softirq_own_stack irq_exit timer_interrupt decrementer_common_virt * interrupt: 900 (timer) at multi_cpu_stop multi_cpu_stop cpu_stopper_thread smpboot_thread_fn kthread Fix this by having wq_watchdog_touch() only write to the line if the last time a touch was recorded exceeds 1/4 of the watchdog threshold. Reported-by: Srikar Dronamraju <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-25workqueue: wq_watchdog_touch is always called with valid CPUNicholas Piggin1-0/+2
Warn in the case it is called with cpu == -1. This does not appear to happen anywhere. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-25hrtimer: Prevent queuing of hrtimer without a function callbackPhil Chang1-0/+2
The hrtimer function callback must not be NULL. It has to be specified by the call side but it is not validated by the hrtimer code. When a hrtimer is queued without a function callback, the kernel crashes with a null pointer dereference when trying to execute the callback in __run_hrtimer(). Introduce a validation before queuing the hrtimer in hrtimer_start_range_ns(). [anna-maria: Rephrase commit message] Signed-off-by: Phil Chang <[email protected]> Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Anna-Maria Behnsen <[email protected]>
2024-06-25syscalls: fix compat_sys_io_pgetevents_time64 usageArnd Bergmann1-1/+1
Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode"), as well as by using more sophisticated system call wrappers on x86 and s390. However, arm64, mips, powerpc, sparc and riscv still have the same bug. Change all of them over to use compat_sys_io_pgetevents_time64() like parisc already does. This was clearly the intention when the function was originally added, but it got hooked up incorrectly in the tables. Cc: [email protected] Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") Acked-by: Heiko Carstens <[email protected]> # s390 Signed-off-by: Arnd Bergmann <[email protected]>
2024-06-25Revert "printk: Save console options for add_preferred_console_match()"Greg Kroah-Hartman4-164/+4
This reverts commit f03e8c1060f86c23eb49bafee99d9fcbd1c1bd77. Let's roll back all of the serial core and printk console changes that went into 6.10-rc1 as there still are problems with them that need to be sorted out. Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1 Reported-by: Petr Mladek <[email protected]> Reported-by: Tony Lindgren <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: John Ogness <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ilpo Järvinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-25Revert "printk: Don't try to parse DEVNAME:0.0 console options"Greg Kroah-Hartman1-4/+0
This reverts commit 8a831c584e6e80cf68f79893dc395c16cdf47dc8. Let's roll back all of the serial core and printk console changes that went into 6.10-rc1 as there still are problems with them that need to be sorted out. Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1 Reported-by: Petr Mladek <[email protected]> Reported-by: Tony Lindgren <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: John Ogness <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ilpo Järvinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-25Revert "printk: Flag register_console() if console is set on command line"Greg Kroah-Hartman1-4/+1
This reverts commit b73c9cbe4f1fc02645228aa575998dd54067f8ef. Let's roll back all of the serial core and printk console changes that went into 6.10-rc1 as there still are problems with them that need to be sorted out. Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1 Reported-by: Petr Mladek <[email protected]> Reported-by: Tony Lindgren <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: John Ogness <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ilpo Järvinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-24hung_task: ignore hung_task_warnings when hung_task_panic is enabledYongliang Gao1-1/+1
If hung_task_panic is enabled, don't consider the value of hung_task_warnings and display the information of the hung tasks. In some cases, hung_task_panic might not be initially set up, after several hung tasks occur, the hung_task_warnings count reaches zero. If hung_task_panic is set up later, it may not display any helpful hung task info in dmesg, only showing messages like: Kernel panic - not syncing: hung_task: blocked tasks CPU: 3 PID: 58 Comm: khungtaskd Not tainted 6.10.0-rc3 #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> panic+0x2f3/0x320 watchdog+0x2dd/0x510 ? __pfx_watchdog+0x10/0x10 kthread+0xe0/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yongliang Gao <[email protected]> Reviewed-by: Huang Cun <[email protected]> Cc: Joel Granados <[email protected]> Cc: John Siddle <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24crash: remove header files which are included more than onceWenchao Hao1-1/+0
Following warning is reported, so remove these duplicated header including: ./kernel/crash_reserve.c: linux/kexec.h is included more than once. This is just a clean code, no logic changed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wenchao Hao <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24kernel/panic: add verbose logging of kernel taints in backtracesJani Nikula1-11/+34
With nearly 20 taint flags and respective characters, it's getting a bit difficult to remember what each taint flag character means. Add verbose logging of the set taints in the format: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN in dump_stack_print_info() when there are taints. Note that the "negative flag" G is not included. Link: https://lkml.kernel.org/r/7321e306166cb2ca2807ab8639e665baa2462e9c.1717146197.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24kernel/panic: initialize taint_flags[] using a macroJani Nikula1-19/+27
Make it easier to extend struct taint_flags in follow-up. Link: https://lkml.kernel.org/r/8a2498285d37953cfad9dce939ed3abef61051bd.1717146197.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24kernel/panic: convert print_tainted() to use struct seq_buf internallyJani Nikula1-14/+24
Convert print_tainted() to use struct seq_buf internally in order to be more aware of the buffer constraints as well as make it easier to extend in follow-up work. Link: https://lkml.kernel.org/r/cb6006fa7c0f82a6b6885e8eea2920fcdc4fc9d0.1717146197.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24kernel/panic: return early from print_tainted() when not taintedJani Nikula1-12/+13
Reduce indent to make follow-up changes slightly easier on the eyes. Link: https://lkml.kernel.org/r/01d6c03de1c9d1b52b59c652a3704a0a9886ed63.1717146197.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24lib min_heap: rename min_heapify() to min_heap_sift_down()Kuan-Wei Chiu1-1/+1
After adding min_heap_sift_up(), the naming convention has been adjusted to maintain consistency with the min_heap_sift_up(). Consequently, min_heapify() has been renamed to min_heap_sift_down(). Link: https://lkml.kernel.org/CAP-5=fVcBAxt8Mw72=NCJPRJfjDaJcqk4rjbadgouAEAHz_q1A@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Brian Foster <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Coly Li <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24lib min_heap: add args for min_heap_callbacksKuan-Wei Chiu1-5/+5
Add a third parameter 'args' for the 'less' and 'swp' functions in the 'struct min_heap_callbacks'. This additional parameter allows these comparison and swap functions to handle extra arguments when necessary. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Brian Foster <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Coly Li <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24lib min_heap: add type safe interfaceKuan-Wei Chiu1-5/+6
Implement a type-safe interface for min_heap using strong type pointers instead of void * in the data field. This change includes adding small macro wrappers around functions, enabling the use of __minheap_cast and __minheap_obj_size macros for type casting and obtaining element size. This implementation removes the necessity of passing element size in min_heap_callbacks. Additionally, introduce the MIN_HEAP_PREALLOCATED macro for preallocating some elements. Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Brian Foster <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Coly Li <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24perf/core: fix several typosKuan-Wei Chiu1-4/+4
Patch series "treewide: Refactor heap related implementation", v6. This patch series focuses on several adjustments related to heap implementation. Firstly, a type-safe interface has been added to the min_heap, along with the introduction of several new functions to enhance its functionality. Additionally, the heap implementation for bcache and bcachefs has been replaced with the generic min_heap implementation from include/linux. Furthermore, several typos have been corrected. Previous discussion with Kent Overstreet: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu This patch (of 16): Replace 'artifically' with 'artificially'. Replace 'irrespecive' with 'irrespective'. Replace 'futher' with 'further'. Replace 'sufficent' with 'sufficient'. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Brian Foster <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Coly Li <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24fork: use this_cpu_try_cmpxchg() in try_release_thread_stack_to_cache()Uros Bizjak1-3/+4
Use this_cpu_try_cmpxchg() instead of this_cpu_cmpxchg (*ptr, old, new) == old in try_release_thread_stack_to_cache. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. [[email protected]: simplify the for loop a bit] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24backtracetest: add MODULE_DESCRIPTION()Jeff Johnson1-0/+1
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/backtracetest.o Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-24Merge tag 'for-netdev' of ↵Jakub Kicinski3-11/+97
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-24 We've added 12 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 412 insertions(+), 16 deletions(-). The main changes are: 1) Fix a BPF verifier issue validating may_goto with a negative offset, from Alexei Starovoitov. 2) Fix a BPF verifier validation bug with may_goto combined with jump to the first instruction, also from Alexei Starovoitov. 3) Fix a bug with overrunning reservations in BPF ring buffer, from Daniel Borkmann. 4) Fix a bug in BPF verifier due to missing proper var_off setting related to movsx instruction, from Yonghong Song. 5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(), from Daniil Dulov. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xdp: Remove WARN() from __xdp_reg_mem_model() selftests/bpf: Add tests for may_goto with negative offset. bpf: Fix may_goto with negative offset. selftests/bpf: Add more ring buffer test coverage bpf: Fix overrunning reservations in ringbuf selftests/bpf: Tests with may_goto and jumps to the 1st insn bpf: Fix the corner case with may_goto and jump to the 1st insn. bpf: Update BPF LSM maintainer list bpf: Fix remap of arena. selftests/bpf: Add a few tests to cover bpf: Add missed var_off setting in coerce_subreg_to_size_sx() bpf: Add missed var_off setting in set_sext32_default_val() ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-25perf,uprobes: fix user stack traces in the presence of pending uretprobesAndrii Nakryiko2-1/+51
When kernel has pending uretprobes installed, it hijacks original user function return address on the stack with a uretprobe trampoline address. There could be multiple such pending uretprobes (either on different user functions or on the same recursive one) at any given time within the same task. This approach interferes with the user stack trace capture logic, which would report suprising addresses (like 0x7fffffffe000) that correspond to a special "[uprobes]" section that kernel installs in the target process address space for uretprobe trampoline code, while logically it should be an address somewhere within the calling function of another traced user function. This is easy to correct for, though. Uprobes subsystem keeps track of pending uretprobes and records original return addresses. This patch is using this to do a post-processing step and restore each trampoline address entries with correct original return address. This is done only if there are pending uretprobes for current task. This is a similar approach to what fprobe/kretprobe infrastructure is doing when capturing kernel stack traces in the presence of pending return probes. Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Riham Selim <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-06-24net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT.Sebastian Andrzej Siewior2-24/+6
The per-CPU flush lists, which are accessed from within the NAPI callback (xdp_do_flush() for instance), are per-CPU. There are subject to the same problem as struct bpf_redirect_info. Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the access. The lists initialized on first usage (similar to bpf_net_ctx_get_ri()). Cc: "Björn Töpel" <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jonathan Lemon <[email protected]> Cc: KP Singh <[email protected]> Cc: Maciej Fijalkowski <[email protected]> Cc: Magnus Karlsson <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Song Liu <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Yonghong Song <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-24net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.Sebastian Andrzej Siewior3-1/+12
The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset. The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. Cc: Andrii Nakryiko <[email protected]> Cc: Eduard Zingerman <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Song Liu <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Yonghong Song <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-24locking/local_lock: Add local nested BH locking infrastructure.Sebastian Andrzej Siewior1-0/+8
Add local_lock_nested_bh() locking. It is based on local_lock_t and the naming follows the preempt_disable_nested() example. For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking assumptions based on local_bh_disable(). The macro is optimized away during compilation. For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that BH is disabled. For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU lock. It does not disable CPU migration because it relies on local_bh_disable() disabling CPU migration. With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT. Due to include hell the softirq check has been moved spinlock.c. The intention is to use this locking in places where locking of a per-CPU variable relies on BH being disabled. Instead of treating disabled bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce the locking scope to what actually needs protecting. A side effect is that it also documents the protection scope of the per-CPU variables. Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-24bpf: Fix may_goto with negative offset.Alexei Starovoitov1-3/+6
Zac's syzbot crafted a bpf prog that exposed two bugs in may_goto. The 1st bug is the way may_goto is patched. When offset is negative it should be patched differently. The 2nd bug is in the verifier: when current state may_goto_depth is equal to visited state may_goto_depth it means there is an actual infinite loop. It's not correct to prune exploration of the program at this point. Note, that this check doesn't limit the program to only one may_goto insn, since 2nd and any further may_goto will increment may_goto_depth only in the queued state pushed for future exploration. The current state will have may_goto_depth == 0 regardless of number of may_goto insns and the verifier has to explore the program until bpf_exit. Fixes: 011832b97b31 ("bpf: Introduce may_goto instruction") Reported-by: Zac Ecob <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Closes: https://lore.kernel.org/bpf/CAADnVQL-15aNp04-cyHRn47Yv61NXfYyhopyZtUyxNojUZUXpA@mail.gmail.com/ Link: https://lore.kernel.org/bpf/[email protected]
2024-06-23bpf: fix build when CONFIG_DEBUG_INFO_BTF[_MODULES] is undefinedAlan Maguire1-2/+2
Kernel test robot reports that kernel build fails with resilient split BTF changes. Examining the associated config and code we see that btf_relocate_id() is defined under CONFIG_DEBUG_INFO_BTF_MODULES. Moving it outside the #ifdef solves the issue. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-06-23cpu: Fix broken cmdline "nosmp" and "maxcpus=0"Huacai Chen1-0/+3
After the rework of "Parallel CPU bringup", the cmdline "nosmp" and "maxcpus=0" parameters are not working anymore. These parameters set setup_max_cpus to zero and that's handed to bringup_nonboot_cpus(). The code there does a decrement before checking for zero, which brings it into the negative space and brings up all CPUs. Add a zero check at the beginning of the function to prevent this. [ tglx: Massaged change log ] Fixes: 18415f33e2ac4ab382 ("cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE") Fixes: 06c6796e0304234da6 ("cpu/hotplug: Fix off by one in cpuhp_bringup_mask()") Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-06-23timekeeping: Add missing kernel-doc function commentsYang Li1-0/+3
Fixup the incomplete kernel-doc style comments for do_adjtimex() and hardpps() by documenting the function parameters. Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9301
2024-06-23sched_ext: Make scx_bpf_cpuperf_set() @cpu arg signedDavid Vernet1-1/+1
The scx_bpf_cpuperf_set() kfunc allows a BPF program to set the relative performance target of a specified CPU. Commit d86adb4fc065 ("sched_ext: Add cpuperf support") defined the @cpu argument to be unsigned. Let's update it to be signed to match the norm for the rest of ext.c and the kernel. Note that the kfunc declaration of scx_bpf_cpuperf_set() in the common.bpf.h header in tools/sched_ext already listed the cpu as signed, so this also fixes the build for tools/sched_ext and the sched_ext selftests due to kfunc declarations now being emitted in vmlinux.h based on BTF (thus causing the compiler to error due to observing conflicting types). Fixes: d86adb4fc065 ("sched_ext: Add cpuperf support") Signed-off-by: David Vernet <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-23irqdomain: Fix formatting irq_find_matching_fwspec() kerneldoc commentAnna-Maria Behnsen1-1/+2
Modify the comment formatting in irq_find_matching_fwspec function to enhance code readability and maintain consistency. Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Shivamurthy Shastri <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-21sched_ext: Add cpuperf supportTejun Heo4-3/+102
sched_ext currently does not integrate with schedutil. When schedutil is the governor, frequencies are left unregulated and usually get stuck close to the highest performance level from running RT tasks. Add CPU performance monitoring and scaling support by integrating into schedutil. The following kfuncs are added: - scx_bpf_cpuperf_cap(): Query the relative performance capacity of different CPUs in the system. - scx_bpf_cpuperf_cur(): Query the current performance level of a CPU relative to its max performance. - scx_bpf_cpuperf_set(): Set the current target performance level of a CPU. This gives direct control over CPU performance setting to the BPF scheduler. The only changes on the schedutil side are accounting for the utilization factor from sched_ext and disabling frequency holding heuristics as it may not apply well to sched_ext schedulers which may have a lot weaker connection between tasks and their current / last CPU. With cpuperf support added, there is no reason to block uclamp. Enable while at it. A toy implementation of cpuperf is added to scx_qmap as a demonstration of the feature. v2: Ignore cpu_util_cfs_boost() when scx_switched_all() in sugov_get_util() to avoid factoring in stale util metric. (Christian) Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Christian Loehle <[email protected]>
2024-06-21cpufreq_schedutil: Refactor sugov_cpu_is_busy()Tejun Heo1-20/+18
sugov_cpu_is_busy() is used to avoid decreasing performance level while the CPU is busy and called by sugov_update_single_freq() and sugov_update_single_perf(). Both callers repeat the same pattern to first test for uclamp and then the business. Let's refactor so that the tests aren't repeated. The new helper is named sugov_hold_freq() and tests both the uclamp exception and CPU business. No functional changes. This will make adding more exception conditions easier. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Reviewed-by: Christian Loehle <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: Viresh Kumar <[email protected]>
2024-06-21workqueue: Remove useless pool->dying_workersLai Jiangshan1-3/+0
A dying worker is first moved from pool->workers to pool->dying_workers in set_worker_dying() and removed from pool->dying_workers in detach_dying_workers(). The whole procedure is in the some lock context of wq_pool_attach_mutex. So pool->dying_workers is useless, just remove it and keep the dying worker in pool->workers after set_worker_dying() and remove it in detach_dying_workers() with wq_pool_attach_mutex held. Cc: Valentin Schneider <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-21workqueue: Detach workers directly in idle_cull_fn()Lai Jiangshan1-26/+19
The code to kick off the destruction of workers is now in a process context (idle_cull_fn()), and the detaching of a worker is not required to be inside the worker thread now, so just do the detaching directly in idle_cull_fn(). wake_dying_workers() is renamed to detach_dying_workers() and the unneeded wakeup in wake_dying_workers() is also removed. Cc: Valentin Schneider <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-21workqueue: Don't bind the rescuer in the last working cpuLai Jiangshan1-12/+12
So that when the rescuer is woken up next time, it will not interrupt the last working cpu which might be busy on other crucial works but have nothing to do with the rescuer's incoming works. Cc: Valentin Schneider <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-21workqueue: Reap workers via kthread_stop() and remove detach_completionLai Jiangshan1-16/+19
The code to kick off the destruction of workers is now in a process context (idle_cull_fn()), so kthread_stop() can be used in the process context to replace the work of pool->detach_completion. The wakeup in wake_dying_workers() is unneeded after this change, but it is harmless, jut keep it here until next patch renames wake_dying_workers() rather than renaming it again and again. Cc: Valentin Schneider <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-21libbpf,bpf: Share BTF relocate-related code with kernelAlan Maguire2-53/+131
Share relocation implementation with the kernel. As part of this, we also need the type/string iteration functions so also share btf_iter.c file. Relocation code in kernel and userspace is identical save for the impementation of the reparenting of split BTF to the relocated base BTF and retrieval of the BTF header from "struct btf"; these small functions need separate user-space and kernel implementations for the separate "struct btf"s they operate upon. One other wrinkle on the kernel side is we have to map .BTF.ids in modules as they were generated with the type ids used at BTF encoding time. btf_relocate() optionally returns an array mapping from old BTF ids to relocated ids, so we use that to fix up these references where needed for kfuncs. Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-06-21module, bpf: Store BTF base pointer in struct moduleAlan Maguire1-1/+4
...as this will allow split BTF modules with a base BTF representation (rather than the full vmlinux BTF at time of BTF encoding) to resolve their references to kernel types in a way that is more resilient to small changes in kernel types. This will allow modules that are not built every time the kernel is to provide more resilient BTF, rather than have it invalidated every time BTF ids for core kernel types change. Fields are ordered to avoid holes in struct module. Signed-off-by: Alan Maguire <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]