aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-03-31Merge tag 'trace-v5.12-rc5' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Add check of order < 0 before calling free_pages() The function addresses that are traced by ftrace are stored in pages, and the size is held in a variable. If there's some error in creating them, the allocate ones will be freed. In this case, it is possible that the order of pages to be freed may end up being negative due to a size of zero passed to get_count_order(), and then that negative number will cause free_pages() to free a very large section. Make sure that does not happen" * tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Check if pages were allocated before calling free_pages()
2021-03-30seccomp: Fix "cacheable" typo in commentsCui GaoSheng1-1/+1
Do a trivial typo fix: s/cachable/cacheable/ Reported-by: Hulk Robot <[email protected]> Signed-off-by: Cui GaoSheng <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-30bpf: Remove redundant assignment of variable idColin Ian King1-1/+0
The variable id is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-30ftrace: Check if pages were allocated before calling free_pages()Steven Rostedt (VMware)1-3/+6
It is possible that on error pg->size can be zero when getting its order, which would return a -1 value. It is dangerous to pass in an order of -1 to free_pages(). Check if order is greater than or equal to zero before calling free_pages(). Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Abaci Robot <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-03-30kernel/printk.c: Fixed mundane typosBhaskar Chowdhury1-3/+3
s/sempahore/semaphore/ s/exacly/exactly/ s/unregistred/unregistered/ s/interation/iteration/ Signed-off-by: Bhaskar Chowdhury <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [[email protected]: Removed 4th hunk. The string has already been removed in the meantime.] Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-30printk: rename vprintk_func to vprintkRasmus Villemoes3-11/+3
The printk code is already hard enough to understand. Remove an unnecessary indirection by renaming vprintk_func to vprintk (adding the asmlinkage annotation), and removing the vprintk definition from printk.c. That way, printk is implemented in terms of vprintk as one would expect, and there's no "vprintk_func, what's that? Some function pointer that gets set where?" The declaration of vprintk in linux/printk.h already has the __printf(1,0) attribute, there's no point repeating that with the definition - it's for diagnostics in callers. linux/printk.h already contains a static inline {return 0;} definition of vprintk when !CONFIG_PRINTK. Since the corresponding stub definition of vprintk_func was not marked "static inline", any translation unit including internal.h would get a definition of vprintk_func - it just so happens that for !CONFIG_PRINTK, there is precisely one such TU, namely printk.c. Had there been more, it would be a link error; now it's just a silly waste of a few bytes of .text, which one must assume are rather precious to anyone disabling PRINTK. $ objdump -dr kernel/printk/printk.o 00000330 <vprintk_func>: 330: 31 c0 xor %eax,%eax 332: c3 ret 333: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 33a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-30genirq/irq_sim: Shrink devm_irq_domain_create_sim()Bartosz Golaszewski1-19/+12
The custom devres structure manages only a single pointer which can can be achieved by using devm_add_action_or_reset() as well which makes the code simpler. [ tglx: Fixed return value handling - found by smatch ] Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-30livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL infrastructureMiroslav Benes2-6/+3
Livepatch sends a fake signal to all remaining blocking tasks of a running transition after a set period of time. It uses TIF_SIGPENDING flag for the purpose. Commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") added a generic infrastructure to achieve the same. Replace our bespoke solution with the generic one. Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Joe Lawrence <[email protected]> Signed-off-by: Miroslav Benes <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-03-29timekeeping: Allow runtime PM from change_clocksource()Niklas Söderlund1-13/+23
The struct clocksource callbacks enable() and disable() are described as a way to allow clock sources to enter a power save mode. See commit 4614e6adafa2 ("clocksource: add enable() and disable() callbacks") But using runtime PM from these callbacks triggers a cyclic lockdep warning when switching clock source using change_clocksource(). # echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource ====================================================== WARNING: possible circular locking dependency detected ------------------------------------------------------ migration/0/11 is trying to acquire lock: ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74 but task is already holding lock: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (tk_core.seq.seqcount){----}-{0:0}: ktime_get+0x28/0xa0 hrtimer_start_range_ns+0x210/0x2dc generic_sched_clock_init+0x70/0x88 sched_clock_init+0x40/0x64 start_kernel+0x494/0x524 -> #1 (hrtimer_bases.lock){-.-.}-{2:2}: hrtimer_start_range_ns+0x68/0x2dc rpm_suspend+0x308/0x5dc rpm_idle+0xc4/0x2a4 pm_runtime_work+0x98/0xc0 process_one_work+0x294/0x6f0 worker_thread+0x70/0x45c kthread+0x154/0x160 ret_from_fork+0x10/0x20 -> #0 (&dev->power.lock){-...}-{2:2}: _raw_spin_lock_irqsave+0x7c/0xc4 __pm_runtime_resume+0x40/0x74 sh_cmt_start+0x1c4/0x260 sh_cmt_clocksource_enable+0x28/0x50 change_clocksource+0x9c/0x160 multi_cpu_stop+0xa4/0x190 cpu_stopper_thread+0x90/0x154 smpboot_thread_fn+0x244/0x270 kthread+0x154/0x160 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: &dev->power.lock --> hrtimer_bases.lock --> tk_core.seq.seqcount Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(tk_core.seq.seqcount); lock(hrtimer_bases.lock); lock(tk_core.seq.seqcount); lock(&dev->power.lock); *** DEADLOCK *** 2 locks held by migration/0/11: #0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160 #1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190 Rework change_clocksource() so it enables the new clocksource and disables the old clocksource outside of the timekeeper_lock and seqcount write held region. There is no requirement that these callbacks are invoked from the lock held region. Signed-off-by: Niklas Söderlund <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Wolfram Sang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()Thomas Gleixner1-12/+7
The signal handling in __rt_mutex_slowlock() is open coded. Use signal_pending_state() instead. Aside of the cleanup this also prepares for the RT lock substituions which require support for TASK_KILLABLE. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Restrict the trylock WARN_ON() to debugThomas Gleixner1-1/+1
The warning as written is expensive and not really required for a production kernel. Make it depend on rt mutex debugging and use !in_task() for the condition which generates far better code and gives the same answer. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()Thomas Gleixner1-1/+1
Preemption is disabled in mark_wakeup_next_waiter(,) not in rt_mutex_slowunlock(). Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Consolidate the fast/slowpath invocationThomas Gleixner1-85/+59
The indirection via a function pointer (which is at least optimized into a tail call by the compiler) is making the code hard to read. Clean it up and move the futex related trylock functions down to the futex section. Move the wake_q wakeup into rt_mutex_slowunlock(). No point in handing it to the caller. The futex code uses a different function. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Make text section and inlining consistentThomas Gleixner1-76/+76
rtmutex is half __sched and the other half is not. If the compiler decides to not inline larger static functions then part of the code ends up in the regular text section. There are also quite some performance related small helpers which are either static or plain inline. Force inline those which make sense and mark the rest __sched. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Move debug functions as inlines into common headerThomas Gleixner5-119/+25
There is no value in having two header files providing just empty stubs and a C file which implements trivial debug functions which can just be inlined. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Decrapify __rt_mutex_init()Thomas Gleixner2-6/+11
The conditional debug handling is just another layer of obfuscation. Split the function so rt_mutex_init_proxy_locked() can invoke the inner init and __rt_mutex_init() gets the full treatment. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Remove pointless CONFIG_RT_MUTEXES=n stubsThomas Gleixner1-42/+20
None of these functions are used when CONFIG_RT_MUTEXES=n. Remove the gunk. Remove pointless comments and clean up the coding style mess while at it. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Inline chainwalk depth checkThomas Gleixner1-8/+3
There is no point for this wrapper at all. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.cThomas Gleixner2-6/+8
Prepare for removing the header maze. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Remove empty and unused debug stubsThomas Gleixner4-32/+0
No users or useless and therefore just ballast. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Remove output from deadlock detectorSebastian Andrzej Siewior5-123/+1
The rtmutex specific deadlock detector predates lockdep coverage of rtmutex and since commit f5694788ad8da ("rt_mutex: Add lockdep annotations") it contains a lot of redundant functionality: - lockdep will detect an potential deadlock before rtmutex-debug has a chance to do so - the deadlock debugging is restricted to rtmutexes which are not associated to futexes and have an active waiter, which is covered by lockdep already Remove the redundant functionality and move actual deadlock WARN() into the deadlock code path. The latter needs a seperate cleanup. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Remove rtmutex deadlock tester leftoversSebastian Andrzej Siewior5-14/+1
The following debug members of 'struct rtmutex' are unused: - save_state: No users - file,line: Printed if ::name is NULL. This is only used for non-futex locks so ::name is never NULL - magic: Assigned to NULL by rt_mutex_destroy(), no further usage Remove them along with unused inline and macro leftovers related to the long gone deadlock tester. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29locking/rtmutex: Remove rt_mutex_timed_lock()Sebastian Andrzej Siewior1-46/+0
rt_mutex_timed_lock() has no callers since: c051b21f71d1f ("rtmutex: Confine deadlock logic to futex") Remove it. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29Merge tag 'v5.12-rc5' into locking/core, to pick up fixesIngo Molnar23-122/+402
Signed-off-by: Ingo Molnar <[email protected]>
2021-03-29module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOADJessica Yu1-5/+4
Dynamic code patching (alternatives, jump_label and static_call) can have sites in __exit code, even it __exit is never executed. Therefore __exit must be present at runtime, at least for as long as __init code is. Additionally, for jump_label and static_call, the __exit sites must also identify as within_module_init(), such that the infrastructure is aware to never touch them after module init -- alternatives are only ran once at init and hence don't have this particular constraint. By making __exit identify as __init for MODULE_UNLOAD, the above is satisfied. So, when !CONFIG_MODULE_UNLOAD, the section ordering should look like the following, with the .exit sections moved to the init region of the module. Core section allocation order: .text .rodata __ksymtab_gpl __ksymtab_strings .note.* sections .bss .data .gnu.linkonce.this_module Init section allocation order: .init.text .exit.text .symtab .strtab [jeyu: thanks to Peter Zijlstra for most of changelog] Link: https://lore.kernel.org/lkml/YFiuphGw0RKehWsQ@gunter/ Link: https://lore.kernel.org/r/[email protected] Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2021-03-28Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-blockLinus Torvalds4-18/+22
Pull io_uring fixes from Jens Axboe: - Use thread info versions of flag testing, as discussed last week. - The series enabling PF_IO_WORKER to just take signals, instead of needing to special case that they do not in a bunch of places. Ends up being pretty trivial to do, and then we can revert all the special casing we're currently doing. - Kill dead pointer assignment - Fix hashed part of async work queue trace - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS - Fix a link completion ordering regression in this merge window - Cancellation fixes * tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block: io_uring: remove unsued assignment to pointer io io_uring: don't cancel extra on files match io_uring: don't cancel-track common timeouts io_uring: do post-completion chore on t-out cancel io_uring: fix timeout cancel return code Revert "signal: don't allow STOP on PF_IO_WORKER threads" Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals" Revert "signal: don't allow sending any signals to PF_IO_WORKER threads" kernel: stop masking signals in create_io_thread() io_uring: handle signals for IO threads like a normal thread kernel: don't call do_exit() for PF_IO_WORKER threads io_uring: maintain CQE order of a failed link io-wq: fix race around pending work on teardown io_uring: do ctx sqd ejection in a clear context io_uring: fix provide_buffers sign extension io_uring: don't skip file_end_write() on reissue io_uring: correct io_queue_async_work() traces io_uring: don't use {test,clear}_tsk_thread_flag() for current
2021-03-27Revert "signal: don't allow STOP on PF_IO_WORKER threads"Jens Axboe1-2/+1
This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597. The IO threads allow and handle SIGSTOP now, so don't special case them anymore in task_set_jobctl_pending(). Signed-off-by: Jens Axboe <[email protected]>
2021-03-27Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"Jens Axboe1-1/+1
This reverts commit 15b2219facadec583c24523eed40fa45865f859f. Before IO threads accepted signals, the freezer using take signals to wake up an IO thread would cause them to loop without any way to clear the pending signal. That is no longer the case, so stop special casing PF_IO_WORKER in the freezer. Signed-off-by: Jens Axboe <[email protected]>
2021-03-27Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"Jens Axboe2-3/+3
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9. The IO threads do allow signals now, including SIGSTOP, and we can allow ptrace attach. Attaching won't reveal anything interesting for the IO threads, but it will allow eg gdb to attach to a task with io_urings and IO threads without complaining. And once attached, it will allow the usual introspection into regular threads. Signed-off-by: Jens Axboe <[email protected]>
2021-03-27Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"Jens Axboe1-3/+0
This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5. IO threads now take signals just fine, so there's no reason to limit them specifically. Revert the change that prevented that from happening. Signed-off-by: Jens Axboe <[email protected]>
2021-03-27kernel: stop masking signals in create_io_thread()Jens Axboe1-8/+8
This is racy - move the blocking into when the task is created and we're marking it as PF_IO_WORKER anyway. The IO threads are now prepared to handle signals like SIGSTOP as well, so clear that from the mask to allow proper stopping of IO threads. Acked-by: "Eric W. Biederman" <[email protected]> Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-03-26bpf: Support bpf program calling kernel functionMartin KaFai Lau5-35/+430
This patch adds support to BPF verifier to allow bpf program calling kernel function directly. The use case included in this set is to allow bpf-tcp-cc to directly call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()"). Those functions have already been used by some kernel tcp-cc implementations. This set will also allow the bpf-tcp-cc program to directly call the kernel tcp-cc implementation, For example, a bpf_dctcp may only want to implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly from the kernel tcp_dctcp.c instead of reimplementing (or copy-and-pasting) them. The tcp-cc kernel functions mentioned above will be white listed for the struct_ops bpf-tcp-cc programs to use in a later patch. The white listed functions are not bounded to a fixed ABI contract. Those functions have already been used by the existing kernel tcp-cc. If any of them has changed, both in-tree and out-of-tree kernel tcp-cc implementations have to be changed. The same goes for the struct_ops bpf-tcp-cc programs which have to be adjusted accordingly. This patch is to make the required changes in the bpf verifier. First change is in btf.c, it adds a case in "btf_check_func_arg_match()". When the passed in "btf->kernel_btf == true", it means matching the verifier regs' states with a kernel function. This will handle the PTR_TO_BTF_ID reg. It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET, and PTR_TO_TCP_SOCK to its kernel's btf_id. In the later libbpf patch, the insn calling a kernel function will look like: insn->code == (BPF_JMP | BPF_CALL) insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */ insn->imm == func_btf_id /* btf_id of the running kernel */ [ For the future calling function-in-kernel-module support, an array of module btf_fds can be passed at the load time and insn->off can be used to index into this array. ] At the early stage of verifier, the verifier will collect all kernel function calls into "struct bpf_kfunc_desc". Those descriptors are stored in "prog->aux->kfunc_tab" and will be available to the JIT. Since this "add" operation is similar to the current "add_subprog()" and looking for the same insn->code, they are done together in the new "add_subprog_and_kfunc()". In the "do_check()" stage, the new "check_kfunc_call()" is added to verify the kernel function call instruction: 1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE. A new bpf_verifier_ops "check_kfunc_call" is added to do that. The bpf-tcp-cc struct_ops program will implement this function in a later patch. 2. Call "btf_check_kfunc_args_match()" to ensure the regs can be used as the args of a kernel function. 3. Mark the regs' type, subreg_def, and zext_dst. At the later do_misc_fixups() stage, the new fixup_kfunc_call() will replace the insn->imm with the function address (relative to __bpf_call_base). If needed, the jit can find the btf_func_model by calling the new bpf_jit_find_kfunc_model(prog, insn). With the imm set to the function address, "bpftool prog dump xlated" will be able to display the kernel function calls the same way as it displays other bpf helper calls. gpl_compatible program is required to call kernel function. This feature currently requires JIT. The verifier selftests are adjusted because of the changes in the verbose log in add_subprog_and_kfunc(). Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26bpf: Refactor btf_check_func_arg_matchMartin KaFai Lau2-75/+88
This patch moved the subprog specific logic from btf_check_func_arg_match() to the new btf_check_subprog_arg_match(). The core logic is left in btf_check_func_arg_match() which will be reused later to check the kernel function call. The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation which will be useful for a later patch. Some of the "btf_kind_str[]" usages is replaced with the shortcut "btf_type_str(t)". Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26bpf: Simplify freeing logic in linfo and jited_linfoMartin KaFai Lau3-26/+16
This patch simplifies the linfo freeing logic by combining "bpf_prog_free_jited_linfo()" and "bpf_prog_free_unused_jited_linfo()" into the new "bpf_prog_jit_attempt_done()". It is a prep work for the kernel function call support. In a later patch, freeing the kernel function call descriptors will also be done in the "bpf_prog_jit_attempt_done()". "bpf_prog_free_linfo()" is removed since it is only called by "__bpf_prog_put_noref()". The kvfree() are directly called instead. It also takes this chance to s/kcalloc/kvcalloc/ for the jited_linfo allocation. Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26bpf: Take module reference for trampoline in moduleJiri Olsa1-0/+30
Currently module can be unloaded even if there's a trampoline register in it. It's easily reproduced by running in parallel: # while :; do ./test_progs -t module_attach; done # while :; do rmmod bpf_testmod; sleep 0.5; done Taking the module reference in case the trampoline's ip is within the module code. Releasing it when the trampoline's ip is unregistered. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26kernel: don't call do_exit() for PF_IO_WORKER threadsJens Axboe1-1/+9
Right now we're never calling get_signal() from PF_IO_WORKER threads, but in preparation for doing so, don't handle a fatal signal for them. The workers have state they need to cleanup when exiting, so just return instead of calling do_exit() on their behalf. The threads themselves will detect a fatal signal and do proper shutdown. Signed-off-by: Jens Axboe <[email protected]>
2021-03-26Merge tag 'pm-5.12-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an issue related to device links in the runtime PM framework and debugfs usage in the Energy Model code. Specifics: - Modify the runtime PM device suspend to avoid suspending supplier devices before the consumer device's status changes to RPM_SUSPENDED (Rafael Wysocki) - Change the Energy Model code to prevent it from attempting to create its main debugfs directory too early (Lukasz Luba)" * tag 'pm-5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: postpone creating the debugfs dir till fs_initcall PM: runtime: Defer suspending suppliers
2021-03-26bpf: Fix a spelling typo in bpf_atomic_alu_string disasmXu Kuohai1-1/+1
The name string for BPF_XOR is "xor", not "or". Fix it. Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions") Signed-off-by: Xu Kuohai <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Brendan Jackman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26bpf: Enforce that struct_ops programs be GPL-onlyToke Høiland-Jørgensen1-0/+5
With the introduction of the struct_ops program type, it became possible to implement kernel functionality in BPF, making it viable to use BPF in place of a regular kernel module for these particular operations. Thus far, the only user of this mechanism is for implementing TCP congestion control algorithms. These are clearly marked as GPL-only when implemented as modules (as seen by the use of EXPORT_SYMBOL_GPL for tcp_register_congestion_control()), so it seems like an oversight that this was not carried over to BPF implementations. Since this is the only user of the struct_ops mechanism, just enforcing GPL-only for the struct_ops program type seems like the simplest way to fix this. Fixes: 0baf26b0fcd7 ("bpf: tcp: Support tcp_congestion_ops in bpf") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-26irqdomain: Introduce irq_domain_create_simple() APIAndy Shevchenko1-10/+10
Linus Walleij pointed out that ird_domain_add_simple() gained additional functionality and can't be anymore replaced with a simple conditional. In preparation to upgrade GPIO library to use fwnode, introduce irq_domain_create_simple() API which is functional equivalent to the existing irq_domain_add_simple(), but takes a pointer to the struct fwnode_handle as a parameter. While at it, amend documentation to mention irq_domain_create_*() functions where it makes sense. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-03-25bpf: Add support for batched ops in LPM trie mapsPedro Tammela1-0/+3
Suggested-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-25bpf: Fix NULL pointer dereference in bpf_get_local_storage() helperYonghong Song2-6/+14
Jiri Olsa reported a bug ([1]) in kernel where cgroup local storage pointer may be NULL in bpf_get_local_storage() helper. There are two issues uncovered by this bug: (1). kprobe or tracepoint prog incorrectly sets cgroup local storage before prog run, (2). due to change from preempt_disable to migrate_disable, preemption is possible and percpu storage might be overwritten by other tasks. This issue (1) is fixed in [2]. This patch tried to address issue (2). The following shows how things can go wrong: task 1: bpf_cgroup_storage_set() for percpu local storage preemption happens task 2: bpf_cgroup_storage_set() for percpu local storage preemption happens task 1: run bpf program task 1 will effectively use the percpu local storage setting by task 2 which will be either NULL or incorrect ones. Instead of just one common local storage per cpu, this patch fixed the issue by permitting 8 local storages per cpu and each local storage is identified by a task_struct pointer. This way, we allow at most 8 nested preemption between bpf_cgroup_storage_set() and bpf_cgroup_storage_unset(). The percpu local storage slot is released (calling bpf_cgroup_storage_unset()) by the same task after bpf program finished running. bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set() interface. The patch is tested on top of [2] with reproducer in [1]. Without this patch, kernel will emit error in 2-3 minutes. With this patch, after one hour, still no error. [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T [2] https://lore.kernel.org/bpf/[email protected] Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-25sysctl: add proc_dou8vec_minmax()Eric Dumazet1-0/+65
Networking has many sysctls that could fit in one u8. This patch adds proc_dou8vec_minmax() for this purpose. Note that the .extra1 and .extra2 fields are pointing to integers, because it makes conversions easier. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-26bpf: Undo ptr_to_map_key alu sanitation for nowDaniel Borkmann1-14/+0
Remove PTR_TO_MAP_KEY for the time being from being sanitized on pointer ALU through sanitize_ptr_alu() mainly for 3 reasons: 1) It's currently unused and not available from unprivileged. However that by itself is not yet a strong reason to drop the code. 2) Commit 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") implemented the sanitation not fully correct in that unlike stack or map_value pointer it doesn't probe whether the access to the map key /after/ the simulated ALU operation is still in bounds. This means that the generated mask can truncate the offset in the non-speculative domain whereas it should only truncate in the speculative domain. The verifier should instead reject such program as we do for other types. 3) Given the recent fixes from f232326f6966 ("bpf: Prohibit alu ops for pointer types not defining ptr_limit"), 10d2bb2e6b1d ("bpf: Fix off-by-one for area size in creating mask to left"), b5871dca250c ("bpf: Simplify alu_limit masking for pointer arithmetic") as well as 1b1597e64e1a ("bpf: Add sanity check for upper ptr_limit") the code changed quite a bit and the merge in efd13b71a3fa broke the PTR_TO_MAP_KEY case due to an incorrect merge conflict. Remove the relevant pieces for the time being and we can rework the PTR_TO_MAP_KEY case once everything settles. Fixes: efd13b71a3fa ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Daniel Borkmann <[email protected]>
2021-03-25Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller5-23/+16
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-24 The following pull-request contains BPF updates for your *net-next* tree. We've added 37 non-merge commits during the last 15 day(s) which contain a total of 65 files changed, 3200 insertions(+), 738 deletions(-). The main changes are: 1) Static linking of multiple BPF ELF files, from Andrii. 2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo. 3) Spelling fixes from various folks. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller28-221/+557
Signed-off-by: David S. Miller <[email protected]>
2021-03-25tracing: Update create_system_filter() kernel-doc commentQiujun Huang1-2/+3
commit f306cc82a93d ("tracing: Update event filters for multibuffer") added the parameter @tr for create_system_filter(). commit bb9ef1cb7d86 ("tracing: Change apply_subsystem_event_filter() paths to check file->system == dir") changed the parameter from @system to @dir. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-03-25tracing: A minor cleanup for create_system_filter()Qiujun Huang1-4/+3
The first two parameters should be reduced to one, as @tr is simply @dir->tr. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+69
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (hugetlb, kasan, gup, selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64, gcov, and mailmap" * emailed patches from Andrew Morton <[email protected]>: mailmap: update Andrey Konovalov's email address mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm: memblock: fix section mismatch warning again kfence: make compatible with kmemleak gcov: fix clang-11+ support ia64: fix format strings for err_inject ia64: mca: allocate early mca with GFP_ATOMIC squashfs: fix xattr id and id lookup sanity checks squashfs: fix inode lookup sanity checks z3fold: prevent reclaim/free race for headless pages selftests/vm: fix out-of-tree build mm/mmu_notifiers: ensure range_end() is paired with range_start() kasan: fix per-page tags for non-page_alloc pages hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25gcov: fix clang-11+ supportNick Desaulniers1-0/+69
LLVM changed the expected function signatures for llvm_gcda_start_file() and llvm_gcda_emit_function() in the clang-11 release. Users of clang-11 or newer may have noticed their kernels failing to boot due to a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y. Fix up the function signatures so calling these functions doesn't panic the kernel. Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041 Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nick Desaulniers <[email protected]> Reported-by: Prasad Sodagudi <[email protected]> Suggested-by: Nathan Chancellor <[email protected]> Reviewed-by: Fangrui Song <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Cc: <[email protected]> [5.4+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>