aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15sched: Disable -Wunused-but-set-variablePeter Zijlstra1-0/+4
The compilers can't deal with obvious DCE vs that warning, resulting in code like: if (0) { sched sched_statistics *stats; stats = __schedstats_from_se(se); ... } triggering the warning. Kill the warning to make the robots stop reporting this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-15sched: Add wrapper for get_wchan() to keep task blockedKees Cook1-0/+19
Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Russell King (Oracle) <[email protected]> [arm] Tested-by: Mark Rutland <[email protected]> [arm64] Link: https://lkml.kernel.org/r/[email protected]
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-29/+47
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <[email protected]>
2021-10-14pid: add pidfd_get_task() helperChristian Brauner1-0/+36
The number of system calls making use of pidfds is constantly increasing. Some of those new system calls duplicate the code to turn a pidfd into task_struct it refers to. Give them a simple helper for this. Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Cc: Vlastimil Babka <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Matthew Bobrowski <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jan Kara <[email protected]> Cc: Minchan Kim <[email protected]> Reviewed-by: Matthew Bobrowski <[email protected]> Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2021-10-14kernel/sched: Fix sched_fork() access an invalid sched_task_groupZhang Qiao2-22/+23
There is a small race between copy_process() and sched_fork() where child->sched_task_group point to an already freed pointer. parent doing fork() | someone moving the parent | to another cgroup -------------------------------+------------------------------- copy_process() + dup_task_struct()<1> parent move to another cgroup, and free the old cgroup. <2> + sched_fork() + __set_task_cpu()<3> + task_fork_fair() + sched_slice()<4> In the worst case, this bug can lead to "use-after-free" and cause panic as shown above: (1) parent copy its sched_task_group to child at <1>; (2) someone move the parent to another cgroup and free the old cgroup at <2>; (3) the sched_task_group and cfs_rq that belong to the old cgroup will be accessed at <3> and <4>, which cause a panic: [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0 [] Oops: 0000 [#1] SMP PTI [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G OE --------- - - 4.18.0.x86_64+ #1 [] RIP: 0010:sched_slice+0x84/0xc0 [] Call Trace: [] task_fork_fair+0x81/0x120 [] sched_fork+0x132/0x240 [] copy_process.part.5+0x675/0x20e0 [] ? __handle_mm_fault+0x63f/0x690 [] _do_fork+0xcd/0x3b0 [] do_syscall_64+0x5d/0x1d0 [] entry_SYSCALL_64_after_hwframe+0x65/0xca [] RIP: 0033:0x7f04418cd7e1 Between cgroup_can_fork() and cgroup_post_fork(), the cgroup membership and thus sched_task_group can't change. So update child's sched_task_group at sched_post_fork() and move task_fork() and __set_task_cpu() (where accees the sched_task_group) from sched_fork() to sched_post_fork(). Fixes: 8323f26ce342 ("sched: Fix race in task_group") Signed-off-by: Zhang Qiao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-14sched/topology: Remove unused numa_distance in cpu_attach_domain()Yicong Yang1-4/+0
numa_distance in cpu_attach_domain() is introduced in commit b5b217346de8 ("sched/topology: Warn when NUMA diameter > 2") to warn user when NUMA diameter > 2 as we'll misrepresent the scheduler topology structures at that time. This is fixed by Barry in commit 585b6d2723dc ("sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2") and numa_distance is unused now. So remove it. Signed-off-by: Yicong Yang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Barry Song <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-14sched/numa: Fix a few commentsBharata B Rao1-2/+2
Fix a few comments to help understand them better. Signed-off-by: Bharata B Rao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-14sched/numa: Remove the redundant member numa_group::fault_cpusBharata B Rao1-7/+5
numa_group::fault_cpus is actually a pointer to the region in numa_group::faults[] where NUMA_CPU stats are located. Remove this redundant member and use numa_group::faults[NUMA_CPU] directly like it is done for similar per-process numa fault stats. There is no functionality change due to this commit. Signed-off-by: Bharata B Rao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-14sched/numa: Replace hard-coded number by a define in numa_task_group()Bharata B Rao1-1/+2
While allocating group fault stats, task_numa_group() is using a hard coded number 4. Replace this by NR_NUMA_HINT_FAULT_STATS. No functionality change in this commit. Signed-off-by: Bharata B Rao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-10-14sched,livepatch: Use wake_up_if_idle()Peter Zijlstra1-1/+4
Make sure to prod idle CPUs so they call klp_update_patch_state(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Miroslav Benes <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Tested-by: Petr Mladek <[email protected]> Tested-by: Vasily Gorbik <[email protected]> # on s390 Link: https://lkml.kernel.org/r/[email protected]
2021-10-13tracing: Fix event probe removal from dynamic eventsSteven Rostedt (VMware)1-3/+51
When an event probe is to be removed via the API that created it via the dynamic events, an -ENOENT error is returned. This is because the removal of the event probe does not expect to see the event system and name that the event probe is attached to, even though that's part of the API to create it. As the removal of probes is to use the same API as they are created. In fact, the removal is not consistent with the kprobes and uprobes removal. Fix that by allowing various ways to remove the eprobe. The eprobe is created with: e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS] Have it get removed by echoing in the following into dynamic_events: # Remove all eprobes with NAME echo '-:NAME' >> dynamic_events # Remove a specific eprobe echo '-:GROUP/NAME' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events echo '-:NAME SYSTEM/EVENT' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Masami Hiramatsu <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-13tracing: in_irq() cleanupChangbin Du2-2/+2
Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-13Merge tag 'modules-for-v5.15-rc6' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules fix from Jessica Yu: - Build fix for cfi_init() when CONFIG_MODULE_UNLOAD=n * tag 'modules-for-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: fix clang CFI with MODULE_UNLOAD=n
2021-10-11Merge branch 'for-5.15-fixes' of ↵Linus Torvalds1-27/+29
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "All documentation / comment updates" * 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroupv2, docs: fix misinformation in "device controller" section cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem docs/cgroup: remove some duplicate words
2021-10-11Merge branch 'for-5.15-fixes' of ↵Linus Torvalds1-2/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "One patch to add a missing __printf annotation and the other to enable deferred printing for debug dumps to avoid deadlocks when triggered from some contexts (e.g. console drivers)" * 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix state-dump console deadlock workqueue: annotate alloc_workqueue() as printf
2021-10-11workqueue: fix state-dump console deadlockJohan Hovold1-2/+16
Console drivers often queue work while holding locks also taken in their console write paths, something which can lead to deadlocks on SMP when dumping workqueue state (e.g. sysrq-t or on suspend failures). For serial console drivers this could look like: CPU0 CPU1 ---- ---- show_workqueue_state(); lock(&pool->lock); <IRQ> lock(&port->lock); schedule_work(); lock(&pool->lock); printk(); lock(console_owner); lock(&port->lock); where workqueues are, for example, used to push data to the line discipline, process break signals and handle modem-status changes. Line disciplines and serdev drivers can also queue work on write-wakeup notifications, etc. Reworking every console driver to avoid queuing work while holding locks also taken in their write paths would complicate drivers and is neither desirable or feasible. Instead use the deferred-printk mechanism to avoid printing while holding pool locks when dumping workqueue state. Note that there are a few WARN_ON() assertions in the workqueue code which could potentially also trigger a deadlock. Hopefully the ongoing printk rework will provide a general solution for this eventually. This was originally reported after a lockdep splat when executing sysrq-t with the imx serial driver. Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t") Cc: [email protected] # 4.0 Reported-by: Fabio Estevam <[email protected]> Tested-by: Fabio Estevam <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Reviewed-by: John Ogness <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2021-10-11dma-debug: fix sg checks in debug_dma_map_sg()Gerald Schaefer1-6/+6
The following warning occurred sporadically on s390: DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or rodata [addr=0000000048cc5e2f] [len=131072] WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 check_for_illegal_area+0xa8/0x138 It is a false-positive warning, due to broken logic in debug_dma_map_sg(). check_for_illegal_area() checks for overlay of sg elements with kernel text or rodata. It is called with sg_dma_len(s) instead of s->length as parameter. After the call to ->map_sg(), sg_dma_len() will contain the length of possibly combined sg elements in the DMA address space, and not the individual sg element length, which would be s->length. The check will then use the physical start address of an sg element, and add the DMA length for the overlap check, which could result in the false warning, because the DMA length can be larger than the actual single sg element length. In addition, the call to check_for_illegal_area() happens in the iteration over mapped_ents, which will not include all individual sg elements if any of them were combined in ->map_sg(). Fix this by using s->length instead of sg_dma_len(s). Also put the call to check_for_illegal_area() in a separate loop, iterating over all the individual sg elements ("nents" instead of "mapped_ents"). While at it, as suggested by Robin Murphy, also move check_for_stack() inside the new loop, as it is similarly concerned with validating the individual sg elements. Link: https://lore.kernel.org/lkml/[email protected] Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor") Signed-off-by: Gerald Schaefer <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2021-10-11dma-mapping: fix the kerneldoc for dma_map_sgtable()Logan Gunthorpe1-6/+6
htmldocs began producing the following warnings: kernel/dma/mapping.c:256: WARNING: Definition list ends without a blank line; unexpected unindent. kernel/dma/mapping.c:257: WARNING: Bullet list ends without a blank line; unexpected unindent. Reformatting the list without hyphens fixes the warnings and produces both a readable text and HTML output. Fixes: fffe3cc8c219 ("dma-mapping: allow map_sg() ops to return negative error code") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2021-10-10tracing: Fix missing * in comment blockColin Ian King1-1/+1
There is a missing * in a comment block, add it in. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-10tracing: Fix memory leak in eprobe_register()Vamshi K Sthambamkadi1-0/+7
kmemleak report: unreferenced object 0xffff900a70ec7ec0 (size 32): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): c8 31 23 45 0a 90 ff ff 40 85 c7 6e 0a 90 ff ff .1#[email protected].... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<0000000088b8124b>] eprobe_register+0x1e3/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff900a56bbf280 (size 128): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): ff ff ff ff ff ff ff ff 00 00 00 00 01 00 00 00 ................ 80 69 3b b2 ff ff ff ff 20 69 3b b2 ff ff ff ff .i;..... i;..... backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<00000000c4e90fad>] eprobe_register+0x1fc/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae In new_eprobe_trigger(), allocated edata and trigger variables are never freed. To fix, free memory in disable_eprobe(). Link: https://lkml.kernel.org/r/20211008071802.GA2098@cosmos Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Vamshi K Sthambamkadi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-10ftrace: Add unit test for removing trace functionCarles Pey1-0/+34
A self test is provided for the trace function removal functionality. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Carles Pey <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-08ftrace: Cleanup ftrace_dyn_arch_init()Weizhao Ouyang1-0/+5
Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common ftrace_dyn_arch_init() to cleanup them. Link: https://lkml.kernel.org/r/[email protected] Acked-by: Heiko Carstens <[email protected]> (s390) Acked-by: Helge Deller <[email protected]> (parisc) Signed-off-by: Weizhao Ouyang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-08tracing: Disable "other" permission bits in the tracefs filesSteven Rostedt (VMware)15-96/+103
When building the files in the tracefs file system, do not by default set any permissions for OTH (other). This will make it easier for admins who want to define a group for accessing tracefs and not having to first disable all the permission bits for "other" in the file system. As tracing can leak sensitive information, it should never by default allowing all users access. An admin can still set the permission bits for others to have access, which may be useful for creating a honeypot and seeing who takes advantage of it and roots the machine. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-08bpftool: Add install-bin target to install binary onlyQuentin Monnet1-1/+1
With "make install", bpftool installs its binary and its bash completion file. Usually, this is what we want. But a few components in the kernel repository (namely, BPF iterators and selftests) also install bpftool locally before using it. In such a case, bash completion is not necessary and is just a useless build artifact. Let's add an "install-bin" target to bpftool, to offer a way to install the binary only. Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-10-08bpf: iterators: Install libbpf headers when buildingQuentin Monnet1-13/+25
API headers from libbpf should not be accessed directly from the library's source directory. Instead, they should be exported with "make install_headers". Let's make sure that bpf/preload/iterators/Makefile installs the headers properly when building. Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-10-08bpf: preload: Install libbpf headers when buildingQuentin Monnet1-5/+20
API headers from libbpf should not be accessed directly from the library's source directory. Instead, they should be exported with "make install_headers". Let's make sure that bpf/preload/Makefile installs the headers properly when building. Note that we declare an additional dependency for iterators/iterators.o: having $(LIBBPF_A) as a dependency to "$(obj)/bpf_preload_umd" is not sufficient, as it makes it required only at the linking step. But we need libbpf to be compiled, and in particular its headers to be exported, before we attempt to compile iterators.o. The issue would not occur before this commit, because libbpf's headers were not exported and were always available under tools/lib/bpf. Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-10-08coredump: Limit coredumps to a single thread groupEric W. Biederman2-10/+4
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every process that shares the mm is killed. In the case of vfork this looks like a real world problem. Consider the following well defined sequence. if (vfork() == 0) { execve(...); _exit(EXIT_FAILURE); } If a signal that generates a core dump is received after vfork but before the execve changes the mm the process that called vfork will also be killed (as the mm is shared). Similarly if the execve fails after the point of no return the kernel delivers SIGSEGV which will kill both the exec'ing process and because the mm is shared the process that called vfork as well. As far as I can tell this behavior is a violation of people's reasonable expectations, POSIX, and is unnecessarily fragile when the system is low on memory. Solve this by making a userspace visible change to only kill a single process/thread group. This is possible because Jann Horn recently modified[1] the coredump code so that the mm can safely be modified while the coredump is happening. With LinuxThreads long gone I don't expect anyone to have a notice this behavior change in practice. To accomplish this move the core_state pointer from mm_struct to signal_struct, which allows different thread groups to coredump simultatenously. In zap_threads remove the work to kill anything except for the current thread group. v2: Remove core_state from the VM_BUG_ON_MM print to fix compile failure when CONFIG_DEBUG_VM is enabled. Reported-by: Stephen Rothwell <[email protected]> [1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133 Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-7/+44
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2021-10-07Merge branches 'fixes.2021.10.07a', 'scftorture.2021.09.16a', ↵Paul E. McKenney6-120/+146
'tasks.2021.09.15a', 'torture.2021.09.13b' and 'torturescript.2021.09.16a' into HEAD fixes.2021.10.07a: Miscellaneous fixes. scftorture.2021.09.16a: smp_call_function torture-test updates. tasks.2021.09.15a: Tasks-trace RCU updates. torture.2021.09.13b: Other torture-test updates. torturescript.2021.09.16a: Torture-test scripting updates.
2021-10-07rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstrPeter Zijlstra1-1/+1
vmlinux.o: warning: objtool: rcu_nmi_enter()+0x36: call to __kasan_check_read() leaves .noinstr.text section noinstr cannot have atomic_*() functions in because they're explicitly annotated, use arch_atomic_*(). Fixes: 2be57f732889 ("rcu: Weaken ->dynticks accesses and updates") Reported-by: Stephen Rothwell <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-10-07rcu: Always inline rcu_dynticks_task*_{enter,exit}()Peter Zijlstra1-4/+4
RCU managed to grow a few noinstr violations: vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x0: call to rcu_dynticks_task_trace_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0xe: call to rcu_dynticks_task_trace_exit() leaves .noinstr.text section Fix them by adding __always_inline to the relevant trivial functions. Also replace the noinstr with __always_inline for the existing rcu_dynticks_task_*() functions since noinstr would force noinline them, even when empty, which seems silly. Fixes: 7d0c9c50c5a1 ("rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built") Reported-by: Stephen Rothwell <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-10-07Merge tag 'net-5.15-rc5' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from xfrm, bpf, netfilter, and wireless. Current release - regressions: - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new value in the middle of an enum - unix: fix an issue in unix_shutdown causing the other end read/write failures - phy: mdio: fix memory leak Current release - new code bugs: - mlx5e: improve MQPRIO resiliency against bad configs Previous releases - regressions: - bpf: fix integer overflow leading to OOB access in map element pre-allocation - stmmac: dwmac-rk: fix ethernet on rk3399 based devices - netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 - brcmfmac: revert using ISO3166 country code and 0 rev as fallback - i40e: fix freeing of uninitialized misc IRQ vector - iavf: fix double unlock of crit_lock Previous releases - always broken: - bpf, arm: fix register clobbering in div/mod implementation - netfilter: nf_tables: correct issues in netlink rule change event notifications - dsa: tag_dsa: fix mask for trunked packets - usb: r8152: don't resubmit rx immediately to avoid soft lockup on device unplug - i40e: fix endless loop under rtnl if FW fails to correctly respond to capability query - mlx5e: fix rx checksum offload coexistence with ipsec offload - mlx5: force round second at 1PPS out start time and allow it only in supported clock modes - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable sequence Misc: - xfrm: slightly rejig the new policy uAPI to make it less cryptic" * tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) net: prefer socket bound to interface when not in VRF iavf: fix double unlock of crit_lock i40e: Fix freeing of uninitialized misc IRQ vector i40e: fix endless loop under rtnl dt-bindings: net: dsa: marvell: fix compatible in example ionic: move filter sync_needed bit set gve: report 64bit tx_bytes counter from gve_handle_report_stats() gve: fix gve_get_stats() rtnetlink: fix if_nlmsg_stats_size() under estimation gve: Properly handle errors in gve_assign_qpl gve: Avoid freeing NULL pointer gve: Correct available tx qpl check unix: Fix an issue in unix_shutdown causing the other end read/write failures net: stmmac: trigger PCS EEE to turn off on link down net: pcs: xpcs: fix incorrect steps on disable EEE netlink: annotate data races around nlk->bound net: pcs: xpcs: fix incorrect CL37 AN sequence net: sfp: Fix typo in state machine debug string net/sched: sch_taprio: properly cancel timer from taprio_destroy() net: bridge: fix under estimation in br_get_linkxstats_size() ...
2021-10-07Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-1/+2
Daniel Borkmann says: ==================== pull-request: bpf 2021-10-07 We've added 7 non-merge commits during the last 8 day(s) which contain a total of 8 files changed, 38 insertions(+), 21 deletions(-). The main changes are: 1) Fix ARM BPF JIT to preserve caller-saved regs for DIV/MOD JIT-internal helper call, from Johan Almbladh. 2) Fix integer overflow in BPF stack map element size calculation when used with preallocation, from Tatsuhiko Yasumatsu. 3) Fix an AF_UNIX regression due to added BPF sockmap support related to shutdown handling, from Jiang Wang. 4) Fix a segfault in libbpf when generating light skeletons from objects without BTF, from Kumar Kartikeya Dwivedi. 5) Fix a libbpf memory leak in strset to free the actual struct strset itself, from Andrii Nakryiko. 6) Dual-license bpf_insn.h similarly as we did for libbpf and bpftool, with ACKs from all contributors, from Luca Boccassi. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-10-07tracing: Initialize upper and lower vars in pid_list_refill_irq()Steven Rostedt (VMware)1-2/+2
The upper and lower variables are set as link lists to add into the sparse array. If they are NULL, after the needed allocations are done, then there is nothing to add. But they need to be initialized to NULL for this to work. Link: https://lore.kernel.org/all/[email protected]/ Fixes: 8d6e90983ade ("tracing: Create a sparse bitmask for pid filtering") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-07tracing: Fix missing osnoise tracer on max_latencyJackie Liu1-7/+4
The compiler warns when the data are actually unused: kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function] 1712 | static void trace_create_maxlat_file(struct trace_array *tr, | ^~~~~~~~~~~~~~~~~~~~~~~~ [Why] CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y gcc report warns. [How] Now trace_create_maxlat_file will only take effect when CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after adding osnoise trace, it also needs to take effect. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Cc: Daniel Bristot de Oliveira <[email protected]> Suggested-by: Steven Rostedt <[email protected]> Reviewed-by: Daniel Bristot de Oliveira <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Signed-off-by: Jackie Liu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-10-07sched: Simplify wake_up_*idle*()Peter Zijlstra2-12/+8
Simplify and make wake_up_if_idle() more robust, also don't iterate the whole machine with preempt_disable() in it's caller: wake_up_all_idle_cpus(). This prepares for another wake_up_if_idle() user that needs a full do_idle() cycle. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Tested-by: Vasily Gorbik <[email protected]> # on s390 Link: https://lkml.kernel.org/r/[email protected]
2021-10-07sched,livepatch: Use task_call_func()Peter Zijlstra1-46/+44
Instead of frobbing around with scheduler internals, use the shiny new task_call_func() interface. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Miroslav Benes <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Tested-by: Petr Mladek <[email protected]> Tested-by: Vasily Gorbik <[email protected]> # on s390 Link: https://lkml.kernel.org/r/[email protected]
2021-10-07sched,rcu: Rework try_invoke_on_locked_down_task()Peter Zijlstra3-13/+13
Give try_invoke_on_locked_down_task() a saner name and have it return an int so that the caller might distinguish between different reasons of failure. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Tested-by: Vasily Gorbik <[email protected]> # on s390 Link: https://lkml.kernel.org/r/[email protected]
2021-10-07sched: Improve try_invoke_on_locked_down_task()Peter Zijlstra1-24/+39
Clarify and tighten try_invoke_on_locked_down_task(). Basically the function calls @func under task_rq_lock(), except it avoids taking rq->lock when possible. This makes calling @func unconditional (the function will get renamed in a later patch to remove the try). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Tested-by: Vasily Gorbik <[email protected]> # on s390 Link: https://lkml.kernel.org/r/[email protected]
2021-10-07futex: Implement sys_futex_waitv()André Almeida4-0/+336
Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timeout, clockid_t clockid) struct futex_waitv { __u64 val; __u64 uaddr; __u32 flags; __u32 __reserved; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timeout is an optional absolute timeout value for the operation. This syscall supports only 64bit sized timeout structs. The flags argument of the syscall should be empty, but it can be used for future extensions. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. __reserved is used for explicit padding and should be 0, but it might be used for future extensions. If the userspace uses 32-bit pointers, it should make sure to explicitly cast it when assigning to waitv::uaddr. Returns the array index of one of the woken futexes. There’s no given information of how many were woken, or any particular attribute of it (if it’s the first woken, if it is of the smaller index...). Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Simplify double_lock_hb()Peter Zijlstra1-8/+6
We need to make sure that all requeue operations take the hash bucket locks in the same order to avoid deadlock. Simplify the current double_lock_hb implementation by making sure hb1 is always the "smallest" bucket to avoid extra checks. [André: Add commit description] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Split out wait/wakePeter Zijlstra4-536/+543
Move the wait/wake bits into their own file. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Split out requeuePeter Zijlstra4-963/+979
Move all the requeue bits into their own file. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename mark_wake_futex()Peter Zijlstra1-5/+5
In order to prepare introducing these symbols into the global namespace; rename: s/mark_wake_futex/futex_wake_mark/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename: match_futex()Peter Zijlstra1-12/+12
In order to prepare introducing these symbols into the global namespace; rename: s/match_futex/futex_match/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename: hb_waiter_{inc,dec,pending}()Peter Zijlstra1-17/+17
In order to prepare introducing these symbols into the global namespace; rename them: s/hb_waiters_/futex_&/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Split out PI futexPeter Zijlstra4-1405/+1449
Move the PI futex implementation into it's own file. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename: {get,cmpxchg}_futex_value_locked()Peter Zijlstra1-15/+15
In order to prepare introducing these symbols into the global namespace; rename them: s/\<\([^_ ]*\)_futex_value_locked/futex_\1_value_locked/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename hash_futex()Peter Zijlstra1-11/+11
In order to prepare introducing these symbols into the global namespace; rename: s/hash_futex/futex_hash/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07futex: Rename __unqueue_futex()Peter Zijlstra1-7/+7
In order to prepare introducing these symbols into the global namespace; rename: s/__unqueue_futex/__futex_unqueue/g Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: André Almeida <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]