| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add helpers to wrap the get_fs/set_fs magic for undoing any damange done
by set_fs(KERNEL_DS). There is no real functional benefit, but this
documents the intent of these calls better, and will allow stubbing the
functions out easily for kernels builds that do not allow address space
overrides in the future.
[[email protected]: drop two incorrect hunks, fix a commit log typo]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Originally kthread_create_on_cpu() parked and woke up the new thread.
However, since commit a65d40961dc7 ("kthread/smpboot: do not park in
kthread_create_on_cpu()") this is no longer the case. This patch removes
the comment that has been left behind and is now incorrect / stale.
Fixes: a65d40961dc7 ("kthread/smpboot: do not park in kthread_create_on_cpu()")
Signed-off-by: Ilias Stamatis <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:
CPU0 CPU1
flush_tlb_mm(mm) use_mm(mm)
<send-IPI>
tsk->active_mm = mm;
<IPI>
if (tsk->active_mm == mm)
// flush TLBs
</IPI>
switch_mm(old_mm,mm,tsk);
Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.
Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().
Of the (SMP) architectures that have IPI based TLB invalidate:
Alpha - checks active_mm
ARC - ASID specific
IA64 - checks active_mm
MIPS - ASID specific flush
OpenRISC - shoots down world
PARISC - shoots down world
SH - ASID specific
SPARC - ASID specific
x86 - N/A
xtensa - checks active_mm
So at the very least Alpha, IA64 and Xtensa are suspect.
On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.
[[email protected]: add comment]
Reported-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
|
|
Better describe what these functions do.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is a kernel enhancement that configures the cpu affinity of kernel
threads via kernel boot option nohz_full=.
When this option is specified, the cpumask is immediately applied upon
kthread launch. This does not affect kernel threads that specify cpu
and node.
This allows CPU isolation (that is not allowing certain threads
to execute on certain CPUs) without using the isolcpus=domain parameter,
making it possible to enable load balancing on such CPUs
during runtime (see kernel-parameters.txt).
Note-1: this is based off on Wind River's patch at
https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch
Difference being that this patch is limited to modifying kernel thread
cpumask. Behaviour of other threads can be controlled via cgroups or
sched_setaffinity.
Note-2: Wind River's patch was based off Christoph Lameter's patch at
https://lwn.net/Articles/565932/ with the only difference being
the kernel parameter changed from kthread to kthread_cpus.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Next patch will switch unbound kernel threads mask to
housekeeping_cpumask(), a subset of cpu_possible_mask. So in order to
ease bisection, lets first switch kthreads default affinity from
cpu_all_mask to cpu_possible_mask.
It looks safe to do so as cpu_possible_mask seem to be initialized
at setup_arch() time, way before kthreadd is created.
Suggested-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Merge some more updates from Andrew Morton:
- various hotfixes and minor things
- hch's use_mm/unuse_mm clearnups
Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.
* emailed patches from Andrew Morton <[email protected]>:
kernel: set USER_DS in kthread_use_mm
kernel: better document the use_mm/unuse_mm API contract
kernel: move use_mm/unuse_mm to kthread.c
kernel: move use_mm/unuse_mm to kthread.c
stacktrace: cleanup inconsistent variable type
lib: test get_count_order/long in test_bitops.c
mm: add comments on pglist_data zones
ocfs2: fix spelling mistake and grammar
mm/debug_vm_pgtable: fix kernel crash by checking for THP support
lib: fix bitmap_parse() on 64-bit big endian archs
checkpatch: correct check for kernel parameters doc
nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
lib/lz4/lz4_decompress.c: document deliberate use of `&'
kcov: check kcov_softirq in kcov_remote_stop()
scripts/spelling: add a few more typos
khugepaged: selftests: fix timeout condition in wait_for_scan()
|
|
Some architectures like arm64 and s390 require USER_DS to be set for
kernel threads to access user address space, which is the whole purpose of
kthread_use_mm, but other like x86 don't. That has lead to a huge mess
where some callers are fixed up once they are tested on said
architectures, while others linger around and yet other like io_uring try
to do "clever" optimizations for what usually is just a trivial asignment
to a member in the thread_struct for most architectures.
Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the
previous value instead.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Cc: Zhi Wang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).
Also give the functions a kthread_ prefix to better document the use case.
[[email protected]: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: powerpc/vas: fix up for {un}use_mm() rename]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]> [usb]
Acked-by: Haren Myneni <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Cc: Zhi Wang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "improve use_mm / unuse_mm", v2.
This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.
This patch (of 3):
Use the proper API instead.
Link: http://lkml.kernel.org/r/[email protected]
These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward. Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Cc: Zhi Wang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It's handy to keep the kthread_fn just as a unique cookie to identify
classes of kthreads. E.g. if you can verify that a given task is
running your thread_fn, then you may know what sort of type kthread_data
points to.
We'll use this in nfsd to pass some information into the vfs. Note it
will need kthread_data() exported too.
Original-patch-by: Tejun Heo <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
|
|
when we create a kthread with ktrhead_create_on_cpu(),the child thread
entry is ktread.c:ktrhead() which will be preempted by the parent after
call complete(done) while schedule() is not called yet,then the parent
will call wait_task_inactive(child) but the child is still on the runqueue,
so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of
time,especially on startup.
parent child
ktrhead_create_on_cpu()
wait_fo_completion(&done) -----> ktread.c:ktrhead()
|----- complete(done);--wakeup and preempted by parent
kthread_bind() <------------| |-> schedule();--dequeue here
wait_task_inactive(child) |
schedule_hrtimeout(1 jiffy) -|
So we hope the child just wakeup parent but not preempted by parent, and the
child is going to call schedule() soon,then the parent will not call
schedule_hrtimeout(1 jiffy) as the child is already dequeue.
The same issue for ktrhead_park()&&kthread_parkme().
This patch can save 120ms on rk312x startup with CONFIG_HZ=300.
Signed-off-by: Liang Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:
kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static?
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
kthread.h can't be included in psi_types.h because it creates a circular
inclusion with kthread.h eventually including psi_types.h and
complaining on kthread structures not being defined because they are
defined further in the kthread.h. Resolve this by removing psi_types.h
inclusion from the headers included from kthread.h.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Merge misc updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <[email protected]>: (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- refcount conversions
- Solve the rq->leaf_cfs_rq_list can of worms for real.
- improve power-aware scheduling
- add sysctl knob for Energy Aware Scheduling
- documentation updates
- misc other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
kthread: Do not use TIMER_IRQSAFE
kthread: Convert worker lock to raw spinlock
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
sched/fair: Remove unused 'sd' parameter from select_idle_smt()
sched/wait: Use freezable_schedule() when possible
sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
sched/fair: Explain LLC nohz kick condition
sched/fair: Simplify nohz_balancer_kick()
sched/topology: Fix percpu data types in struct sd_data & struct s_data
sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
sched/fair: Fix O(nr_cgroups) in the load balancing path
sched/fair: Optimize update_blocked_averages()
sched/fair: Fix insertion in rq->leaf_cfs_rq_list
sched/fair: Add tmp_alone_branch assertion
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
sched/fair: Update scale invariance of PELT
sched/fair: Move the rq_of() helper function
sched/core: Convert task_struct.stack_refcount to refcount_t
...
|
|
Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.
1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"
This patch (of 2):
At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Jeff Kirsher <[email protected]> [ixgbe]
Acked-by: Jens Axboe <[email protected]> [mtip32xx]
Acked-by: Vinod Koul <[email protected]> [dmaengine.c]
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Doug Ledford <[email protected]> [drivers/infiniband]
Cc: Joseph Qi <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread:
initial support for delayed kthread work") which modelled the delayed
kthread code after workqueue's code. The workqueue code requires the flag
TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's
delay timer since all operations occur under a lock.
Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup()
for initialisation purpose which is the official function.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.
This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.
Reported-by: Steffen Trumtrar <[email protected]>
Reported-by: Tim Sander <[email protected]>
Signed-off-by: Julia Cartwright <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Steffen Trumtrar <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Guenter Roeck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
kthread_should_park() is used to check if the calling kthread ('current')
should park, but there is no function to check whether an arbitrary kthread
should be parked. The latter is required to plug a CPU hotplug race vs. a
parking ksoftirqd thread.
The new __kthread_should_park() receives a task_struct as parameter to
check if the corresponding kernel thread should be parked.
Call __kthread_should_park() from kthread_should_park() to avoid code
duplication.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: Stephen Boyd <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Cleanup and improvement of NUMA balancing
- Refactoring and improvements to the PELT (Per Entity Load Tracking)
code
- Watchdog simplification and related cleanups
- The usual pile of small incremental fixes and improvements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
watchdog: Reduce message verbosity
stop_machine: Reflow cpu_stop_queue_two_works()
sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
sched/numa: Use group_weights to identify if migration degrades locality
sched/numa: Update the scan period without holding the numa_group lock
sched/numa: Remove numa_has_capacity()
sched/numa: Modify migrate_swap() to accept additional parameters
sched/numa: Remove unused task_capacity from 'struct numa_stats'
sched/numa: Skip nodes that are at 'hoplimit'
sched/debug: Reverse the order of printing faults
sched/numa: Use task faults only if numa_group is not yet set up
sched/numa: Set preferred_node based on best_cpu
sched/numa: Simplify load_too_imbalanced()
sched/numa: Evaluate move once per node
sched/numa: Remove redundant field
sched/debug: Show the sum wait time of a task group
sched/fair: Remove #ifdefs from scale_rt_capacity()
sched/core: Remove get_cpu() from sched_fork()
sched/cpufreq: Clarify sugov_get_util()
sched/sysctl: Remove unused sched_time_avg_ms sysctl
...
|
|
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Snild Dolkow <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Oleg explains the reason we could hit park+park is that
smpboot_update_cpumask_percpu_thread()'s
for_each_cpu_and(cpu, &tmp, cpu_online_mask)
smpboot_park_kthread();
turns into:
for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
smpboot_park_kthread();
on UP, ignoring the mask. But since we just completely removed that
function, this is no longer relevant.
So revert commit:
b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread")
Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Gaurav reports that commit:
85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
isn't working for him. Because of the following race:
> controller Thread CPUHP Thread
> takedown_cpu
> kthread_park
> kthread_parkme
> Set KTHREAD_SHOULD_PARK
> smpboot_thread_fn
> set Task interruptible
>
>
> wake_up_process
> if (!(p->state & state))
> goto out;
>
> Kthread_parkme
> SET TASK_PARKED
> schedule
> raw_spin_lock(&rq->lock)
> ttwu_remote
> waiting for __task_rq_lock
> context_switch
>
> finish_lock_switch
>
>
>
> Case TASK_PARKED
> kthread_park_complete
>
>
> SET Running
Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
handling is buggered because the TASK_DEAD thing is done with
preemption disabled, the current code can still complete early on
preemption :/
So basically revert that earlier fix and go with a variant of the
alternative mentioned in the commit. Promote TASK_PARKED to special
state to avoid the store-store issue on task->state leading to the
WARN in kthread_unpark() -> __kthread_bind().
But in addition, add wait_task_inactive() to kthread_park() to ensure
the task really is PARKED when we return from kthread_park(). This
avoids the whole kthread still gets migrated nonsense -- although it
would be really good to get this done differently.
Reported-by: Gaurav Kohli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The following commit:
85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
added a WARN() in the case where we call kthread_park() on an already
parked thread, because the old code wasn't doing the right thing there
and it wasn't at all clear that would happen.
It turns out, this does in fact happen, so we have to deal with it.
Instead of potentially returning early, also wait for the completion.
This does however mean we have to use complete_all() and re-initialize
the completion on re-use.
Reported-by: LKP <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Even with the wait-loop fixed, there is a further issue with
kthread_parkme(). Upon hotplug, when we do takedown_cpu(),
smpboot_park_threads() can return before all those threads are in fact
blocked, due to the placement of the complete() in __kthread_parkme().
When that happens, sched_cpu_dying() -> migrate_tasks() can end up
migrating such a still runnable task onto another CPU.
Normally the task will have hit schedule() and gone to sleep by the
time we do kthread_unpark(), which will then do __kthread_bind() to
re-bind the task to the correct CPU.
However, when we loose the initial TASK_PARKED store to the concurrent
wakeup issue described previously, do the complete(), get migrated, it
is possible to either:
- observe kthread_unpark()'s clearing of SHOULD_PARK and terminate
the park and set TASK_RUNNING, or
- __kthread_bind()'s wait_task_inactive() to observe the competing
TASK_RUNNING store.
Either way the WARN() in __kthread_bind() will trigger and fail to
correctly set the CPU affinity.
Fix this by only issuing the complete() when the kthread has scheduled
out. This does away with all the icky 'still running' nonsense.
The alternative is to promote TASK_PARKED to a special state, this
guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING
and we'll end up doing the right thing, but this preserves the whole
icky business of potentially migating the still runnable thing.
Reported-by: Gaurav Kohli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Gaurav reported a problem with __kthread_parkme() where a concurrent
try_to_wake_up() could result in competing stores to ->state which,
when the TASK_PARKED store got lost bad things would happen.
The comment near set_current_state() actually mentions this competing
store, but only mentions the case against TASK_RUNNING. This same
store, with different timing, can happen against a subsequent !RUNNING
store.
This normally is not a problem, because as per that same comment, the
!RUNNING state store is inside a condition based wait-loop:
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!need_sleep)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous
(concurrent) wakeup, the schedule() will NO-OP and we'll go around the
loop once more.
The problem here is that the TASK_PARKED store is not inside the
KTHREAD_SHOULD_PARK condition wait-loop.
There is a genuine issue with sleeps that do not have a condition;
this is addressed in a subsequent patch.
Reported-by: Gaurav Kohli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:
perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
The now unused macros are also dropped from include/linux/timer.h.
Signed-off-by: Kees Cook <[email protected]>
|
|
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
|
|
kthread() could bail out early before we initialize blkcg_css (if the
kthread is killed very early. Please see xchg() statement in kthread()),
which confuses free_kthread_struct. Instead of moving the blkcg_css
initialization early, we simply zero the whole 'self' data structure,
which doesn't sound much overhead.
Reported-by: syzbot <[email protected]>
Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info")
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In preparation for unconditionally passing the struct timer_list pointer
to all timer callbacks, switch kthread to use from_timer() and pass the
timer pointer explicitly.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: [email protected]
Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Wim Van Sebroeck <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ursula Braun <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Harish Patil <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Manish Chopra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: Heiko Carstens <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Julian Wiedmann <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Mark Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: "Martin K. Petersen" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Stefan Richter <[email protected]>
Cc: Michael Reed <[email protected]>
Cc: [email protected]
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: Sudip Mukherjee <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The code is only for blkcg not for all cgroups
Fixes: d4478e92d618 ("block/loop: make loop cgroup aware")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.
Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If the worker thread continues getting work, it will hog the cpu and rcu
stall complains. Make it a good citizen. This is triggered in a loop
block device test.
Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com
Signed-off-by: Shaohua Li <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
non-root cgroups
Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator. Once the new kthread starts
running, it initializes itself and wakes up the creator. The creator
then can further configure the kthread and then let it start doing its
job by waking it up.
In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland. When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity. This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.
Unfortunately, the cgroup side of protection is racy. While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.
This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one. Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.
This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland. kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed. The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.
It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side. Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.
v2: Switch to a simpler implementation using a new task_struct bit
field suggested by Oleg.
Signed-off-by: Tejun Heo <[email protected]>
Suggested-by: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Reported-and-debugged-by: Chris Mason <[email protected]>
Cc: [email protected] # v4.3+ (we can't close the race on < v4.3)
Signed-off-by: Tejun Heo <[email protected]>
|
|
<linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
<uapi/linux/sched/types.h>
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.
Create empty placeholder header that maps to <linux/types.h>.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Currently CONFIG_TIMER_STATS exposes process information across namespaces:
kernel/time/timer_list.c print_timer():
SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
/proc/timer_list:
#11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: John Stultz <[email protected]>
Cc: Nicolas Pitre <[email protected]>
Cc: [email protected]
Cc: Lai Jiangshan <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Xing Gao <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Jessica Frazelle <[email protected]>
Cc: [email protected]
Cc: Nicolas Iooss <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: Arjan van de Ven <[email protected]>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
When commit fbae2d44aa1d ("kthread: add kthread_create_worker*()")
introduced some kthread_create_...() functions which were taking
printf-like parametter, it introduced __printf attributes to some
functions (e.g. kthread_create_worker()). Nevertheless some new
functions were forgotten (they have been detected thanks to
-Wmissing-format-attribute warning flag).
Add the missing __printf attributes to the newly-introduced functions in
order to detect formatting issues at build-time with -Wformat flag.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicolas Iooss <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this
only makes sense if this kthread can be parked/unparked by cpuhp code.
kthread workers never call kthread_parkme() so this has no effect.
Change __kthread_create_worker() to simply call kthread_bind(task, cpu).
The very fact that kthread_create_on_cpu() doesn't accept a generic fmt
shows that it should not be used outside of smpboot.c.
Now, the only reason we can not unexport this helper and move it into
smpboot.c is that it sets kthread->cpu and struct kthread is not exported.
And the only reason we can not kill kthread->cpu is that kthread_unpark()
is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can
not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu).
Signed-off-by: Oleg Nesterov <[email protected]>
Tested-by: Petr Mladek <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Chunming Zhou <[email protected]>
Cc: Roman Pen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Now that to_kthread() is always validm change kthread_park() and
kthread_unpark() to use it and kill to_live_kthread().
The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set
then the task has called complete(&self->parked) and there the function
cannot race against a concurrent kthread_stop() and exit.
kthread_park() is more tricky, because its semantics are not well
defined. It returns -ENOSYS if the thread exited but this can never happen
and as Roman pointed out kthread_park() can obviously block forever if it
would race with the exiting kthread.
The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c)
is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads()
clears *ht->store, smpboot_park_thread() checks it is not NULL under the same
smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other
callers are fine too.
But it has two more users:
- watchdog_park_threads():
The code is actually correct, get_online_cpus() ensures that
kthread_park() can't race with itself (note that kthread_park() can't
handle this race correctly), but it should not use kthread_park()
directly.
- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use
kthread_park() either.
kthread_park() must not be called after amd_sched_fini() which does
kthread_stop(), otherwise even to_live_kthread() is not safe because
task_struct can be already freed and sched->thread can point to nowhere.
The usage of kthread_park/unpark should either be restricted to core code
which is properly protected against the exit race or made more robust so it
is safe to use it in drivers.
To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now.
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Chunming Zhou <[email protected]>
Cc: Roman Pen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
kthread_stop() had to use to_live_kthread() simply because it was not
possible to access kthread->exited after the exiting task clears
task_struct->vfork_done. Now that to_kthread() is always valid,
wake_up_process() + wait_for_completion() can be done
ununconditionally. It's not an issue anymore if the task has already issued
complete_vfork_done() or died.
The exiting task can get the spurious wakeup after mm_release() but this is
possible without this change too and is fine; do_task_dead() ensures that
this can't make any harm.
As a further enhancement this could be converted to task_work_add() later,
so ->vfork_done can be avoided completely.
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Chunming Zhou <[email protected]>
Cc: Roman Pen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
to_live_kthread() function"
This reverts commit 23196f2e5f5d810578a772785807dcdc2b9fdce9.
Now that struct kthread is kmalloc'ed and not longer on the task stack
there is no need anymore to pin the stack.
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Chunming Zhou <[email protected]>
Cc: Roman Pen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
commit 23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() /
put_task_stack() in to_live_kthread() function" is a workaround for the
fragile design of struct kthread being allocated on the task stack.
struct kthread in its current form should be removed, but this needs
cleanups outside of kthread.c.
As a first step move struct kthread away from the task stack by making it
kmalloc'ed. This allows to access kthread.exited without the magic of
trying to pin task stack and the try logic in to_live_kthread().
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Chunming Zhou <[email protected]>
Cc: Roman Pen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.
It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.
I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.
Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.
This patch implements kthread_mod_delayed_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.
A very special case is when the work is being canceled at the same time.
It might happen because of the regular kthread_cancel_delayed_work_sync()
or by another kthread_mod_delayed_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.
This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations. Therefore the implementation might be easier.
First, we check if a worker is assigned. If not, the work has newer been
queued after it was initialized.
Second, we take the worker lock. It must be the right one. The work must
not be assigned to another worker unless it is initialized in between.
Third, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running. We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback. In the meantime, we set work->canceling counter to
avoid any queuing.
Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.
Fifth, if the work is running, we call kthread_flush_work(). It might
take an arbitrary time. We need to release the worker-lock again. In the
meantime, we again block any queuing by the canceling counter.
As already mentioned, the check for a pending kthread work is done under a
lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work(). Any queuing is blocked until the counter gets zero.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We are going to use kthread_worker more widely and delayed works
will be pretty useful.
The implementation is inspired by workqueues. It uses a timer to queue
the work after the requested delay. If the delay is zero, the work is
queued immediately.
In compare with workqueues, each work is associated with a single worker
(kthread). Therefore the implementation could be much easier. In
particular, we use the worker->lock to synchronize all the operations with
the work. We do not need any atomic operation with a flags variable.
In fact, we do not need any state variable at all. Instead, we add a list
of delayed works into the worker. Then the pending work is listed either
in the list of queued or delayed works. And the existing check of pending
works is the same even for the delayed ones.
A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker is valid
and it could simply take the lock. We just add some sanity checks to help
with debugging a potential misuse.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|