aboutsummaryrefslogtreecommitdiff
path: root/kernel/kthread.c
AgeCommit message (Collapse)AuthorFilesLines
2019-10-16kthread: make __kthread_queue_delayed_work staticBen Dooks1-3/+3
The __kthread_queue_delayed_work is not exported so make it static, to avoid the following sparse warning: kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static? Signed-off-by: Ben Dooks <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-05-14include/: refactor headers to allow kthread.h inclusion in psi_types.hSuren Baghdasaryan1-0/+1
kthread.h can't be included in psi_types.h because it creates a circular inclusion with kthread.h eventually including psi_types.h and complaining on kthread structures not being defined because they are defined further in the kthread.h. Resolve this by removing psi_types.h inclusion from the headers included from kthread.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+2
Merge misc updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <[email protected]>: (159 commits) tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include proc: more robust bulk read test proc: test /proc/*/maps, smaps, smaps_rollup, statm proc: use seq_puts() everywhere proc: read kernel cpu stat pointer once proc: remove unused argument in proc_pid_lookup() fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() fs/proc/self.c: code cleanup for proc_setup_self() proc: return exit code 4 for skipped tests mm,mremap: bail out earlier in mremap_to under map pressure mm/sparse: fix a bad comparison mm/memory.c: do_fault: avoid usage of stale vm_area_struct writeback: fix inode cgroup switching comment mm/huge_memory.c: fix "orig_pud" set but not used mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC mm/memcontrol.c: fix bad line in comment mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm/compaction: pass pgdat to too_many_isolated() instead of zone mm: remove zone_lru_lock() function, access ->lru_lock directly ...
2019-03-06Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-21/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - refcount conversions - Solve the rq->leaf_cfs_rq_list can of worms for real. - improve power-aware scheduling - add sysctl knob for Energy Aware Scheduling - documentation updates - misc other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) kthread: Do not use TIMER_IRQSAFE kthread: Convert worker lock to raw spinlock sched/fair: Use non-atomic cpumask_{set,clear}_cpu() sched/fair: Remove unused 'sd' parameter from select_idle_smt() sched/wait: Use freezable_schedule() when possible sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block sched/fair: Explain LLC nohz kick condition sched/fair: Simplify nohz_balancer_kick() sched/topology: Fix percpu data types in struct sd_data & struct s_data sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument sched/fair: Fix O(nr_cgroups) in the load balancing path sched/fair: Optimize update_blocked_averages() sched/fair: Fix insertion in rq->leaf_cfs_rq_list sched/fair: Add tmp_alone_branch assertion sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity sched/fair: Update scale invariance of PELT sched/fair: Move the rq_of() helper function sched/core: Convert task_struct.stack_refcount to refcount_t ...
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual1-1/+2
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Jeff Kirsher <[email protected]> [ixgbe] Acked-by: Jens Axboe <[email protected]> [mtip32xx] Acked-by: Vinod Koul <[email protected]> [dmaengine.c] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Doug Ledford <[email protected]> [drivers/infiniband] Cc: Joseph Qi <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-28kthread: Do not use TIMER_IRQSAFESebastian Andrzej Siewior1-2/+3
The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread: initial support for delayed kthread work") which modelled the delayed kthread code after workqueue's code. The workqueue code requires the flag TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's delay timer since all operations occur under a lock. Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup() for initialisation purpose which is the official function. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-28kthread: Convert worker lock to raw spinlockJulia Cartwright1-21/+21
In order to enable the queuing of kthread work items from hardirq context even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a raw_spin_lock. This is only acceptable to do because the work performed under the lock is well-bounded and minimal. Reported-by: Steffen Trumtrar <[email protected]> Reported-by: Tim Sander <[email protected]> Signed-off-by: Julia Cartwright <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Steffen Trumtrar <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Guenter Roeck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-10kthread: Add __kthread_should_park()Matthias Kaehlcke1-1/+7
kthread_should_park() is used to check if the calling kthread ('current') should park, but there is no function to check whether an arbitrary kthread should be parked. The latter is required to plug a CPU hotplug race vs. a parking ksoftirqd thread. The new __kthread_should_park() receives a task_struct as parameter to check if the corresponding kernel thread should be parked. Call __kthread_should_park() from kthread_should_park() to avoid code duplication. Signed-off-by: Matthias Kaehlcke <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-08-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
2018-07-26kthread, tracing: Don't expose half-written comm when creating kthreadsSnild Dolkow1-1/+7
There is a window for racing when printing directly to task->comm, allowing other threads to see a non-terminated string. The vsnprintf function fills the buffer, counts the truncated chars, then finally writes the \0 at the end. creator other vsnprintf: fill (not terminated) count the rest trace_sched_waking(p): ... memcpy(comm, p->comm, TASK_COMM_LEN) write \0 The consequences depend on how 'other' uses the string. In our case, it was copied into the tracing system's saved cmdlines, a buffer of adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be): crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk' 0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12" ...and a strcpy out of there would cause stack corruption: [224761.522292] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffff9bf9783c78 crash-arm64> kbt | grep 'comm\|trace_print_context' #6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396) comm (char [16]) = "irq/497-pwr_even" crash-arm64> rd 0xffffffd4d0e17d14 8 ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_ ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16: ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`.. ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`.......... The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was likely needed because of this same bug. Solved by vsnprintf:ing to a local buffer, then using set_task_comm(). This way, there won't be a window where comm is not terminated. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure") Reviewed-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Snild Dolkow <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-07-03kthread: Simplify kthread_park() completionPeter Zijlstra1-2/+4
Oleg explains the reason we could hit park+park is that smpboot_update_cpumask_percpu_thread()'s for_each_cpu_and(cpu, &tmp, cpu_online_mask) smpboot_park_kthread(); turns into: for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and) smpboot_park_kthread(); on UP, ignoring the mask. But since we just completely removed that function, this is no longer relevant. So revert commit: b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread") Suggested-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-07-03kthread, sched/core: Fix kthread_parkme() (again...)Peter Zijlstra1-6/+24
Gaurav reports that commit: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue") isn't working for him. Because of the following race: > controller Thread CPUHP Thread > takedown_cpu > kthread_park > kthread_parkme > Set KTHREAD_SHOULD_PARK > smpboot_thread_fn > set Task interruptible > > > wake_up_process > if (!(p->state & state)) > goto out; > > Kthread_parkme > SET TASK_PARKED > schedule > raw_spin_lock(&rq->lock) > ttwu_remote > waiting for __task_rq_lock > context_switch > > finish_lock_switch > > > > Case TASK_PARKED > kthread_park_complete > > > SET Running Furthermore, Oleg noticed that the whole scheduler TASK_PARKED handling is buggered because the TASK_DEAD thing is done with preemption disabled, the current code can still complete early on preemption :/ So basically revert that earlier fix and go with a variant of the alternative mentioned in the commit. Promote TASK_PARKED to special state to avoid the store-store issue on task->state leading to the WARN in kthread_unpark() -> __kthread_bind(). But in addition, add wait_task_inactive() to kthread_park() to ensure the task really is PARKED when we return from kthread_park(). This avoids the whole kthread still gets migrated nonsense -- although it would be really good to get this done differently. Reported-by: Gaurav Kohli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue") Signed-off-by: Ingo Molnar <[email protected]>
2018-05-25kthread: Allow kthread_park() on a parked kthreadPeter Zijlstra1-4/+2
The following commit: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue") added a WARN() in the case where we call kthread_park() on an already parked thread, because the old code wasn't doing the right thing there and it wasn't at all clear that would happen. It turns out, this does in fact happen, so we have to deal with it. Instead of potentially returning early, also wait for the completion. This does however mean we have to use complete_all() and re-initialize the completion on re-use. Reported-by: LKP <[email protected]> Tested-by: Meelis Roos <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: kernel test robot <[email protected]> Cc: [email protected] Cc: Thomas Gleixner <[email protected]> Fixes: 85f1abe0019f ("kthread, sched/wait: Fix kthread_parkme() completion issue") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-05-03kthread, sched/wait: Fix kthread_parkme() completion issuePeter Zijlstra1-24/+19
Even with the wait-loop fixed, there is a further issue with kthread_parkme(). Upon hotplug, when we do takedown_cpu(), smpboot_park_threads() can return before all those threads are in fact blocked, due to the placement of the complete() in __kthread_parkme(). When that happens, sched_cpu_dying() -> migrate_tasks() can end up migrating such a still runnable task onto another CPU. Normally the task will have hit schedule() and gone to sleep by the time we do kthread_unpark(), which will then do __kthread_bind() to re-bind the task to the correct CPU. However, when we loose the initial TASK_PARKED store to the concurrent wakeup issue described previously, do the complete(), get migrated, it is possible to either: - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate the park and set TASK_RUNNING, or - __kthread_bind()'s wait_task_inactive() to observe the competing TASK_RUNNING store. Either way the WARN() in __kthread_bind() will trigger and fail to correctly set the CPU affinity. Fix this by only issuing the complete() when the kthread has scheduled out. This does away with all the icky 'still running' nonsense. The alternative is to promote TASK_PARKED to a special state, this guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING and we'll end up doing the right thing, but this preserves the whole icky business of potentially migating the still runnable thing. Reported-by: Gaurav Kohli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2018-05-03kthread, sched/wait: Fix kthread_parkme() wait-loopPeter Zijlstra1-3/+4
Gaurav reported a problem with __kthread_parkme() where a concurrent try_to_wake_up() could result in competing stores to ->state which, when the TASK_PARKED store got lost bad things would happen. The comment near set_current_state() actually mentions this competing store, but only mentions the case against TASK_RUNNING. This same store, with different timing, can happen against a subsequent !RUNNING store. This normally is not a problem, because as per that same comment, the !RUNNING state store is inside a condition based wait-loop: for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!need_sleep) break; schedule(); } __set_current_state(TASK_RUNNING); If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous (concurrent) wakeup, the schedule() will NO-OP and we'll go around the loop once more. The problem here is that the TASK_PARKED store is not inside the KTHREAD_SHOULD_PARK condition wait-loop. There is a genuine issue with sleeps that do not have a condition; this is addressed in a subsequent patch. Reported-by: Gaurav Kohli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-11-21treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook1-1/+1
With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <[email protected]>
2017-11-14Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+62
Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
2017-11-10kthread: zero the kthread data structureShaohua Li1-5/+1
kthread() could bail out early before we initialize blkcg_css (if the kthread is killed very early. Please see xchg() statement in kthread()), which confuses free_kthread_struct. Instead of moving the blkcg_css initialization early, we simply zero the whole 'self' data structure, which doesn't sound much overhead. Reported-by: syzbot <[email protected]> Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info") Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Dmitry Vyukov <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-10-05kthread: Convert callback to use from_timer()Kees Cook1-6/+4
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch kthread to use from_timer() and pass the timer pointer explicitly. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Len Brown <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Kalle Valo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Machek <[email protected]> Cc: [email protected] Cc: Chris Metcalf <[email protected]> Cc: [email protected] Cc: [email protected] Cc: "James E.J. Bottomley" <[email protected]> Cc: Wim Van Sebroeck <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ursula Braun <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Harish Patil <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Manish Chopra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: [email protected] Cc: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Julian Wiedmann <[email protected]> Cc: John Stultz <[email protected]> Cc: Mark Gross <[email protected]> Cc: [email protected] Cc: [email protected] Cc: "Martin K. Petersen" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Stefan Richter <[email protected]> Cc: Michael Reed <[email protected]> Cc: [email protected] Cc: Tejun Heo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: Sudip Mukherjee <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2017-09-26block: fix a build errorShaohua Li1-4/+4
The code is only for blkcg not for all cgroups Fixes: d4478e92d618 ("block/loop: make loop cgroup aware") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-09-26kthread: add a mechanism to store cgroup infoShaohua Li1-2/+64
kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread context. Later we can retrieve the cgroup info and attach the cgroup info to jobs. Since this mechanism is only required by kthread, we store the cgroup info in kthread data instead of generic task_struct. Acked-by: Tejun Heo <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-08-31kernel/kthread.c: kthread_worker: don't hog the cpuShaohua Li1-0/+1
If the worker thread continues getting work, it will hog the cpu and rcu stall complains. Make it a good citizen. This is triggered in a loop block device test. Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com Signed-off-by: Shaohua Li <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-03-17cgroup, kthread: close race window where new kthreads can be migrated to ↵Tejun Heo1-0/+3
non-root cgroups Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <[email protected]> Suggested-by: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Reported-and-debugged-by: Chris Mason <[email protected]> Cc: [email protected] # v4.3+ (we can't close the race on < v4.3) Signed-off-by: Tejun Heo <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/task.h> We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-10time: Remove CONFIG_TIMER_STATSKees Cook1-1/+0
Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Kees Cook <[email protected]> Acked-by: John Stultz <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: [email protected] Cc: Lai Jiangshan <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Xing Gao <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Jessica Frazelle <[email protected]> Cc: [email protected] Cc: Nicolas Iooss <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Michal Marek <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Olof Johansson <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: Arjan van de Ven <[email protected]> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-12kthread: add __printf attributesNicolas Iooss1-2/+3
When commit fbae2d44aa1d ("kthread: add kthread_create_worker*()") introduced some kthread_create_...() functions which were taking printf-like parametter, it introduced __printf attributes to some functions (e.g. kthread_create_worker()). Nevertheless some new functions were forgotten (they have been detected thanks to -Wmissing-format-attribute warning flag). Add the missing __printf attributes to the newly-introduced functions in order to detect formatting issues at build-time with -Wformat flag. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicolas Iooss <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-08kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()Oleg Nesterov1-15/+8
kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this only makes sense if this kthread can be parked/unparked by cpuhp code. kthread workers never call kthread_parkme() so this has no effect. Change __kthread_create_worker() to simply call kthread_bind(task, cpu). The very fact that kthread_create_on_cpu() doesn't accept a generic fmt shows that it should not be used outside of smpboot.c. Now, the only reason we can not unexport this helper and move it into smpboot.c is that it sets kthread->cpu and struct kthread is not exported. And the only reason we can not kill kthread->cpu is that kthread_unpark() is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu). Signed-off-by: Oleg Nesterov <[email protected]> Tested-by: Petr Mladek <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Chunming Zhou <[email protected]> Cc: Roman Pen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-08kthread: Don't use to_live_kthread() in kthread_[un]park()Oleg Nesterov1-45/+24
Now that to_kthread() is always validm change kthread_park() and kthread_unpark() to use it and kill to_live_kthread(). The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set then the task has called complete(&self->parked) and there the function cannot race against a concurrent kthread_stop() and exit. kthread_park() is more tricky, because its semantics are not well defined. It returns -ENOSYS if the thread exited but this can never happen and as Roman pointed out kthread_park() can obviously block forever if it would race with the exiting kthread. The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c) is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads() clears *ht->store, smpboot_park_thread() checks it is not NULL under the same smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other callers are fine too. But it has two more users: - watchdog_park_threads(): The code is actually correct, get_online_cpus() ensures that kthread_park() can't race with itself (note that kthread_park() can't handle this race correctly), but it should not use kthread_park() directly. - drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use kthread_park() either. kthread_park() must not be called after amd_sched_fini() which does kthread_stop(), otherwise even to_live_kthread() is not safe because task_struct can be already freed and sched->thread can point to nowhere. The usage of kthread_park/unpark should either be restricted to core code which is properly protected against the exit race or made more robust so it is safe to use it in drivers. To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chunming Zhou <[email protected]> Cc: Roman Pen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-08kthread: Don't use to_live_kthread() in kthread_stop()Oleg Nesterov1-7/+5
kthread_stop() had to use to_live_kthread() simply because it was not possible to access kthread->exited after the exiting task clears task_struct->vfork_done. Now that to_kthread() is always valid, wake_up_process() + wait_for_completion() can be done ununconditionally. It's not an issue anymore if the task has already issued complete_vfork_done() or died. The exiting task can get the spurious wakeup after mm_release() but this is possible without this change too and is fine; do_task_dead() ensures that this can't make any harm. As a further enhancement this could be converted to task_work_add() later, so ->vfork_done can be avoided completely. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chunming Zhou <[email protected]> Cc: Roman Pen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-08Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in ↵Oleg Nesterov1-6/+2
to_live_kthread() function" This reverts commit 23196f2e5f5d810578a772785807dcdc2b9fdce9. Now that struct kthread is kmalloc'ed and not longer on the task stack there is no need anymore to pin the stack. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Chunming Zhou <[email protected]> Cc: Roman Pen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-12-08kthread: Make struct kthread kmalloc'edOleg Nesterov1-13/+45
commit 23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() / put_task_stack() in to_live_kthread() function" is a workaround for the fragile design of struct kthread being allocated on the task stack. struct kthread in its current form should be removed, but this needs cleanups outside of kthread.c. As a first step move struct kthread away from the task stack by making it kmalloc'ed. This allows to access kthread.exited without the magic of trying to pin task stack and the try logic in to_live_kthread(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chunming Zhou <[email protected]> Cc: Roman Pen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-10-11kthread: better support freezable kthread workersPetr Mladek1-6/+15
This patch allows to make kthread worker freezable via a new @flags parameter. It will allow to avoid an init work in some kthreads. It currently does not affect the function of kthread_worker_fn() but it might help to do some optimization or fixes eventually. I currently do not know about any other use for the @flags parameter but I believe that we will want more flags in the future. Finally, I hope that it will not cause confusion with @flags member in struct kthread. Well, I guess that we will want to rework the basic kthreads implementation once all kthreads are converted into kthread workers or workqueues. It is possible that we will merge the two structures. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: allow to modify delayed kthread workPetr Mladek1-0/+53
There are situations when we need to modify the delay of a delayed kthread work. For example, when the work depends on an event and the initial delay means a timeout. Then we want to queue the work immediately when the event happens. This patch implements kthread_mod_delayed_work() as inspired workqueues. It cancels the timer, removes the work from any worker list and queues it again with the given timeout. A very special case is when the work is being canceled at the same time. It might happen because of the regular kthread_cancel_delayed_work_sync() or by another kthread_mod_delayed_work(). In this case, we do nothing and let the other operation win. This should not normally happen as the caller is supposed to synchronize these operations a reasonable way. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: allow to cancel kthread workPetr Mladek1-2/+130
We are going to use kthread workers more widely and sometimes we will need to make sure that the work is neither pending nor running. This patch implements cancel_*_sync() operations as inspired by workqueues. Well, we are synchronized against the other operations via the worker lock, we use del_timer_sync() and a counter to count parallel cancel operations. Therefore the implementation might be easier. First, we check if a worker is assigned. If not, the work has newer been queued after it was initialized. Second, we take the worker lock. It must be the right one. The work must not be assigned to another worker unless it is initialized in between. Third, we try to cancel the timer when it exists. The timer is deleted synchronously to make sure that the timer call back is not running. We need to temporary release the worker->lock to avoid a possible deadlock with the callback. In the meantime, we set work->canceling counter to avoid any queuing. Fourth, we try to remove the work from a worker list. It might be the list of either normal or delayed works. Fifth, if the work is running, we call kthread_flush_work(). It might take an arbitrary time. We need to release the worker-lock again. In the meantime, we again block any queuing by the canceling counter. As already mentioned, the check for a pending kthread work is done under a lock. In compare with workqueues, we do not need to fight for a single PENDING bit to block other operations. Therefore we do not suffer from the thundering storm problem and all parallel canceling jobs might use kthread_flush_work(). Any queuing is blocked until the counter gets zero. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: initial support for delayed kthread workPetr Mladek1-0/+102
We are going to use kthread_worker more widely and delayed works will be pretty useful. The implementation is inspired by workqueues. It uses a timer to queue the work after the requested delay. If the delay is zero, the work is queued immediately. In compare with workqueues, each work is associated with a single worker (kthread). Therefore the implementation could be much easier. In particular, we use the worker->lock to synchronize all the operations with the work. We do not need any atomic operation with a flags variable. In fact, we do not need any state variable at all. Instead, we add a list of delayed works into the worker. Then the pending work is listed either in the list of queued or delayed works. And the existing check of pending works is the same even for the delayed ones. A work must not be assigned to another worker unless reinitialized. Therefore the timer handler might expect that dwork->work->worker is valid and it could simply take the lock. We just add some sanity checks to help with debugging a potential misuse. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: detect when a kthread work is used by more workersPetr Mladek1-8/+20
Nothing currently prevents a work from queuing for a kthread worker when it is already running on another one. This means that the work might run in parallel on more than one worker. Also some operations are not reliable, e.g. flush. This problem will be even more visible after we add kthread_cancel_work() function. It will only have "work" as the parameter and will use worker->lock to synchronize with others. Well, normally this is not a problem because the API users are sane. But bugs might happen and users also might be crazy. This patch adds a warning when we try to insert the work for another worker. It does not fully prevent the misuse because it would make the code much more complicated without a big benefit. It adds the same warning also into kthread_flush_work() instead of the repeated attempts to get the right lock. A side effect is that one needs to explicitly reinitialize the work if it must be queued into another worker. This is needed, for example, when the worker is stopped and started again. It is a bit inconvenient. But it looks like a good compromise between the stability and complexity. I have double checked all existing users of the kthread worker API and they all seems to initialize the work after the worker gets started. Just for completeness, the patch adds a check that the work is not already in a queue. The patch also puts all the checks into a separate function. It will be reused when implementing delayed works. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: add kthread_destroy_worker()Petr Mladek1-0/+23
The current kthread worker users call flush() and stop() explicitly. This function does the same plus it frees the kthread_worker struct in one call. It is supposed to be used together with kthread_create_worker*() that allocates struct kthread_worker. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: add kthread_create_worker*()Petr Mladek1-10/+103
Kthread workers are currently created using the classic kthread API, namely kthread_run(). kthread_worker_fn() is passed as the @threadfn parameter. This patch defines kthread_create_worker() and kthread_create_worker_on_cpu() functions that hide implementation details. They enforce using kthread_worker_fn() for the main thread. But I doubt that there are any plans to create any alternative. In fact, I think that we do not want any alternative main thread because it would be hard to support consistency with the rest of the kthread worker API. The naming and function of kthread_create_worker() is inspired by the workqueues API like the rest of the kthread worker API. The kthread_create_worker_on_cpu() variant is motivated by the original kthread_create_on_cpu(). Note that we need to bind per-CPU kthread workers already when they are created. It makes the life easier. kthread_bind() could not be used later for an already running worker. This patch does _not_ convert existing kthread workers. The kthread worker API need more improvements first, e.g. a function to destroy the worker. IMPORTANT: kthread_create_worker_on_cpu() allows to use any format of the worker name, in compare with kthread_create_on_cpu(). The good thing is that it is more generic. The bad thing is that most users will need to pass the cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu, "helper/%d", cpu). To be honest, the main motivation was to avoid the need for an empty va_list. The only legal way was to create a helper function that would be called with an empty list. Other attempts caused compilation warnings or even errors on different architectures. There were also other alternatives, for example, using #define or splitting __kthread_create_worker(). The used solution looked like the least ugly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: allow to call __kthread_create_on_node() with va_list argsPetr Mladek1-30/+42
kthread_create_on_node() implements a bunch of logic to create the kthread. It is already called by kthread_create_on_cpu(). We are going to extend the kthread worker API and will need to call kthread_create_on_node() with va_list args there. This patch does only a refactoring and does not modify the existing behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread/smpboot: do not park in kthread_create_on_cpu()Petr Mladek1-2/+6
kthread_create_on_cpu() was added by the commit 2a1d446019f9a5983e ("kthread: Implement park/unpark facility"). It is currently used only when enabling new CPU. For this purpose, the newly created kthread has to be parked. The CPU binding is a bit tricky. The kthread is parked when the CPU has not been allowed yet. And the CPU is bound when the kthread is unparked. The function would be useful for more per-CPU kthreads, e.g. bnx2fc_thread, fcoethread. For this purpose, the newly created kthread should stay in the uninterruptible state. This patch moves the parking into smpboot. It binds the thread already when created. Then the function might be used universally. Also the behavior is consistent with kthread_create() and kthread_create_on_node(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: kthread worker API cleanupPetr Mladek1-16/+17
A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [[email protected]: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-11kthread: rename probe_kthread_data() to kthread_probe_data()Petr Mladek1-2/+2
Patch series "kthread: Kthread worker API improvements" The intention of this patchset is to make it easier to manipulate and maintain kthreads. Especially, I want to replace all the custom main cycles with a generic one. Also I want to make the kthreads sleep in a consistent state in a common place when there is no work. This patch (of 11): A good practice is to prefix the names of functions by the name of the subsystem. This patch fixes the name of probe_kthread_data(). The other wrong functions names are part of the kthread worker API and will be fixed separately. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Suggested-by: Andrew Morton <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-09-16kthread: Pin the stack via try_get_task_stack()/put_task_stack() in ↵Oleg Nesterov1-2/+6
to_live_kthread() function get_task_struct(tsk) no longer pins tsk->stack so all users of to_live_kthread() should do try_get_task_stack/put_task_stack to protect "struct kthread" which lives on kthread's stack. TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too, and rework kthread_stop(), it can use task_work_add() to sync with the exiting kernel thread. Message-Id: <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jann Horn <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/cb9b16bbc19d4aea4507ab0552e4644c1211d130.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2015-09-04kernel/kthread.c:kthread_create_on_node(): clarify documentationAndrew Morton1-3/+4
- Make it clear that the `node' arg refers to memory allocations only: kthread_create_on_node() does not pin the new thread to that node's CPUs. - Encourage the use of NUMA_NO_NODE. [[email protected]: use NUMA_NO_NODE in kthread_create() also] Cc: Nathan Zimmer <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-08-31Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...
2015-08-12sched: Fix a race between __kthread_bind() and sched_setaffinity()Peter Zijlstra1-3/+17
Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY without locks, a caller might observe an old value and race with the set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo it: __kthread_bind() do_set_cpus_allowed() <SYSCALL> sched_setaffinity() if (p->flags & PF_NO_SETAFFINITIY) set_cpus_allowed_ptr() p->flags |= PF_NO_SETAFFINITY Fix the bug by putting everything under the regular scheduler locks. This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-08-07kthread: export kthread functionsDavid Kershner1-0/+4
The s-Par visornic driver, currently in staging, processes a queue being serviced by the an s-Par service partition. We can get a message that something has happened with the Service Partition, when that happens, we must not access the channel until we get a message that the service partition is back again. The visornic driver has a thread for processing the channel, when we get the message, we need to be able to park the thread and then resume it when the problem clears. We can do this with kthread_park and unpark but they are not exported from the kernel, this patch exports the needed functions. Signed-off-by: David Kershner <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09kernel/kthread.c: partial revert of 81c98869faa5 ("kthread: ensure locality ↵Nishanth Aravamudan1-1/+1
of task_struct allocations") After discussions with Tejun, we don't want to spread the use of cpu_to_mem() (and thus knowledge of allocators/NUMA topology details) into callers, but would rather ensure the callees correctly handle memoryless nodes. With the previous patches ("topology: add support for node_to_mem_node() to determine the fallback node" and "slub: fallback to node_to_mem_node() node if allocating on memoryless node") adding and using node_to_mem_node(), we can safely undo part of the change to the kthread logic from 81c98869faa5. Signed-off-by: Nishanth Aravamudan <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: David Rientjes <[email protected]> Cc: Han Pingtian <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>