Merge tag 'sched-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar: - The biggest change is introduction of a new iteration of the SCHED_FAIR interactivity code: the EEVDF ("Earliest Eligible Virtual Deadline First") scheduler EEVDF too is a virtual-time scheduler, with two parameters (weight and relative deadline), compared to CFS that had weight only. It completely reworks the base scheduler: placement, preemption, picking -- everything LWN.net, as usual, has a terrific writeup about EEVDF: https://lwn.net/Articles/925371/ Preemption (both tick and wakeup) is driven by testing against a fresh pick. Because the tree is now effectively an interval tree, and the selection is no longer the 'leftmost' task, over-scheduling is less of a problem. A lot of the CFS heuristics are removed or replaced by more natural latency-space parameters & constructs In terms of expected performance regressions: we will and can fix everything where a 'good' workload misbehaves with the new scheduler, but EEVDF inevitably changes workload scheduling in a binary fashion, hopefully for the better in the overwhelming majority of cases, but in some cases it won't, especially in adversarial loads that got lucky with the previous code, such as some variants of hackbench. We are trying hard to err on the side of fixing all performance regressions, but we expect some inevitable post-release iterations of that process - Improve load-balancing on hybrid x86 systems: enable cluster scheduling (again) - Improve & fix bandwidth-scheduling on nohz systems - Improve bandwidth-throttling - Use lock guards to simplify and de-goto-ify control flow - Misc improvements, cleanups and fixes * tag 'sched-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well sched/eevdf: Curb wakeup-preemption sched: Simplify sched_core_cpu_{starting,deactivate}() sched: Simplify try_steal_cookie() sched: Simplify sched_tick_remote() sched: Simplify sched_exec() sched: Simplify ttwu() sched: Simplify wake_up_if_idle() sched: Simplify: migrate_swap_stop() sched: Simplify sysctl_sched_uclamp_handler() sched: Simplify get_nohz_timer_target() sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/fair: Block nohz tick_stop when cfs bandwidth in use sched, cgroup: Restore meaning to hierarchical_quota MAINTAINERS: Add Peter explicitly to the psi section sched/psi: Select KERNFS as needed sched/topology: Align group flags when removing degenerate domain sched/fair: remove util_est boosting sched/fair: Propagate enqueue flags into place_entity() ...
author: Linus Torvalds <[email protected]> 2023-08-28 16:43:39 -0700
committer: Linus Torvalds <[email protected]> 2023-08-28 16:43:39 -0700
commit: 3ca9a836ff53db8eb76d559764c07fb3b015886a (patch)
tree: edbd56fb3069895e1d8306df53aaa3d742a9bec8 /include/linux/sched/task.h
parent: 1a7c611546e552193180941ecf6b191e659db979 (diff)
parent: 2f88c8e802c8b128a155976631f4eb2ce4f3c805 (diff)
1 files changed, 37 insertions, 1 deletions
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index dd35ce28bb90..a23af225c898 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -118,11 +118,47 @@ static inline struct task_struct *get_task_struct(struct task_struct *t)
 }
 
 extern void __put_task_struct(struct task_struct *t);
+extern void __put_task_struct_rcu_cb(struct rcu_head *rhp);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (refcount_dec_and_test(&t->usage))
+	if (!refcount_dec_and_test(&t->usage))
+		return;
+
+	/*
+	 * In !RT, it is always safe to call __put_task_struct().
+	 * Under RT, we can only call it in preemptible context.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
 		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
+	/*
+	 * under PREEMPT_RT, we can't call put_task_struct
+	 * in atomic context because it will indirectly
+	 * acquire sleeping locks.
+	 *
+	 * call_rcu() will schedule delayed_put_task_struct_rcu()
+	 * to be called in process context.
+	 *
+	 * __put_task_struct() is called when
+	 * refcount_dec_and_test(&t->usage) succeeds.
+	 *
+	 * This means that it can't "conflict" with
+	 * put_task_struct_rcu_user() which abuses ->rcu the same
+	 * way; rcu_users has a reference so task->usage can't be
+	 * zero after rcu_users 1 -> 0 transition.
+	 *
+	 * delayed_free_task() also uses ->rcu, but it is only called
+	 * when it fails to fork a process. Therefore, there is no
+	 * way it can conflict with put_task_struct().
+	 */
+	call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
 DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
author	Linus Torvalds <[email protected]>	2023-08-28 16:43:39 -0700
committer	Linus Torvalds <[email protected]>	2023-08-28 16:43:39 -0700
commit	3ca9a836ff53db8eb76d559764c07fb3b015886a (patch)
tree	edbd56fb3069895e1d8306df53aaa3d742a9bec8 /include/linux/sched/task.h
parent	1a7c611546e552193180941ecf6b191e659db979 (diff)
parent	2f88c8e802c8b128a155976631f4eb2ce4f3c805 (diff)