sched/core: Explain sleep/wakeup in a better way

There were a few questions wrt. how sleep-wakeup works. Try and explain
it more.

Requested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Peter Zijlstra 2016-10-19 15:45:27 +02:00 committed by Ingo Molnar
parent 3c3fcb45d5
commit a225023828
2 changed files with 44 additions and 23 deletions

View file

@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!!(
#define set_task_state(tsk, state_value) \
do { \
(tsk)->task_state_change = _THIS_IP_; \
smp_store_mb((tsk)->state, (state_value)); \
smp_store_mb((tsk)->state, (state_value)); \
} while (0)
/*
* set_current_state() includes a barrier so that the write of current->state
* is correctly serialised wrt the caller's subsequent test of whether to
* actually sleep:
*
* set_current_state(TASK_UNINTERRUPTIBLE);
* if (do_i_need_to_sleep())
* schedule();
*
* If the caller does not need such serialisation then use __set_current_state()
*/
#define __set_current_state(state_value) \
do { \
current->task_state_change = _THIS_IP_; \
@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!!(
#define set_current_state(state_value) \
do { \
current->task_state_change = _THIS_IP_; \
smp_store_mb(current->state, (state_value)); \
smp_store_mb(current->state, (state_value)); \
} while (0)
#else
/*
* @tsk had better be current, or you get to keep the pieces.
*
* The only reason is that computing current can be more expensive than
* using a pointer that's already available.
*
* Therefore, see set_current_state().
*/
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value) \
@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!!(
* is correctly serialised wrt the caller's subsequent test of whether to
* actually sleep:
*
* for (;;) {
* set_current_state(TASK_UNINTERRUPTIBLE);
* if (do_i_need_to_sleep())
* schedule();
* if (!need_sleep)
* break;
*
* If the caller does not need such serialisation then use __set_current_state()
* schedule();
* }
* __set_current_state(TASK_RUNNING);
*
* If the caller does not need such serialisation (because, for instance, the
* condition test and condition change and wakeup are under the same lock) then
* use __set_current_state().
*
* The above is typically ordered against the wakeup, which does:
*
* need_sleep = false;
* wake_up_state(p, TASK_UNINTERRUPTIBLE);
*
* Where wake_up_state() (and all other wakeup primitives) imply enough
* barriers to order the store of the variable against wakeup.
*
* Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
* once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
* TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
*
* This is obviously fine, since they both store the exact same value.
*
* Also see the comments of try_to_wake_up().
*/
#define __set_current_state(state_value) \
do { current->state = (state_value); } while (0)

View file

@ -1995,14 +1995,15 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
* @state: the mask of task states that can be woken
* @wake_flags: wake modifier flags (WF_*)
*
* Put it on the run-queue if it's not already there. The "current"
* thread is always on the run-queue (except when the actual
* re-schedule is in progress), and as such you're allowed to do
* the simpler "current->state = TASK_RUNNING" to mark yourself
* runnable without the overhead of this.
* If (@state & @p->state) @p->state = TASK_RUNNING.
*
* Return: %true if @p was woken up, %false if it was already running.
* or @state didn't match @p's state.
* If the task was not queued/runnable, also place it back on a runqueue.
*
* Atomic against schedule() which would dequeue a task, also see
* set_current_state().
*
* Return: %true if @p->state changes (an actual wakeup was done),
* %false otherwise.
*/
static int
try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)