diff options
| author | Frederic Weisbecker <[email protected]> | 2021-12-01 16:19:45 +0100 | 
|---|---|---|
| committer | Tejun Heo <[email protected]> | 2021-12-02 13:00:59 -1000 | 
| commit | 45c753f5f24d2d4717acb38ce35e604ff9abcb50 (patch) | |
| tree | 26a2bd00e9eccd018a879c9490e43c0ef46ebe92 /drivers/usb/cdns3/cdnsp-ring.c | |
| parent | 07edfece8bcb0580a1828d939e6f8d91a8603eb2 (diff) | |
workqueue: Fix unbind_workers() VS wq_worker_sleeping() race
At CPU-hotplug time, unbind_workers() may preempt a worker while it is
going to sleep. In that case the following scenario can happen:
    unbind_workers()                     wq_worker_sleeping()
    --------------                      -------------------
                                      if (worker->flags & WORKER_NOT_RUNNING)
                                          return;
                                      //PREEMPTED by unbind_workers
    worker->flags |= WORKER_UNBOUND;
    [...]
    atomic_set(&pool->nr_running, 0);
    //resume to worker
                                       atomic_dec_and_test(&pool->nr_running);
After unbind_worker() resets pool->nr_running, the value is expected to
remain 0 until the pool ever gets rebound in case cpu_up() is called on
the target CPU in the future. But here the race leaves pool->nr_running
with a value of -1, triggering the following warning when the worker goes
idle:
        WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
        Modules linked in:
        CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Workqueue:  0x0 (rcu_par_gp)
        RIP: 0010:worker_enter_idle+0x95/0xc0
        Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
        RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
        RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
        RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
        R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
        R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
        FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          <TASK>
          worker_thread+0x89/0x3d0
          ? process_one_work+0x400/0x400
          kthread+0x162/0x190
          ? set_kthread_struct+0x40/0x40
          ret_from_fork+0x22/0x30
          </TASK>
Also due to this incorrect "nr_running == -1", all sorts of hazards can
happen, starting with queued works being ignored because no workers are
awaken at insert_work() time.
Fix this with checking again the worker flags while pool->lock is locked.
Fixes: b945efcdd07d ("sched: Remove pointless preemption disable in sched_submit_work()")
Reviewed-by: Lai Jiangshan <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Diffstat (limited to 'drivers/usb/cdns3/cdnsp-ring.c')
0 files changed, 0 insertions, 0 deletions