diff options
author | Changwei Ge <[email protected]> | 2019-09-23 15:33:37 -0700 |
---|---|---|
committer | Linus Torvalds <[email protected]> | 2019-09-24 15:54:07 -0700 |
commit | 0a3775e4f883912944481cf2ef36eb6383a9cc74 (patch) | |
tree | c32030265f51ad8fd58f25dae5850b75cb0e826f | |
parent | a89bd89fae638965ca5a79a3467d79f926260882 (diff) |
ocfs2: wait for recovering done after direct unlock request
There is a scenario causing ocfs2 umount hang when multiple hosts are
rebooting at the same time.
NODE1 NODE2 NODE3
send unlock requset to NODE2
dies
become recovery master
recover NODE2
find NODE2 dead
mark resource RECOVERING
directly remove lock from grant list
calculate usage but RECOVERING marked
**miss the window of purging
clear RECOVERING
To reproduce this issue, crash a host and then umount ocfs2
from another node.
To solve this, just let unlock progress wait for recovery done.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Changwei Ge <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
-rw-r--r-- | fs/ocfs2/dlm/dlmunlock.c | 23 |
1 files changed, 19 insertions, 4 deletions
diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c index e78657742bd8..3883633e82eb 100644 --- a/fs/ocfs2/dlm/dlmunlock.c +++ b/fs/ocfs2/dlm/dlmunlock.c @@ -90,7 +90,8 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, enum dlm_status status; int actions = 0; int in_use; - u8 owner; + u8 owner; + int recovery_wait = 0; mlog(0, "master_node = %d, valblk = %d\n", master_node, flags & LKM_VALBLK); @@ -193,9 +194,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, } if (flags & LKM_CANCEL) lock->cancel_pending = 0; - else - lock->unlock_pending = 0; - + else { + if (!lock->unlock_pending) + recovery_wait = 1; + else + lock->unlock_pending = 0; + } } /* get an extra ref on lock. if we are just switching @@ -229,6 +233,17 @@ leave: spin_unlock(&res->spinlock); wake_up(&res->wq); + if (recovery_wait) { + spin_lock(&res->spinlock); + /* Unlock request will directly succeed after owner dies, + * and the lock is already removed from grant list. We have to + * wait for RECOVERING done or we miss the chance to purge it + * since the removement is much faster than RECOVERING proc. + */ + __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING); + spin_unlock(&res->spinlock); + } + /* let the caller's final dlm_lock_put handle the actual kfree */ if (actions & DLM_UNLOCK_FREE_LOCK) { /* this should always be coupled with list removal */ |