aboutsummaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)AuthorFilesLines
2014-10-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds6-212/+250
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main updates in this cycle were: - mutex MCS refactoring finishing touches: improve comments, refactor and clean up code, reduce debug data structure footprint, etc. - qrwlock finishing touches: remove old code, self-test updates. - small rwsem optimization - various smaller fixes/cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Revert qrwlock recusive stuff locking/rwsem: Avoid double checking before try acquiring write lock locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code locking/semaphore: Resolve some shadow warnings locking/selftest: Support queued rwlock locking/lockdep: Restrict the use of recursive read_lock() with qrwlock locking/spinlocks: Always evaluate the second argument of spin_lock_nested() locking/Documentation: Update locking/mutex-design.txt disadvantages locking/Documentation: Move locking related docs into Documentation/locking/ locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate locking/mutexes: Refactor optimistic spinning code locking/mcs: Remove obsolete comment locking/mutexes: Document quick lock release when unlocking locking/mutexes: Standardize arguments in lock/unlock slowpaths locking: Remove deprecated smp_mb__() barriers
2014-10-03locking/lockdep: Revert qrwlock recusive stuffPeter Zijlstra1-6/+0
Commit f0bab73cb539 ("locking/lockdep: Restrict the use of recursive read_lock() with qrwlock") changed lockdep to try and conform to the qrwlock semantics which differ from the traditional rwlock semantics. In particular qrwlock is fair outside of interrupt context, but in interrupt context readers will ignore all fairness. The problem modeling this is that read and write side have different lock state (interrupts) semantics but we only have a single representation of these. Therefore lockdep will get confused, thinking the lock can cause interrupt lock inversions. So revert it for now; the old rwlock semantics were already imperfectly modeled and the qrwlock extra won't fit either. If we want to properly fix this, I think we need to resurrect the work by Gautham did a few years ago that split the read and write state of locks: http://lwn.net/Articles/332801/ FWIW the locking selftest that would've failed (and was reported by Borislav earlier) is something like: RL(X1); /* IRQ-ON */ LOCK(A); UNLOCK(A); RU(X1); IRQ_ENTER(); RL(X1); /* IN-IRQ */ RU(X1); IRQ_EXIT(); At which point it would report that because A is an IRQ-unsafe lock we can suffer the following inversion: CPU0 CPU1 lock(A) lock(X1) lock(A) <IRQ> lock(X1) And this is 'wrong' because X1 can recurse (assuming the above lock are in fact read-lock) but lockdep doesn't know about this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-10-03locking/rwsem: Avoid double checking before try acquiring write lockJason Low1-9/+11
Commit 9b0fc9c09f1b ("rwsem: skip initial trylock in rwsem_down_write_failed") checks for if there are known active lockers in order to avoid write trylocking using expensive cmpxchg() when it likely wouldn't get the lock. However, a subsequent patch was added such that we directly check for sem->count == RWSEM_WAITING_BIAS right before trying that cmpxchg(). Thus, commit 9b0fc9c09f1b now just adds overhead. This patch modifies it so that we only do a check for if count == RWSEM_WAITING_BIAS. Also, add a comment on why we do an "extra check" of count before the cmpxchg(). Signed-off-by: Jason Low <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Chegu Vinod <[email protected]> Cc: Peter Hurley <[email protected]> Cc: Tim Chen <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/1410913017.2447.22.camel@j-VirtualBox Signed-off-by: Ingo Molnar <[email protected]>
2014-09-30locktorture: Cleanup header usageDavidlohr Bueso1-13/+1
Remove some unnecessary ones and explicitly include rwsem.h Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-30locktorture: Cannot hold read and write lockDavidlohr Bueso1-0/+10
... trigger an error if so. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-30locktorture: Fix __acquire annotation for spinlock irqDavidlohr Bueso1-1/+1
Its quite easy to get mixed up with the names -- 'torture_spinlock_irq' is not actually a valid spinlock name. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-30locktorture: Support rwlocksDavidlohr Bueso1-3/+112
Add a "rw_lock" torture test to stress kernel rwlocks and their irq variant. Reader critical regions are 5x longer than writers. As such a similar ratio of lock acquisitions is seen in the statistics. In the case of massive contention, both hold the lock for 1/10 of a second. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Introduce torture contextDavidlohr Bueso1-79/+82
The amount of global variables is getting pretty ugly. Group variables related to the execution (ie: not parameters) in a new context structure. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Support rwsemsDavidlohr Bueso1-1/+67
We can easily do so with our new reader lock support. Just an arbitrary design default: readers have higher (5x) critical region latencies than writers: 50 ms and 10 ms, respectively. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Add infrastructure for torturing read locksDavidlohr Bueso1-20/+156
Most of it is based on what we already have for writers. This allows readers to be very independent (and thus configurable), enabling future module parameters to control things such as rw distribution. Furthermore, readers have their own delaying function, allowing us to test different rw critical region latencies, and stress locking internals. Similarly, statistics, for now will only serve for the number of lock acquisitions -- as opposed to writers, readers have no failure detection. In addition, introduce a new nreaders_stress module parameter. The default number of readers will be the same number of writers threads. Writer threads are interleaved with readers. Documentation is updated, respectively. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16torture: Address race in module cleanupDavidlohr Bueso1-1/+2
When performing module cleanups by calling torture_cleanup() the 'torture_type' string in nullified However, callers are not necessarily done, and might still need to reference the variable. This impacts both rcutorture and locktorture, causing printing things like: [ 94.226618] (null)-torture: Stopping lock_torture_writer task [ 94.226624] (null)-torture: Stopping lock_torture_stats task Thus delay this operation until the very end of the cleanup process. The consequence (which shouldn't matter for this kid of program) is, of course, that we delay the window between rmmod and modprobing, for instance in module_torture_begin(). Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Make statistics genericDavidlohr Bueso1-16/+16
The statistics structure can serve well for both reader and writer locks, thus simply rename some fields that mention 'write' and leave the declaration of lwsa. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Teach about lock debuggingDavidlohr Bueso1-2/+13
Regular locks are very different than locks with debugging. For instance for mutexes, debugging forces to only take the slowpaths. As such, the locktorture module should take this into account when printing related information -- specifically when printing user passed parameters, it seems the right place for such info. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Support mutexesDavidlohr Bueso1-2/+39
Add a "mutex_lock" torture test. The main difference with the already existing spinlock tests is that the latency of the critical region is much larger. We randomly delay for (arbitrarily) either 500 ms or, otherwise, 25 ms. While this can considerably reduce the amount of writes compared to non blocking locks, if run long enough it can have the same torturous effect. Furthermore it is more representative of mutex hold times and can stress better things like thrashing. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locktorture: Rename locktorture_runnable parameterDavidlohr Bueso1-4/+4
... to just 'torture_runnable'. It follows other variable naming and is shorter. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2014-09-16locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definitionDavidlohr Bueso1-4/+3
rw-semaphore is the only type of lock doing this ugliness of exporting at the end of the file. Signed-off-by: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-09-04locking/semaphore: Resolve some shadow warningsMark Rustad1-6/+6
Resolve some shadow warnings resulting from using the name jiffies, which is a well-known global. This is not a problem of course, but it could be a trap for someone copying and pasting code, and it just makes W=2 a little cleaner. Signed-off-by: Mark Rustad <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/lockdep: Restrict the use of recursive read_lock() with qrwlockWaiman Long1-0/+6
Unlike the original unfair rwlock implementation, queued rwlock will grant lock according to the chronological sequence of the lock requests except when the lock requester is in the interrupt context. Consequently, recursive read_lock calls will now hang the process if there is a write_lock call somewhere in between the read_lock calls. This patch updates the lockdep implementation to look for recursive read_lock calls. A new read state (3) is used to mark those read_lock call that cannot be recursively called except in the interrupt context. The new read state does exhaust the 2 bits available in held_lock:read bit field. The addition of any new read state in the future may require a redesign of how all those bits are squeezed together in the held_lock structure. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/Documentation: Move locking related docs into Documentation/locking/Davidlohr Bueso2-2/+2
Specifically: Documentation/locking/lockdep-design.txt Documentation/locking/lockstat.txt Documentation/locking/mutex-design.txt Documentation/locking/rt-mutex-design.txt Documentation/locking/rt-mutex.txt Documentation/locking/spinlocks.txt Documentation/locking/ww-mutex-design.txt Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Alexei Starovoitov <[email protected]> Cc: Al Viro <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Chris Mason <[email protected]> Cc: Dan Streetman <[email protected]> Cc: David Airlie <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jason Low <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Lubomir Rintel <[email protected]> Cc: Masanari Iida <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Tim Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriateDavidlohr Bueso1-1/+1
4badad35 ("locking/mutex: Disable optimistic spinning on some architectures") added a ARCH_SUPPORTS_ATOMIC_RMW flag to disable the mutex optimistic feature on specific archs. Because CONFIG_MUTEX_SPIN_ON_OWNER only depended on DEBUG and SMP, it was ok to have the ->owner field conditional a bit flexible. However by adding a new variable to the matter, we can waste space with the unused field, ie: CONFIG_SMP && (!CONFIG_MUTEX_SPIN_ON_OWNER && !CONFIG_DEBUG_MUTEX). Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Davidlohr Bueso <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/mutexes: Refactor optimistic spinning codeDavidlohr Bueso1-182/+214
When we fail to acquire the mutex in the fastpath, we end up calling __mutex_lock_common(). A *lot* goes on in this function. Move out the optimistic spinning code into mutex_optimistic_spin() and simplify the former a bit. Furthermore, this is similar to what we have in rwsems. No logical changes. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/mcs: Remove obsolete commentDavidlohr Bueso1-3/+0
... as we clearly inline mcs_spin_lock() now. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/mutexes: Document quick lock release when unlockingDavidlohr Bueso1-2/+9
When unlocking, we always want to reach the slowpath with the lock's counter indicating it is unlocked. -- as returned by the asm fastpath call or by explicitly setting it. While doing so, at least in theory, we can optimize and allow faster lock stealing. When unlocking, we always want to reach the slowpath with the lock's counter indicating it is unlocked. -- as returned by the asm fastpath call or by explicitly setting it. While doing so, at least in theory, we can optimize and allow faster lock stealing. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-08-13locking/mutexes: Standardize arguments in lock/unlock slowpathsDavidlohr Bueso1-3/+4
Just how the locking-end behaves, when unlocking, go ahead and obtain the proper data structure immediately after the previous (asm-end) call exits and there are (probably) pending waiters. This simplifies a bit some of the layering. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17arch, locking: Ciao arch_mutex_cpu_relax()Davidlohr Bueso5-16/+13
The arch_mutex_cpu_relax() function, introduced by 34b133f, is hacky and ugly. It was added a few years ago to address the fact that common cpu_relax() calls include yielding on s390, and thus impact the optimistic spinning functionality of mutexes. Nowadays we use this function well beyond mutexes: rwsem, qrwlock, mcs and lockref. Since the macro that defines the call is in the mutex header, any users must include mutex.h and the naming is misleading as well. This patch (i) renames the call to cpu_relax_lowlatency ("relax, but only if you can do it with very low latency") and (ii) defines it in each arch's asm/processor.h local header, just like for regular cpu_relax functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax, and thus we can take it out of mutex.h. While this can seem redundant, I believe it is a good choice as it allows us to move out arch specific logic from generic locking primitives and enables future(?) archs to transparently define it, similarly to System Z. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharat Bhushan <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Howells <[email protected]> Cc: David S. Miller <[email protected]> Cc: Deepthi Dharwar <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hirokazu Takata <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Joe Perches <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Joseph Myers <[email protected]> Cc: Kees Cook <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Qais Yousef <[email protected]> Cc: Qiaowei Ren <[email protected]> Cc: Rafael Wysocki <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Miao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Stratos Karafotis <[email protected]> Cc: Tim Chen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Kulikov <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: Wolfram Sang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17locking/lockdep: Only ask for /proc/lock_stat output when availableAndreas Gruenbacher1-0/+2
When lockdep turns itself off, the following message is logged: Please attach the output of /proc/lock_stat to the bug report Omit this message when CONFIG_LOCK_STAT is off, and /proc/lock_stat doesn't exist. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17Merge branch 'locking/urgent' into locking/core, before applying larger ↵Ingo Molnar6-44/+77
changes and to refresh the branch with fixes Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNERDavidlohr Bueso2-3/+3
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER), encapsulate the dependencies for rwsem optimistic spinning. No logical changes here as it continues to depend on both SMP and the XADD algorithm variant. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Jason Low <[email protected]> [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ] Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Chris Mason <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Waiman Long <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Rename 'activity' to 'count'Peter Zijlstra1-14/+14
There are two definitions of struct rw_semaphore, one in linux/rwsem.h and one in linux/rwsem-spinlock.h. For some reason they have different names for the initial field. This makes it impossible to use C99 named initialization for __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing along with the structure definitions. The simpler patch is renaming the rwsem-spinlock variant to match the regular rwsem. This allows us to switch to C99 named initialization. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Micro-optimize osq_unlock()Jason Low1-2/+2
In the unlock function of the cancellable MCS spinlock, the first thing we do is to retrive the current CPU's osq node. However, due to the changes made in the previous patch, in the common case where the lock is not contended, we wouldn't need to access the current CPU's osq node anymore. This patch optimizes this by only retriving this CPU's osq node after we attempt the initial cmpxchg to unlock the osq and found that its contended. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Introduce and use init macro and function for osq locksJason Low2-2/+2
Currently, we initialize the osq lock by directly setting the lock's values. It would be preferable if we use an init macro to do the initialization like we do with other locks. This patch introduces and uses a macro and function for initializing the osq lock. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overheadJason Low4-11/+44
The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()Jason Low2-16/+16
Currently, the per-cpu nodes structure for the cancellable MCS spinlock is named "optimistic_spin_queue". However, in a follow up patch in the series we will be introducing a new structure that serves as the new "handle" for the lock. It would make more sense if that structure is named "optimistic_spin_queue". Additionally, since the current use of the "optimistic_spin_queue" structure are "nodes", it might be better if we rename them to "node" anyway. This preparatory patch renames all current "optimistic_spin_queue" to "optimistic_spin_node". Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Allow conservative optimistic spinning when readers have lockJason Low1-5/+5
Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning") introduced a major performance regression for workloads such as xfs_repair which mix read and write locking of the mmap_sem across many threads. The result was xfs_repair ran 5x slower on 3.16-rc2 than on 3.15 and using 20x more system CPU time. Perf profiles indicate in some workloads that significant time can be spent spinning on !owner. This is because we don't set the lock owner when readers(s) obtain the rwsem. In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll return false if there is no lock owner. The rationale is that if we just entered the slowpath, yet there is no lock owner, then there is a possibility that a reader has the lock. To be conservative, we'll avoid spinning in these situations. This patch reduced the total run time of the xfs_repair workload from about 4 minutes 24 seconds down to approximately 1 minute 26 seconds, back to close to the same performance as on 3.15. Retesting of AIM7, which were some of the workloads used to test the original optimistic spinning code, confirmed that we still get big performance gains with optimistic spinning, even with this additional regression fix. Davidlohr found that while the 'custom' workload took a performance hit of ~-14% to throughput for >300 users with this additional patch, the overall gain with optimistic spinning is still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users. Tested-by: Dave Chinner <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox Signed-off-by: Ingo Molnar <[email protected]>
2014-07-05locking/mutexes: Optimize mutex trylock slowpathJason Low1-0/+4
The mutex_trylock() function calls into __mutex_trylock_fastpath() when trying to obtain the mutex. On 32 bit x86, in the !__HAVE_ARCH_CMPXCHG case, __mutex_trylock_fastpath() calls directly into __mutex_trylock_slowpath() regardless of whether or not the mutex is locked. In __mutex_trylock_slowpath(), we then acquire the wait_lock spinlock, xchg() lock->count with -1, then set lock->count back to 0 if there are no waiters, and return true if the prev lock count was 1. However, if the mutex is already locked, then there isn't much point in attempting all of the above expensive operations. In this patch, we only attempt the above trylock operations if the mutex is unlocked. Signed-off-by: Jason Low <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-05locking/mutexes: Try to acquire mutex only if it is unlockedJason Low1-3/+4
Upon entering the slowpath in __mutex_lock_common(), we try once more to acquire the mutex. We only try to acquire if (lock->count >= 0). However, what we actually want here is to try to acquire if the mutex is unlocked (lock->count == 1). This patch changes it so that we only try-acquire the mutex upon entering the slowpath if it is unlocked, rather than if the lock count is non-negative. This helps further reduce unnecessary atomic xchg() operations. Furthermore, this patch uses !mutex_is_locked(lock) to do the initial checks for if the lock is free rather than directly calling atomic_read() on the lock->count, in order to improve readability. Signed-off-by: Jason Low <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-05locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macroJason Low1-10/+8
MUTEX_SHOW_NO_WAITER() is a macro which checks for if there are "no waiters" on a mutex by checking if the lock count is non-negative. Based on feedback from the discussion in the earlier version of this patchset, the macro is not very readable. Furthermore, checking lock->count isn't always the correct way to determine if there are "no waiters" on a mutex. For example, a negative count on a mutex really only means that there "potentially" are waiters. Likewise, there can be waiters on the mutex even if the count is non-negative. Thus, "MUTEX_SHOW_NO_WAITER" doesn't always do what the name of the macro suggests. So this patch deletes the MUTEX_SHOW_NO_WAITERS() macro, directly use atomic_read() instead of the macro, and adds comments which elaborate on how the extra atomic_read() checks can help reduce unnecessary xchg() operations. Signed-off-by: Jason Low <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-05locking/mutexes: Correct documentation on mutex optimistic spinningJason Low1-6/+4
The mutex optimistic spinning documentation states that we spin for acquisition when we find that there are no pending waiters. However, in actuality, whether or not there are waiters for the mutex doesn't determine if we will spin for it. This patch removes that statement and also adds a comment which mentions that we spin for the mutex while we don't need to reschedule. Signed-off-by: Jason Low <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-06-21rtmutex: Avoid pointless requeueing in the deadlock detection chain walkThomas Gleixner1-7/+70
In case the dead lock detector is enabled we follow the lock chain to the end in rt_mutex_adjust_prio_chain, even if we could stop earlier due to the priority/waiter constellation. But once we are no longer the top priority waiter in a certain step or the task holding the lock has already the same priority then there is no point in dequeing and enqueing along the lock chain as there is no change at all. So stop the queueing at this point. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Lai Jiangshan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-06-21rtmutex: Cleanup deadlock detector debug logicThomas Gleixner5-28/+83
The conditions under which deadlock detection is conducted are unclear and undocumented. Add constants instead of using 0/1 and provide a selection function which hides the additional debug dependency from the calling code. Add comments where needed. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Lai Jiangshan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-06-21rtmutex: Confine deadlock logic to futexThomas Gleixner2-33/+33
The deadlock logic is only required for futexes. Remove the extra arguments for the public functions and also for the futex specific ones which get always called with deadlock detection enabled. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]>
2014-06-21rtmutex: Simplify remove_waiter()Thomas Gleixner1-17/+19
Exit right away, when the removed waiter was not the top priority waiter on the lock. Get rid of the extra indent level. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Lai Jiangshan <[email protected]>
2014-06-21rtmutex: Document pi chain walkThomas Gleixner1-9/+91
Add commentry to document the chain walk and the protection mechanisms and their scope. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]>
2014-06-21rtmutex: Clarify the boost/deboost partThomas Gleixner1-10/+48
Add a separate local variable for the boost/deboost logic to make the code more readable. Add comments where appropriate. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]>
2014-06-21rtmutex: No need to keep task ref for lock owner checkThomas Gleixner1-2/+3
There is no point to keep the task ref across the check for lock owner. Drop the ref before that, so the protection context is clear. Found while documenting the chain walk. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Lai Jiangshan <[email protected]>
2014-06-21rtmutex: Simplify and document try_to_take_rtmutex()Thomas Gleixner1-45/+88
The current implementation of try_to_take_rtmutex() is correct, but requires more than a single brain twist to understand the clever encoded conditionals. Untangle it and document the cases proper. Looks less efficient at the first glance, but actually reduces the binary code size on x8664 by 80 bytes. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]>
2014-06-21rtmutex: Simplify rtmutex_slowtrylock()Thomas Gleixner1-11/+20
Oleg noticed that rtmutex_slowtrylock() has a pointless check for rt_mutex_owner(lock) != current. To avoid calling try_to_take_rtmutex() we really want to check whether the lock has an owner at all or whether the trylock failed because the owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers the lock is owned by caller situation as well. We can actually do this check lockless. trylock is taking a chance whether we take lock->wait_lock to do the check or not. Add comments to the function while at it. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Lai Jiangshan <[email protected]>
2014-06-21Merge branch 'locking/urgent' into locking/coreThomas Gleixner3-35/+218
Reason: Required to add more rtmutex robustness changes on top of those already in mainline. Signed-off-by: Thomas Gleixner <[email protected]>
2014-06-21Merge branch 'locking-urgent-for-linus.patch' of ↵Linus Torvalds3-35/+218
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rtmutex fixes from Thomas Gleixner: "Another three patches to make the rtmutex code more robust. That's the last urgent fallout from the big futex/rtmutex investigation" * 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Plug slow unlock race rtmutex: Detect changes in the pi lock chain rtmutex: Handle deadlock detection smarter
2014-06-16rtmutex: Plug slow unlock raceThomas Gleixner1-6/+109
When the rtmutex fast path is enabled the slow unlock function can create the following situation: spin_lock(foo->m->wait_lock); foo->m->owner = NULL; rt_mutex_lock(foo->m); <-- fast path free = atomic_dec_and_test(foo->refcnt); rt_mutex_unlock(foo->m); <-- fast path if (free) kfree(foo); spin_unlock(foo->m->wait_lock); <--- Use after free. Plug the race by changing the slow unlock to the following scheme: while (!rt_mutex_has_waiters(m)) { /* Clear the waiters bit in m->owner */ clear_rt_mutex_waiters(m); owner = rt_mutex_owner(m); spin_unlock(m->wait_lock); if (cmpxchg(m->owner, owner, 0) == owner) return; spin_lock(m->wait_lock); } So in case of a new waiter incoming while the owner tries the slow path unlock we have two situations: unlock(wait_lock); lock(wait_lock); cmpxchg(p, owner, 0) == owner mark_rt_mutex_waiters(lock); acquire(lock); Or: unlock(wait_lock); lock(wait_lock); mark_rt_mutex_waiters(lock); cmpxchg(p, owner, 0) != owner enqueue_waiter(); unlock(wait_lock); lock(wait_lock); wakeup_next waiter(); unlock(wait_lock); lock(wait_lock); acquire(lock); If the fast path is disabled, then the simple m->owner = NULL; unlock(m->wait_lock); is sufficient as all access to m->owner is serialized via m->wait_lock; Also document and clarify the wakeup_next_waiter function as suggested by Oleg Nesterov. Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Thomas Gleixner <[email protected]>