aboutsummaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)AuthorFilesLines
2016-03-22kernel: add kcov code coverageDmitry Vyukov1-0/+3
kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [[email protected]: make task_struct.kcov_mode have type `enum kcov_mode'] [[email protected]: unbreak allmodconfig] [[email protected]: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: syzkaller <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Tavis Ormandy <[email protected]> Cc: Will Deacon <[email protected]> Cc: Quentin Casasnovas <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Kees Cook <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Sasha Levin <[email protected]> Cc: David Drysdale <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-03-15tags: Fix DEFINE_PER_CPU expansionsPeter Zijlstra1-2/+1
$ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-29locking/lockdep: Detect chain_key collisionsIngo Molnar1-8/+51
Add detection for chain_key collision under CONFIG_DEBUG_LOCKDEP. When a collision is detected the problem is reported and all lock debugging is turned off. Tested using liblockdep and the added tests before and after applying the fix, confirming both that the code added for the detection correctly reports the problem and that the fix actually fixes it. Tested tweaking lockdep to generate false collisions and verified that the problem is reported and that lock debugging is turned off. Also tested with lockdep's test suite after applying the patch: [ 0.000000] Good, all 253 testcases passed! | Signed-off-by: Alfredo Alvarez Fernandez <[email protected]> Cc: Alfredo Alvarez Fernandez <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1455864533-7536-4-git-send-email-alfredoalvarezernandez@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/lockdep: Prevent chain_key collisionsAlfredo Alvarez Fernandez1-8/+6
The chain_key hashing macro iterate_chain_key(key1, key2) does not generate a new different value if both key1 and key2 are 0. In that case the generated value is again 0. This can lead to collisions which can result in lockdep not detecting deadlocks or circular dependencies. Avoid the problem by using class_idx (1-based) instead of class id (0-based) as an input for the hashing macro 'key2' in iterate_chain_key(key1, key2). The use of class id created collisions in cases like the following: 1.- Consider an initial state in which no class has been acquired yet. Under these circumstances an AA deadlock will not be detected by lockdep: lock [key1,key2]->new key (key1=old chain_key, key2=id) -------------------------- A [0,0]->0 A [0,0]->0 (collision) The newly generated chain_key collides with the one used before and as a result the check for a deadlock is skipped A simple test using liblockdep and a pthread mutex confirms the problem: (omitting stack traces) new class 0xe15038: 0x7ffc64950f20 acquire class [0xe15038] 0x7ffc64950f20 acquire class [0xe15038] 0x7ffc64950f20 hash chain already cached, key: 0000000000000000 tail class: [0xe15038] 0x7ffc64950f20 2.- Consider an ABBA in 2 different tasks and no class yet acquired. T1 [key1,key2]->new key T2[key1,key2]->new key -- -- A [0,0]->0 B [0,1]->1 B [0,1]->1 (collision) A In this case the collision prevents lockdep from creating the new dependency A->B. This in turn results in lockdep not detecting the circular dependency when T2 acquires A. Signed-off-by: Alfredo Alvarez Fernandez <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1455147212-2389-4-git-send-email-alfredoalvarezernandez@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/mutex: Allow next waiter lockless wakeupDavidlohr Bueso1-2/+3
Make use of wake-queues and enable the wakeup to occur after releasing the wait_lock. This is similar to what we do with rtmutex top waiter, slightly shortening the critical region and allow other waiters to acquire the wait_lock sooner. In low contention cases it can also help the recently woken waiter to find the wait_lock available (fastpath) when it continues execution. Reviewed-by: Waiman Long <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ding Tianhong <[email protected]> Cc: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/pvqspinlock: Enable slowpath locking count trackingWaiman Long2-0/+8
This patch enables the tracking of the number of slowpath locking operations performed. This can be used to compare against the number of lock stealing operations to see what percentage of locks are stolen versus acquired via the regular slowpath. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/qspinlock: Use smp_cond_acquire() in pending codeWaiman Long1-4/+3
The newly introduced smp_cond_acquire() was used to replace the slowpath lock acquisition loop. Similarly, the new function can also be applied to the pending bit locking loop. This patch uses the new function in that loop. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/pvqspinlock: Move lock stealing count tracking code into ↵Waiman Long2-20/+9
pv_queued_spin_steal_lock() This patch moves the lock stealing count tracking code into pv_queued_spin_steal_lock() instead of via a jacket function simplifying the code. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29locking/mcs: Fix mcs_spin_lock() orderingPeter Zijlstra1-1/+7
Similar to commit b4b29f94856a ("locking/osq: Fix ordering of node initialisation in osq_lock") the use of xchg_acquire() is fundamentally broken with MCS like constructs. Furthermore, it turns out we rely on the global transitivity of this operation because the unlock path observes the pointer with a READ_ONCE(), not an smp_load_acquire(). This is non-critical because the MCS code isn't actually used and mostly serves as documentation, a stepping stone to the more complex things we've build on top of the idea. Reported-by: Andrea Parri <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Fixes: 3552a07a9c4a ("locking/mcs: Use acquire/release semantics") Signed-off-by: Ingo Molnar <[email protected]>
2016-02-09locking/lockdep: Eliminate lockdep_init()Andrey Ryabinin1-59/+0
Lockdep is initialized at compile time now. Get rid of lockdep_init(). Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Krinkin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-09locking/lockdep: Convert hash tables to hlistsAndrew Morton1-23/+19
Mike said: : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e. : kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error : message. : : The problem is that ubsan callbacks use spinlocks and might be called : before lockdep is initialized. Particularly this line in the : reserve_ebda_region function causes problem: : : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); : : If i put lockdep_init() before reserve_ebda_region call in : x86_64_start_reservations kernel loads well. Fix this ordering issue permanently: change lockdep so that it uses hlists for the hash tables. Unlike a list_head, an hlist_head is in its initialized state when it is all-zeroes, so lockdep is ready for operation immediately upon boot - lockdep_init() need not have run. The patch will also save some memory. Probably lockdep_init() and lockdep_initialized can be done away with now. Suggested-by: Mike Krinkin <[email protected]> Reported-by: Mike Krinkin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-09locking/lockdep: Fix stack trace caching logicDmitry Vyukov1-6/+10
check_prev_add() caches saved stack trace in static trace variable to avoid duplicate save_trace() calls in dependencies involving trylocks. But that caching logic contains a bug. We may not save trace on first iteration due to early return from check_prev_add(). Then on the second iteration when we actually need the trace we don't save it because we think that we've already saved it. Let check_prev_add() itself control when stack is saved. There is another bug. Trace variable is protected by graph lock. But we can temporary release graph lock during printing. Fix this by invalidating cached stack trace when we release graph lock. Signed-off-by: Dmitry Vyukov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-26rtmutex: Make wait_lock irq safeThomas Gleixner1-63/+72
Sasha reported a lockdep splat about a potential deadlock between RCU boosting rtmutex and the posix timer it_lock. CPU0 CPU1 rtmutex_lock(&rcu->rt_mutex) spin_lock(&rcu->rt_mutex.wait_lock) local_irq_disable() spin_lock(&timer->it_lock) spin_lock(&rcu->mutex.wait_lock) --> Interrupt spin_lock(&timer->it_lock) This is caused by the following code sequence on CPU1 rcu_read_lock() x = lookup(); if (x) spin_lock_irqsave(&x->it_lock); rcu_read_unlock(); return x; We could fix that in the posix timer code by keeping rcu read locked across the spinlocked and irq disabled section, but the above sequence is common and there is no reason not to support it. Taking rt_mutex.wait_lock irq safe prevents the deadlock. Reported-by: Sasha Levin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul McKenney <[email protected]>
2016-01-11Merge branch 'locking-core-for-linus' of ↵Linus Torvalds3-58/+576
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "So we have a laundry list of locking subsystem changes: - continuing barrier API and code improvements - futex enhancements - atomics API improvements - pvqspinlock enhancements: in particular lock stealing and adaptive spinning - qspinlock micro-enhancements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op futex: Cleanup the goto confusion in requeue_pi() futex: Remove pointless put_pi_state calls in requeue() futex: Document pi_state refcounting in requeue code futex: Rename free_pi_state() to put_pi_state() futex: Drop refcount if requeue_pi() acquired the rtmutex locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation lcoking/barriers, arch: Use smp barriers in smp_store_release() locking/cmpxchg, arch: Remove tas() definitions locking/pvqspinlock: Queue node adaptive spinning locking/pvqspinlock: Allow limited lock stealing locking/pvqspinlock: Collect slowpath lock statistics sched/core, locking: Document Program-Order guarantees locking, sched: Introduce smp_cond_acquire() and use it locking/pvqspinlock, x86: Optimize the PV unlock code path locking/qspinlock: Avoid redundant read of next pointer locking/qspinlock: Prefetch the next node cacheline locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() atomics: Add test for atomic operations with _relaxed variants
2015-12-17locking/osq: Fix ordering of node initialisation in osq_lockWill Deacon1-3/+5
The Cavium guys reported a soft lockup on their arm64 machine, caused by commit c55a6ffa6285 ("locking/osq: Relax atomic semantics"): mutex_optimistic_spin+0x9c/0x1d0 __mutex_lock_slowpath+0x44/0x158 mutex_lock+0x54/0x58 kernfs_iop_permission+0x38/0x70 __inode_permission+0x88/0xd8 inode_permission+0x30/0x6c link_path_walk+0x68/0x4d4 path_openat+0xb4/0x2bc do_filp_open+0x74/0xd0 do_sys_open+0x14c/0x228 SyS_openat+0x3c/0x48 el0_svc_naked+0x24/0x28 This is because in osq_lock we initialise the node for the current CPU: node->locked = 0; node->next = NULL; node->cpu = curr; and then publish the current CPU in the lock tail: old = atomic_xchg_acquire(&lock->tail, curr); Once the update to lock->tail is visible to another CPU, the node is then live and can be both read and updated by concurrent lockers. Unfortunately, the ACQUIRE semantics of the xchg operation mean that there is no guarantee the contents of the node will be visible before lock tail is updated. This can lead to lock corruption when, for example, a concurrent locker races to set the next field. Fixes: c55a6ffa6285 ("locking/osq: Relax atomic semantics"): Reported-by: David Daney <[email protected]> Reported-by: Andrew Pinski <[email protected]> Tested-by: Andrew Pinski <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2015-12-04locking/pvqspinlock: Queue node adaptive spinningWaiman Long3-4/+50
In an overcommitted guest where some vCPUs have to be halted to make forward progress in other areas, it is highly likely that a vCPU later in the spinlock queue will be spinning while the ones earlier in the queue would have been halted. The spinning in the later vCPUs is then just a waste of precious CPU cycles because they are not going to get the lock soon as the earlier ones have to be woken up and take their turn to get the lock. This patch implements an adaptive spinning mechanism where the vCPU will call pv_wait() if the previous vCPU is not running. Linux kernel builds were run in KVM guest on an 8-socket, 4 cores/socket Westmere-EX system and a 4-socket, 8 cores/socket Haswell-EX system. Both systems are configured to have 32 physical CPUs. The kernel build times before and after the patch were: Westmere Haswell Patch 32 vCPUs 48 vCPUs 32 vCPUs 48 vCPUs ----- -------- -------- -------- -------- Before patch 3m02.3s 5m00.2s 1m43.7s 3m03.5s After patch 3m03.0s 4m37.5s 1m43.0s 2m47.2s For 32 vCPUs, this patch doesn't cause any noticeable change in performance. For 48 vCPUs (over-committed), there is about 8% performance improvement. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-12-04locking/pvqspinlock: Allow limited lock stealingWaiman Long3-28/+155
This patch allows one attempt for the lock waiter to steal the lock when entering the PV slowpath. To prevent lock starvation, the pending bit will be set by the queue head vCPU when it is in the active lock spinning loop to disable any lock stealing attempt. This helps to reduce the performance penalty caused by lock waiter preemption while not having much of the downsides of a real unfair lock. The pv_wait_head() function was renamed as pv_wait_head_or_lock() as it was modified to acquire the lock before returning. This is necessary because of possible lock stealing attempts from other tasks. Linux kernel builds were run in KVM guest on an 8-socket, 4 cores/socket Westmere-EX system and a 4-socket, 8 cores/socket Haswell-EX system. Both systems are configured to have 32 physical CPUs. The kernel build times before and after the patch were: Westmere Haswell Patch 32 vCPUs 48 vCPUs 32 vCPUs 48 vCPUs ----- -------- -------- -------- -------- Before patch 3m15.6s 10m56.1s 1m44.1s 5m29.1s After patch 3m02.3s 5m00.2s 1m43.7s 3m03.5s For the overcommited case (48 vCPUs), this patch is able to reduce kernel build time by more than 54% for Westmere and 44% for Haswell. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-12-04locking/pvqspinlock: Collect slowpath lock statisticsWaiman Long2-5/+308
This patch enables the accumulation of kicking and waiting related PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration option is selected. It also enables the collection of data which enable us to calculate the kicking and wakeup latencies which have a heavy dependency on the CPUs being used. The statistical counters are per-cpu variables to minimize the performance overhead in their updates. These counters are exported via the debugfs filesystem under the qlockstat directory. When the corresponding debugfs files are read, summation and computing of the required data are then performed. The measured latencies for different CPUs are: CPU Wakeup Kicking --- ------ ------- Haswell-EX 63.6us 7.4us Westmere-EX 67.6us 9.3us The measured latencies varied a bit from run-to-run. The wakeup latency is much higher than the kicking latency. A sample of statistical counters after system bootup (with vCPU overcommit) was: pv_hash_hops=1.00 pv_kick_unlock=1148 pv_kick_wake=1146 pv_latency_kick=11040 pv_latency_wake=194840 pv_spurious_wakeup=7 pv_wait_again=4 pv_wait_head=23 pv_wait_node=1129 Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-12-04locking, sched: Introduce smp_cond_acquire() and use itPeter Zijlstra1-2/+1
Introduce smp_cond_acquire() which combines a control dependency and a read barrier to form acquire semantics. This primitive has two benefits: - it documents control dependencies, - its typically cheaper than using smp_load_acquire() in a loop. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23locking/pvqspinlock, x86: Optimize the PV unlock code pathWaiman Long1-16/+27
The unlock function in queued spinlocks was optimized for better performance on bare metal systems at the expense of virtualized guests. For x86-64 systems, the unlock call needs to go through a PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit registers before calling the real __pv_queued_spin_unlock() function. The thunk code may also be in a separate cacheline from __pv_queued_spin_unlock(). This patch optimizes the PV unlock code path by: 1) Moving the unlock slowpath code from the fastpath into a separate __pv_queued_spin_unlock_slowpath() function to make the fastpath as simple as possible.. 2) For x86-64, hand-coded an assembly function to combine the register saving thunk code with the fastpath code. Only registers that are used in the fastpath will be saved and restored. If the fastpath fails, the slowpath function will be called via another PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C __pv_queued_spin_unlock() code as the thunk saves and restores only one 32-bit register. With a microbenchmark of 5M lock-unlock loop, the table below shows the execution times before and after the patch with different number of threads in a VM running on a 32-core Westmere-EX box with x86-64 4.2-rc1 based kernels: Threads Before patch After patch % Change ------- ------------ ----------- -------- 1 134.1 ms 119.3 ms -11% 2 1286 ms 953 ms -26% 3 3715 ms 3480 ms -6.3% 4 4092 ms 3764 ms -8.0% Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23locking/qspinlock: Avoid redundant read of next pointerWaiman Long1-3/+6
With optimistic prefetch of the next node cacheline, the next pointer may have been properly inititalized. As a result, the reading of node->next in the contended path may be redundant. This patch eliminates the redundant read if the next pointer value is not NULL. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23locking/qspinlock: Prefetch the next node cachelineWaiman Long1-0/+10
A queue head CPU, after acquiring the lock, will have to notify the next CPU in the wait queue that it has became the new queue head. This involves loading a new cacheline from the MCS node of the next CPU. That operation can be expensive and add to the latency of locking operation. This patch addes code to optmistically prefetch the next MCS node cacheline if the next pointer is defined and it has been spinning for the MCS lock for a while. This reduces the locking latency and improves the system throughput. The performance change will depend on whether the prefetch overhead can be hidden within the latency of the lock spin loop. On really short critical section, there may not be performance gain at all. With longer critical section, however, it was found to have a performance boost of 5-10% over a range of different queue depths with a spinlock loop microbenchmark. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()Waiman Long1-5/+24
This patch replaces the cmpxchg() and xchg() calls in the native qspinlock code with the more relaxed _acquire or _release versions of those calls to enable other architectures to adopt queued spinlocks with less memory barrier performance overhead. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-11-23treewide: Remove old email addressPeter Zijlstra2-2/+2
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman1-1/+1
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle were: - sched/fair load tracking fixes and cleanups (Byungchul Park) - Make load tracking frequency scale invariant (Dietmar Eggemann) - sched/deadline updates (Juri Lelli) - stop machine fixes, cleanups and enhancements for bugs triggered by CPU hotplug stress testing (Oleg Nesterov) - scheduler preemption code rework: remove PREEMPT_ACTIVE and related cleanups (Peter Zijlstra) - Rework the sched_info::run_delay code to fix races (Peter Zijlstra) - Optimize per entity utilization tracking (Peter Zijlstra) - ... misc other fixes, cleanups and smaller updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() sched: Start stopper early stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() stop_machine: Ensure that a queued callback will be called before cpu_stop_park() sched/x86: Fix typo in __switch_to() comments sched/core: Remove a parameter in the migrate_task_rq() function sched/core: Drop unlikely behind BUG_ON() sched/core: Fix task and run queue sched_info::run_delay inconsistencies sched/numa: Fix task_tick_fair() from disabling numa_balancing sched/core: Add preempt_count invariant check sched/core: More notrace annotations sched/core: Kill PREEMPT_ACTIVE sched/core, sched/x86: Kill thread_info::saved_preempt_count sched/core: Simplify preempt_count tests sched/core: Robustify preemption leak checks sched/core: Stop setting PREEMPT_ACTIVE ...
2015-11-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds7-29/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking changes from Ingo Molnar: "The main changes in this cycle were: - More gradual enhancements to atomic ops: new atomic*_read_ctrl() ops, synchronize atomic_{read,set}() ordering requirements between architectures, add atomic_long_t bitops. (Peter Zijlstra) - Add _{relaxed|acquire|release}() variants for inc/dec atomics and use them in various locking primitives: mutex, rtmutex, mcs, rwsem. This enables weakly ordered architectures (such as arm64) to make use of more locking related optimizations. (Davidlohr Bueso) - Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon) - Futex kernel data cache footprint micro-optimization. (Rasmus Villemoes) - pvqspinlock runtime overhead micro-optimization. (Waiman Long) - misc smaller fixlets" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec} locking/rwsem: Use acquire/release semantics locking/mcs: Use acquire/release semantics locking/rtmutex: Use acquire/release semantics locking/mutex: Use acquire/release semantics locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics atomic: Implement atomic_read_ctrl() atomic, arch: Audit atomic_{read,set}() atomic: Add atomic_long_t bitops futex: Force hot variables into a single cache line locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL locking/osq: Relax atomic semantics locking/qrwlock: Rename ->lock to ->wait_lock locking/Documentation/lockstat: Fix typo - lokcing -> locking locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
2015-10-19Merge branch 'for-mingo' of ↵Ingo Molnar2-60/+194
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Miscellaneous fixes. (Paul E. McKenney, Boqun Feng, Oleg Nesterov, Patrick Marlier) - Improvements to expedited grace periods. (Paul E. McKenney) - Performance improvements to and locktorture tests for percpu-rwsem. (Oleg Nesterov, Paul E. McKenney) - Torture-test changes. (Paul E. McKenney, Davidlohr Bueso) - Documentation updates. (Paul E. McKenney) Signed-off-by: Ingo Molnar <[email protected]>
2015-10-07Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and ↵Paul E. McKenney2-60/+194
'torture.2015.10.06a' into HEAD doc.2015.10.06a: Documentation updates. percpu-rwsem.2015.10.06a: Optimization of per-CPU reader-writer semaphores. torture.2015.10.06a: Torture-test updates.
2015-10-06locktorture: Fix module unwind when bad torture_type specifiedPaul E. McKenney1-3/+3
The locktorture module has a list of torture types, and specifying a type not on this list is supposed to cleanly fail the module load. Unfortunately, the "fail" happens without the "cleanly". This commit therefore adds the needed clean-up after an incorrect torture_type. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/percpu-rwsem: Clean up the lockdep annotations in percpu_down_read()Oleg Nesterov1-5/+5
Based on Peter Zijlstra's earlier patch. Change percpu_down_read() to use __down_read(), this way we can do rwsem_acquire_read() unconditionally at the start to make this code more symmetric and clean. Originally-From: Peter Zijlstra <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/percpu-rwsem: Fix the comments outdated by rcu_syncOleg Nesterov1-39/+11
Update the comments broken by the previous change. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/percpu-rwsem: Make use of the rcu_sync infrastructureOleg Nesterov1-11/+7
Currently down_write/up_write calls synchronize_sched_expedited() twice, which is evil. Change this code to rely on rcu-sync primitives. This avoids the _expedited "big hammer", and this can be faster in the contended case or even in the case when a single thread does down_write/up_write in a loop. Of course, a single down_write() will take more time, but otoh it will be much more friendly to the whole system. To simplify the review this patch doesn't update the comments, fixed by the next change. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/percpu-rwsem: Make percpu_free_rwsem() after kzalloc() safeOleg Nesterov1-0/+7
This is the temporary ugly hack which will be reverted later. We only need it to ensure that the next patch will not break "change sb_writers to use percpu_rw_semaphore" patches routed via the VFS tree. The alloc_super()->destroy_super() error path assumes that it is safe to call percpu_free_rwsem() after kzalloc() without percpu_init_rwsem(), so let's not disappoint it. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locktorture: Add torture tests for percpu_rwsemPaul E. McKenney1-0/+44
This commit adds percpu_rwsem tests based on the earlier rwsem tests. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/percpu-rwsem: Export symbols for locktorturePaul E. McKenney1-0/+5
This commit exports percpu_down_read(), percpu_down_write(), __percpu_init_rwsem(), percpu_up_read(), and percpu_up_write() to allow locktorture to test them when built as a module. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locktorture: Support rtmutex torturingDavidlohr Bueso1-2/+112
Real time mutexes is one of the few general primitives that we do not have in locktorture. Address this -- a few considerations: o To spice things up, enable competing thread(s) to become rt, such that we can stress different prio boosting paths in the rtmutex code. Introduce a ->task_boost callback, only used by rtmutex-torturer. Tasks will boost/deboost around every 50k (arbitrarily) lock/unlock operations. o Hold times are similar to what we have for other locks: only occasionally having longer hold times (per ~200k ops). So we roughly do two full rt boost+deboosting ops with short hold times. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2015-10-06locking/rwsem: Use acquire/release semanticsDavidlohr Bueso1-2/+3
As of 654672d4ba1 (locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations) and 6d79ef2d30e (locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly ordered archs can benefit from more relaxed use of barriers when locking and unlocking, instead of regular full barrier semantics. While currently only arm64 supports such optimizations, updating corresponding locking primitives serves for other archs to immediately benefit as well, once the necessary machinery is implemented of course. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul E.McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-06locking/mcs: Use acquire/release semanticsDavidlohr Bueso1-2/+2
As of 654672d4ba1 (locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations) and 6d79ef2d30e (locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly ordered archs can benefit from more relaxed use of barriers when locking and unlocking, instead of regular full barrier semantics. While currently only arm64 supports such optimizations, updating corresponding locking primitives serves for other archs to immediately benefit as well, once the necessary machinery is implemented of course. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul E.McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-06locking/rtmutex: Use acquire/release semanticsDavidlohr Bueso1-9/+21
As of 654672d4ba1 (locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations) and 6d79ef2d30e (locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly ordered archs can benefit from more relaxed use of barriers when locking and unlocking, instead of regular full barrier semantics. While currently only arm64 supports such optimizations, updating corresponding locking primitives serves for other archs to immediately benefit as well, once the necessary machinery is implemented of course. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul E.McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-06locking/mutex: Use acquire/release semanticsDavidlohr Bueso1-4/+5
As of 654672d4ba1 (locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations) and 6d79ef2d30e (locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'), weakly ordered archs can benefit from more relaxed use of barriers when locking and unlocking, instead of regular full barrier semantics. While currently only arm64 supports such optimizations, updating corresponding locking primitives serves for other archs to immediately benefit as well, once the necessary machinery is implemented of course. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul E.McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-10-06Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar1-5/+5
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2015-09-23Merge branch 'locking/urgent' into locking/core, to pick up fixes before ↵Ingo Molnar1-5/+5
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2015-09-23sched/deadline, locking/rtmutex: Fix open coded check in rt_mutex_waiter_less()Juri Lelli1-1/+2
rt_mutex_waiter_less() check of task deadlines is open coded. Since this is subject to wraparound bugs, make it use the correct helper. Reported-by: Luca Abeni <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-23locking/lockdep: Fix hlock->pin_count reset on lock stack rebuildsPeter Zijlstra1-5/+5
Various people reported hitting the "unpinning an unpinned lock" warning. As it turns out there are 2 places where we take a lock out of the middle of a stack, and in those cases it would fail to preserve the pin_count when rebuilding the lock stack. Reported-by: Sasha Levin <[email protected]> Reported-by: Tim Spriggs <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-18locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VALWaiman Long1-5/+1
If _Q_SLOW_VAL has been set, the vCPU state must have been vcpu_hashed. The extra check at the end of __pv_queued_spin_unlock() is unnecessary and can be removed. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-18locking/osq: Relax atomic semanticsDavidlohr Bueso1-3/+8
... by using acquire/release for ops around the lock->tail. As such, weakly ordered archs can benefit from more relaxed use of barriers when issuing atomics. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-18locking/qrwlock: Rename ->lock to ->wait_lockDavidlohr Bueso1-4/+4
... trivial, but reads a little nicer when we name our actual primitive 'lock'. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-17Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Spinlock performance regression fix, plus documentation fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/static_keys: Fix up the static keys documentation locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support locking/qspinlock/x86: Fix performance regression under unaccelerated VMs locking/static_keys: Fix a silly typo
2015-09-13Merge tag 'v4.3-rc1' into locking/core, to refresh the treeIngo Molnar2-2/+14
Signed-off-by: Ingo Molnar <[email protected]>