aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-04-09locking/mutex: Further simplify mutex_spin_on_owner()Jason Low1-10/+4
Similar to what Linus suggested for rwsem_spin_on_owner(), in mutex_spin_on_owner() instead of having while (true) and breaking out of the spin loop on lock->owner != owner, we can have the loop directly check for while (lock->owner == owner) to improve the readability of the code. It also shrinks the code a bit: text data bss dec hex filename 3721 0 0 3721 e89 mutex.o.before 3705 0 0 3705 e79 mutex.o.after Signed-off-by: Jason Low <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Added code generation info. ] Signed-off-by: Ingo Molnar <[email protected]>
2015-03-25locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() ↵Tom(JeHyeon) Yeon1-1/+1
documentation as well The following commit changed "deadlock_detect" to "chwalk": 8930ed80f970 ("rtmutex: Cleanup deadlock detector debug logic") do that rename in the function's documentation as well. Signed-off-by: Tom(JeHyeon) Yeon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-03-07locking/rwsem: Fix lock optimistic spinning when owner is not runningJason Low1-20/+11
Ming reported soft lockups occurring when running xfstest due to the following tip:locking/core commit: b3fd4f03ca0b ("locking/rwsem: Avoid deceiving lock spinners") When doing optimistic spinning in rwsem, threads should stop spinning when the lock owner is not running. While a thread is spinning on owner, if the owner reschedules, owner->on_cpu returns false and we stop spinning. However, this commit essentially caused the check to get ignored because when we break out of the spin loop due to !on_cpu, we continue spinning if sem->owner != NULL. This patch fixes this by making sure we stop spinning if the owner is not running. Furthermore, just like with mutexes, refactor the code such that we don't have separate checks for owner_running(). This makes it more straightforward in terms of why we exit the spin on owner loop and we would also avoid needing to "guess" why we broke out of the loop to make this more readable. Reported-and-tested-by: Ming Lei <[email protected]> Signed-off-by: Jason Low <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Dave Jones <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/1425714331.2475.388.camel@j-VirtualBox Signed-off-by: Ingo Molnar <[email protected]>
2015-02-24locking: Remove ACCESS_ONCE() usageDavidlohr Bueso4-19/+19
With the new standardized functions, we can replace all ACCESS_ONCE() calls across relevant locking - this includes lockref and seqlock while at it. ACCESS_ONCE() does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-24Merge tag 'v4.0-rc1' into locking/core, to refresh the tree before merging ↵Ingo Molnar88-857/+2124
new changes Signed-off-by: Ingo Molnar <[email protected]>
2015-02-21Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds1-0/+12
Pull MIPS updates from Ralf Baechle: "This is the main pull request for MIPS: - a number of fixes that didn't make the 3.19 release. - a number of cleanups. - preliminary support for Cavium's Octeon 3 SOCs which feature up to 48 MIPS64 R3 cores with FPU and hardware virtualization. - support for MIPS R6 processors. Revision 6 of the MIPS architecture is a major revision of the MIPS architecture which does away with many of original sins of the architecture such as branch delay slots. This and other changes in R6 require major changes throughout the entire MIPS core architecture code and make up for the lion share of this pull request. - finally some preparatory work for eXtendend Physical Address support, which allows support of up to 40 bit of physical address space on 32 bit processors" [ Ahh, MIPS can't leave the PAE brain damage alone. It's like every CPU architect has to make that mistake, but pee in the snow by changing the TLA. But whether it's called PAE, LPAE or XPA, it's horrid crud - Linus ] * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (114 commits) MIPS: sead3: Corrected get_c0_perfcount_int MIPS: mm: Remove dead macro definitions MIPS: OCTEON: irq: add CIB and other fixes MIPS: OCTEON: Don't do acknowledge operations for level triggered irqs. MIPS: OCTEON: More OCTEONIII support MIPS: OCTEON: Remove setting of processor specific CVMCTL icache bits. MIPS: OCTEON: Core-15169 Workaround and general CVMSEG cleanup. MIPS: OCTEON: Update octeon-model.h code for new SoCs. MIPS: OCTEON: Implement DCache errata workaround for all CN6XXX MIPS: OCTEON: Add little-endian support to asm/octeon/octeon.h MIPS: OCTEON: Implement the core-16057 workaround MIPS: OCTEON: Delete unused COP2 saving code MIPS: OCTEON: Use correct instruction to read 64-bit COP0 register MIPS: OCTEON: Save and restore CP2 SHA3 state MIPS: OCTEON: Fix FP context save. MIPS: OCTEON: Save/Restore wider multiply registers in OCTEON III CPUs MIPS: boot: Provide more uImage options MIPS: Remove unneeded #ifdef __KERNEL__ from asm/processor.h MIPS: ip22-gio: Remove legacy suspend/resume support mips: pci: Add ifdef around pci_proc_domain ...
2015-02-21Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull ntp fix from Ingo Molnar: "An adjtimex interface regression fix for 32-bit systems" [ A check that was added in a previous commit is really only a concern for 64bit systems, but was applied to both 32 and 64bit systems, which results in breaking 32bit systems. Thus the fix here is to make the check only apply to 64bit systems ] * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ntp: Fixup adjtimex freq validation on 32-bit systems
2015-02-21Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: the paravirt spin_unlock() corruption/crash fix, and an rtmutex NULL dereference crash fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlocks/paravirt: Fix memory corruption on unlock locking/rtmutex: Avoid a NULL pointer dereference on deadlock
2015-02-21Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds5-99/+148
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Thiscontains misc fixes: preempt_schedule_common() and io_schedule() recursion fixes, sched/dl fixes, a completion_done() revert, two sched/rt fixes and a comment update patch" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Avoid obvious configuration fail sched/autogroup: Fix failure to set cpu.rt_runtime_us sched/dl: Do update_rq_clock() in yield_task_dl() sched: Prevent recursion in io_schedule() sched/completion: Serialize completion_done() with complete() sched: Fix preempt_schedule_common() triggering tracing recursion sched/dl: Prevent enqueue of a sleeping task in dl_task_timer() sched: Make dl_task_time() use task_rq_lock() sched: Clarify ordering between task_rq_lock() and move_queued_task()
2015-02-21Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rcu fix and x86 irq fix from Ingo Molnar: - Fix a bug that caused an RCU warning splat. - Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ registry fix. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Clear need_qs flag to prevent splat * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() x86/irq: Fix regression caused by commit b568b8601f05
2015-02-20Merge tag 'for_linux-3.20-rc1' of ↵Linus Torvalds5-23/+64
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull kgdb/kdb updates from Jason Wessel: "KGDB/KDB New: - KDB: improved searching - No longer enter debug core on panic if panic timeout is set KGDB/KDB regressions / cleanups - fix pdf doc build errors - prevent junk characters on kdb console from printk levels" * tag 'for_linux-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: kgdb, docs: Fix <para> pdfdocs build errors debug: prevent entering debug mode on panic/exception. kdb: Const qualifier for kdb_getstr's prompt argument kdb: Provide forward search at more prompt kdb: Fix a prompt management bug when using | grep kdb: Remove stack dump when entering kgdb due to NMI kdb: Avoid printing KERN_ levels to consoles kdb: Fix off by one error in kdb_cpu() kdb: fix incorrect counts in KDB summary command output
2015-02-19debug: prevent entering debug mode on panic/exception.Colin Cross1-0/+17
On non-developer devices, kgdb prevents the device from rebooting after a panic. Incase of panics and exceptions, to allow the device to reboot, prevent entering debug mode to avoid getting stuck waiting for the user to interact with debugger. To avoid entering the debugger on panic/exception without any extra configuration, panic_timeout is being used which can be set via /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default value. Setting panic_timeout indicates that the user requested machine to perform unattended reboot after panic. We dont want to get stuck waiting for the user input incase of panic. Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Android Kernel Team <[email protected]> Cc: John Stultz <[email protected]> Cc: Sumit Semwal <[email protected]> Signed-off-by: Colin Cross <[email protected]> [Kiran: Added context to commit message. panic_timeout is used instead of break_on_panic and break_on_exception to honor CONFIG_PANIC_TIMEOUT Modified the commit as per community feedback] Signed-off-by: Kiran Raparthy <[email protected]> Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Const qualifier for kdb_getstr's prompt argumentDaniel Thompson2-2/+2
All current callers of kdb_getstr() can pass constant pointers via the prompt argument. This patch adds a const qualification to make explicit the fact that this is safe. Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Provide forward search at more promptDaniel Thompson3-5/+26
Currently kdb allows the output of comamnds to be filtered using the | grep feature. This is useful but does not permit the output emitted shortly after a string match to be examined without wading through the entire unfiltered output of the command. Such a feature is particularly useful to navigate function traces because these traces often have a useful trigger string *before* the point of interest. This patch reuses the existing filtering logic to introduce a simple forward search to kdb that can be triggered from the more prompt. Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Fix a prompt management bug when using | grepDaniel Thompson1-2/+2
Currently when the "| grep" feature is used to filter the output of a command then the prompt is not displayed for the subsequent command. Likewise any characters typed by the user are also not echoed to the display. This rather disconcerting problem eventually corrects itself when the user presses Enter and the kdb_grepping_flag is cleared as kdb_parse() tries to make sense of whatever they typed. This patch resolves the problem by moving the clearing of this flag from the middle of command processing to the beginning. Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Remove stack dump when entering kgdb due to NMIDaniel Thompson1-1/+0
Issuing a stack dump feels ergonomically wrong when entering due to NMI. Entering due to NMI is normally a reaction to a user request, either the NMI button on a server or a "magic knock" on a UART. Therefore the backtrace behaviour on entry due to NMI should be like SysRq-g (no stack dump) rather than like oops. Note also that the stack dump does not offer any information that cannot be trivial retrieved using the 'bt' command. Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Avoid printing KERN_ levels to consolesDaniel Thompson2-10/+14
Currently when kdb traps printk messages then the raw log level prefix (consisting of '\001' followed by a numeral) does not get stripped off before the message is issued to the various I/O handlers supported by kdb. This causes annoying visual noise as well as causing problems grepping for ^. It is also a change of behaviour compared to normal usage of printk() usage. For example <SysRq>-h ends up with different output to that of kdb's "sr h". This patch addresses the problem by stripping log levels from messages before they are issued to the I/O handlers. printk() which can also act as an i/o handler in some cases is special cased; if the caller provided a log level then the prefix will be preserved when sent to printk(). The addition of non-printable characters to the output of kdb commands is a regression, albeit and extremely elderly one, introduced by commit 04d2c8c83d0e ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Note also that this patch does *not* restore the original behaviour from v3.5. Instead it makes printk() from within a kdb command display the message without any prefix (i.e. like printk() normally does). Signed-off-by: Daniel Thompson <[email protected]> Cc: Joe Perches <[email protected]> Cc: [email protected] Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: Fix off by one error in kdb_cpu()Jason Wessel2-2/+2
There was a follow on replacement patch against the prior "kgdb: Timeout if secondary CPUs ignore the roundup". See: https://lkml.org/lkml/2015/1/7/442 This patch is the delta vs the patch that was committed upstream: * Fix an off-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit. * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). Cc: Daniel Thompson <[email protected]> Cc: <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19kdb: fix incorrect counts in KDB summary command outputJay Lan1-1/+1
The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by: Jay Lan <[email protected]> Cc: <[email protected]> Signed-off-by: Jason Wessel <[email protected]>
2015-02-19Merge branch 'kbuild' of ↵Linus Torvalds1-31/+5
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - several cleanups in kbuild - serialize multiple *config targets so that 'make defconfig kvmconfig' works - The cc-ifversion macro got support for an else-branch * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild,gcov: simplify kernel/gcov/Makefile more kbuild: allow cc-ifversion to have the argument for false condition kbuild,gcov: simplify kernel/gcov/Makefile kbuild,gcov: remove unnecessary workaround kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion kbuild: fix cc-ifversion macro kbuild: drop $(version_h) from MRPROPER_FILES kbuild: use mixed-targets when two or more config targets are given kbuild: remove redundant line from bounds.h/asm-offsets.h kbuild: merge bounds.h and asm-offsets.h rules kbuild: Drop support for clean-rule
2015-02-18Merge branch 'rcu/next' of ↵Ingo Molnar1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/rwsem: Check for active lock before bailing on spinningDavidlohr Bueso1-10/+17
37e9562453b ("locking/rwsem: Allow conservative optimistic spinning when readers have lock") forced the default for optimistic spinning to be disabled if the lock owner was nil, which makes much sense for readers. However, while it is not our priority, we can make some optimizations for write-mostly workloads. We can bail the spinning step and still be conservative if there are any active tasks, otherwise there's really no reason not to spin, as the semaphore is most likely unlocked. This patch recovers most of a Unixbench 'execl' benchmark throughput by sleeping less and making better average system usage: before: CPU %user %nice %system %iowait %steal %idle all 0.60 0.00 8.02 0.00 0.00 91.38 after: CPU %user %nice %system %iowait %steal %idle all 1.22 0.00 70.18 0.00 0.00 28.60 Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/rwsem: Avoid deceiving lock spinnersDavidlohr Bueso1-6/+15
When readers hold the semaphore, the ->owner is nil. As such, and unlike mutexes, '!owner' does not necessarily imply that the lock is free. This will cause writers to potentially spin excessively as they've been mislead to thinking they have a chance of acquiring the lock, instead of blocking. This patch therefore enhances the counter check when the owner is not set by the time we've broken out of the loop. Otherwise we can return true as a new owner has the lock and thus we want to continue spinning. While at it, we can make rwsem_spin_on_owner() less ambiguos and return right away under need_resched conditions. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/rwsem: Set lock ownership ASAPDavidlohr Bueso4-24/+28
In order to optimize the spinning step, we need to set the lock owner as soon as the lock is acquired; after a successful counter cmpxchg operation, that is. This is particularly useful as rwsems need to set the owner to nil for readers, so there is a greater chance of falling out of the spinning. Currently we only set the owner much later in the game, in the more generic level -- latency can be specially bad when waiting for a node->next pointer when releasing the osq in up_write calls. As such, update the owner inside rwsem_try_write_lock (when the lock is obtained after blocking) and rwsem_try_write_lock_unqueued (when the lock is obtained while spinning). This requires creating a new internal rwsem.h header to share the owner related calls. Also cleanup some headers for mutex and rwsem. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/rwsem: Document barrier need when waking tasksDavidlohr Bueso2-0/+14
The need for the smp_mb() in __rwsem_do_wake() should be properly documented. Applies to both xadd and spinlock variants. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jason Low <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreadsOleg Nesterov1-1/+1
attach_to_pi_owner() checks p->mm to prevent attaching to kthreads and this looks doubly wrong: 1. It should actually check PF_KTHREAD, kthread can do use_mm(). 2. If this task is not kthread and it is actually the lock owner we can wrongly return -EPERM instead of -ESRCH or retry-if-EAGAIN. And note that this wrong EPERM is the likely case unless the exiting task is (auto)reaped quickly, we check ->mm before PF_EXITING. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mateusz Guzik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/mutex: Refactor mutex_spin_on_owner()Jason Low1-25/+22
As suggested by Davidlohr, we could refactor mutex_spin_on_owner(). Currently, we split up owner_running() with mutex_spin_on_owner(). When the owner changes, we make duplicate owner checks which are not necessary. It also makes the code a bit obscure as we are using a second check to figure out why we broke out of the loop. This patch modifies it such that we remove the owner_running() function and the mutex_spin_on_owner() loop directly checks for if the owner changes, if the owner is not running, or if we need to reschedule. If the owner changes, we break out of the loop and return true. If the owner is not running or if we need to reschedule, then break out of the loop and return false. Suggested-by: Davidlohr Bueso <[email protected]> Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/mutex: In mutex_spin_on_owner(), return true when owner changesJason Low1-4/+4
In the mutex_spin_on_owner(), we return true only if lock->owner == NULL. This was beneficial in situations where there were multiple threads simultaneously spinning for the mutex. If another thread got the lock while other spinner(s) were also doing mutex_spin_on_owner(), then the other spinners would stop spinning. This workaround helped reduce the chance that many spinners were simultaneously spinning for the mutex which can help reduce contention in highly contended cases. However, recent changes were made to the optimistic spinning code such that instead of having all spinners simultaneously spin for the mutex, we queue the spinners with an MCS lock such that only one thread spins for the mutex at a time. Furthermore, the OSQ optimizations ensure that spinners in the queue will stop waiting if it needs to reschedule. Now, we don't have to worry about multiple threads spinning on owner at the same time, and if lock->owner is not NULL at this point, it likely means another thread happens to obtain the lock in the fastpath. In this case, it would make sense for the spinner to continue spinning as long as the spinner doesn't need to schedule and the mutex owner is running. This patch changes this so that mutex_spin_on_owner() returns true when the lock owner changes, which means a thread will only stop spinning if it either needs to reschedule or if the lock owner is not running. We saw up to a 5% performance improvement in the fserver workload with this patch. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Tim Chen <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/rt: Avoid obvious configuration failPeter Zijlstra1-3/+11
Setting the root group's cpu.rt_runtime_us to 0 is a bad thing; it would disallow the kernel creating RT tasks. One can of course still set it to 1, which will (likely) still wreck your kernel, but at least make it clear that setting it to 0 is not good. Collect both sanity checks into the one place while we're there. Suggested-by: Zefan Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/autogroup: Fix failure to set cpu.rt_runtime_usPeter Zijlstra2-5/+7
Because task_group() uses a cache of autogroup_task_group(), whose output depends on sched_class, switching classes can generate problems. In particular, when started as fair, the cache points to the autogroup, so when switching to RT the tg_rt_schedulable() test fails for every cpu.rt_{runtime,period}_us change because now the autogroup has tasks and no runtime. Furthermore, going back to the previous semantics of varying task_group() with sched_class has the down-side that the sched_debug output varies as well, even though the task really is in the autogroup. Therefore add an autogroup exception to tg_has_rt_tasks() -- such that both (all) task_group() usages in sched/core now have one. And remove all the remnants of the variable task_group() output. Reported-by: Zefan Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Stefan Bader <[email protected]> Fixes: 8323f26ce342 ("sched: Fix race in task_group()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/dl: Do update_rq_clock() in yield_task_dl()Kirill Tkhai1-0/+1
update_curr_dl() needs actual rq clock. Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/1423040972.18770.10.camel@tkhai Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18ntp: Fixup adjtimex freq validation on 32-bit systemsJohn Stultz1-3/+7
Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer <[email protected]> Reported-by: George Joseph <[email protected]> Tested-by: George Joseph <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> # v3.19+ Cc: Linus Torvalds <[email protected]> Cc: Sasha Levin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Prevent recursion in io_schedule()NeilBrown1-19/+12
io_schedule() calls blk_flush_plug() which, depending on the contents of current->plug, can initiate arbitrary blk-io requests. Note that this contrasts with blk_schedule_flush_plug() which requires all non-trivial work to be handed off to a separate thread. This makes it possible for io_schedule() to recurse, and initiating block requests could possibly call mempool_alloc() which, in times of memory pressure, uses io_schedule(). Apart from any stack usage issues, io_schedule() will not behave correctly when called recursively as delayacct_blkio_start() does not allow for repeated calls. So: - use ->in_iowait to detect recursion. Set it earlier, and restore it to the old value. - move the call to "raw_rq" after the call to blk_flush_plug(). As this is some sort of per-cpu thing, we want some chance that we are on the right CPU - When io_schedule() is called recurively, use blk_schedule_flush_plug() which cannot further recurse. - as this makes io_schedule() a lot more complex and as io_schedule() must match io_schedule_timeout(), but all the changes in io_schedule_timeout() and make io_schedule a simple wrapper for that. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Moved the now rudimentary io_schedule() into sched.h. ] Cc: Jens Axboe <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Tony Battersby <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/completion: Serialize completion_done() with complete()Oleg Nesterov1-2/+17
Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when reading completion state" was not correct, without lock/unlock the code like stop_machine_from_inactive_cpu() while (!completion_done()) cpu_relax(); can return before complete() finishes its spin_unlock() which writes to this memory. And spin_unlock_wait(). While at it, change try_wait_for_completion() to use READ_ONCE(). Reported-by: Paul E. McKenney <[email protected]> Reported-by: Davidlohr Bueso <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Added a comment with the barrier. ] Cc: Linus Torvalds <[email protected]> Cc: Nicholas Mc Guire <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Fix preempt_schedule_common() triggering tracing recursionFrederic Weisbecker1-1/+1
Since the function graph tracer needs to disable preemption, it might call preempt_schedule() after reenabling it if something triggered the need for rescheduling in between. Therefore we can't trace preempt_schedule() itself because we would face a function tracing recursion otherwise as the tracer is always called before PREEMPT_ACTIVE gets set to prevent that recursion. This is why preempt_schedule() is tagged as "notrace". But the same issue applies to every function called by preempt_schedule() before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is one such example. Unfortunately we forgot to tag it as notrace as well and as a result we are encountering tracing recursion since it got introduced by: a18b5d0181923 ("sched: Fix missing preemption opportunity") Let's fix that by applying the appropriate function tag to preempt_schedule_common(). Reported-by: Huang Ying <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()Kirill Tkhai1-0/+20
A deadline task may be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; Later the timer fires, but the task is still dequeued: dl_task_timer() enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */ Someone wakes it up: try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Patch fixes this problem, it prevents queueing !on_rq tasks on dl_rq. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Wrote comment. ] Cc: Juri Lelli <[email protected]> Fixes: 1019a359d3dc ("sched/deadline: Fix stale yield state") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Make dl_task_time() use task_rq_lock()Peter Zijlstra3-85/+79
Kirill reported that a dl task can be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; This invalidates the assumption from commit 0f397f2c90ce ("sched/dl: Fix race in dl_task_timer()"): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". And therefore we have to use the full task_rq_lock() here. This further amends the fact that we forgot to update the rq lock loop for TASK_ON_RQ_MIGRATE, from commit cca26e8009d1 ("sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING state"). Reported-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Juri Lelli <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Clarify ordering between task_rq_lock() and move_queued_task()Peter Zijlstra1-0/+16
There was a wee bit of confusion around the exact ordering here; clarify things. Reported-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18locking/rtmutex: Avoid a NULL pointer dereference on deadlockSebastian Andrzej Siewior1-1/+2
With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. ( Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. ) Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix deadlock detector for real") or commit 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter"). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> # for v3.16 and later kernels Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-17Merge branch 'getname2' of ↵Linus Torvalds2-151/+37
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull getname/putname updates from Al Viro: "Rework of getname/getname_kernel/etc., mostly from Paul Moore. Gets rid of quite a pile of kludges between namei and audit..." * 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: audit: replace getname()/putname() hacks with reference counters audit: fix filename matching in __audit_inode() and __audit_inode_child() audit: enable filename recording via getname_kernel() simpler calling conventions for filename_mountpoint() fs: create proper filename objects using getname_kernel() fs: rework getname_kernel to handle up to PATH_MAX sized filenames cut down the number of do_path_lookup() callers
2015-02-17Merge branch 'for-linus' of ↵Linus Torvalds2-52/+47
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc VFS updates from Al Viro: "This cycle a lot of stuff sits on topical branches, so I'll be sending more or less one pull request per branch. This is the first pile; more to follow in a few. In this one are several misc commits from early in the cycle (before I went for separate branches), plus the rework of mntput/dput ordering on umount, switching to use of fs_pin instead of convoluted games in namespace_unlock()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch the IO-triggering parts of umount to fs_pin new fs_pin killing logics allow attaching fs_pin to a group not associated with some superblock get rid of the second argument of acct_kill() take count and rcu_head out of fs_pin dcache: let the dentry count go down to zero without taking d_lock pull bumping refcount into ->kill() kill pin_put() mode_t whack-a-mole: chelsio file->f_path.dentry is pinned down for as long as the file is open... get rid of lustre_dump_dentry() gut proc_register() a bit kill d_validate() ncpfs: get rid of d_validate() nonsense selinuxfs: don't open-code d_genocide()
2015-02-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-18/+23
Merge yet more updates from Andrew Morton: - a pile of minor fs fixes and cleanups - kexec updates - random misc fixes in various places: vmcore, rbtree, eventfd, ipc, seccomp. - a series of python-based kgdb helper scripts * emailed patches from Andrew Morton <[email protected]>: (58 commits) seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO samples/seccomp: improve label helper ipc,sem: use current->state helpers scripts/gdb: disable pagination while printing from breakpoint handler scripts/gdb: define maintainer scripts/gdb: convert CpuList to generator function scripts/gdb: convert ModuleList to generator function scripts/gdb: use a generator instead of iterator for task list scripts/gdb: ignore byte-compiled python files scripts/gdb: port to python3 / gdb7.7 scripts/gdb: add basic documentation scripts/gdb: add lx-lsmod command scripts/gdb: add class to iterate over CPU masks scripts/gdb: add lx_current convenience function scripts/gdb: add internal helper and convenience function for per-cpu lookup scripts/gdb: add get_gdbserver_type helper scripts/gdb: add internal helper and convenience function to retrieve thread_info scripts/gdb: add is_target_arch helper scripts/gdb: add helper and convenience function to look up tasks scripts/gdb: add task iteration class ...
2015-02-17seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNOKees Cook1-1/+3
The value resulting from the SECCOMP_RET_DATA mask could exceed MAX_ERRNO when setting errno during a SECCOMP_RET_ERRNO filter action. This makes sure we have a reliable value being set, so that an invalid errno will not be ignored by userspace. Signed-off-by: Kees Cook <[email protected]> Reported-by: Dmitry V. Levin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17kernel/module.c: do not inline do_init_module()Jan Kiszka1-2/+7
This provides a reliable breakpoint target, required for automatic symbol loading via the gdb helper command 'lx-symbols'. Signed-off-by: Jan Kiszka <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jason Wessel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17kexec: simplify conditionalGeoff Levand1-7/+10
Simplify the code around one of the conditionals in the kexec_load syscall routine. The original code was confusing with a redundant check on KEXEC_ON_CRASH and comments outside of the conditional block. This change switches the order of the conditional check, and cleans up the comments for the conditional. There is no functional change to the code. Signed-off-by: Geoff Levand <[email protected]> Acked-by: Vivek Goyal <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Maximilian Attems <[email protected]> Cc: Michal Marek <[email protected]> Cc: Paul Bolle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17kexec: fix a typo in commentAlexander Kuleshov1-1/+1
Signed-off-by: Alexander Kuleshov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17kexec: remove never used member destination in kimageBaoquan He1-4/+0
struct kimage has a member destination which is used to store the real destination address of each page when load segment from user space buffer to kernel. But we never retrieve the value stored in kimage->destination, so this member variable in kimage and its assignment operation are redundent code. I guess for_each_kimage_entry just does the work that kimage->destination is expected to do. So in this patch just make a cleanup to remove it. Signed-off-by: Baoquan He <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17signal: use current->state helpersDavidlohr Bueso1-2/+2
Call __set_current_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who changed the state. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17ptrace: remove linux/compat.h inclusion under CONFIG_COMPATFabian Frederick1-1/+0
Commit 84c751bd4aeb ("ptrace: add ability to retrieve signals without removing from a queue (v4)") includes <linux/compat.h> globally in ptrace.c This patch removes inclusion under if defined CONFIG_COMPAT. Signed-off-by: Fabian Frederick <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-17Merge tag 'suspend-to-idle-3.20-rc1' of ↵Linus Torvalds5-17/+142
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull suspend-to-idle updates from Rafael Wysocki: "Suspend-to-idle timer quiescing support for v3.20-rc1 Until now suspend-to-idle has not been able to save much more energy than runtime PM because of timer interrupts that periodically bring CPUs out of idle while they are waiting for a wakeup interrupt. Of course, the timer interrupts are not wakeup ones, so the handling of them can be deferred until a real wakeup interrupt happens, but at the same time we don't want to mass-expire timers at that point. The solution is to suspend the entire timekeeping when the last CPU is entering an idle state and resume it when the first CPU goes out of idle. That has to be done with care, though, so as to avoid accessing suspended clocksources etc. end we need extra support from idle drivers for that. This series of commits adds support for quiescing timers during suspend-to-idle and adds the requisite callbacks to intel_idle and the ACPI cpuidle driver" * tag 'suspend-to-idle-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / idle: Implement ->enter_freeze callback routine intel_idle: Add ->enter_freeze callbacks PM / sleep: Make it possible to quiesce timers during suspend-to-idle timekeeping: Make it safe to use the fast timekeeper while suspended timekeeping: Pass readout base to update_fast_timekeeper() PM / sleep: Re-implement suspend-to-idle handling