aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2012-07-13Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-18/+98
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull the leap second fixes from Thomas Gleixner: "It's a rather large series, but well discussed, refined and reviewed. It got a massive testing by John, Prarit and tip. In theory we could split it into two parts. The first two patches f55a6faa3843: hrtimer: Provide clock_was_set_delayed() 4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue are merely preventing the stuff loops forever issues, which people have observed. But there is no point in delaying the other 4 commits which achieve full correctness into 3.6 as they are tagged for stable anyway. And I rather prefer to have the full fixes merged in bulk than a "prevent the observable wreckage and deal with the hidden fallout later" approach." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Update hrtimer base offsets each hrtimer_interrupt timekeeping: Provide hrtimer update function hrtimers: Move lock held region in hrtimer_interrupt() timekeeping: Maintain ktime_t based offsets for hrtimers timekeeping: Fix leapsecond triggered load spike issue hrtimer: Provide clock_was_set_delayed()
2012-07-12workqueue: separate out worker_pool flagsTejun Heo1-22/+25
GCWQ_MANAGE_WORKERS, GCWQ_MANAGING_WORKERS and GCWQ_HIGHPRI_PENDING are per-pool properties. Add worker_pool->flags and make the above three flags per-pool flags. The changes in this patch are mechanical and don't caues any functional difference. This is to prepare for multiple pools per gcwq. Signed-off-by: Tejun Heo <[email protected]>
2012-07-12workqueue: use @pool instead of @gcwq or @cpu where applicableTejun Heo1-107/+111
Modify all functions which deal with per-pool properties to pass around @pool instead of @gcwq or @cpu. The changes in this patch are mechanical and don't caues any functional difference. This is to prepare for multiple pools per gcwq. Signed-off-by: Tejun Heo <[email protected]>
2012-07-12workqueue: factor out worker_pool from global_cwqTejun Heo1-99/+117
Move worklist and all worker management fields from global_cwq into the new struct worker_pool. worker_pool points back to the containing gcwq. worker and cpu_workqueue_struct are updated to point to worker_pool instead of gcwq too. This change is mechanical and doesn't introduce any functional difference other than rearranging of fields and an added level of indirection in some places. This is to prepare for multiple pools per gcwq. v2: Comment typo fixes as suggested by Namhyung. Signed-off-by: Tejun Heo <[email protected]> Cc: Namhyung Kim <[email protected]>
2012-07-12workqueue: don't use WQ_HIGHPRI for unbound workqueuesTejun Heo1-7/+11
Unbound wqs aren't concurrency-managed and try to execute work items as soon as possible. This is currently achieved by implicitly setting %WQ_HIGHPRI on all unbound workqueues; however, WQ_HIGHPRI implementation is about to be restructured and this usage won't be valid anymore. Add an explicit chain-wakeup path for unbound workqueues in process_one_work() instead of piggy backing on %WQ_HIGHPRI. Signed-off-by: Tejun Heo <[email protected]>
2012-07-11tracing: Check for allocation failure in __tracing_open()Dan Carpenter1-0/+4
Clean up and return -ENOMEM on if the kzalloc() fails. This also prevents a potential crash, as the pointer that failed to allocate would be later used. Link: http://lkml.kernel.org/r/[email protected] Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2012-07-11Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds1-6/+10
Merge random patches from Andrew Morton. * Merge emailed patches from Andrew Morton <[email protected]>: (32 commits) memblock: free allocated memblock_reserved_regions later mm: sparse: fix usemap allocation above node descriptor section mm: sparse: fix section usemap placement calculation xtensa: fix incorrect memset shmem: cleanup shmem_add_to_page_cache shmem: fix negative rss in memcg memory.stat tmpfs: revert SEEK_DATA and SEEK_HOLE drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT fat: fix non-atomic NFS i_pos read MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section sgi-xp: nested calls to spin_lock_irqsave() fs: ramfs: file-nommu: add SetPageUptodate() drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails h8300/uaccess: add mising __clear_user() h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user() h8300/time: add missing #include <asm/irq_regs.h> h8300/signal: fix typo "statis" h8300/pgtable: add missing #include <asm-generic/pgtable.h> drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled ...
2012-07-11c/r: prctl: less paranoid prctl_set_mm_exe_file()Konstantin Khlebnikov1-6/+10
"no other files mapped" requirement from my previous patch (c/r: prctl: update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal) is too paranoid, it forbids operation even if there mapped one shared-anon vma. Let's check that current mm->exe_file already unmapped, in this case exe_file symlink already outdated and its changing is reasonable. Plus, this patch fixes exit code in case operation success. Signed-off-by: Konstantin Khlebnikov <[email protected]> Reported-by: Cyrill Gorcunov <[email protected]> Tested-by: Cyrill Gorcunov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Kees Cook <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-11hrtimer: Update hrtimer base offsets each hrtimer_interruptJohn Stultz1-14/+14
The update of the hrtimer base offsets on all cpus cannot be made atomically from the timekeeper.lock held and interrupt disabled region as smp function calls are not allowed there. clock_was_set(), which enforces the update on all cpus, is called either from preemptible process context in case of do_settimeofday() or from the softirq context when the offset modification happened in the timer interrupt itself due to a leap second. In both cases there is a race window for an hrtimer interrupt between dropping timekeeper lock, enabling interrupts and clock_was_set() issuing the updates. Any interrupt which arrives in that window will see the new time but operate on stale offsets. So we need to make sure that an hrtimer interrupt always sees a consistent state of time and offsets. ktime_get_update_offsets() allows us to get the current monotonic time and update the per cpu hrtimer base offsets from hrtimer_interrupt() to capture a consistent state of monotonic time and the offsets. The function replaces the existing ktime_get() calls in hrtimer_interrupt(). The overhead of the new function vs. ktime_get() is minimal as it just adds two store operations. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to any hrtimer expiration and guarantees that timers are not expired early. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11timekeeping: Provide hrtimer update functionThomas Gleixner1-0/+34
To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Signed-off-by: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11hrtimers: Move lock held region in hrtimer_interrupt()Thomas Gleixner1-2/+3
We need to update the base offsets from this code and we need to do that under base->lock. Move the lock held region around the ktime_get() calls. The ktime_get() calls are going to be replaced with a function which gets the time and the offsets atomically. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Signed-off-by: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11timekeeping: Maintain ktime_t based offsets for hrtimersThomas Gleixner1-2/+23
We need to update the hrtimer clock offsets from the hrtimer interrupt context. To avoid conversions from timespec to ktime_t maintain a ktime_t based representation of those offsets in the timekeeper. This puts the conversion overhead into the code which updates the underlying offsets and provides fast accessible values in the hrtimer interrupt. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11timekeeping: Fix leapsecond triggered load spike issueJohn Stultz1-0/+4
The timekeeping code misses an update of the hrtimer subsystem after a leap second happened. Due to that timers based on CLOCK_REALTIME are either expiring a second early or late depending on whether a leap second has been inserted or deleted until an operation is initiated which causes that update. Unless the update happens by some other means this discrepancy between the timekeeping and the hrtimer data stays forever and timers are expired either early or late. The reported immediate workaround - $ data -s "`date`" - is causing a call to clock_was_set() which updates the hrtimer data structures. See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix Add the missing clock_was_set() call to update_wall_time() in case of a leap second event. The actual update is deferred to softirq context as the necessary smp function call cannot be invoked from hard interrupt context. Signed-off-by: John Stultz <[email protected]> Reported-by: Jan Engelhardt <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11hrtimer: Provide clock_was_set_delayed()John Stultz1-0/+20
clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <[email protected]> Reported-by: Jan Engelhardt <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-11Merge tag 'driver-core-3.5-rc6' of ↵Linus Torvalds1-76/+126
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull printk fixes from Greg Kroah-Hartman: "Here are some more printk fixes for 3.5-rc6. They resolve all known outstanding issues with the printk changes that have been happening. They have been tested by the people reporting the problems. This hopefully should be it for the printk stuff for 3.5-final. Signed-off-by: Greg Kroah-Hartman <[email protected]>" * tag 'driver-core-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kmsg: merge continuation records while printing kmsg: /proc/kmsg - support reading of partial log records kmsg: make sure all messages reach a newly registered boot console kmsg: properly handle concurrent non-blocking read() from /proc/kmsg kmsg: add the facility number to the syslog prefix kmsg: escape the backslash character while exporting data printk: replacing the raw_spin_lock/unlock with raw_spin_lock/unlock_irq
2012-07-09kmsg: merge continuation records while printingKay Sievers1-42/+78
In (the unlikely) case our continuation merge buffer is busy, we unfortunately can not merge further continuation printk()s into a single record and have to store them separately, which leads to split-up output of these lines when they are printed. Add some flags about newlines and prefix existence to these records and try to reconstruct the full line again, when the separated records are printed. Reported-By: Michael Neuling <[email protected]> Cc: Dave Jones <[email protected]> Cc: Linus Torvalds <[email protected]> Tested-By: Michael Neuling <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-09cgroup: cgroup_rm_files() was calling simple_unlink() with the wrong inodeTejun Heo1-1/+1
While refactoring cgroup file removal path, 05ef1d7c4a "cgroup: introduce struct cfent" incorrectly changed the @dir argument of simple_unlink() to the inode of the file being deleted instead of that of the containing directory. The effect of this bug is minor - ctime and mtime of the parent weren't properly updated on file deletion. Fix it by using @cgrp->dentry->d_inode instead. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Al Viro <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: [email protected]
2012-07-09kmsg: /proc/kmsg - support reading of partial log recordsKay Sievers1-8/+20
Restore support for partial reads of any size on /proc/kmsg, in case the supplied read buffer is smaller than the record size. Some people seem to think is is ia good idea to run: $ dd if=/proc/kmsg bs=1 of=... as a klog bridge. Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44211 Reported-by: Jukka Ollila <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-07cgroup: fix cgroup hierarchy umount raceTejun Heo1-1/+5
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal optional" allowed a css to linger after the associated cgroup is removed. As a css holds a reference on the cgroup's dentry, it means that cgroup dentries may linger for a while. Destroying a superblock which has dentries with positive refcnts is a critical bug and triggers BUG() in vfs code. As each cgroup dentry holds an s_active reference, any lingering cgroup has both its dentry and the superblock pinned and thus preventing premature release of superblock. Unfortunately, after 48ddbe1946, there's a small window while releasing a cgroup which is directly under the root of the hierarchy. When a cgroup directory is released, vfs layer first deletes the corresponding dentry and then invokes dput() on the parent, which may recurse further, so when a cgroup directly below root cgroup is released, the cgroup is first destroyed - which releases the s_active it was holding - and then the dentry for the root cgroup is dput(). This creates a window where the root dentry's refcnt isn't zero but superblock's s_active is. If umount happens before or during this window, vfs will see the root dentry with non-zero refcnt and trigger BUG(). Before 48ddbe1946, this problem didn't exist because the last dentry reference was guaranteed to be put synchronously from rmdir(2) invocation which holds s_active around the whole process. Fix it by holding an extra superblock->s_active reference across dput() from css release, which is the dput() path added by 48ddbe1946 and the only one which doesn't hold an extra s_active ref across the final cgroup dput(). Signed-off-by: Tejun Heo <[email protected]> LKML-Reference: <[email protected]> Reported-by: shyju pv <[email protected]> Tested-by: shyju pv <[email protected]> Cc: Sasha Levin <[email protected]> Acked-by: Li Zefan <[email protected]>
2012-07-07Revert "cgroup: superblock can't be released with active dentries"Tejun Heo1-14/+3
This reverts commit fa980ca87d15bb8a1317853f257a505990f3ffde. The commit was an attempt to fix a race condition where a cgroup hierarchy may be unmounted with positive dentry reference on root cgroup. While the commit made the race condition slightly more difficult to trigger, the race was still there and could be reliably triggered using a different test case. Revert the incorrect fix. The next commit will describe the race and fix it correctly. Signed-off-by: Tejun Heo <[email protected]> LKML-Reference: <[email protected]> Reported-by: shyju pv <[email protected]> Cc: Sasha Levin <[email protected]> Acked-by: Li Zefan <[email protected]>
2012-07-06kmsg: make sure all messages reach a newly registered boot consoleKay Sievers1-0/+6
We suppress printing kmsg records to the console, which are already printed immediately while we have received their fragments. Newly registered boot consoles print the entire kmsg buffer during registration. Clear the console-suppress flag after we skipped the record during its first storage, so any later print will see these records as usual. Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-06kmsg: properly handle concurrent non-blocking read() from /proc/kmsgKay Sievers1-8/+1
The /proc/kmsg read() interface is internally simply wired up to a sequence of syslog() syscalls, which might are racy between their checks and actions, regarding concurrency. In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on usual O_NONBLOCK behavior, the recently introduced mutex might block an O_NONBLOCK reader in read(), when poll() returns for it, but another process has already read the data in the meantime. We've seen that while running artificial test setups and tools that "fight" about /proc/kmsg data. This restores the original /proc/kmsg behavior, where in case of concurrent read()s, poll() might wake up but the read() syscall will just return 0 to the caller, while another process has "stolen" the data. This is in the general case not the expected behavior, but it is the exact same one, that can easily be triggered with a 3.4 kernel, and some tools might just rely on it. The mutex is not needed, the original integrity issue which introduced it, is in the meantime covered by: "fill buffer with more than a single message for SYSLOG_ACTION_READ" 116e90b23f74d303e8d607c7a7d54f60f14ab9f2 Cc: Yuanhan Liu <[email protected]> Acked-by: Jan Beulich <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-06kmsg: add the facility number to the syslog prefixKay Sievers1-4/+7
After the recent split of facility and level into separate variables, we miss the facility value (always 0 for kernel-originated messages) in the syslog prefix. On Tue, Jul 3, 2012 at 12:45 PM, Dan Carpenter <[email protected]> wrote: > Static checkers complain about the impossible condition here. > > In 084681d14e ('printk: flush continuation lines immediately to > console'), we changed msg->level from being a u16 to being an unsigned > 3 bit bitfield. Cc: Dan Carpenter <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-06kmsg: escape the backslash character while exporting dataKay Sievers1-2/+2
Non-printable characters in the log data are hex-escaped to ensure safe post processing. We need to escape a backslash we find in the data, to be able to distinguish it from a backslash we add for the escaping. Also escape the non-printable character 127. Thanks to Miloslav Trmac for the heads up. Reported-by: Michael Neuling <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-06printk: replacing the raw_spin_lock/unlock with raw_spin_lock/unlock_irqliu chuansheng1-12/+12
In function devkmsg_read/writev/llseek/poll/open()..., the function raw_spin_lock/unlock is used, there is potential deadlock case happening. CPU1: thread1 doing the cat /dev/kmsg: raw_spin_lock(&logbuf_lock); while (user->seq == log_next_seq) { when thread1 run here, at this time one interrupt is coming on CPU1 and running based on this thread,if the interrupt handle called the printk which need the logbuf_lock spin also, it will cause deadlock. So we should use raw_spin_lock/unlock_irq here. Acked-by: Kay Sievers <[email protected]> Signed-off-by: liu chuansheng <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-06Merge branch 'rcu/next' of ↵Ingo Molnar10-521/+569
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the RCU tree from Paul E. McKenney: "The major features of this series are: 1. Preventing latency spikes of more than 200 microseconds for kernels built with NR_CPUS=4096, which is reportedly becoming the default for some distros. This is a first step, as it does not help with systems that actually -have- 4096 CPUs (work on this case is in progress, but is not yet ready for mainline). This category also includes improving concurrency of rcu_barrier(), placed here due to conflicts. Posted to LKML at: https://lkml.org/lkml/2012/6/22/381. Note that patches 18-22 of that series have been defered to 3.7, as they have not yet proven themselves to be mainline-ready (and yes, these are the ones intended to get rid of RCU's latency spikes for systems that actually have 4096 CPUs). 2. Updates to documentation and rcutorture fixes, the latter category including improvements to rcu_barrier() testing. Posted to LKML at: http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/04094.html. 3. Miscellaneous fixes posted to LKML at: https://lkml.org/lkml/2012/6/22/500, with the exception of the last commit, which was posted here: http://www.gossamer-threads.com/lists/linux/kernel/1561830 4. RCU_FAST_NO_HZ fixes and improvements. Posted to LKML at: http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/00006.html and http://www.gossamer-threads.com/lists/linux/kernel/1561833. The first four patches of the first series went into 3.5 to fix a regression. 5. Code-style fixes. These were posted to LKML at http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01180.html and http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01181.html. " Signed-off-by: Ingo Molnar <[email protected]>
2012-07-06rcu: Fix broken strings in RCU's source code.Paul E. McKenney2-32/+26
Although the C language allows you to break strings across lines, doing this makes it hard for people to find the Linux kernel code corresponding to a given console message. This commit therefore fixes broken strings throughout RCU's source code. Suggested-by: Josh Triplett <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-06rcu: Fix code-style issues involving "else"Paul E. McKenney4-13/+18
The Linux kernel coding style says that single-statement blocks should omit curly braces unless the other leg of the "if" statement has multiple statements, in which case the curly braces should be included. This commit fixes RCU's violations of this rule. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-07-06Merge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', ↵Paul E. McKenney8-190/+191
'fixes.2012.07.06a' and 'fnh.2012.07.02a' into HEAD bigrtm: First steps towards getting RCU out of the way of tens-of-microseconds real-time response on systems compiled with NR_CPUS=4096. Also cleanups for and increased concurrency of rcu_barrier() family of primitives. doctorture: rcutorture and documentation improvements. fixes: Miscellaneous fixes. fnh: RCU_FAST_NO_HZ fixes and improvements.
2012-07-06rcu: Introduce check for callback list/count mismatchPaul E. McKenney1-0/+1
The recent bug that introduced the RCU callback list/count mismatch showed the need for a diagnostic to check for this, which this commit adds. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-07-06devicetree: add helper inline for retrieving a node's full nameGrant Likely1-4/+4
The pattern (np ? np->full_name : "<none>") is rather common in the kernel, but can also make for quite long lines. This patch adds a new inline function, of_node_full_name() so that the test for a valid node pointer doesn't need to be open coded at all call sites. Signed-off-by: Grant Likely <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Rob Herring <[email protected]>
2012-07-06Merge branch 'rcu/urgent' of ↵Ingo Molnar4-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull low probability CONFIG_RCU_BOOST=y deadlock fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-06Merge branch 'tip/perf/core' of ↵Ingo Molnar4-10/+33
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-05Merge branch 'perf/urgent' into perf/coreIngo Molnar12-218/+591
Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-05sched/nohz: Rewrite and fix load-avg computation -- againPeter Zijlstra4-75/+205
Thanks to Charles Wang for spotting the defects in the current code: - If we go idle during the sample window -- after sampling, we get a negative bias because we can negate our own sample. - If we wake up during the sample window we get a positive bias because we push the sample to a known active period. So rewrite the entire nohz load-avg muck once again, now adding copious documentation to the code. Reported-and-tested-by: Doug Smythies <[email protected]> Reported-and-tested-by: Charles Wang <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins [ minor edits ] Signed-off-by: Ingo Molnar <[email protected]>
2012-07-05sched: Fix fork() error path to not crashSalman Qazi1-3/+8
In dup_task_struct(), if arch_dup_task_struct() fails, the clean up code fails to clean up correctly. That's because the clean up code depends on unininitalized ti->task pointer. We fix this by making sure that the task and thread_info know about each other before we attempt to take the error path. Signed-off-by: Salman Qazi <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2012-07-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-96/+228
2012-07-03Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-4/+7
Pull block bits from Jens Axboe: "As vacation is coming up, thought I'd better get rid of my pending changes in my for-linus branch for this iteration. It contains: - Two patches for mtip32xx. Killing a non-compliant sysfs interface and moving it to debugfs, where it belongs. - A few patches from Asias. Two legit bug fixes, and one killing an interface that is no longer in use. - A patch from Jan, making the annoying partition ioctl warning a bit less annoying, by restricting it to !CAP_SYS_RAWIO only. - Three bug fixes for drbd from Lars Ellenberg. - A fix for an old regression for umem, it hasn't really worked since the plugging scheme was changed in 3.0. - A few fixes from Tejun. - A splice fix from Eric Dumazet, fixing an issue with pipe resizing." * 'for-linus' of git://git.kernel.dk/linux-block: scsi: Silence unnecessary warnings about ioctl to partition block: Drop dead function blk_abort_queue() block: Mitigate lock unbalance caused by lock switching block: Avoid missed wakeup in request waitqueue umem: fix up unplugging splice: fix racy pipe->buffers uses drbd: fix null pointer dereference with on-congestion policy when diskless drbd: fix list corruption by failing but already aborted reads drbd: fix access of unallocated pages and kernel panic xen/blkfront: Add WARN to deal with misbehaving backends. blkcg: drop local variable @q from blkg_destroy() mtip32xx: Create debugfs entries for troubleshooting mtip32xx: Remove 'registers' and 'flags' from sysfs blkcg: fix blkg_alloc() failure path block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED block: fix return value on cfq_init() failure mtip32xx: Remove version.h header file inclusion xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-02rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameterPaul E. McKenney3-1/+17
If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to also disable itself. This commit therefore checks for tick_nohz_enabled being zero, disabling rcu_prepare_for_idle() if so. This commit assumes that tick_nohz_enabled can change at runtime: If this is not the case, then a simpler approach suffices. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: Round FAST_NO_HZ lazy timeout to nearest secondPaul E. McKenney1-7/+11
Currently, if several CPUs in the same package have all lazy RCU callbacks, their wakeups will be uncorrelated. If all the CPUs are in the same power domain (as is often the case), this will result in unnecessary power-ups of the package. This commit therefore uses round_jiffies() to round the timeouts to a second boundary, increasing the odds that they can be coalesced with each other or with other timeouts. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: The rcu_needs_cpu() function is not a quiescent statePaul E. McKenney1-2/+0
The TINY_PREEMPT_RCU() function rcu_preempt_needs_cpu(), which is called from rcu_needs_cpu(), assumes that it is in a quiescent state with respect to the CPU. This is no longer the case. This commit therefore updates rcu_preempt_needs_cpu() to make it aware that it is not running in a quiescent state. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Heiko Carstens <[email protected]> Tested-by: Pascal Chapperon <[email protected]>
2012-07-02rcu: Dump only the current CPU's buffers for idle-entry/exit warningsPaul E. McKenney1-2/+2
Problems in RCU idle entry and exit are almost always confined to the offending CPU. This commit therefore switches ftrace_dump() from DUMP_ALL to DUMP_ORIG. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Heiko Carstens <[email protected]> Tested-by: Pascal Chapperon <[email protected]>
2012-07-02rcu: Add check for CPUs going offline with callbacks queuedPaul E. McKenney1-0/+3
If a CPU goes offline with callbacks queued, those callbacks might be indefinitely postponed, which can result in a system hang. This commit therefore inserts warnings for this condition. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: Disable preemption in rcu_blocking_is_gp()Paul E. McKenney1-18/+6
It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu() for uniprocessor optimization, which means that rcu_blocking_is_gp() can no longer rely on RCU read-side critical sections having disabled preemption. This commit therefore disables preemption across rcu_blocking_is_gp()'s scan of the cpu_online_mask. (Updated from previous version to fix embarrassing bug spotted by Wu Fengguang.) Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: Prevent uninitialized string in RCU CPU stall infoCarsten Emde1-0/+1
An uninitialized string may be displayed at the end of the rcu_preempt detected stall info such as 0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D ^^^^^^^^^^ if CONFIG_RCU_FAST_NO_HZ is not defined. This trivial patch clears the string in this case. Signed-off-by: Carsten Emde <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCUPaul E. McKenney1-2/+2
The rcu_is_cpu_idle() function is used if CONFIG_DEBUG_LOCK_ALLOC, but TINY_RCU defines it only when CONFIG_PROVE_RCU. This causes build failures when CONFIG_DEBUG_LOCK_ALLOC=y but CONFIG_PROVE_RCU=n. This commit therefore adjusts the #ifdefs for rcu_is_cpu_idle() so that it is defined when CONFIG_DEBUG_LOCK_ALLOC=y. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-07-02rcu: Split RCU core processing out of __call_rcu()Paul E. McKenney1-41/+49
The __call_rcu() function is a bit overweight, so this commit splits it into actual enqueuing of and accounting for the callback (__call_rcu()) and associated RCU-core processing (__call_rcu_core()). Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-07-02rcu: Prevent __call_rcu() from invoking RCU core on offline CPUsPaul E. McKenney1-3/+3
The __call_rcu() function will invoke the RCU core, for example, if it detects that the current CPU has too many callbacks. However, this can happen on an offline CPU that is on its way to the idle loop, in which case it is an error to invoke the RCU core, and the excess callbacks will be adopted in any case. This commit therefore adds checks to __call_rcu() for running on an offline CPU, refraining from invoking the RCU core in this case. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-07-02rcu: Make __call_rcu() handle invocation from idlePaul E. McKenney1-6/+9
Although __call_rcu() is handled correctly when called from a momentary non-idle period, if it is called on a CPU that RCU believes to be idle on RCU_FAST_NO_HZ kernels, the callback might be indefinitely postponed. This commit therefore ensures that RCU is aware of the new callback and has a chance to force the CPU out of dyntick-idle mode when a new callback is posted. Reported-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-07-02rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementationsPaul E. McKenney3-92/+46
The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of __rcu_read_lock() and __rcu_read_unlock() are identical, so this commit consolidates them into kernel/rcupdate.h. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>