aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2014-02-21sched/deadline: Remove useless dl_nr_totalKirill Tkhai2-4/+1
In deadline class we do not have group scheduling like in RT. dl_nr_total is the same as dl_nr_running. So, one of them should be removed. Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/deadline: Test for CPU's presence explicitlyBoris Ostrovsky1-3/+3
A hot-removed CPU may have ID that is numerically larger than the number of existing CPUs in the system (e.g. we can unplug CPU 4 from a system that has CPUs 0, 1 and 4). Thus the WARN_ONs should check whether the CPU in question is currently present, not whether its ID value is less than num_present_cpus(). Cc: Ingo Molnar <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Steven Rostedt <[email protected]> Reported-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched: Add 'flags' argument to sched_{set,get}attr() syscallsPeter Zijlstra1-5/+6
Because of a recent syscall design debate; its deemed appropriate for each syscall to have a flags argument for future extension; without immediately requiring new syscalls. Cc: [email protected] Cc: Ingo Molnar <[email protected]> Suggested-by: Michael Kerrisk <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched: Fix information leak in sys_sched_getattr()Vegard Nossum1-1/+1
We're copying the on-stack structure to userspace, but forgot to give the right number of bytes to copy. This allows the calling process to obtain up to PAGE_SIZE bytes from the stack (and possibly adjacent kernel memory). This fix copies only as much as we actually have on the stack (attr->size defaults to the size of the struct) and leaves the rest of the userspace-provided buffer untouched. Found using kmemcheck + trinity. Fixes: d50dde5a10f30 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Cc: Dario Faggioli <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched,numa: add cond_resched to task_numa_workRik van Riel1-0/+2
Normally task_numa_work scans over a fairly small amount of memory, but it is possible to run into a large unpopulated part of virtual memory, with no pages mapped. In that case, task_numa_work can run for a while, and it may make sense to reschedule as required. Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Reported-by: Xing Gang <[email protected]> Tested-by: Chegu Vinod <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/core: Make dl_b->lock IRQ safeJuri Lelli1-4/+6
Fix this lockdep warning: [ 44.804600] ========================================================= [ 44.805746] [ INFO: possible irq lock inversion dependency detected ] [ 44.805746] 3.14.0-rc2-test+ #14 Not tainted [ 44.805746] --------------------------------------------------------- [ 44.805746] bash/3674 just changed the state of lock: [ 44.805746] (&dl_b->lock){+.....}, at: [<ffffffff8106ad15>] sched_rt_handler+0x132/0x248 [ 44.805746] but this lock was taken by another, HARDIRQ-safe lock in the past: [ 44.805746] (&rq->lock){-.-.-.} and interrupts could create inverse lock ordering between them. [ 44.805746] [ 44.805746] other info that might help us debug this: [ 44.805746] Possible interrupt unsafe locking scenario: [ 44.805746] [ 44.805746] CPU0 CPU1 [ 44.805746] ---- ---- [ 44.805746] lock(&dl_b->lock); [ 44.805746] local_irq_disable(); [ 44.805746] lock(&rq->lock); [ 44.805746] lock(&dl_b->lock); [ 44.805746] <Interrupt> [ 44.805746] lock(&rq->lock); by making dl_b->lock acquiring always IRQ safe. Cc: Ingo Molnar <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/core: Fix sched_rt_global_validateJuri Lelli1-1/+2
Don't compare sysctl_sched_rt_runtime against sysctl_sched_rt_period if the former is equal to RUNTIME_INF, otherwise disabling -rt bandwidth management (with CONFIG_RT_GROUP_SCHED=n) fails. Cc: Ingo Molnar <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/deadline: Fix overflow to handle period==0 and deadline!=0Steven Rostedt1-1/+1
While debugging the crash with the bad nr_running accounting, I hit another bug where, after running my sched deadline test, I was getting failures to take a CPU offline. It was giving me a -EBUSY error. Adding a bunch of trace_printk()s around, I found that the cpu notifier that called sched_cpu_inactive() was returning a failure. The overflow value was coming up negative? Talking this over with Juri, the problem is that the total_bw update was suppose to be made by dl_overflow() which, during my tests, seemed to not be called. Adding more trace_printk()s, it wasn't that it wasn't called, but it exited out right away with the check of new_bw being equal to p->dl.dl_bw. The new_bw calculates the ratio between period and runtime. The bug is that if you set a deadline, you do not need to set a period if you plan on the period being equal to the deadline. That is, if period is zero and deadline is not, then the system call should set the period to be equal to the deadline. This is done elsewhere in the code. The fix is easy, check if period is set, and if it is not, then use the deadline. Cc: Juri Lelli <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/deadline: Fix bad accounting of nr_runningJuri Lelli1-4/+2
Rostedt writes: My test suite was locking up hard when enabling mmiotracer. This was due to the mmiotracer placing all but one CPU offline. I found this out when I was able to reproduce the bug with just my stress-cpu-hotplug test. This bug baffled me because it would not always trigger, and would only trigger on the first run after boot up. The stress-cpu-hotplug test would crash hard the first run, or never crash at all. But a new reboot may cause it to crash on the first run again. I spent all week bisecting this, as I couldn't find a consistent reproducer. I finally narrowed it down to the sched deadline patches, and even more peculiar, to the commit that added the sched deadline boot up self test to the latency tracer. Then it dawned on me to what the bug was. All it took was to run a task under sched deadline to screw up the CPU hot plugging. This explained why it would lock up only on the first run of the stress-cpu-hotplug test. The bug happened when the boot up self test of the schedule latency tracer would test a deadline task. The deadline task would corrupt something that would cause CPU hotplug to fail. If it didn't corrupt it, the stress test would always work (there's no other sched deadline tasks that would run to cause problems). If it did corrupt on boot up, the first test would lockup hard. I proved this theory by running my deadline test program on another box, and then run the stress-cpu-hotplug test, and it would now consistently lock up. I could run stress-cpu-hotplug over and over with no problem, but once I ran the deadline test, the next run of the stress-cpu-hotplug would lock hard. After adding lots of tracing to the code, I found the cause. The function tracer showed that migrate_tasks() was stuck in an infinite loop, where rq->nr_running never equaled 1 to break out of it. When I added a trace_printk() to see what that number was, it was 335 and never decrementing! Looking at the deadline code I found: static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { dequeue_dl_entity(&p->dl); dequeue_pushable_dl_task(rq, p); } static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { update_curr_dl(rq); __dequeue_task_dl(rq, p, flags); dec_nr_running(rq); } And this: if (dl_runtime_exceeded(rq, dl_se)) { __dequeue_task_dl(rq, curr, 0); if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) dl_se->dl_throttled = 1; else enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); if (!is_leftmost(curr, &rq->dl)) resched_task(curr); } Notice how we call __dequeue_task_dl() and in the else case we call enqueue_task_dl()? Also notice that dequeue_task_dl() has underscores where enqueue_task_dl() does not. The enqueue_task_dl() calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is where we get nr_running out of sync. [snip] Another point where nr_running can get out of sync is when the dl_timer fires: dl_se->dl_throttled = 0; if (p->on_rq) { enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); if (task_has_dl_policy(rq->curr)) check_preempt_curr_dl(rq, p, 0); else resched_task(rq->curr); This patch does two things: - correctly accounts for throttled tasks (that are now considered !running); - fixes the bug, updating nr_running from {inc,dec}_dl_tasks(), since we risk to update it twice in some situations (e.g., a task is dequeued while it has exceeded its budget). Cc: [email protected] Cc: [email protected] Cc: [email protected] Reported-by: Steven Rostedt <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Tested-by: Steven Rostedt <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-21sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mmMartin Schwidefsky1-1/+3
The finish_arch_post_lock_switch is called at the end of the task switch after all locks have been released. In concept it is paired with the switch_mm function, but the current code only does the call in finish_task_switch. Add the call to idle_task_exit and use_mm. One use case for the additional calls is s390 which will use finish_arch_post_lock_switch to wait for the completion of TLB flush operations. Signed-off-by: Martin Schwidefsky <[email protected]>
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds2-28/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two workqueue fixes. One for an unlikely but possible critical bug during kworker shutdown and the other to make lockdep names a bit more descriptive" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: ensure @task is valid across kthread_stop() workqueue: add args to workqueue lockdep name
2014-02-20user_namespace.c: Remove duplicated word in commentBrian Campbell1-1/+1
Signed-off-by: Brian Campbell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-02-20ftrace: Have static function trace clear ENABLED flag on unregisterSteven Rostedt (Red Hat)1-1/+7
The ENABLED flag needs to be cleared when a ftrace_ops is unregistered otherwise it wont be able to be registered again. This is only for static tracing and does not affect DYNAMIC_FTRACE at all. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Add trace_clock=<clock> kernel parameterSteven Rostedt1-16/+45
Being able to change the trace clock at boot can be advantageous if you need a better source of when things happen across CPUs. The default trace clock is the fastest, but it uses local clocks which may not be synced across CPUs and it does not let you know when events took place with respect to events on other CPUs. The global trace clock can help in this case, and if you do not care about timings, the counter "clock" is the best, as that is just a simple atomic counter that is incremented for every event. Usage is to add "trace_clock=counter" on the kernel command line. You can replace counter with "global" or any of the clocks listed in /sys/kernel/debug/tracing/trace_clock Suggested-by: Thomas Gleixner <[email protected]> Tested-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Appreciated-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing/uprobes: Support mix of ftrace and perfNamhyung Kim1-8/+1
It seems there's no reason to prevent mixed used of ftrace and perf for a single uprobe event. At least the kprobes already support it. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: zhangwei(Jovi) <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing/uprobes: Support event triggeringNamhyung Kim1-2/+4
Add support for event triggering to uprobes. This is same as kprobes support added by Tom (plus cleanup by Steven). Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: zhangwei(Jovi) <[email protected]> Cc: Tom Zanussi <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing/uprobes: Support ftrace_event_file base multibufferzhangwei(Jovi)3-38/+101
Support multi-buffer on uprobe-based dynamic events by using ftrace_event_file. This patch is based kprobe-based dynamic events multibuffer support work initially, commited by Masami(commit 41a7dd420c), but revised as below: Oleg changed the kprobe-based multibuffer design from array-pointers of ftrace_event_file into simple list, so this patch also change to the list design. rcu_read_lock/unlock added into uprobe_trace_func/uretprobe_trace_func, to synchronize with ftrace_event_file list add and delete. Even though we allow multi-uprobes instances now, but TP_FLAG_PROFILE/TP_FLAG_TRACE are still mutually exclusive in probe_event_enable currently, this means we cannot allow one user is using uprobe-tracer, and another user is using perf-probe on same uprobe concurrently. (Perhaps this will be fix in future, kprobe don't have this limitation now) Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Srikar Dronamraju <[email protected]> Signed-off-by: zhangwei(Jovi) <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing/uprobes: Move argument fetching to uprobe_dispatcher()Namhyung Kim1-37/+56
A single uprobe event might serve different users like ftrace and perf. And this is especially important for upcoming multi buffer support. But in this case it'll fetch (same) data from userspace multiple times. So move it to the beginning of the dispatcher function and reuse it for each users. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: zhangwei(Jovi) <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing/uprobes: Rename uprobe_{trace,perf}_print() functionsNamhyung Kim1-6/+6
The uprobe_{trace,perf}_print functions are misnomers since what they do is not printing. There's also a real print function named print_uprobe_event() so they'll only increase confusion IMHO. Rename them with double underscores to follow convention of kprobe. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Srikar Dronamraju <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20ftrace: Allow for function tracing instance to filter functionsSteven Rostedt (Red Hat)4-14/+94
Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance directories to let users filter of functions to trace for the given instance. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20ftrace: Pass in global_ops for use with filtering filesSteven Rostedt (Red Hat)1-4/+8
In preparation for having the function tracing instances be able to filter on functions, the generic filter functions must first be converted to take in the global_ops as a parameter. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20ftrace: Allow instances to use function tracingSteven Rostedt (Red Hat)2-40/+81
Allow instances (sub-buffers) to enable function tracing. Each instance will have its own function tracing capability. For now, instances will not have function stack tracing, or will they be able to pick and choose what functions they can trace. Picking and choosing their own functions will come later. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Convert tracer->enabled to counterSteven Rostedt (Red Hat)2-4/+4
As tracers will soon be used by instances, the tracer enabled field needs to be converted to a counter instead of a boolean. This counter is protected by the trace_types_lock mutex. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Disable tracers before deletion of instanceSteven Rostedt (Red Hat)1-0/+18
When an instance is about to be deleted, make sure the tracer is set to nop. If it isn't reset the tracer and set it to the nop tracer, otherwise memory leaks and bad pointers may result. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20ftrace: Copy ops private to global_ops privateSteven Rostedt (Red Hat)1-9/+8
If global_ops function is being called directly, instead of the global_ops list function, set the global_ops private to be the same as the ops private that's being called directly. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Only let top level have option filesSteven Rostedt (Red Hat)1-3/+5
Currently, only the top level instance can have tracing options. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Set up infrastructure to allow tracers for instancesSteven Rostedt (Red Hat)3-16/+60
Currently the tracers (function, function_graph, irqsoff, etc) can only be used by the top level tracing directory (not for instances). This sets up the infrastructure to allow instances to be able to run a separate tracer apart from the what the top level tracing is doing. As tracers need to adapt for being used by instances, the tracers must flag if they can be used by instances or not. Currently only the 'nop' tracer can be used by all instances. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Pass trace_array to flag_changed callbackSteven Rostedt (Red Hat)4-4/+8
As options (flags) may affect instances instead of being global the flag_changed() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20tracing: Pass trace_array to set_flag callbackSteven Rostedt (Red Hat)8-17/+27
As options (flags) may affect instances instead of being global the set_flag() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <[email protected]>
2014-02-20Merge branch 'master' into for-nextJiri Kosina98-2990/+8298
2014-02-19genirq: Update the a comment typoChuansheng Liu1-1/+1
Change the comment "chasnge" to "change". Signed-off-by: Chuansheng Liu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-19genirq: Provide irq_wake_thread()Thomas Gleixner3-2/+30
In course of the sdhci/sdio discussion with Russell about killing the sdio kthread hackery we discovered the need to be able to wake an interrupt thread from software. The rationale for this is, that sdio hardware can lack proper interrupt support for certain features. So the driver needs to poll the status registers, but at the same time it needs to be woken up by an hardware interrupt. To be able to get rid of the home brewn kthread construct of sdio we need a way to wake an irq thread independent of an actual hardware interrupt. Provide an irq_wake_thread() function which wakes up the thread which is associated to a given dev_id. This allows sdio to invoke the irq thread from the hardware irq handler via the IRQ_WAKE_THREAD return value and provides a possibility to wake it via a timer for the polling scenarios. That allows to simplify the sdio logic significantly. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Russell King <[email protected]> Cc: Chris Ball <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2014-02-19genirq: Provide synchronize_hardirq()Thomas Gleixner1-20/+50
synchronize_irq() waits for hard irq and threaded handlers to complete before returning. For some special cases we only need to make sure that the hard interrupt part of the irq line is not in progress when we disabled the - possibly shared - interrupt at the device level. A proper use case for this was provided by Russell. The sdhci driver requires some irq triggered functions to be run in thread context. The current implementation of the thread context is a sdio private kthread construct, which has quite some shortcomings. These can be avoided when the thread is directly associated to the device interrupt via the generic threaded irq infrastructure. Though there is a corner case related to run time power management where one side disables the device interrupts at the device level and needs to make sure, that an already running hard interrupt handler has completed before proceeding further. Though that hard interrupt handler might wake the associated thread, which in turn can request the runtime PM to reenable the device. Using synchronize_irq() leads to an immediate deadlock of the irq thread waiting for the PM lock and the synchronize_irq() waiting for the irq thread to complete. Due to the fact that it is sufficient for this case to ensure that no hard irq handler is executing a new function which avoids the check for the thread is required. Add a function, which just monitors the hard irq parts and ignores the threaded handlers. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Russell King <[email protected]> Cc: Chris Ball <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2014-02-19sched_clock: Prevent callers from seeing half-updated dataStephen Boyd1-17/+29
The generic sched_clock registration function was previously done lockless, due to the fact that it was expected to be called only once. However, now there are systems that may register multiple sched_clock sources, for which the lack of locking has casued problems: If two sched_clock sources are registered we may end up in a situation where a call to sched_clock() may be accessing the epoch cycle count for the old counter and the cycle count for the new counter. This can lead to confusing results where sched_clock() values jump and then are reset to 0 (due to the way the registration function forces the epoch_ns to be 0). Fix this by reorganizing the registration function to hold the seqlock for as short a time as possible while we update the clock_data structure for a new counter. We also put any accumulated time into epoch_ns instead of resetting the time to 0 so that the clock doesn't reset after each successful registration. [jstultz: Added extra context to the commit message] Reported-by: Will Deacon <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Josh Cartwright <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Stultz <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2014-02-19user_namespace.c: Remove duplicated word in commentBrian Campbell1-1/+1
Signed-off-by: Brian Campbell <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2014-02-19treewide: Fix typo in Documentation/DocBookMasanari Iida2-2/+2
This patch fix spelling typo in Documentation/DocBook. It is because .html and .xml files are generated by make htmldocs, I have to fix a typo within the source files. Signed-off-by: Masanari Iida <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2014-02-18cgroup: update cgroup_enable_task_cg_lists() to grab siglockTejun Heo1-0/+5
Currently, there's nothing preventing cgroup_enable_task_cg_lists() from missing set PF_EXITING and race against cgroup_exit(). Depending on the timing, cgroup_exit() may finish with the task still linked on css_set leading to list corruption. Fix it by grabbing siglock in cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be visible. This whole on-demand cg_list optimization is extremely fragile and has ample possibility to lead to bugs which can cause things like once-a-year oops during boot. I'm wondering whether the better approach would be just adding "cgroup_disable=all" handling which disables the whole cgroup rather than tempting fate with this on-demand craziness. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: [email protected]
2014-02-18cgroup: add a validation check to cgroup_add_cftyps()Li Zefan1-0/+3
Fengguang reported this bug: BUG: unable to handle kernel NULL pointer dereference at 0000003c IP: [<cc90b4ad>] cgroup_cfts_commit+0x27/0x1c1 ... Call Trace: [<cc9d1129>] ? kmem_cache_alloc_trace+0x33f/0x3b7 [<cc90c6fc>] cgroup_add_cftypes+0x8f/0xca [<cd78b646>] cgroup_init+0x6a/0x26a [<cd764d7d>] start_kernel+0x4d7/0x57a [<cd7642ef>] i386_start_kernel+0x92/0x96 This happens in a corner case. If CGROUP_SCHED=y but CFS_BANDWIDTH=n && FAIR_GROUP_SCHED=n && RT_GROUP_SCHED=n, we have: cpu_files[] = { { } /* terminate */ } When we pass cpu_files to cgroup_apply_cftypes(), as cpu_files[0].ss is NULL, we'll access NULL pointer. The bug was introduced by commit de00ffa56ea3132c6013fc8f07133b8a1014cf53 ("cgroup: make cgroup_subsys->base_cftypes use cgroup_add_cftypes()"). Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2014-02-18workqueue: ensure @task is valid across kthread_stop()Lai Jiangshan1-0/+7
When a kworker should die, the kworkre is notified through WORKER_DIE flag instead of kthread_should_stop(). This, IIRC, is primarily to keep the test synchronized inside worker_pool lock. WORKER_DIE is first set while holding pool->lock, the lock is dropped and kthread_stop() is called. Unfortunately, this means that there's a slight chance that the target kworker may see WORKER_DIE before kthread_stop() finishes and exits and frees the target task before or during kthread_stop(). Fix it by pinning the target task before setting WORKER_DIE and putting it after kthread_stop() is done. tj: Improved patch description and comment. Moved pinning above WORKER_DIE for better signify what it's protecting. CC: [email protected] Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2014-02-18inotify: Fix reporting of cookies for inotify eventsJan Kara2-2/+2
My rework of handling of notification events (namely commit 7053aee26a35 "fsnotify: do not share events between notification groups") broke sending of cookies with inotify events. We didn't propagate the value passed to fsnotify() properly and passed 4 uninitialized bytes to userspace instead (so it is also an information leak). Sadly I didn't notice this during my testing because inotify cookies aren't used very much and LTP inotify tests ignore them. Fix the problem by passing the cookie value properly. Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c Reported-by: Vegard Nossum <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2014-02-17rcu: Optimize RCU_FAST_NO_HZ for RCU_NOCB_CPU_ALLPaul E. McKenney1-2/+5
If CONFIG_RCU_NOCB_CPU_ALL=y, then no CPU will ever have RCU callbacks because these callbacks will instead be handled by the rcuo kthreads. However, the current version of RCU_FAST_NO_HZ nevertheless checks for RCU callbacks. This commit therefore creates static inline implementations of rcu_prepare_for_idle() and rcu_cleanup_after_idle() that are no-ops when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Optimize rcu_needs_cpu() for RCU_NOCB_CPU_ALLPaul E. McKenney2-1/+5
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always return false, however, the current version nevertheless checks for RCU callbacks. This commit therefore creates a static inline implementation of rcu_needs_cpu() that unconditionally returns false when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Optimize rcu_is_nocb_cpu() for RCU_NOCB_CPU_ALLPaul E. McKenney1-0/+2
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_is_nocb_cpu() will always return true, however, the current version nevertheless checks rcu_nocb_mask. This commit therefore creates a static inline implementation of rcu_is_nocb_cpu() that unconditionally returns true when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Move SRCU grace period work to power efficient workqueueShaibal Dutta1-2/+3
For better use of CPU idle time, allow the scheduler to select the CPU on which the SRCU grace period work would be scheduled. This improves idle residency time and conserves power. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Cc: Lai Jiangshan <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dipankar Sarma <[email protected]> Signed-off-by: Shaibal Dutta <[email protected]> [[email protected]: Rebased to latest kernel version. Added commit message. Fixed code alignment.] Signed-off-by: Zoran Markovic <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Disambiguate CONFIG_RCU_NOCB_CPUsPaul Bolle1-1/+1
This commit fixes a grammar issue in the rcu_nohz_full_cpu() comment header, so that it is clear that the plural is CPUs not Kconfig options. Signed-off-by: Paul Bolle <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Remove ACCESS_ONCE() from jiffiesPaul E. McKenney2-5/+5
Because jiffies is one of a very few variables marked "volatile", there is no need to use ACCESS_ONCE() when accessing it. This commit therefore removes the redundant ACCESS_ONCE() wrappers. Reported by: Eric Dumazet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Stop tracking FSF's postal addressPaul E. McKenney10-20/+20
All of the RCU source files have the usual GPL header, which contains a long-obsolete postal address for FSF. To avoid the need to track the FSF office's movements, this commit substitutes the URL where GPL may be found. Reported-by: Greg KH <[email protected]> Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17rcu: Add ACCESS_ONCE() to ->n_force_qs_lh accessesPaul E. McKenney2-3/+3
The ->n_force_qs_lh field is accessed without the benefit of any synchronization, so this commit adds the needed ACCESS_ONCE() wrappers. Yes, increments to ->n_force_qs_lh can be lost, but contention should be low and the field is strictly statistical in nature, so this is not a problem. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2014-02-17printk: fix syslog() overflowing user bufferLinus Torvalds1-2/+0
This is not a buffer overflow in the traditional sense: we don't overflow any *kernel* buffers, but we do mis-count the amount of data we copy back to user space for the SYSLOG_ACTION_READ_ALL case. In particular, if the user buffer is too small to hold everything, and *if* there is a continuation line at just the right place, we can end up giving the user more data than he asked for. The reason is that we first count up the number of bytes all the log records contains, then we walk the records again until we've skipped the records at the beginning that won't fit, and then we walk the rest of the records and copy them to the user space buffer. And in between that "skip the initial records that won't fit" and the "copy the records that *will* fit to user space", we reset the 'prev' variable that contained the record information for the last record not copied. That meant that when we started copying to user space, we now had a different character count than what we had originally calculated in the first record walk-through. The fix is to simply not clear the 'prev' flags value (in both cases where we had the same logic: syslog_print_all and kmsg_dump_get_buffer: the latter is used for pstore-like dumping) Reported-and-tested-by: Debabrata Banerjee <[email protected]> Acked-by: Kay Sievers <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Mahoney <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>