aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2013-02-05Merge tag 'full-dynticks-cputime-for-mingo' of ↵Ingo Molnar13-85/+385
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker: "This implements the cputime accounting on full dynticks CPUs. Typical cputime stats infrastructure relies on the timer tick and its periodic polling on the CPU to account the amount of time spent by the CPUs and the tasks per high level domains such as userspace, kernelspace, guest, ... Now we are preparing to implement full dynticks capability on Linux for Real Time and HPC users who want full CPU isolation. This feature requires a cputime accounting that doesn't depend on the timer tick. To implement it, this new cputime infrastructure plugs into kernel/user/guest boundaries to take snapshots of cputime and flush these to the stats when needed. This performs pretty much like CONFIG_VIRT_CPU_ACCOUNTING except that context location and cputime snaphots are synchronized between write and read side such that the latter can safely retrieve the pending tickless cputime of a task and add it to its latest cputime snapshot to return the correct result to the user." Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2013-02-05sched: Fix signedness bug in yield_to()Dan Carpenter1-1/+1
In 7b270f6099 "sched: Bail out of yield_to when source and target runqueue has one task" we changed this to store -ESRCH so it needs to be signed. Signed-off-by: Dan Carpenter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Steven Rostedt <[email protected]> Cc: Mike Galbraith <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-02-05hrtimer: Prevent hrtimer_enqueue_reprogram raceLeonid Shatz1-18/+18
hrtimer_enqueue_reprogram contains a race which could result in timer.base switch during unlock/lock sequence. hrtimer_enqueue_reprogram is releasing the lock protecting the timer base for calling raise_softirq_irqsoff() due to a lock ordering issue versus rq->lock. If during that time another CPU calls __hrtimer_start_range_ns() on the same hrtimer, the timer base might switch, before the current CPU can lock base->lock again and therefor the unlock_timer_base() call will unlock the wrong lock. [ tglx: Added comment and massaged changelog ] Signed-off-by: Leonid Shatz <[email protected]> Signed-off-by: Izik Eidus <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2013-02-05Merge branch 'nohz/printk-v8' into irq/coreFrederic Weisbecker4-46/+110
Conflicts: kernel/irq_work.c Add support for printk in full dynticks CPU. * Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. * Flush irq works before CPU offlining. * Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. * Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. Signed-off-by: Frederic Weisbecker <[email protected]>
2013-02-05Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three small fixlets" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix format string for 32-bit platforms sched: Fix warning in kernel/sched/fair.c sched/rt: Use root_domain of rt_rq not current processor
2013-02-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-2/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Three fixlets and two small (and low risk) hw-enablement changes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix event group context move x86/perf: Add IvyBridge EP support perf/x86: Fix P6 driver section warning arch/x86/tools/insn_sanity.c: Identify source of messages perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
2013-02-05Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull two small RCU fixlets from Ingo Molnar. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make rcu_nocb_poll an early_param instead of module_param rcu: Prevent soft-lockup complaints about no-CBs CPUs
2013-02-04rcu: Allow rcutorture to be built at low optimization levelsSteven Rostedt1-7/+16
The uses of trace_clock_local() are dead code when CONFIG_RCU_TRACE=n, but some compilers might nevertheless generate code calling this function. This commit therefore ensures that trace_clock_local() is invoked only when CONFIG_RCU_TRACE=y. Reported-by: Tetsuo Handa <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-02-04sched: Fix select_idle_sibling() bouncing cow syndromeMike Galbraith1-14/+7
If the previous CPU is cache affine and idle, select it. The current implementation simply traverses the sd_llc domain, taking the first idle CPU encountered, which walks buddy pairs hand in hand over the package, inflicting excruciating pain. 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs Signed-off-by: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-02-04Merge branch 'rcu/next' of ↵Ingo Molnar10-117/+431
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: 1. Changes to rcutorture and to RCU documentation. Posted to LKML at https://lkml.org/lkml/2013/1/26/188. 2. Enhancements to uniprocessor handling in tiny RCU. Posted to LKML at https://lkml.org/lkml/2013/1/27/2. 3. Tag RCU callbacks with grace-period number to simplify callback advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203. 4. Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204. Signed-off-by: Ingo Molnar <[email protected]>
2013-02-04irq_work: Remove return value from the irq_work_queue() functionanish kumar1-21/+10
As no one is using the return value of irq_work_queue(), so it is better to just make it void. Signed-off-by: anish kumar <[email protected]> Acked-by: Steven Rostedt <[email protected]> [ Fix stale comments, remove now unnecessary __irq_work_queue() intermediate function ] Signed-off-by: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-02-04Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux ↵Thomas Gleixner4-15/+53
into timers/core Trivial conflict in arch/x86/Kconfig Signed-off-by: Thomas Gleixner <[email protected]>
2013-02-03sched/rt: Further simplify pick_rt_task()Kirill Tkhai1-2/+1
Function next_prio() has been removed and pull_rt_task() is the only user of pick_next_highest_task_rt() at the moment. pull_rt_task is not interested in p->nr_cpus_allowed, its only interest is the fact that cpu is allowed to execute p. If nr_cpus_allowed == 1, cpu != task_cpu(p) and cpu is allowed then it means that task p is in the middle of the migration techniques; the task waits until it is moved by migration thread. So, lets pull it earlier. Signed-off-by: Kirill V Tkhai <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> CC: linux-rt-users <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-02-03perf: Fix event group context moveJiri Olsa1-2/+18
When we have group with mixed events (hw/sw) we want to end up with group leader being in hw context. So if group leader is initialy sw event, we move all the events under hw context. The move is done for each event by removing it from its context and adding it back into proper one. As a part of the removal the event is automatically disabled, which is not what we want at this stage of creating groups. The fix is to initialize event state after removal from sw context. This fix resulted from the following discussion: http://thread.gmane.org/gmane.linux.kernel.perf.user/1144 Reported-by: Andreas Hollmann <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-02-03Merge branch 'tip/perf/core' of ↵Ingo Molnar6-61/+200
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updated from Steve Rostedt. Signed-off-by: Ingo Molnar <[email protected]>
2013-02-01tracing: Init current_trace to nop_trace and remove NULL checksSteven Rostedt (Red Hat)1-18/+12
On early boot up, when the ftrace ring buffer is initialized, the static variable current_trace is initialized to &nop_trace. Before this initialization, current_trace is NULL and will never become NULL again. It is always reassigned to a ftrace tracer. Several places check if current_trace is NULL before it uses it, and this check is frivolous, because at the point in time when the checks are made the only way current_trace could be NULL is if ftrace failed its allocations at boot up, and the paths to these locations would probably not be possible. By initializing current_trace to &nop_trace where it is declared, current_trace will never be NULL, and we can remove all these checks of current_trace being NULL which never needed to be checked in the first place. Cc: Dan Carpenter <[email protected]> Cc: Hiraku Toyooka <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-31clockevents: Add generic timer broadcast functionMark Rutland2-0/+17
Currently, the timer broadcast mechanism is defined by a function pointer on struct clock_event_device. As the fundamental mechanism for broadcast is architecture-specific, this means that clock_event_device drivers cannot be shared across multiple architectures. This patch adds an (optional) architecture-specific function for timer tick broadcast, allowing drivers which may require broadcast functionality to be shared across multiple architectures. Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Santosh Shilimkar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Tested-by: Santosh Shilimkar <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2013-01-31clockevents: Add generic timer broadcast receiverMark Rutland1-0/+17
Currently the broadcast mechanism used for timers is abstracted by a function pointer on struct clock_event_device. As the fundamental mechanism for broadcast is architecture-specific, this ties each clock_event_device driver to a single architecture, even where the driver is otherwise generic. This patch adds a standard path for the receipt of timer broadcasts, so drivers and/or architecture backends need not manage redundant lists of timers for the purpose of routing broadcast timer ticks. [tglx: Made the implementation depend on the config switch as well ] Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Santosh Shilimkar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Tested-by: Santosh Shilimkar <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2013-01-31sched/rt: Do not account zero delta_exec in update_curr_rt()Kirill Tkhai1-2/+2
There are several places of consecutive calls of dequeue_task_rt() and put_prev_task_rt() in the scheduler. For example, function rt_mutex_setprio() does it. The both calls lead to update_curr_rt(), the second of it receives zeroed delta_exec. The only effective action in this case is call of sched_rt_avg_update(), which can change rq->age_stamp and rq->rt_avg. But it is possible in case of ""floating"" rq->clock. This fact is not reasonable to be accounted. Another actions do nothing. Signed-off-by: Kirill V Tkhai <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> CC: linux-rt-users <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-01-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of miscellaneous fixes, the most important one is the fix for the Samsung laptop bricking issue (auto-blacklisting the samsung-laptop driver); the efi_enabled() changes you see below are prerequisites for that fix. The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI debugging, and requiring CAP_SYS_RAWIO for MSR references, just as with I/O port references." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: samsung-laptop: Disable on EFI hardware efi: Make 'efi_enabled' a function to query EFI facilities smp: Fix SMP function call empty cpu mask race x86/msr: Add capabilities check x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES x86/olpc: Fix olpc-xo1-sci.c build errors arch/x86/platform/uv: Fix incorrect tlb flush all issue x86-64: Fix unwind annotations in recent NMI changes x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-31Revert "console: implement lockdep support for console_lock"Dave Airlie1-9/+0
This reverts commit daee779718a319ff9f83e1ba3339334ac650bb22. I'll requeue this after the console locking fixes, so lockdep is useful again for people until fbcon is fixed. Signed-off-by: Dave Airlie <[email protected]>
2013-01-30tracing: Make a snapshot feature available from userspaceHiraku Toyooka3-26/+151
Ftrace has a snapshot feature available from kernel space and latency tracers (e.g. irqsoff) are using it. This patch enables user applictions to take a snapshot via debugfs. Add "snapshot" debugfs file in "tracing" directory. snapshot: This is used to take a snapshot and to read the output of the snapshot. # echo 1 > snapshot This will allocate the spare buffer for snapshot (if it is not allocated), and take a snapshot. # cat snapshot This will show contents of the snapshot. # echo 0 > snapshot This will free the snapshot if it is allocated. Any other positive values will clear the snapshot contents if the snapshot is allocated, or return EINVAL if it is not allocated. Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia Cc: Jiri Olsa <[email protected]> Cc: David Sharp <[email protected]> Signed-off-by: Hiraku Toyooka <[email protected]> [ Fixed irqsoff selftest and also a conflict with a change that fixes the update_max_tr. ] Signed-off-by: Steven Rostedt <[email protected]>
2013-01-30tracing: Replace static old_tracer check of tracer nameHiraku Toyooka1-13/+9
Currently the trace buffer read functions use a static variable "old_tracer" for detecting if the current tracer changes. This was suitable for a single trace file ("trace"), but to add a snapshot feature that will use the same function for its file, a check against a static variable is not sufficient. To use the output functions for two different files, instead of storing the current tracer in a static variable, as the trace iterator descriptor contains a pointer to the original current tracer's name, that pointer can now be used to check if the current tracer has changed between different reads of the trace file. Link: http://lkml.kernel.org/r/20121226025252.3252.9276.stgit@liselsia Signed-off-by: Hiraku Toyooka <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-30tracing: Use sched_clock_cpu for trace_clock_globalNamhyung Kim1-1/+1
For systems with an unstable sched_clock, all cpu_clock() does is enable/ disable local irq during the call to sched_clock_cpu(). And for stable systems they are same. trace_clock_global() already disables interrupts, so it can call sched_clock_cpu() directly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-30ring-buffer: Add stats field for amount read from trace ring bufferSteven Rostedt (Red Hat)2-0/+21
Add a stat about the number of events read from the ring buffer: # cat /debug/tracing/per_cpu/cpu0/stats entries: 39869 overrun: 870512 commit overrun: 0 bytes: 1449912 oldest event ts: 6561.368690 now ts: 6565.246426 dropped events: 0 read events: 112 <-- Added Signed-off-by: Steven Rostedt <[email protected]>
2013-01-29timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCKJohn Stultz1-1/+1
Jason pointed out the HAS_PERSISTENT_CLOCK name isn't quite accurate for the config, as some systems may have the persistent_clock in some cases, but not always. So change the config name to the more clear ALWAYS_USE_PERSISTENT_CLOCK. Signed-off-by: John Stultz <[email protected]>
2013-01-29tracing/fgraph: Adjust fgraph depth before calling trace return callbackSteven Rostedt (Red Hat)1-1/+7
While debugging the virtual cputime with the function graph tracer with a max_depth of 1 (most common use of the max_depth so far), I found that I was missing kernel execution because of a race condition. The code for the return side of the function has a slight race: ftrace_pop_return_trace(&trace, &ret, frame_pointer); trace.rettime = trace_clock_local(); ftrace_graph_return(&trace); barrier(); current->curr_ret_stack--; The ftrace_pop_return_trace() initializes the trace structure for the callback. The ftrace_graph_return() uses the trace structure for its own use as that structure is on the stack and is local to this function. Then the curr_ret_stack is decremented which is what the trace.depth is set to. If an interrupt comes in after the ftrace_graph_return() but before the curr_ret_stack, then the called function will get a depth of 2. If max_depth is set to 1 this function will be ignored. The problem is that the trace has already been called, and the timestamp for that trace will not reflect the time the function was about to re-enter userspace. Calls to the interrupt will not be traced because the max_depth has prevented this. To solve this issue, the ftrace_graph_return() can safely be moved after the current->curr_ret_stack has been updated. This way the timestamp for the return callback will reflect the actual time. If an interrupt comes in after the curr_ret_stack update and ftrace_graph_return(), it will be traced. It may look a little confusing to see it within the other function, but at least it will not be lost. Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller10-98/+262
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug fixes that some net-next work will build upon. Signed-off-by: David S. Miller <[email protected]>
2013-01-29tracing: Remove second iterator initializerJovi Zhang1-4/+1
The trace iterator is already initialized by trace_init_global_iter(), so there is no need to initialize it again. Link: http://lkml.kernel.org/r/CACV3sb+G1YnO6168JhY3dEadmJi58pA5-2cSZT8E0WVHJNFt9Q@mail.gmail.com Signed-off-by: Jovi Zhang <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-28Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', ↵Paul E. McKenney8-100/+370
'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation. fixes.2013.01.26a: Miscellaneous fixes. tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to simplify callback advancement. tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.
2013-01-28rcu: Make rcutorture's shuffler task shuffle recently added tasksPaul E. McKenney1-4/+20
A number of kthreads have been added to rcutorture, but the shuffler task was not informed of them, and thus did not shuffle them. This commit therefore adds the requisite shuffling, and, while in the area fixes up some whitespace issues. However, the shuffling is intended to keep randomly selected CPUs idle, which means that the RCU priority boosting kthreads need to avoid waking up every jiffy. This commit also makes that fix. Signed-off-by: Paul E. McKenney <[email protected]>
2013-01-28rcu: Provide RCU CPU stall warnings for tiny RCUPaul E. McKenney6-50/+121
Tiny RCU has historically omitted RCU CPU stall warnings in order to reduce memory requirements, however, lack of these warnings caused Thomas Gleixner some debugging pain recently. Therefore, this commit adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps the memory footprint small, while still enabling CPU stall warnings in kernels built to enable them. Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON config variable to simplify #if expressions. Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2013-01-28smp: Fix SMP function call empty cpu mask raceWang YanQing1-1/+12
I get the following warning every day with v3.7, once or twice a day: [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8() As explained by Linus as well: | | Once we've done the "list_add_rcu()" to add it to the | queue, we can have (another) IPI to the target CPU that can | now see it and clear the mask. | | So by the time we get to actually send the IPI, the mask might | have been cleared by another IPI. | This patch also fixes a system hang problem, if the data->cpumask gets cleared after passing this point: if (WARN_ONCE(!mask, "empty IPI mask")) return; then the problem in commit 83d349f35e1a ("x86: don't send an IPI to the empty set of CPU's") will happen again. Signed-off-by: Wang YanQing <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Jan Beulich <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: <[email protected]> Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight [ Tidied up the changelog and the comment in the code. ] Signed-off-by: Ingo Molnar <[email protected]>
2013-01-27cputime: Safely read cputime of full dynticks CPUsFrederic Weisbecker5-16/+211
While remotely reading the cputime of a task running in a full dynticks CPU, the values stored in utime/stime fields of struct task_struct may be stale. Its values may be those of the last kernel <-> user transition time snapshot and we need to add the tickless time spent since this snapshot. To fix this, flush the cputime of the dynticks CPUs on kernel <-> user transition and record the time / context where we did this. Then on top of this snapshot and the current time, perform the fixup on the reader side from task_times() accessors. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> [fixed kvm module related build errors] Signed-off-by: Sedat Dilek <[email protected]>
2013-01-27kvm: Prepare to add generic guest entry/exit callbacksFrederic Weisbecker1-10/+0
Do some ground preparatory work before adding guest_enter() and guest_exit() context tracking callbacks. Those will be later used to read the guest cputime safely when we run in full dynticks mode. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-27cputime: Use accessors to read task cputime statsFrederic Weisbecker8-35/+89
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-27cputime: Allow dynamic switch between tick/virtual based cputime accountingFrederic Weisbecker2-11/+35
Allow to dynamically switch between tick and virtual based cputime accounting. This way we can provide a kind of "on-demand" virtual based cputime accounting. In this mode, the kernel relies on the context tracking subsystem to dynamically probe on kernel boundaries. This is in preparation for being able to stop the timer tick in more places than just the idle state. Doing so will depend on CONFIG_VIRT_CPU_ACCOUNTING_GEN which makes it possible to account the cputime without the tick by hooking on kernel/user boundaries. Depending whether the tick is stopped or not, we can switch between tick and vtime based accounting anytime in order to minimize the overhead associated to user hooks. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-27cputime: Generic on-demand virtual cputime accountingFrederic Weisbecker2-6/+61
If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-27cputime: Move default nsecs_to_cputime() to jiffies based cputime fileFrederic Weisbecker1-4/+0
If the architecture doesn't provide an implementation of nsecs_to_cputime(), the cputime accounting core uses a default one that converts the nanoseconds to jiffies. However this only makes sense if we use the jiffies based cputime. For now it doesn't matter much because this API is only called on code that uses jiffies based cputime accounting. But the code may evolve and this API may be used more broadly in the future. Keeping this default implementation around is very error prone as it may introduce a bug and hide it on architectures that don't override this API. Fix this by moving this definition to the jiffies based cputime headers as it is the only place where it belongs to. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-27cputime: Avoid multiplication overflow on utime scalingFrederic Weisbecker1-9/+9
We scale stime, utime values based on rtime (sum_exec_runtime converted to jiffies). During scaling we multiple rtime * utime, which seems to be fine, since both values are converted to u64, but it's not. Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads, run for 1 day, threads utilize 100% cpu on user space. Machine has 64 cpus. Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies, which is 0x149970000. Multiplication rtime * utime result is 0x1a855771100000000, which can not be covered in 64 bits. Result of overflow is stall of utime values visible in user space (prev_utime in kernel), even if application still consume lot of CPU time. A solution to solve this is to perform the multiplication on stime instead of utime. It's easy to grow the utime value fast with a CPU bound thread in userspace for example. Now we assume that doing so with stime is much harder. In most cases a task shouldn't ever spend much time in kernel space as it tends to sleep waiting for jobs completion when they take long to achieve. IO is the typical example of that. Hence scaling the cputime by performing the multiplication on stime instead of utime should considerably reduce the chances of an overflow on most workloads. This is largely inspired by a patch from Stanislaw Gruszka: http://lkml.kernel.org/r/[email protected] Inspired-by: Stanislaw Gruszka <[email protected]> Reported-by: Stanislaw Gruszka <[email protected]> Acked-by: Stanislaw Gruszka <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-01-26context_tracking: Add comments on interface and internalsFrederic Weisbecker1-10/+65
This subsystem lacks many explanations on its purpose and design. Add these missing comments. v4: Document function parameter to be more kernel-doc friendly, as per Namhyung suggestion. Reported-by: Andrew Morton <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Alessio Igor Bogani <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Geoff Levand <[email protected]> Cc: Gilad Ben Yossef <[email protected]> Cc: Hakan Akkan <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2013-01-26rcu: Remove unused code originally used for context trackingLi Zhong2-7/+0
As context tracking subsystem evolved, it stopped using ignore_user_qs and in_user defined in the rcu_dynticks structure. This commit therefore removes them. Signed-off-by: Li Zhong <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Frederic Weisbecker <[email protected]>
2013-01-26rcu: Correct 'optimized' to 'optimize' in header commentCody P Schafer1-1/+1
Small grammar fix in rcutree comment regarding 'rcu_scheduler_active' var. Signed-off-by: Cody P Schafer <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2013-01-26context_tracking: Export context state for generic vtimeFrederic Weisbecker1-15/+1
Export the context state: whether we run in user / kernel from the context tracking subsystem point of view. This is going to be used by the generic virtual cputime accounting subsystem that is needed to implement the full dynticks. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2013-01-25tracing: Use __this_cpu_inc/dec operation instead of __get_cpu_varShan Wei1-2/+2
__this_cpu_inc_return() or __this_cpu_dec generates a single instruction, which is faster than __get_cpu_var operation. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Christoph Lameter <[email protected]> Signed-off-by: Shan Wei <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-01-26PM / tracing: remove deprecated power trace APIPaul Gortmaker2-18/+0
The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <[email protected]> Acked-by: Steven Rostedt <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-01-26PM: don't use [delayed_]work_pending()Tejun Heo2-7/+4
There's no need to test whether a (delayed) work item is pending before queueing, flushing or cancelling it, so remove work_pending() tests used in those cases. Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-01-25async: initialise list heads to fix crashJames Hogan1-0/+2
9fdb04cdc55 ("async: replace list of active domains with global list of pending items") added a struct list_head global_list in struct async_entry, which isn't initialised. This means that if !domain->registered at __async_schedule(), then list_del_init() will be called on the list head in async_run_entry_fn with both pointers NULL, causing a crash. This is fixed by initialising both the global_list and domain_list list_heads after kzalloc'ing the entry. This was noticed due to dapm_power_widgets() which uses ASYNC_DOMAIN_EXCLUSIVE, which initialises the domain->registered to 0. Signed-off-by: James Hogan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Reported-by: James Hogan <[email protected]> Reported-by: Stephen Warren <[email protected]>
2013-01-25sched/debug: Fix format string for 32-bit platformsArnd Bergmann1-2/+2
The type returned from atomic64_t can be either unsigned long or unsigned long long, depending on the architecture. Using a cast to unsigned long long lets us use the same format string for all architectures. Without this patch, building with scheduler debugging enabled results in: kernel/sched/debug.c: In function 'print_cfs_rq': kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat] kernel/sched/debug.c:225:2: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Wformat] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-01-25sched: Fix warning in kernel/sched/fair.cArnd Bergmann1-1/+1
a4c96ae319 "sched: Unthrottle rt runqueues in __disable_runtime()" turned the unthrottle_offline_cfs_rqs function into a static symbol, which now triggers a warning about it being potentially unused: kernel/sched/fair.c:2055:13: warning: 'unthrottle_offline_cfs_rqs' defined but not used [-Wunused-function] Marking it __maybe_unused shuts up the gcc warning and lets the compiler safely drop the function body when it's not being used. To reproduce, build the ARM bcm2835_defconfig. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Peter Boonstoppel <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>