aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched.c
AgeCommit message (Collapse)AuthorFilesLines
2010-02-16sched: Fix SMT scheduler regression in find_busiest_queue()Suresh Siddha1-2/+13
Fix a SMT scheduler performance regression that is leading to a scenario where SMT threads in one core are completely idle while both the SMT threads in another core (on the same socket) are busy. This is caused by this commit (with the problematic code highlighted) commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1 Author: Peter Zijlstra <[email protected]> Date: Tue Sep 1 10:34:38 2009 +0200 sched: Try to deal with low capacity @@ -4203,15 +4223,18 @@ find_busiest_queue() ... for_each_cpu(i, sched_group_cpus(group)) { + unsigned long power = power_of(i); ... - wl = weighted_cpuload(i); + wl = weighted_cpuload(i) * SCHED_LOAD_SCALE; + wl /= power; - if (rq->nr_running == 1 && wl > imbalance) + if (capacity && rq->nr_running == 1 && wl > imbalance) continue; On a SMT system, power of the HT logical cpu will be 589 and the scheduler load imbalance (for scenarios like the one mentioned above) can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling the weighted load with the power will result in "wl > imbalance" and ultimately resulting in find_busiest_queue() return NULL, causing load_balance() to think that the load is well balanced. But infact one of the tasks can be moved to the idle core for optimal performance. We don't need to use the weighted load (wl) scaled by the cpu power to compare with imabalance. In that condition, we already know there is only a single task "rq->nr_running == 1" and the comparison between imbalance, wl is to make sure that we select the correct priority thread which matches imbalance. So we really need to compare the imabalnce with the original weighted load of the cpu and not the scaled load. But in other conditions where we want the most hammered(busiest) cpu, we can use scaled load to ensure that we consider the cpu power in addition to the actual load on that cpu, so that we can move the load away from the guy that is getting most hammered with respect to the actual capacity, as compared with the rest of the cpu's in that busiest group. Fix it. Reported-by: Ma Ling <[email protected]> Initial-Analysis-by: Zhang, Yanmin <[email protected]> Signed-off-by: Suresh Siddha <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Cc: [email protected] [2.6.32.x] Signed-off-by: Thomas Gleixner <[email protected]>
2010-02-08sched: cpuacct: Use bigger percpu counter batch values for stats countersAnton Blanchard1-1/+19
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are enabled we can call cpuacct_update_stats with values much larger than percpu_counter_batch. This means the call to percpu_counter_add will always add to the global count which is protected by a spinlock and we end up with a global spinlock in the scheduler. Based on an idea by KOSAKI Motohiro, this patch scales the batch value by cputime_one_jiffy such that we have the same batch limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled. His patch did this once at boot but that initialisation happened too early on PowerPC (before time_init) and it was never updated at runtime as a result of a hotplug cpu add/remove. This patch instead scales percpu_counter_batch by cputime_one_jiffy at runtime, which keeps the batch correct even after cpu hotplug operations. We cap it at INT_MAX in case of overflow. For architectures that do not support CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1 and gcc is smart enough to optimise min(s32 percpu_counter_batch, INT_MAX) to just percpu_counter_batch at least on x86 and PowerPC. So there is no need to add an #ifdef. On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT disabled kernel: CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec Tested with: wget http://ozlabs.org/~anton/junkcode/context_switch.c make context_switch for i in `seq 0 63`; do taskset -c $i ./context_switch & done vmstat 1 Signed-off-by: Anton Blanchard <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Acked-by: Balbir Singh <[email protected]> Tested-by: Balbir Singh <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "Luck, Tony" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-02-08Merge branch 'sched/urgent' into sched/coreIngo Molnar1-13/+31
Merge reason: Merge dependent fix, update to latest -rc. Signed-off-by: Ingo Molnar <[email protected]>
2010-02-08kernel/sched.c: Suppress unused var warningAndrew Morton1-1/+1
On UP: kernel/sched.c: In function 'wake_up_new_task': kernel/sched.c:2631: warning: unused variable 'cpu' Signed-off-by: Andrew Morton <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-02-04sched: Remove member rt_se from struct rt_rqYong Zhang1-2/+0
It's a duplicate of tg->rt_se[cpu] and the only usage is sched_rt_rq_dequeue() and sched_rt_rq_enqueue(). After the first patch to those two function. rt_se can be removed. Signed-off-by: Yong Zhang <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-02-02sched: Remove unused update_shares_locked()Peter Zijlstra1-14/+0
Commit f492e12ef050e02bf0185b6b57874992591b9be1 ("sched: Remove load_balance_newidle()") removed the only user of this function, so remove it too. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1265019219.24455.128.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-29Merge branch 'perf/urgent' into perf/coreIngo Molnar1-1/+4
Merge reason: We want to queue up a dependent patch. Also update to later -rc's. Signed-off-by: Ingo Molnar <[email protected]>
2010-01-22sched: Queue a deboosted task to the head of the RT prio queueThomas Gleixner1-1/+1
rtmutex_set_prio() is used to implement priority inheritance for futexes. When a task is deboosted it gets enqueued at the tail of its RT priority list. This is violating the POSIX scheduling semantics: rt priority list X contains two runnable tasks A and B task A runs with priority X and holds mutex M task C preempts A and is blocked on mutex M -> task A is boosted to priority of task C (Y) task A unlocks the mutex M and deboosts itself -> A is dequeued from rt priority list Y -> A is enqueued to the tail of rt priority list X task C schedules away task B runs This is wrong as task A did not schedule away and therefor violates the POSIX scheduling semantics. Enqueue the task to the head of the priority list instead. Reported-by: Mathias Weber <[email protected]> Reported-by: Carsten Emde <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Tested-by: Carsten Emde <[email protected]> Tested-by: Mathias Weber <[email protected]> LKML-Reference: <[email protected]>
2010-01-22sched: Extend enqueue_task to allow head queueingThomas Gleixner1-6/+7
The ability of enqueueing a task to the head of a SCHED_FIFO priority list is required to fix some violations of POSIX scheduling policy. Extend the related functions with a "head" argument. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Tested-by: Carsten Emde <[email protected]> Tested-by: Mathias Weber <[email protected]> LKML-Reference: <[email protected]>
2010-01-21sched: Fix fork vs hotplug vs cpuset namespacesPeter Zijlstra1-12/+27
There are a number of issues: 1) TASK_WAKING vs cgroup_clone (cpusets) copy_process(): sched_fork() child->state = TASK_WAKING; /* waiting for wake_up_new_task() */ if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); while (child->state == TASK_WAKING) cpu_relax(); will deadlock the system. 2) cgroup_clone (cpusets) vs copy_process So even if the above would work we still have: copy_process(): if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); ... p->cpus_allowed = current->cpus_allowed over-writing the modified cpus_allowed. 3) fork() vs hotplug if we unplug the child's cpu after the sanity check when the child gets attached to the task_list but before wake_up_new_task() shit will meet with fan. Solve all these issues by moving fork cpu selection into wake_up_new_task(). Reported-by: Serge E. Hallyn <[email protected]> Tested-by: Serge E. Hallyn <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1264106190.4283.1314.camel@laptop> Signed-off-by: Thomas Gleixner <[email protected]>
2010-01-21sched: Remove USER_SCHEDDhaval Giani1-107/+7
Remove the USER_SCHED feature. It has been scheduled to be removed in 2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2 Signed-off-by: Dhaval Giani <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1263990378.24844.3.camel@localhost> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21sched: Remove the sched_class load_balance methodsPeter Zijlstra1-26/+0
Take out the sched_class methods for load-balancing. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21sched: Move load balance code into sched_fair.cPeter Zijlstra1-1840/+79
Straight fwd code movement. Since non of the load-balance abstractions are used anymore, do away with them and simplify the code some. In preparation move the code around. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21sched: Reassign prev and switch_count when reacquire_kernel_lock() failYong Zhang1-1/+4
Assume A->B schedule is processing, if B have acquired BKL before and it need reschedule this time. Then on B's context, it will go to need_resched_nonpreemptible for reschedule. But at this time, prev and switch_count are related to A. It's wrong and will lead to incorrect scheduler statistics. Signed-off-by: Yong Zhang <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-13sched/perf: Make sure irqs are disabled for perf_event_task_sched_in()Jamie Iles1-0/+6
perf_event_task_sched_in() expects interrupts to be disabled, but on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW defined, this isn't true. If this is defined, disable irqs around the call in finish_task_switch(). Signed-off-by: Jamie Iles <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Russell King - ARM Linux <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-13Merge branch 'perf/urgent' into perf/coreIngo Molnar1-143/+164
Merge reason: queue up dependent patch, update to -rc4 Signed-off-by: Ingo Molnar <[email protected]>
2009-12-28sched: might_sleep(): Make file parameter const char *Simon Kagstrom1-1/+1
Fixes a warning when building with g++: warning: deprecated conversion from string constant to 'char*' And the file parameter use is constant, so mark it as such. Signed-off-by: Simon Kagstrom <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-28perf events: Remove arg from perf sched hooksPeter Zijlstra1-3/+3
Since we only ever schedule the local cpu, there is no need to pass the cpu number to the perf sched hooks. This micro-optimizes things a bit. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-23sched: Revert 738d2be, simplify set_task_cpu()Peter Zijlstra1-5/+4
Effectively reverts 738d2be4301007f054541c5c4bf7fb6a361c9b3a. As demonstrated by Eric, we really need to call __set_task_cpu() early in the fork() path to properly initialize the various task state -- specifically the cgroup state through set_task_rq(). [ we could probably fix this by explicitly calling __set_task_cpu() from sched_fork(), but lets try that for the next cycle and simply revert to the old behaviour for now. ] Reported-by: Eric Paris <[email protected]> Tested-by: Eric Paris <[email protected]>, Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] LKML-Reference: <1261492999.4937.36.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-20sched: Fix hotplug hangPeter Zijlstra1-1/+1
The hot-unplug kstopmachine usage does a wakeup after deactivating the cpu, hence we cannot use cpu_active() here but must rely on the good olde online. Reported-by: Sachin Sant <[email protected]> Reported-by: Jens Axboe <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Jens Axboe <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> LKML-Reference: <1261326987.4314.24.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-20sched: Restore printk sanityPeter Zijlstra1-40/+49
Revert the braindead pr_* crap. (Commit 663997d "sched: Use pr_fmt() and pr_<level>()") It's dumb and causes stupid "sched: " strings all over the place. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Mike Galbraith <[email protected]> Cc: Joe Perches <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> LKML-Reference: <1261315437.4314.6.camel@laptop> [ i dont mind the pr_*() patterns that much - but Peter dislikes them with a vengence. ] [ - v2: remove spurious diffstat from changelog :-/ ] Signed-off-by: Ingo Molnar <[email protected]>
2009-12-17sched: Fix broken assertionPeter Zijlstra1-1/+2
There's a preemption race in the set_task_cpu() debug check in that when we get preempted after setting task->state we'd still be on the rq proper, but fail the test. Check for preempted tasks, since those are always on the RQ. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-17sched: Teach might_sleep() about preemptible RCUFrederic Weisbecker1-1/+1
In practice, it is harmless to voluntarily sleep in a rcu_read_lock() section if we are running under preempt rcu, but it is illegal if we build a kernel running non-preemptable rcu. Currently, might_sleep() doesn't notice sleepable operations under rcu_read_lock() sections if we are running under preemptable rcu because preempt_count() is left untouched after rcu_read_lock() in this case. But we want developers who test their changes under such config to notice the "sleeping while atomic" issues. So we add rcu_read_lock_nesting to prempt_count() in might_sleep() checks. [ v2: Handle rcu-tiny ] Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-17sched: Make warning less noisyIngo Molnar1-1/+1
Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Simplify set_task_cpu()Peter Zijlstra1-8/+5
Rearrange code a bit now that its a simpler function. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Remove the cfs_rq dependency from set_task_cpu()Peter Zijlstra1-5/+1
In order to remove the cfs_rq dependency from set_task_cpu() we need to ensure the task is cfs_rq invariant for all callsites. The simple approach is to substract cfs_rq->min_vruntime from se->vruntime on dequeue, and add cfs_rq->min_vruntime on enqueue. However, this has the downside of breaking FAIR_SLEEPERS since we loose the old vruntime as we only maintain the relative position. To solve this, we observe that we only migrate runnable tasks, we do this using deactivate_task(.sleep=0) and activate_task(.wakeup=0), therefore we can restrain the min_vruntime invariance to that state. The only other case is wakeup balancing, since we want to maintain the old vruntime we cannot make it relative on dequeue, but since we don't migrate inactive tasks, we can do so right before we activate it again. This is where we need the new pre-wakeup hook, we need to call this while still holding the old rq->lock. We could fold it into ->select_task_rq(), but since that has multiple callsites and would obfuscate the locking requirements, that seems like a fudge. This leaves the fork() case, simply make sure that ->task_fork() leaves the ->vruntime in a relative state. This covers all cases where set_task_cpu() gets called, and ensures it sees a relative vruntime. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Add pre and post wakeup hooksPeter Zijlstra1-4/+8
As will be apparent in the next patch, we need a pre wakeup hook for sched_fair task migration, hence rename the post wakeup hook and one pre wakeup. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Move kthread_bind() back to kthread.cPeter Zijlstra1-26/+0
Since kthread_bind() lost its dependencies on sched.c, move it back where it came from. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Fix select_task_rq() vs hotplug issuesPeter Zijlstra1-35/+40
Since select_task_rq() is now responsible for guaranteeing ->cpus_allowed and cpu_active_mask, we need to verify this. select_task_rq_rt() can blindly return smp_processor_id()/task_cpu() without checking the valid masks, select_task_rq_fair() can do the same in the rare case that all SD_flags are disabled. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Fix sched_exec() balancingPeter Zijlstra1-22/+23
Since we access ->cpus_allowed without holding rq->lock we need a retry loop to validate the result, this comes for near free when we merge sched_migrate_task() into sched_exec() since that already does the needed check. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Ensure set_task_cpu() is never called on blocked tasksPeter Zijlstra1-19/+66
In order to clean up the set_task_cpu() rq dependencies we need to ensure it is never called on blocked tasks because such usage does not pair with consistent rq->lock usage. This puts the migration burden on ttwu(). Furthermore we need to close a race against changing ->cpus_allowed, since select_task_rq() runs with only preemption disabled. For sched_fork() this is safe because the child isn't in the tasklist yet, for wakeup we fix this by synchronizing set_cpus_allowed_ptr() against TASK_WAKING, which leaves sched_exec to be a problem This also closes a hole in (6ad4c1888 sched: Fix balance vs hotplug race) where ->select_task_rq() doesn't validate the result against the sched_domain/root_domain. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Use TASK_WAKING for fork wakupsPeter Zijlstra1-9/+9
For later convenience use TASK_WAKING for fresh tasks. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16sched: Fix task_hot() test orderPeter Zijlstra1-3/+3
Make sure not to access sched_fair fields before verifying it is indeed a sched_fair task. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> CC: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-16Merge branch 'linus' into sched/urgentIngo Molnar1-114/+117
Conflicts: kernel/sched_idletask.c Merge reason: resolve the conflicts, pick up latest changes. Signed-off-by: Ingo Molnar <[email protected]>
2009-12-14sched: Convert pi_lock to raw_spinlockThomas Gleixner1-6/+6
Convert locks which cannot be sleeping locks in preempt-rt to raw_spinlocks. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2009-12-14sched: Convert rt_runtime_lock to raw_spinlockThomas Gleixner1-14/+14
Convert locks which cannot be sleeping locks in preempt-rt to raw_spinlocks. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2009-12-14sched: Convert rq->lock to raw_spinlockThomas Gleixner1-90/+93
Convert locks which cannot be sleeping locks in preempt-rt to raw_spinlocks. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2009-12-14locking: Further name space cleanupsThomas Gleixner1-1/+1
The name space hierarchy for the internal lock functions is now a bit backwards. raw_spin* functions map to _spin* which use __spin*, while we would like to have _raw_spin* and __raw_spin*. _raw_spin* is already used by lock debugging, so rename those funtions to do_raw_spin* to free up the _raw_spin* name space. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2009-12-14locking: Implement new raw_spinlockThomas Gleixner1-1/+1
Now that the raw_spin name space is freed up, we can implement raw_spinlock and the related functions which are used to annotate the locks which are not converted to sleeping spinlocks in preempt-rt. A side effect is that only such locks can be used with the low level lock fsunctions which circumvent lockdep. For !rt spin_* functions are mapped to the raw_spin* implementations. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Ingo Molnar <[email protected]>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-14sched: Use rcu in sched_get_rr_param()Thomas Gleixner1-3/+3
read_lock(&tasklist_lock) does not protect sys_sched_get_rr_param() against a concurrent update of the policy or scheduler parameters as do_sched_scheduler() does not take the tasklist_lock. The access to task->sched_class->get_rr_interval is protected by task_rq_lock(task). Use rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-14sched: Use rcu in sched_get/set_affinity()Thomas Gleixner1-10/+6
tasklist_lock is held read locked to protect the find_task_by_vpid() call and to prevent the task going away. sched_setaffinity acquires a task struct ref and drops tasklist lock right away. The access to the cpus_allowed mask is protected by rq->lock. rcu_read_lock() provides the same protection here. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-14sched: Use rcu in sys_sched_getscheduler/sys_sched_getparam()Thomas Gleixner1-5/+5
read_lock(&tasklist_lock) does not protect sys_sched_getscheduler and sys_sched_getparam() against a concurrent update of the policy or scheduler parameters as do_sched_setscheduler() does not take the tasklist_lock. The accessed integers can be retrieved w/o locking and are snapshots anyway. Using rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away is not changing the above situation. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-13sched: Use pr_fmt() and pr_<level>()Joe Perches1-52/+42
- Convert printk(KERN_<level> to pr_<level> (not KERN_DEBUG) - Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Coalesce long format strings - Add missing \n to "ERROR: !SD_LOAD_BALANCE domain has parent" Signed-off-by: Joe Perches <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-13sched: Make wakeup side and atomic variants of completion API irq safeRafael J. Wysocki1-4/+6
Alan Stern noticed that all the wakeup side (and atomic) variants of the completion APIs should be irq safe, but the newly introduced completion_done() and try_wait_for_completion() aren't. The use of the irq unsafe variants in IRQ contexts can cause crashes/hangs. Fix the problem by making them use spin_lock_irqsave() and spin_lock_irqrestore(). Reported-by: Alan Stern <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Zhang Rui <[email protected]> Cc: pm list <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: David Chinner <[email protected]> Cc: Lachlan McIlroy <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-12Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-104/+114
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits) sched: Remove forced2_migrations stats sched: Fix memory leak in two error corner cases sched: Fix build warning in get_update_sysctl_factor() sched: Update normalized values on user updates via proc sched: Make tunable scaling style configurable sched: Fix missing sched tunable recalculation on cpu add/remove sched: Fix task priority bug sched: cgroup: Implement different treatment for idle shares sched: Remove unnecessary RCU exclusion sched: Discard some old bits sched: Clean up check_preempt_wakeup() sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call sched: Sanitize fork() handling sched: Clean up ttwu() rq locking sched: Remove rq->clock coupling from set_task_cpu() sched: Consolidate select_task_rq() callers sched: Remove sysctl.sched_features sched: Protect sched_rr_get_param() access to task->sched_class sched: Protect task->cpus_allowed access in sched_getaffinity() sched: Fix balance vs hotplug race ... Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)
2009-12-10sched: Remove forced2_migrations statsIngo Molnar1-6/+0
This build warning: kernel/sched.c: In function 'set_task_cpu': kernel/sched.c:2070: warning: unused variable 'old_rq' Made me realize that the forced2_migrations stat looks pretty pointless (and a misnomer) - remove it. Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-10sched: Fix memory leak in two error corner casesPhil Carmody1-2/+6
If the second in each of these pairs of allocations fails, then the first one will not be freed in the error route out. Found by a static code analysis tool. Signed-off-by: Phil Carmody <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-12-10sched: Fix build warning in get_update_sysctl_factor()Mike Galbraith1-1/+1
Signed-off-by: Mike Galbraith <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> LKML-Reference: <new-submission>
2009-12-09sched: Update normalized values on user updates via procChristian Ehrhardt1-2/+10
The normalized values are also recalculated in case the scaling factor changes. This patch updates the internally used scheduler tuning values that are normalized to one cpu in case a user sets new values via sysfs. Together with patch 2 of this series this allows to let user configured values scale (or not) to cpu add/remove events taking place later. Signed-off-by: Christian Ehrhardt <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> [ v2: fix warning ] Signed-off-by: Ingo Molnar <[email protected]>