aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched.c
AgeCommit message (Collapse)AuthorFilesLines
2009-03-04Merge commit 'v2.6.29-rc7' into perfcounters/coreIngo Molnar1-3/+12
Conflicts: arch/x86/mm/iomap_32.c
2009-03-04Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/coreIngo Molnar1-3/+12
2009-03-02sched: kill unused parameter of pick_next_task()Wang Chen1-3/+3
Impact: micro-optimization Parameter "prev" is not used really. Signed-off-by: Wang Chen <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-02Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/coreIngo Molnar1-3/+12
2009-02-27sched: don't allow setuid to succeed if the user does not have rt bandwidthDhaval Giani1-2/+11
Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: Corey Hickey <[email protected]> Signed-off-by: Dhaval Giani <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-26sched_rt: don't start timer when rt bandwidth disabledHiroshi Shimamoto1-1/+1
Impact: fix incorrect condition check No need to start rt bandwidth timer when rt bandwidth is disabled. If this timer starts, it may stop at sched_rt_period_timer() on the first time. Signed-off-by: Hiroshi Shimamoto <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-26cpuacct: add a branch predictionLi Zefan1-1/+1
cpuacct_charge() is in fast-path, and checking of !cpuacct_susys.active always returns false after cpuacct has been initialized at system boot. Signed-off-by: Li Zefan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Menage <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-26Merge branch 'x86/core' into perfcounters/coreIngo Molnar1-3/+12
Conflicts: arch/x86/kernel/apic/apic.c arch/x86/kernel/irqinit_32.c Signed-off-by: Ingo Molnar <[email protected]>
2009-02-25generic-ipi: remove CSD_FLAG_WAITPeter Zijlstra1-1/+1
Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework the code so that we can use CSD_FLAG_LOCK for both purposes. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-24Merge branch 'tj-percpu' of ↵Ingo Molnar1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu Conflicts: arch/x86/include/asm/pgtable.h
2009-02-22Merge branch 'linus' into x86/apicIngo Molnar1-3/+12
Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <[email protected]>
2009-02-20alloc_percpu: change percpu_ptr to per_cpu_ptrRusty Russell1-3/+3
Impact: cleanup There are two allocated per-cpu accessor macros with almost identical spelling. The original and far more popular is per_cpu_ptr (44 files), so change over the other 4 files. tj: kill percpu_ptr() and update UP too Signed-off-by: Rusty Russell <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Tejun Heo <[email protected]>
2009-02-19Merge branch 'linus' into tracing/blktraceIngo Molnar1-3/+12
Conflicts: block/blktrace.c Semantic merge: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <[email protected]>
2009-02-16sched: use TASK_NICE for task_structAmérico Wang1-1/+1
#define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio) So it's better to use TASK_NICE here. Signed-off-by: WANG Cong <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-15sched: idle_at_tick is only used when CONFIG_SMP is setHenrik Austad1-1/+1
Impact: struct rq size optimization The idle_at_tick in struct rq is only used in SMP settings and it does not make sense to have this in the rq in an UP setup. Signed-off-by: Henrik Austad <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-15Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/coreIngo Molnar1-13/+12
2009-02-13Merge branches 'tracing/ftrace', 'tracing/ring-buffer', 'tracing/sysprof', ↵Ingo Molnar1-16/+11
'tracing/urgent' and 'linus' into tracing/core
2009-02-13Merge branch 'linus' into x86/apicIngo Molnar1-18/+13
Conflicts: arch/x86/kernel/acpi/boot.c arch/x86/mm/fault.c
2009-02-13Merge branch 'linus' into perfcounters/coreIngo Molnar1-16/+11
Conflicts: arch/x86/kernel/acpi/boot.c
2009-02-12sched: cpu hotplug fixIngo Molnar1-3/+12
rq_attach_root() does a kfree() with the runqueue lock held. That's not a very wise move, fix it. Signed-off-by: Ingo Molnar <[email protected]>
2009-02-11Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: revert recent sync wakeup changes
2009-02-11Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: fix TIMER_ABSTIME for process wide cpu timers timers: split process wide cpu clocks/timers, fix x86: clean up hpet timer reinit timers: split process wide cpu clocks/timers, remove spurious warning timers: split process wide cpu clocks/timers signal: re-add dead task accumulation stats. x86: fix hpet timer reinit for x86_64 sched: fix nohz load balancer on cpu offline
2009-02-11sched: revert recent sync wakeup changesPeter Zijlstra1-10/+0
Intel reported a 10% regression (mysql+sysbench) on a 16-way machine with these patches: 1596e29: sched: symmetric sync vs avg_overlap d942fb6: sched: fix sync wakeups Revert them. Reported-by: "Zhang, Yanmin" <[email protected]> Bisected-by: Lin Ming <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-11Merge commit 'v2.6.29-rc4' into sched/coreIngo Molnar1-2/+2
2009-02-11Merge commit 'v2.6.29-rc4' into perfcounters/coreIngo Molnar1-2/+12
Conflicts: arch/x86/kernel/setup_percpu.c arch/x86/mm/fault.c drivers/acpi/processor_idle.c kernel/irq/handle.c
2009-02-09Merge commit 'v2.6.29-rc4' into core/percpuIngo Molnar1-2/+12
Conflicts: arch/x86/mach-voyager/voyager_smp.c arch/x86/mm/fault.c
2009-02-09perf_counters: make software counters work as per-cpu countersPaul Mackerras1-0/+17
Impact: kernel crash fix Yanmin Zhang reported that using a PERF_COUNT_TASK_CLOCK software counter as a per-cpu counter would reliably crash the system, because it calls __task_delta_exec with a null pointer. The page fault, context switch and cpu migration counters also won't function correctly as per-cpu counters since they reference the current task. This fixes the problem by redirecting the task_clock counter to the cpu_clock counter when used as a per-cpu counter, and by implementing per-cpu page fault, context switch and cpu migration counters. Along the way, this: - Initializes counter->ctx earlier, in perf_counter_alloc, so that sw_perf_counter_init can use it - Adds code to kernel/sched.c to count task migrations into each cpu, in rq->nr_migrations_in - Exports the per-cpu context switch and task migration counts via new functions added to kernel/sched.c - Makes sure that if sw_perf_counter_init fails, we don't try to initialize the counter as a hardware counter. Since the user has passed a negative, non-raw event type, they clearly don't intend for it to be interpreted as a hardware event. Reported-by: "Zhang Yanmin" <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-09Merge commit 'v2.6.29-rc4' into tracing/coreIngo Molnar1-2/+2
2009-02-08Merge branches 'sched/rt' and 'sched/urgent' into sched/coreIngo Molnar1-35/+128
2009-02-07Merge branch 'linus' into core/lockingIngo Molnar1-20/+37
Conflicts: fs/btrfs/locking.c
2009-02-05Merge branch 'x86/urgent' into x86/apicIngo Molnar1-0/+10
Conflicts: arch/x86/mach-default/setup.c Semantic merge: arch/x86/kernel/irqinit_32.c Signed-off-by: Ingo Molnar <[email protected]>
2009-02-05wait: prevent exclusive waiter starvationJohannes Weiner1-2/+2
With exclusive waiters, every process woken up through the wait queue must ensure that the next waiter down the line is woken when it has finished. Interruptible waiters don't do that when aborting due to a signal. And if an aborting waiter is concurrently woken up through the waitqueue, noone will ever wake up the next waiter. This has been observed with __wait_on_bit_lock() used by lock_page_killable(): the first contender on the queue was aborting when the actual lock holder woke it up concurrently. The aborted contender didn't acquire the lock and therefor never did an unlock followed by waking up the next waiter. Add abort_exclusive_wait() which removes the process' wait descriptor from the waitqueue, iff still queued, or wakes up the next waiter otherwise. It does so under the waitqueue lock. Racing with a wake up means the aborting process is either already woken (removed from the queue) and will wake up the next waiter, or it will remove itself from the queue and the concurrent wake up will apply to the next waiter after it. Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and __wait_on_bit_lock() when they were interrupted by other means than a wake up through the queue. [[email protected]: coding-style fixes] Reported-by: Chris Mason <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Mentored-by: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> ["after some testing"] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-02-04sched: fix nohz load balancer on cpu offlineSuresh Siddha1-6/+11
Christian Borntraeger reports: > After a logical cpu offline, even on a complete idle system, there > is one cpu with full ticks. It turns out that nohz.cpu_mask has the > the offlined cpu still set. > > In select_nohz_load_balancer() we check if the system is completely > idle to turn of load balancing. We compare cpu_online_map with > nohz.cpu_mask. Since cpu_online_map is updated on cpu unplug, > but nohz.cpu_mask is not, the check fails and the scheduler believes > that we need an "idle load balancer" even on a fully idle system. > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ. Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask while a cpu is going offline. Reported-by: Christian Borntraeger <[email protected]> Signed-off-by: Suresh Siddha <[email protected]> Tested-by: Christian Borntraeger <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-03Merge branches 'tracing/ftrace', 'tracing/kmemtrace' and 'linus' into ↵Ingo Molnar1-0/+10
tracing/core
2009-02-01sched: symmetric sync vs avg_overlapPeter Zijlstra1-3/+9
Reinstate the weakening of the sync hint if set. This yields a more symmetric usage of avg_overlap. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-02-01sched: fix sync wakeupsPeter Zijlstra1-0/+4
Pawel Dziekonski reported that the openssl benchmark and his quantum chemistry application both show slowdowns due to the scheduler under-parallelizing execution. The reason are pipe wakeups still doing 'sync' wakeups which overrides the normal buddy wakeup logic - even if waker and wakee are loosely coupled. Fix an inversion of logic in the buddy wakeup code. Reported-by: Pawel Dziekonski <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-23trace, lockdep: manual preempt count adding for local_bh_disableSteven Rostedt1-4/+4
Impact: fix to preempt trace triggering lockdep check_flag failure In local_bh_disable, the use of add_preempt_count causes the preempt tracer to start recording the time preemption is off. But because it already modified the preempt_count to show softirqs disabled, and before it called the lockdep code to handle this, it causes a state that lockdep can not handle. The preempt tracer will reset the ring buffer on start of a trace, and the ring buffer reset code does a spin_lock_irqsave. This calls into lockdep and lockdep will fail when it detects the invalid state of having softirqs disabled but the internal current->softirqs_enabled is still set. The fix is to manually add the SOFTIRQ_OFFSET to preempt count and call the preempt tracer code outside the lockdep critical area. Thanks to Peter Zijlstra for suggesting this solution. Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-23Merge branch 'core/percpu' into perfcounters/coreIngo Molnar1-6/+1
Conflicts: arch/x86/include/asm/hardirq_32.h arch/x86/include/asm/hardirq_64.h Semantic merge: arch/x86/include/asm/hardirq.h [ added apic_perf_irqs field. ] Signed-off-by: Ingo Molnar <[email protected]>
2009-01-21Merge commit 'v2.6.29-rc2' into perfcounters/coreIngo Molnar1-18/+25
Conflicts: include/linux/syscalls.h
2009-01-21Merge branch 'x86/mm' into core/percpuIngo Molnar1-18/+25
Conflicts: arch/x86/mm/fault.c
2009-01-18Merge branch 'core/percpu' into stackprotectorIngo Molnar1-479/+639
Conflicts: arch/x86/include/asm/pda.h arch/x86/include/asm/system.h Also, moved include/asm-x86/stackprotector.h to arch/x86/include/asm. Signed-off-by: Ingo Molnar <[email protected]>
2009-01-18Merge branch 'core/percpu' into perfcounters/coreIngo Molnar1-5/+8
Conflicts: arch/x86/include/asm/pda.h We merge tip/core/percpu into tip/perfcounters/core because of a semantic and contextual conflict: the former eliminates the PDA, while the latter extends it with apic_perf_irqs field. Resolve the conflict by moving the new field to the irq_cpustat structure on 64-bit too. Signed-off-by: Ingo Molnar <[email protected]>
2009-01-15Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: sched_slice() fixlet sched: fix update_min_vruntime sched: SCHED_OTHER vs SCHED_IDLE isolation sched: SCHED_IDLE weight change sched: fix bandwidth validation for UID grouping Revert "sched: improve preempt debugging"
2009-01-15sched: SCHED_IDLE weight changePeter Zijlstra1-2/+2
Increase the SCHED_IDLE weight from 2 to 3, this gives much more stable vruntime numbers. time advanced in 100ms: weight=2 64765.988352 67012.881408 88501.412352 weight=3 35496.181411 34130.971298 35497.411573 Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-15sched: fix bandwidth validation for UID groupingPeter Zijlstra1-0/+7
Impact: make rt-limit tunables work again Mark Glines reported: > I've got an issue on x86-64 where I can't configure the system to allow > RT tasks for a non-root user. > > In 2.6.26.5, I was able to do the following to set things up nicely: > echo 450000 >/sys/kernel/uids/0/cpu_rt_runtime > echo 450000 >/sys/kernel/uids/1000/cpu_rt_runtime > > Seems like every value I try to echo into the /sys files returns EINVAL. For UID grouping we initialize the root group with infinite bandwidth which by default is actually more than the global limit, therefore the bandwidth check always fails. Because the root group is a phantom group (for UID grouping) we cannot runtime adjust it, therefore we let it reflect the global bandwidth settings. Reported-by: Mark Glines <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-15sched: introduce avg_wakeupPeter Zijlstra1-6/+30
Introduce a new avg_wakeup statistic. avg_wakeup is a measure of how frequently a task wakes up other tasks, it represents the average time between wakeups, with a limit of avg_runtime for when it doesn't wake up anybody. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-14mutex: implement adaptive spinningPeter Zijlstra1-0/+61
Change mutex contention behaviour such that it will sometimes busy wait on acquisition - moving its behaviour closer to that of spinlocks. This concept got ported to mainline from the -rt tree, where it was originally implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins. Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50) gave a 345% boost for VFS scalability on my testbox: # ./test-mutex-shm V 16 10 | grep "^avg ops" avg ops/sec: 296604 # ./test-mutex-shm V 16 10 | grep "^avg ops" avg ops/sec: 85870 The key criteria for the busy wait is that the lock owner has to be running on a (different) cpu. The idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus we'll be better off spinning instead of blocking/scheduling. Since regular mutexes (as opposed to rtmutexes) do not atomically track the owner, we add the owner in a non-atomic fashion and deal with the races in the slowpath. Furthermore, to ease the testing of the performance impact of this new code, there is means to disable this behaviour runtime (without having to reboot the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y), by issuing the following command: # echo NO_OWNER_SPIN > /debug/sched_features This command re-enables spinning again (this is also the default): # echo OWNER_SPIN > /debug/sched_features Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-14mutex: preemption fixesPeter Zijlstra1-3/+7
The problem is that dropping the spinlock right before schedule is a voluntary preemption point and can cause a schedule, right after which we schedule again. Fix this inefficiency by keeping preemption disabled until we schedule, do this by explicity disabling preemption and providing a schedule() variant that assumes preemption is already disabled. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-01-14sched: fix build error in kernel/sched_rt.c when RT_GROUP_SCHED && !SMPGregory Haskins1-0/+4
Ingo found a build error in the scheduler when RT_GROUP_SCHED was enabled, but SMP was not. This patch rearranges the code such that it is a little more streamlined and compiles under all permutations of SMP, UP and RT_GROUP_SCHED. It was boot tested on my 4-way x86_64 and it still passes preempt-test. Signed-off-by: Gregory Haskins <[email protected]>
2009-01-14[CVE-2009-0029] System call wrappers part 08Heiko Carstens1-1/+1
Signed-off-by: Heiko Carstens <[email protected]>