aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched.c
AgeCommit message (Collapse)AuthorFilesLines
2009-06-17sched: Add SCHED_RESET_ON_FORK functionality for nice < 0 tasksMike Galbraith1-0/+5
Signed-off-by: Mike Galbraith <[email protected]> Acked-by: Lennart Poettering <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-17sched: Clean up SCHED_RESET_ON_FORKMike Galbraith1-16/+18
Make SCHED_RESET_ON_FORK sched_fork() bits a self-contained unlikely code path. Signed-off-by: Mike Galbraith <[email protected]> Acked-by: Lennart Poettering <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-17sched: Remove unneeded __ref tagLi Zefan1-1/+1
Those two functions no longer call alloc_bootmmem_cpumask_var(), so no need to tag them with __init_refok. Signed-off-by: Li Zefan <[email protected]> Acked-by: Pekka Enberg <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-15Merge branch 'timers-for-linus-migration' of ↵Linus Torvalds1-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually
2009-06-15sched: Introduce SCHED_RESET_ON_FORK scheduling policy flagLennart Poettering1-9/+40
This patch introduces a new flag SCHED_RESET_ON_FORK which can be passed to the kernel via sched_setscheduler(), ORed in the policy parameter. If set this will make sure that when the process forks a) the scheduling priority is reset to DEFAULT_PRIO if it was higher and b) the scheduling policy is reset to SCHED_NORMAL if it was either SCHED_FIFO or SCHED_RR. Why have this? Currently, if a process is real-time scheduled this will 'leak' to all its child processes. For security reasons it is often (always?) a good idea to make sure that if a process acquires RT scheduling this is confined to this process and only this process. More specifically this makes the per-process resource limit RLIMIT_RTTIME useful for security purposes, because it makes it impossible to use a fork bomb to circumvent the per-process RLIMIT_RTTIME accounting. This feature is also useful for tools like 'renice' which can then change the nice level of a process without having this spill to all its child processes. Why expose this via sched_setscheduler() and not other syscalls such as prctl() or sched_setparam()? prctl() does not take a pid parameter. Due to that it would be impossible to modify this flag for other processes than the current one. The struct passed to sched_setparam() can unfortunately not be extended without breaking compatibility, since sched_setparam() lacks a size parameter. How to use this from userspace? In your RT program simply replace this: sched_setscheduler(pid, SCHED_FIFO, &param); by this: sched_setscheduler(pid, SCHED_FIFO|SCHED_RESET_ON_FORK, &param); Signed-off-by: Lennart Poettering <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-12sched: export kick_processRusty Russell1-0/+1
lguest needs kick_process: wake_up_process() does nothing if a process is running, which isn't sufficient (we need it in the kernel). And lguest support is usually modular. Signed-off-by: Rusty Russell <[email protected]> Cc: Ingo Molnar <[email protected]>
2009-06-11Merge branch 'perfcounters-for-linus' of ↵Linus Torvalds1-6/+51
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits) perf_counter: Turn off by default perf_counter: Add counter->id to the throttle event perf_counter: Better align code perf_counter: Rename L2 to LL cache perf_counter: Standardize event names perf_counter: Rename enums perf_counter tools: Clean up u64 usage perf_counter: Rename perf_counter_limit sysctl perf_counter: More paranoia settings perf_counter: powerpc: Implement generalized cache events for POWER processors perf_counters: powerpc: Add support for POWER7 processors perf_counter: Accurate period data perf_counter: Introduce struct for sample data perf_counter tools: Normalize data using per sample period data perf_counter: Annotate exit ctx recursion perf_counter tools: Propagate signals properly perf_counter tools: Small frequency related fixes perf_counter: More aggressive frequency adjustment perf_counter/x86: Fix the model number of Intel Core2 processors perf_counter, x86: Correct some event and umask values for Intel processors ...
2009-06-11sched: use slab in cpupri_init()Pekka Enberg1-1/+1
Lets not use the bootmem allocator in cpupri_init() as slab is already up when it is run. Cc: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2009-06-11sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()Pekka Enberg1-4/+4
Slab is initialized when sched_init() runs now so lets use alloc_cpumask_var(). Cc: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2009-06-11sched: use kzalloc() instead of the bootmem allocatorPekka Enberg1-12/+8
Now that kmem_cache_init() happens before sched_init(), we should use kzalloc() and not the bootmem allocator. Signed-off-by: Pekka Enberg <[email protected]>
2009-06-11Merge branch 'linus' into perfcounters/coreIngo Molnar1-44/+315
Conflicts: arch/x86/kernel/irqinit.c arch/x86/kernel/irqinit_64.c arch/x86/kernel/traps.c arch/x86/mm/fault.c include/linux/sched.h kernel/exit.c
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds1-8/+47
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-10Merge branch 'x86-xen-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...
2009-06-10Merge branch 'sched-docs-for-linus' of ↵Linus Torvalds1-0/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-docs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Document memory barriers implied by sleep/wake-up primitives
2009-06-02perf_counter: Fix cpu migration counterPaul Mackerras1-0/+1
This fixes the cpu migration software counter to count correctly even when contexts get swapped from one task to another. Previously the cpu migration counts reported by perf stat were bogus, ranging from negative to several thousand for a single "lat_ctx 2 8 32" run. With this patch the cpu migration count reported for "lat_ctx 2 8 32" is almost always between 35 and 44. This fixes the problem by adding a call into the perf_counter code from set_task_cpu when tasks are migrated. This enables us to use the generic swcounter code (with some modifications) for the cpu migration counter. This modifies the swcounter code to allow a NULL regs pointer to be passed in to perf_swcounter_ctx_event() etc. The cpu migration counter does this because there isn't necessarily a pt_regs struct for the task available. In this case, the counter will not have interrupt capability - but the migration counter didn't have interrupt capability before, so this is no loss. Signed-off-by: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: John Kacur <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-06-02perf_counter: Initialize per-cpu context earlier on cpu upPaul Mackerras1-2/+4
This arranges for perf_counter's notifier for cpu hotplug operations to be called earlier than the migration notifier in sched.c by increasing its priority to 20, compared to the 10 for the migration notifier. The reason for doing this is that a subsequent commit to convert the cpu migration counter to use the generic swcounter infrastructure will add a call into the perf_counter subsystem when tasks get migrated. Therefore the perf_counter subsystem needs a chance to initialize its per-cpu data for the new cpu before it can get called from the migration code. This also adds a comment to the migration notifier noting that its priority needs to be lower than that of the perf_counter notifier. Signed-off-by: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-23perf_counter: Fix dynamic irq_period loggingPeter Zijlstra1-1/+2
We call perf_adjust_freq() from perf_counter_task_tick() which is is called under the rq->lock causing lock recursion. However, it's no longer required to be called under the rq->lock, so remove it from under it. Also, fix up some related comments. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: John Kacur <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-22perf_counter: Optimize context switch between identical inherited contextsPaul Mackerras1-1/+1
When monitoring a process and its descendants with a set of inherited counters, we can often get the situation in a context switch where both the old (outgoing) and new (incoming) process have the same set of counters, and their values are ultimately going to be added together. In that situation it doesn't matter which set of counters are used to count the activity for the new process, so there is really no need to go through the process of reading the hardware counters and updating the old task's counters and then setting up the PMU for the new task. This optimizes the context switch in this situation. Instead of scheduling out the perf_counter_context for the old task and scheduling in the new context, we simply transfer the old context to the new task and keep using it without interruption. The new context gets transferred to the old task. This means that both tasks still have a valid perf_counter_context, so no special case is introduced when the old task gets scheduled in again, either on this CPU or another CPU. The equivalence of contexts is detected by keeping a pointer in each cloned context pointing to the context it was cloned from. To cope with the situation where a context is changed by adding or removing counters after it has been cloned, we also keep a generation number on each context which is incremented every time a context is changed. When a context is cloned we take a copy of the parent's generation number, and two cloned contexts are equivalent only if they have the same parent and the same generation number. In order that the parent context pointer remains valid (and is not reused), we increment the parent context's reference count for each context cloned from it. Since we don't have individual fds for the counters in a cloned context, the only thing that can make two clones of a given parent different after they have been cloned is enabling or disabling all counters with prctl. To account for this, we keep a count of the number of enabled counters in each context. Two contexts must have the same number of enabled counters to be considered equivalent. Here are some measurements of the context switch time as measured with the lat_ctx benchmark from lmbench, comparing the times obtained with and without this patch series: -----Unmodified----- With this patch series Counters: none 2 HW 4H+4S none 2 HW 4H+4S 2 processes: Average 3.44 6.45 11.24 3.12 3.39 3.60 St dev 0.04 0.04 0.13 0.05 0.17 0.19 8 processes: Average 6.45 8.79 14.00 5.57 6.23 7.57 St dev 1.27 1.04 0.88 1.42 1.46 1.42 32 processes: Average 5.56 8.43 13.78 5.28 5.55 7.15 St dev 0.41 0.47 0.53 0.54 0.57 0.81 The numbers are the mean and standard deviation of 20 runs of lat_ctx. The "none" columns are lat_ctx run directly without any counters. The "2 HW" columns are with lat_ctx run under perfstat, counting cycles and instructions. The "4H+4S" columns are lat_ctx run under perfstat with 4 hardware counters and 4 software counters (cycles, instructions, cache references, cache misses, task clock, context switch, cpu migrations, and page faults). [ Impact: performance optimization of counter context-switches ] Signed-off-by: Paul Mackerras <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Corey Ashford <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-19sched: properly define the sched_group::cpumask and sched_domain::span fieldsIngo Molnar1-2/+3
Properly document the variable-size structure tricks we are doing wrt. struct sched_group and sched_domain, and use the field[0] GCC extension instead of defining a vla array. Dont use unions for this, as pointed out by Linus. [ Impact: cleanup, un-confuse Sparse and LLVM ] Reported-by: Jeff Garzik <[email protected]> Acked-by: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-18Merge commit 'v2.6.30-rc6' into perfcounters/coreIngo Molnar1-1/+1
Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <[email protected]>
2009-05-15sched, timers: cleanup avenrun usersThomas Gleixner1-0/+15
avenrun is an rough estimate so we don't have to worry about consistency of the three avenrun values. Remove the xtime lock dependency and provide a function to scale the values. Cleanup the users. [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-05-15sched, timers: move calc_load() to schedulerThomas Gleixner1-10/+74
Dimitri Sivanich noticed that xtime_lock is held write locked across calc_load() which iterates over all online CPUs. That can cause long latencies for xtime_lock readers on large SMP systems. The load average calculation is an rough estimate anyway so there is no real need to protect the readers vs. the update. It's not a problem when the avenrun array is updated while a reader copies the values. Instead of iterating over all online CPUs let the scheduler_tick code update the number of active tasks shortly before the avenrun update happens. The avenrun update itself is handled by the CPU which calls do_timer(). [ Impact: reduce xtime_lock write locked section ] Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-05-13timers: Logic to move non pinned timersArun R Bharadwaj1-0/+5
* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]: This patch migrates all non pinned timers and hrtimers to the current idle load balancer, from all the idle CPUs. Timers firing on busy CPUs are not migrated. While migrating hrtimers, care should be taken to check if migrating a hrtimer would result in a latency or not. So we compare the expiry of the hrtimer with the next timer interrupt on the target cpu and migrate the hrtimer only if it expires *after* the next interrupt on the target cpu. So, added a clockevents_get_next_event() helper function to return the next_event on the target cpu's clock_event_device. [ tglx: cleanups and simplifications ] Signed-off-by: Arun R Bharadwaj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2009-05-13timers: /proc/sys sysctl hook to enable timer migrationArun R Bharadwaj1-0/+2
* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]: This patch creates the /proc/sys sysctl interface at /proc/sys/kernel/timer_migration Timer migration is enabled by default. To disable timer migration, when CONFIG_SCHED_DEBUG = y, echo 0 > /proc/sys/kernel/timer_migration Signed-off-by: Arun R Bharadwaj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2009-05-13timers: Identifying the existing pinned timersArun R Bharadwaj1-2/+2
* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]: The following pinned hrtimers have been identified and marked: 1)sched_rt_period_timer 2)tick_sched_timer 3)stack_trace_timer_fn [ tglx: fixup the hrtimer pinned mode ] Signed-off-by: Arun R Bharadwaj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2009-05-11Merge commit 'v2.6.30-rc5' into sched/coreIngo Molnar1-4/+8
Merge reason: sched/core was on .30-rc1 before, update to latest fixes Signed-off-by: Ingo Molnar <[email protected]>
2009-05-08Merge branch 'x86/urgent' into x86/xenIngo Molnar1-16/+150
Conflicts: arch/frv/include/asm/pgtable.h arch/x86/include/asm/required-features.h arch/x86/xen/mmu.c Merge reason: x86/xen was on a .29 base still, move it to a fresher branch and pick up Xen fixes as well, plus resolve conflicts Signed-off-by: Ingo Molnar <[email protected]>
2009-05-07Merge branch 'tracing/hw-branch-tracing' into tracing/coreIngo Molnar1-0/+43
Merge reason: this topic is ready for upstream now. It passed Oleg's review and Andrew had no further mm/* objections/observations either. Signed-off-by: Ingo Molnar <[email protected]>
2009-05-07Merge branch 'linus' into tracing/coreIngo Molnar1-4/+8
Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar <[email protected]>
2009-05-07sched: emit thread info flags with stack traceDavid Rientjes1-2/+3
When a thread is oom killed and fails to exit, it's helpful to know which threads have access to memory reserves if the machine livelocks. This is done by testing for the TIF_MEMDIE thread info flag and should be displayed alongside stack traces to identify tasks that have access to such reserves but are still stuck allocating pages, for instance. It would probably be helpful in other cases as well, so all thread info flags are emitted when showing a task. ( v2: fix warning reported by Stephen Rothwell ) [ Impact: extend debug printout info ] Signed-off-by: David Rientjes <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Rothwell <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-06tracepoint: trace_sched_migrate_task(): remove parameterMathieu Desnoyers1-1/+1
The orig_cpu parameter in trace_sched_migrate_task() is not necessary, it can be got by using task_cpu(p) in the probe. [ Impact: micro-optimization ] Signed-off-by: Mathieu Desnoyers <[email protected]> [ modified from Mathieu's patch. The original patch is at: http://marc.info/?l=linux-kernel&m=123791201716239&w=2 ] Signed-off-by: Xiao Guangrong <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Li Zefan <[email protected]> Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-05sched: rt: document the risk of small values in the bandwidth settingsPeter Zijlstra1-0/+7
Thomas noted that we should disallow sysctl_sched_rt_runtime == 0 for (!RT_GROUP) since the root group always has some RT tasks in it. Further, update the documentation to inspire clue. [ Impact: exclude corner-case sysctl_sched_rt_runtime value ] Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-05-04perf_counter: initialize the per-cpu context earlierIngo Molnar1-1/+4
percpu scheduling for perfcounters wants to take the context lock, but that lock first needs to be initialized. Currently it is an early_initcall() - but that is too late, the task tick runs much sooner than that. Call it explicitly from the scheduler init sequence instead. [ Impact: fix access-before-init crash ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-29sched: account system time properlyEric Dumazet1-1/+1
Andrew Gallatin reported that IRQ and SOFTIRQ times were sometime not reported correctly on recent kernels, and even bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 ([PATCH] fix scaled & unscaled cputime accounting) as the first bad commit. Further analysis pointed that commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime accounting) was the real cause of the problem. account_process_tick() was not taking into account timer IRQ interrupting the idle task servicing a hard or soft irq. On mostly idle cpu, irqs were thus not accounted and top or mpstat could tell user/admin that cpu was 100 % idle, 0.00 % irq, 0.00 % softirq, while it was not. [ Impact: fix occasionally incorrect CPU statistics in top/mpstat ] Reported-by: Andrew Gallatin <[email protected]> Re-reported-by: Andrew Morton <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Martin Schwidefsky <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-29Merge branch 'linus' into perfcounters/coreIngo Molnar1-15/+149
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <[email protected]>
2009-04-29sched: Document memory barriers implied by sleep/wake-up primitivesDavid Howells1-0/+23
Add a section to the memory barriers document to note the implied memory barriers of sleep primitives (set_current_state() and wrappers) and wake-up primitives (wake_up() and co.). Also extend the in-code comments on the wake_up() functions to note these implied barriers. [ Impact: add documentation ] Signed-off-by: David Howells <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-24Merge commit 'v2.6.30-rc3' into tracing/hw-branch-tracingIngo Molnar1-15/+149
Conflicts: arch/x86/kernel/ptrace.c Merge reason: fix the conflict above, and also pick up the CONFIG_BROKEN dependency change from upstream so that we can remove it here. Signed-off-by: Ingo Molnar <[email protected]>
2009-04-21sched: Replace first_cpu() with cpumask_first() in ILB nomination codeGautham R Shenoy1-1/+1
Stephen Rothwell reported this build warning: > kernel/sched.c: In function 'find_new_ilb': > kernel/sched.c:4355: warning: passing argument 1 of '__first_cpu' from incompatible pointer type > > Possibly caused by commit f711f6090a81cbd396b63de90f415d33f563af9b > ("sched: Nominate idle load balancer from a semi-idle package") from > the sched tree. Should this call to first_cpu be cpumask_first? For !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT), find_new_ilb() nominates the Idle load balancer as the first cpu from the nohz.cpu_mask. This code uses the older API first_cpu(). Replace it with cpumask_first(), which is the correct API here. [ Impact: cleanup, address build warning ] Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Cc: Rusty Russell <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-20sched: remove extra call overhead for schedule()Peter Zijlstra1-8/+4
Lai Jiangshan's patch reminded me that I promised Nick to remove that extra call overhead in schedule(). Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-20perfcounters, sched: remove __task_delta_exec()Ingo Molnar1-23/+0
This function was left orphan by the latest round of sw-counter cleanups. [ Impact: remove unused kernel function ] Signed-off-by: Ingo Molnar <[email protected]>
2009-04-17sched: Avoid printing sched_group::__cpu_power for default caseGautham R Shenoy1-2/+6
Commit 46e0bb9c12f4 ("sched: Print sched_group::__cpu_power in sched_domain_debug") produces a messy dmesg output while attempting to print the sched_group::__cpu_power for each group in the sched_domain hierarchy. Fix this by avoid printing the __cpu_power for default cases. (i.e, __cpu_power == SCHED_LOAD_SCALE). [ Impact: reduce syslog clutter ] Reported-by: Tony Luck <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Fixed-by: Tony Luck <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-15sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus())Miao Xie1-2/+2
Impact: cleanup This patch changes cpumask_first(sched_group_cpus()) to group_first_cpu() for maintainability. Signed-off-by: Miao Xie <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-14tracing/events: move trace point headers into include/trace/eventsSteven Rostedt1-1/+1
Impact: clean up Create a sub directory in include/trace called events to keep the trace point headers in their own separate directory. Only headers that declare trace points should be defined in this directory. Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Neil Horman <[email protected]> Cc: Zhao Lei <[email protected]> Cc: Eduard - Gabriel Munteanu <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2009-04-14tracing: create automated trace definesSteven Rostedt1-7/+3
This patch lowers the number of places a developer must modify to add new tracepoints. The current method to add a new tracepoint into an existing system is to write the trace point macro in the trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or DECLARE_TRACE, then they must add the same named item into the C file with the macro DEFINE_TRACE(name) and then add the trace point. This change cuts out the needing to add the DEFINE_TRACE(name). Every file that uses the tracepoint must still include the trace/<type>.h file, but the one C file must also add a define before the including of that file. #define CREATE_TRACE_POINTS #include <trace/mytrace.h> This will cause the trace/mytrace.h file to also produce the C code necessary to implement the trace point. Note, if more than one trace/<type>.h is used to create the C code it is best to list them all together. #define CREATE_TRACE_POINTS #include <trace/foo.h> #include <trace/bar.h> #include <trace/fido.h> Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with the cleaner solution of the define above the includes over my first design to have the C code include a "special" header. This patch converts sched, irq and lockdep and skb to use this new method. Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Neil Horman <[email protected]> Cc: Zhao Lei <[email protected]> Cc: Eduard - Gabriel Munteanu <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2009-04-14wait: don't use __wake_up_common()Johannes Weiner1-1/+1
'777c6c5 wait: prevent exclusive waiter starvation' made __wake_up_common() global to be used from abort_exclusive_wait(). It was needed to do a wake-up with the waitqueue lock held while passing down a key to the wake-up function. Since '4ede816 epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key()' there is an appropriate wrapper for this case: __wake_up_locked_key(). Use it here and make __wake_up_common() private to the scheduler again. Signed-off-by: Johannes Weiner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-14sched: Nominate a power-efficient ilb in select_nohz_balancer()Gautham R Shenoy1-1/+17
The CPU that first goes idle becomes the idle-load-balancer and remains that until either it picks up a task or till all the CPUs of the system goes idle. Optimize this further to allow it to relinquish it's post once all it's siblings in the power-aware sched_domain go idle, thereby allowing the whole package-core to go idle. While relinquising the post, nominate another an idle-load balancer from a semi-idle core/package. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-14sched: Nominate idle load balancer from a semi-idle package.Gautham R Shenoy1-9/+118
Currently the nomination of idle-load balancer is done by choosing the first idle cpu in the nohz.cpu_mask. This may not be power-efficient, since such an idle cpu could come from a completely idle core/package thereby preventing the whole core/package from being in a low-power state. For eg, consider a quad-core dual package system. The cpu numbering need not be sequential and can something like [0, 2, 4, 6] and [1, 3, 5, 7]. With sched_mc/smt_power_savings and the power-aware IRQ balance, we try to keep as fewer Packages/Cores active. But the current idle load balancer logic goes against this by choosing the first_cpu in the nohz.cpu_mask and not taking the system topology into consideration. Improve the algorithm to nominate the idle load balancer from a semi idle cores/packages thereby increasing the probability of the cores/packages being in deeper sleep states for longer duration. The algorithm is activated only when sched_mc/smt_power_savings != 0. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-14tracing, sched: mark get_parent_ip() notraceLai Jiangshan1-1/+1
Impact: remove overly redundant tracing entries When tracer is "function" or "function_graph", way too much "get_parent_ip" entries are recorded in ring_buffer. Signed-off-by: Lai Jiangshan <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Acked-by: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-08Merge commit 'v2.6.30-rc1' into sched/urgentIngo Molnar1-353/+788
Merge reason: update to latest upstream to queue up fix Signed-off-by: Ingo Molnar <[email protected]>
2009-04-07Merge commit 'origin/master' into for-linus/xen/masterJeremy Fitzhardinge1-36/+125
* commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c