aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched.c
AgeCommit message (Collapse)AuthorFilesLines
2009-04-07sched, hw-branch-tracer: add wait_task_context_switch() function to sched.hMarkus Metzger1-0/+43
Add a function to wait until some other task has been switched out at least once. This differs from wait_task_inactive() subtly, in that the latter will wait until the task has left the CPU. Signed-off-by: Markus Metzger <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-07Merge branch 'linus' into perfcounters/coreIngo Molnar1-3/+11
Merge reason: need the upstream facility added by: 7f1e2ca: hrtimer: fix rq->lock inversion (again) Signed-off-by: Ingo Molnar <[email protected]>
2009-04-07perf_counter: remove rq->lock usagePeter Zijlstra1-20/+0
Now that all the task runtime clock users are gone, remove the ugly rq->lock usage from perf counters, which solves the nasty deadlock seen when a software task clock counter was read from an NMI overflow context. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Corey Ashford <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-06Merge branch 'locking-for-linus' of ↵Linus Torvalds1-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: add stack dumps to asserts hrtimer: fix rq->lock inversion (again)
2009-04-06perf_counter: generic context switch eventPeter Zijlstra1-6/+0
Impact: cleanup Use the generic software events for context switches. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Steven Rostedt <[email protected]> Orig-LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-06Merge branch 'linus' into perfcounters/core-v2Ingo Molnar1-325/+809
Merge reason: we have gathered quite a few conflicts, need to merge upstream Conflicts: arch/powerpc/kernel/Makefile arch/x86/ia32/ia32entry.S arch/x86/include/asm/hardirq.h arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/cpu/common.c arch/x86/kernel/irq.c arch/x86/kernel/syscall_table_32.S arch/x86/mm/iomap_32.c include/linux/sched.h kernel/Makefile Signed-off-by: Ingo Molnar <[email protected]>
2009-04-05Merge branch 'tracing-for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c
2009-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumaskLinus Torvalds1-21/+22
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...
2009-04-03Merge branch 'ipi-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: s390: remove arch specific smp_send_stop() panic: clean up kernel/panic.c panic, smp: provide smp_send_stop() wrapper on UP too panic: decrease oops_in_progress only after having done the panic generic-ipi: eliminate WARN_ON()s during oops/panic generic-ipi: cleanups generic-ipi: remove CSD_FLAG_WAIT generic-ipi: remove kmalloc() generic IPI: simplify barriers and locking
2009-04-02Merge branch 'tracing/core-v2' into tracing-for-linusIngo Molnar1-4/+4
Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c
2009-04-01epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key()Davide Libenzi1-4/+19
This patchset introduces wakeup hints for some of the most popular (from epoll POV) devices, so that epoll code can avoid spurious wakeups on its waiters. The problem with epoll is that the callback-based wakeups do not, ATM, carry any information about the events the wakeup is related to. So the only choice epoll has (not being able to call f_op->poll() from inside the callback), is to add the file* to a ready-list and resolve the real events later on, at epoll_wait() (or its own f_op->poll()) time. This can cause spurious wakeups, since the wake_up() itself might be for an event the caller is not interested into. The rate of these spurious wakeup can be pretty high in case of many network sockets being monitored. By allowing devices to report the events the wakeups refer to (at least the two major classes - POLLIN/POLLOUT), we are able to spare useless wakeups by proper handling inside the epoll's poll callback. Epoll will have in any case to call f_op->poll() on the file* later on, since the change to be done in order to have the full event set sent via wakeup, is too invasive for the way our f_op->poll() system works (the full event set is calculated inside the poll function - there are too many of them to even start thinking the change - also poll/select would need change too). Epoll is changed in a way that both devices which send event hints, and the ones that don't, are correctly handled. The former will gain some efficiency though. As a general rule for devices, would be to add an event mask by using key-aware wakeup macros, when making up poll wait queues. I tested it (together with the epoll's poll fix patch Andrew has in -mm) and wakeups for the supported devices are correctly filtered. Test program available here: http://www.xmailserver.org/epoll_test.c This patch: Nothing revolutionary here. Just using the available "key" that our wakeup core already support. The __wake_up_locked_key() was no brainer, since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers around __wake_up_common(). The __wake_up_sync() function had a body, so the choice was between borrowing the body for __wake_up_sync_key() and calling it from __wake_up_sync(), or make an inline and calling it from both. I chose the former since in most archs it all resolves to "mov $0, REG; jmp ADDR". Signed-off-by: Davide Libenzi <[email protected]> Cc: Alan Cox <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Miller <[email protected]> Cc: William Lee Irwin III <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-04-01sched: Print sched_group::__cpu_power in sched_domain_debugGautham R Shenoy1-1/+2
Impact: extend debug info /proc/sched_debug If the user changes the value of the sched_mc/smt_power_savings sysfs tunable, it'll trigger a rebuilding of the whole sched_domain tree, with the SD_POWERSAVINGS_BALANCE flag set at certain levels. As a result, there would be a change in the __cpu_power of sched_groups in the sched_domain hierarchy. Print the __cpu_power values for each sched_group in sched_domain_debug to help verify this change and correlate it with the change in the load-balancing behavior. Signed-off-by: Gautham R Shenoy <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-01cpuacct: add per-cgroup utime/stime statisticsBharata B Rao1-6/+81
Add per-cgroup cpuacct controller statistics like the system and user time consumed by the group of tasks. Changelog: v7 - Changed the name of the statistic from utime to user and from stime to system so that in future we could easily add other statistics like irq, softirq, steal times etc easily. v6 - Fixed a bug in the error path of cpuacct_create() (pointed by Li Zefan). v5 - In cpuacct_stats_show(), use cputime64_to_clock_t() since we are operating on a 64bit variable here. v4 - Remove comments in cpuacct_update_stats() which explained why rcu_read_lock() was needed (as per Peter Zijlstra's review comments). - Don't say that percpu_counter_read() is broken in Documentation/cpuacct.txt as per KAMEZAWA Hiroyuki's review comments. v3 - Fix a small race in the cpuacct hierarchy walk. v2 - stime and utime now exported in clock_t units instead of msecs. - Addressed the code review comments from Balbir and Li Zefan. - Moved to -tip tree. v1 - Moved the stime/utime accounting to cpuacct controller. Earlier versions - http://lkml.org/lkml/2009/2/25/129 Signed-off-by: Bharata B Rao <[email protected]> Signed-off-by: Balaji Rao <[email protected]> Cc: Dhaval Giani <[email protected]> Cc: Paul Menage <[email protected]> Cc: Andrew Morton <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Li Zefan <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Balbir Singh <[email protected]> Tested-by: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-04-01posixtimers, sched: Fix posix clock monotonicityHidetoshi Seto1-8/+57
Impact: Regression fix (against clock_gettime() backwarding bug) This patch re-introduces a couple of functions, task_sched_runtime and thread_group_sched_runtime, which was once removed at the time of 2.6.28-rc1. These functions protect the sampling of thread/process clock with rq lock. This rq lock is required not to update rq->clock during the sampling. i.e. The clock_gettime() may return ((accounted runtime before update) + (delta after update)) that is less than what it should be. v2 -> v3: - Rename static helper function __task_delta_exec() to do_task_delta_exec() since -tip tree already has a __task_delta_exec() of different version. v1 -> v2: - Revises comments of function and patch description. - Add note about accuracy of thread group's runtime. Signed-off-by: Hidetoshi Seto <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] [2.6.28.x][2.6.29.x] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-31cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when ↵Bharata B Rao1-0/+5
rcupreempt is used -v2 Impact: fix cgroups race under rcu-preempt cpuacct_charge() obtains task's ca and does a hierarchy walk upwards. This can race with the task's movement between cgroups. This race can cause an access to freed ca pointer in cpuacct_charge() or access to invalid cgroups pointer of the task. This will not happen with rcu or tree rcu as cpuacct_charge() is called with preemption disabled. However if rcupreempt is used, the race is seen. Thanks to Li Zefan for explaining this. Fix this race by explicitly protecting ca and the hierarchy walk with rcu_read_lock(). Changes for v2: - Update patch descrition (as per Li Zefan's review comments). - Remove comments in cpuacct_charge() which explained why rcu_read_lock() was needed (as per Peter Zijlstra's review comments). Signed-off-by: Bharata B Rao <[email protected]> Cc: Dhaval Giani <[email protected]> Cc: Li Zefan <[email protected]> Cc: Paul Menage <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Balbir Singh <[email protected]> Tested-by: Balbir Singh <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-31hrtimer: fix rq->lock inversion (again)Peter Zijlstra1-3/+11
It appears I inadvertly introduced rq->lock recursion to the hrtimer_start() path when I delegated running already expired timers to softirq context. This patch fixes it by introducing a __hrtimer_start_range_ns() method that will not use raise_softirq_irqoff() but __raise_softirq_irqoff() which avoids the wakeup. It then also changes schedule() to check for pending softirqs and do the wakeup then, I'm not quite sure I like this last bit, nor am I convinced its really needed. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-31Merge branch 'cpumask-for-linus' of ↵Rusty Russell1-21/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).
2009-03-30Merge branch 'locking-for-linus' of ↵Linus Torvalds1-3/+68
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits) lockdep: fix deadlock in lockdep_trace_alloc lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB lockdep: annotate reclaim context (__GFP_NOFS), fix lockdep: build fix for !PROVE_LOCKING lockstat: warn about disabled lock debugging lockdep: use stringify.h lockdep: simplify check_prev_add_irq() lockdep: get_user_chars() redo lockdep: simplify get_user_chars() lockdep: add comments to mark_lock_irq() lockdep: remove macro usage from mark_held_locks() lockdep: fully reduce mark_lock_irq() lockdep: merge the !_READ mark_lock_irq() helpers lockdep: merge the _READ mark_lock_irq() helpers lockdep: simplify mark_lock_irq() helpers #3 lockdep: further simplify mark_lock_irq() helpers lockdep: simplify the mark_lock_irq() helpers lockdep: split up mark_lock_irq() lockdep: generate usage strings lockdep: generate the state bit definitions ...
2009-03-30Merge branch 'linus' into cpumask-for-linusIngo Molnar1-289/+692
Conflicts: arch/x86/kernel/cpu/common.c
2009-03-29x86/paravirt: finish change from lazy cpu to context switch start/endJeremy Fitzhardinge1-1/+1
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-03-29x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switchJeremy Fitzhardinge1-1/+1
Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge <[email protected]> Acked-by: Peter Zijlstra <[email protected]>
2009-03-29sched: fix errors in struct & function commentsRandy Dunlap1-7/+8
Fix kernel-doc errors in sched.c: the structs don't have kernel-doc notation and the short function description needs to be one line only. Error(kernel/sched.c:3197): cannot understand prototype: 'struct sd_lb_stats ' Error(kernel/sched.c:3228): cannot understand prototype: 'struct sg_lb_stats ' Error(kernel/sched.c:3375): duplicate section name 'Description' Signed-off-by: Randy Dunlap <[email protected]> cc: Ingo Molnar <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-03-27Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2Ingo Molnar1-9/+4
Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Add comments to find_busiest_group() functionGautham R Shenoy1-8/+42
Impact: cleanup Add /** style comments around find_busiest_group(). Also add a few explanatory comments. This concludes the find_busiest_group() cleanup. The function is now down to 72 lines from the original 313 lines. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Refactor the power savings balance codeGautham R Shenoy1-83/+153
Impact: cleanup Create seperate helper functions to initialize the power-savings-balance related variables, to update them and to check if we have a scope for performing power-savings balance. Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT) case. This will eliminate all the #ifdef jungle in find_busiest_group() and the other helper functions. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Optimize the !power_savings_balance during fbg()Gautham R Shenoy1-9/+14
Impact: cleanup, micro-optimization We don't need to perform power_savings balance if either the cpu is NOT_IDLE or if the sched_domain doesn't contain the SD_POWERSAVINGS_BALANCE flag set. Currently, we check for these conditions multiple number of times, even though these variables don't change over the scope of find_busiest_group(). Check once, and store the value in the already exiting "power_savings_balance" variable. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Create a helper function to calculate imbalanceGautham R Shenoy1-33/+45
Move all the imbalance calculation out of find_busiest_group() through this helper function. With this change, the structure of find_busiest_group() will be as follows: - update_sched_domain_statistics. - check if imbalance exits. - update imbalance and return busiest. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Create helper to calculate small_imbalance in fbg()Gautham R Shenoy1-61/+70
Impact: cleanup We have two places in find_busiest_group() where we need to calculate the minor imbalance before returning the busiest group. Encapsulate this functionality into a seperate helper function. Credit: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Create a helper function to calculate sched_domain stats for fbg()Gautham R Shenoy1-44/+73
Impact: cleanup Create a helper function named update_sd_lb_stats() to update the various sched_domain related statistics in find_busiest_group(). With this we would have moved all the statistics computation out of find_busiest_group(). Credit: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Define structure to store the sched_domain statistics for fbg()Gautham R Shenoy1-86/+121
Impact: cleanup Currently we use a lot of local variables in find_busiest_group() to capture the various statistics related to the sched_domain. Group them together into a single data structure. This will help us to offload the job of updating the sched_domain statistics to a helper function. Credit: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Create a helper function to calculate sched_group stats for fbg()Gautham R Shenoy1-75/+100
Impact: cleanup Create a helper function named update_sg_lb_stats() which can be invoked to calculate the individual group's statistics in find_busiest_group(). This reduces the lenght of find_busiest_group() considerably. Credit: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Aked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Define structure to store the sched_group statistics for fbg()Gautham R Shenoy1-33/+46
Impact: cleanup Currently a whole bunch of variables are used to store the various statistics pertaining to the groups we iterate over in find_busiest_group(). Group them together in a single data structure and add appropriate comments. This will be useful later on when we create helper functions to calculate the sched_group statistics. Credit: Vaidyanathan Srinivasan <[email protected]> Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Fix indentations in find_busiest_group() using gotosGautham R Shenoy1-15/+17
Impact: cleanup Some indentations in find_busiest_group() can minimized by using early exits with the help of gotos. This improves readability in a couple of cases. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-25sched: Simple helper functions for find_busiest_group()Gautham R Shenoy1-12/+43
Impact: cleanup Currently the load idx calculation code is in find_busiest_group(). Move that to a static inline helper function. Similary, to find the first cpu of a sched_group we use cpumask_first(sched_group_cpus(group)) Use a helper to that. It improves readability in some cases. Signed-off-by: Gautham R Shenoy <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: "Balbir Singh" <[email protected]> Cc: Nick Piggin <[email protected]> Cc: "Dhaval Giani" <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: "Vaidyanathan Srinivasan" <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-24sched: remove unused fields from struct rqLuis Henriques1-3/+0
Impact: cleanup, new schedstat ABI Since they are used on in statistics and are always set to zero, the following fields from struct rq have been removed: yld_exp_empty, yld_act_empty and yld_both_empty. Both Sched Debug and SCHEDSTAT_VERSION versions has also been incremented since ABIs have been changed. The schedtop tool has been updated to properly handle new version of schedstat: http://rt.wiki.kernel.org/index.php/Schedtop_utility Signed-off-by: Luis Henriques <[email protected]> Acked-by: Gregory Haskins <[email protected]> Acked-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-19cpumask: remove cpumask allocation from idle_balance, fixRusty Russell1-1/+1
Impact: fix boot crash Fix typo in the size calculation. Reported-by: Ingo Molnar <[email protected]> Signed-off-by: Rusty Russell <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-19cpumask: remove cpumask allocation from idle_balanceRusty Russell1-17/+18
Impact: fix circular locking Steven reports a circular locking from alloc_cpumask_var doing a wakeup. We get rid of this using the tried-and-true technique of using a per-cpu cpumask_var_t rather than doing an alloc every time. Simpler and more robust than a rare, implicit allocation within an atomic codepath. Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Rusty Russell <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-17sched: small optimisation of can_migrate_task()Luis Henriques1-4/+6
There were 3 invocations of task_hot() in can_migrate_task(). Replace these 3 invocations by only one invocation, cached in a local variable. Signed-off-by: Luis Henriques <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-17sched: fix typos in documentationLuis Henriques1-2/+2
Fixed typos in function documentation. Signed-off-by: Luis Henriques <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-13Merge branch 'x86/core' into core/ipiIngo Molnar1-9/+4
2009-03-13cpumask: use topology_core_cpumask/topology_thread_cpumask instead of ↵Rusty Russell1-4/+4
cpu_core_map/cpu_sibling_map Impact: cleanup This is presumably what those definitions are for, and while all archs define cpu_core_map/cpu_sibling map, that's changing (eg. x86 wants to change it to a pointer). Signed-off-by: Rusty Russell <[email protected]>
2009-03-13Merge branch 'linus' into core/ipiIngo Molnar1-3/+12
2009-03-11sched: add avg_overlap decayMike Galbraith1-1/+23
Impact: more precise avg_overlap metric - better load-balancing avg_overlap is used to measure the runtime overlap of the waker and wakee. However, when a process changes behaviour, eg a pipe becomes un-congested and we don't need to go to sleep after a wakeup for a while, the avg_overlap value grows stale. When running we use the avg runtime between preemption as a measure for avg_overlap since the amount of runtime can be correlated to cache footprint. The longer we run, the less likely we'll be wanting to be migrated to another CPU. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1236709131.25234.576.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-10sched: optimize ttwu vs group schedulingPeter Zijlstra1-1/+15
Impact: micro-optimization We can avoid the sched domain walk on try_to_wake_up() when we know there are no groups. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1236603381.8389.455.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-06Merge branch 'x86/core' into tracing/texteditIngo Molnar1-9/+4
Conflicts: arch/x86/Kconfig block/blktrace.c kernel/irq/handle.c Semantic conflict: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <[email protected]>
2009-03-06sched: TIF_NEED_RESCHED -> need_reshed() cleanupLai Jiangshan1-5/+5
Impact: cleanup Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched() instead of using TIF_NEED_RESCHED. Signed-off-by: Lai Jiangshan <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-06Merge branch 'sched/core' into sched/cleanupsIngo Molnar1-1/+8
2009-03-05Merge branch 'x86/urgent' into x86/coreIngo Molnar1-3/+12
Conflicts: arch/x86/include/asm/fixmap_64.h Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <[email protected]>
2009-03-05sched: don't rebalance if attached on NULL domainFrederic Weisbecker1-1/+8
Impact: fix function graph trace hang / drop pointless softirq on UP While debugging a function graph trace hang on an old PII, I saw that it consumed most of its time on the timer interrupt. And the domain rebalancing softirq was the most concerned. The timer interrupt calls trigger_load_balance() which will decide if it is worth to schedule a rebalancing softirq. In case of builtin UP kernel, no problem arises because there is no domain question. In case of builtin SMP kernel running on an SMP box, still no problem, the softirq will be raised each time we reach the next_balance time. In case of builtin SMP kernel running on a UP box (most distros provide default SMP kernels, whatever the box you have), then the CPU is attached to the NULL sched domain. So a kind of unexpected behaviour happen: trigger_load_balance() -> raises the rebalancing softirq later on softirq: run_rebalance_domains() -> rebalance_domains() where the for_each_domain(cpu, sd) is not taken because of the NULL domain we are attached at. Which means rq->next_balance is never updated. So on the next timer tick, we will enter trigger_load_balance() which will always reschedule() the rebalacing softirq: if (time_after_eq(jiffies, rq->next_balance)) raise_softirq(SCHED_SOFTIRQ); So for each tick, we process this pointless softirq. This patch fixes it by checking if we are attached to the null domain before raising the softirq, another possible fix would be to set the maximal possible JIFFIES value to rq->next_balance if we are attached to the NULL domain. v2: build fix on UP Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-03-04Merge branch 'core/locking' into tracing/ftraceIngo Molnar1-3/+68