blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2009-10-25	sched, cpuacct: Fix niced guest time accounting	Ryota Ozaki	1	-2/+7
	CPU time of a guest is always accounted in 'user' time without concern for the nice value of its counterpart process although the guest is scheduled under the nice value. This patch fixes the defect and accounts cpu time of a niced guest in 'nice' time as same as a niced process. And also the patch adds 'guest_nice' to cpuacct. The value provides niced guest cpu time which is like 'nice' to 'user'. The original discussions can be found here: http://www.mail-archive.com/[email protected]/msg23982.html http://www.mail-archive.com/[email protected]/msg23860.html Signed-off-by: Ryota Ozaki <[email protected]> Acked-by: Avi Kivity <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-25	Merge branch 'linus' into sched/core	Ingo Molnar	1	-34/+76
	Conflicts: fs/proc/array.c Merge reason: resolve conflict and queue up dependent patch. Signed-off-by: Ingo Molnar <[email protected]>
2009-10-23	sched: Strengthen buddies and mitigate buddy induced latencies	Mike Galbraith	1	-1/+1
	This patch restores the effectiveness of LAST_BUDDY in preventing pgsql+oltp from collapsing due to wakeup preemption. It also switches LAST_BUDDY to exclusively do what it does best, namely mitigate the effects of aggressive wakeup preemption, which improves vmark throughput markedly, and restores mysql+oltp scalability. Since buddies are about scalability, enable them beginning at the point where we begin expanding sched_latency, namely sched_nr_latency. Previously, buddies were cleared aggressively, which seriously reduced their effectiveness. Not clearing aggressively however, produces a small drop in mysql+oltp throughput immediately after peak, indicating that LAST_BUDDY is actually doing some harm. This is right at the point where X on the desktop in competition with another load wants low latency service. Ergo, do not enable until we need to scale. To mitigate latency induced by buddies, or by a task just missing wakeup preemption, check latency at tick time. Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via CACHE_HOT_BUDDY. Supporting performance tests: tip = v2.6.32-rc5-1497-ga525b32 tipx = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs (Three run averages except where noted.) vmark: ------ tip 108466 messages per second tip+ 125307 messages per second tip+x 125335 messages per second tipx 117781 messages per second 2.6.31.3 122729 messages per second mysql+oltp: ----------- clients 1 2 4 8 16 32 64 128 256 .......................................................................................... tip 9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08 tip+ 10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47 tipx 9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18 2.6.31.3 8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86 pgsql+oltp: ----------- clients 1 2 4 8 16 32 64 128 256 .......................................................................................... tip 13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35 tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94 tip+x 13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00 tipx 13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84 2.6.31.3 13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25 Signed-off-by: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-14	Merge branch 'sched-fixes-for-linus' of ↵	Linus Torvalds	1	-2/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix missing kernel-doc notation Revert "x86, timers: Check for pending timers after (device) interrupts" sched: Update the clock of runqueue select_task_rq() selected
2009-10-13	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	1	-3/+0
	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: Add cciss_allow_hpsa module parameter cciss: Fix multiple calls to pci_release_regions blk-settings: fix function parameter kernel-doc notation writeback: kill space in debugfs item name writeback: account IO throttling wait as iowait elv_iosched_store(): fix strstrip() misuse cfq-iosched: avoid probable slice overrun when idling cfq-iosched: apply bool value where we return 0/1 cfq-iosched: fix think time allowed for seekers cfq-iosched: fix the slice residual sign cfq-iosched: abstract out the 'may this cfqq dispatch' logic block: use proper BLK_RW_ASYNC in blk_queue_start_tag() block: Seperate read and write statistics of in_flight requests v2 block: get rid of kblock_schedule_delayed_work() cfq-iosched: fix possible problem with jiffies wraparound cfq-iosched: fix issue with rq-rq merging and fifo list ordering
2009-10-12	sched: Fix missing kernel-doc notation	Randy Dunlap	1	-0/+2
	The following htmldocs warnings: Warning(kernel/sched.c:685): No description found for parameter 'cpu' Warning(kernel/sched.c:3676): No description found for parameter 'sd' Trigger because new parameters were added to update_rq_clock() and update_group_power() without updating the kernel-doc notation. Signed-off-by: Jaswinder Singh Rajput <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-09	sched: Update the clock of runqueue select_task_rq() selected	Mike Galbraith	1	-2/+6
	In try_to_wake_up(), we update the runqueue clock, but select_task_rq() may select a different runqueue than the one we updated, leaving the new runqueue's clock stale for a bit. This patch cures occasional huge latencies reported by latencytop when coming out of idle on a mostly idle NO_HZ box. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-09	writeback: account IO throttling wait as iowait	Wu Fengguang	1	-3/+0
	It makes sense to do IOWAIT when someone is blocked due to IO throttle, as suggested by Kame and Peter. There is an old comment for not doing IOWAIT on throttle, however it has been mismatching the code for a long time. If we stop accounting IOWAIT for 2.6.32, it could be an undesirable behavior change. So restore the io_schedule. CC: KAMEZAWA Hiroyuki <[email protected]> CC: Peter Zijlstra <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2009-10-05	sched: Remove obsolete comment in sched_init()	Jayson R. King	1	-4/+0
	Remove the comment about calling alloc_bootmem() as it is not called here since commit 36b7b6d465489c4754c4fd66fcec6086eba87896. Signed-off-by: Jayson R. King <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Jiri Kosina <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-05	sched: Set correct normal_prio and prio values in sched_fork()	Peter Williams	1	-11/+9
	normal_prio should be updated if policy changes from RT to SCHED_MORMAL or if static_prio/nice is changed. Some paths through sched_fork() ignore this requirement and may result in normal_prio having an invalid value. Fixing this issue allows the call to effective_prio() in wake_up_new_task() to be removed. Signed-off-by: Peter Williams <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-01	const: constify remaining file_operations	Alexey Dobriyan	1	-1/+1
	[[email protected]: fix KVM] Signed-off-by: Alexey Dobriyan <[email protected]> Acked-by: Mike Frysinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-24	sysctl: remove "struct file *" argument of ->proc_handler	Alexey Dobriyan	1	-2/+2
	It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <[email protected]> Cc: David Howells <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Al Viro <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-24	cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time	Ben Blum	1	-3/+32
	Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [[email protected]: fix build] Signed-off-by: Ben Blum <[email protected]> Signed-off-by: Paul Menage <[email protected]> Acked-by: Li Zefan <[email protected]> Reviewed-by: Matt Helsley <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Dave Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-23	Merge branch 'timers-for-linus' of ↵	Linus Torvalds	1	-5/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: itimers: Add tracepoints for itimer hrtimer: Add tracepoint for hrtimers timers: Add tracepoints for timer_list timers cputime: Optimize jiffies_to_cputime(1) itimers: Simplify arm_timer() code a bit itimers: Fix periodic tics precision itimers: Merge ITIMER_VIRT and ITIMER_PROF Trivial header file include conflicts in kernel/fork.c
2009-09-22	cpuidle: fix the menu governor to boost IO performance	Arjan van de Ven	1	-0/+13
	Fix the menu idle governor which balances power savings, energy efficiency and performance impact. The reason for a reworked governor is that there have been serious performance issues reported with the existing code on Nehalem server systems. To show this I'm sure Andrew wants to see benchmark results: (benchmark is "fio", "no cstates" is using "idle=poll") no cstates current linux new algorithm 1 disk 107 Mb/s 85 Mb/s 105 Mb/s 2 disks 215 Mb/s 123 Mb/s 209 Mb/s 12 disks 590 Mb/s 320 Mb/s 585 Mb/s In various power benchmark measurements, no degredation was found by our measurement&diagnostics team. Obviously a small percentage more power was used in the "fio" benchmark, due to the much higher performance. While it would be a novel idea to describe the new algorithm in this commit message, I cheaped out and described it in comments in the code instead. [changes since first post: spelling fixes from akpm, review feedback, folded menu-tng into menu.c] Signed-off-by: Arjan van de Ven <[email protected]> Cc: Venkatesh Pallipadi <[email protected]> Cc: Len Brown <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yanmin Zhang <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-21	Merge branch 'perfcounters-rename-for-linus' of ↵	Linus Torvalds	1	-7/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Tidy up after the big rename perf: Do the big rename: Performance Counters -> Performance Events perf_counter: Rename 'event' to event_id/hw_event perf_counter: Rename list_entry -> group_entry, counter_list -> group_list Manually resolved some fairly trivial conflicts with the tracing tree in include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
2009-09-21	perf: Do the big rename: Performance Counters -> Performance Events	Ingo Molnar	1	-7/+7
	Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N \| sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paul Mackerras <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Howells <[email protected]> Cc: Kyle McMartin <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-21	sched: Simplify sys_sched_rr_get_interval() system call	Peter Williams	1	-16/+1
	By removing the need for it to know details of scheduling classes. This allows PlugSched to define orthogonal scheduling classes. Signed-off-by: Peter Williams <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20	sched: Fix potential NULL derference of doms_cur	Yong Zhang	1	-1/+1
	If CONFIG_CPUMASK_OFFSTACK is enabled but doms_cur alloc failed in arch_init_sched_domains(), doms_cur will move back to fallback_doms. But this time, fallback_doms has not been initialized yet. Signed-off-by: Yong Zhang <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-20	sched: Fix raciness in runqueue_is_locked()	Andrew Morton	1	-8/+2
	runqueue_is_locked() is unavoidably racy due to a poor interface design. It does cpu = get_cpu() ret = some_perpcu_thing(cpu); put_cpu(cpu); return ret; Its return value is unreliable. Fix. Signed-off-by: Andrew Morton <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-17	Merge branch 'sched-core-for-linus' of ↵	Linus Torvalds	1	-301/+143
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits) sched: Fix SD_POWERSAVING_BALANCE\|SD_PREFER_LOCAL vs SD_WAKE_AFFINE sched: Stop buddies from hogging the system sched: Add new wakeup preemption mode: WAKEUP_RUNNING sched: Fix TASK_WAKING & loadaverage breakage sched: Disable wakeup balancing sched: Rename flags to wake_flags sched: Clean up the load_idx selection in select_task_rq_fair sched: Optimize cgroup vs wakeup a bit sched: x86: Name old_perf in a unique way sched: Implement a gentler fair-sleepers feature sched: Add SD_PREFER_LOCAL sched: Add a few SYNC hint knobs to play with sched: Fix sync wakeups again sched: Add WF_FORK sched: Rename sync arguments sched: Rename select_task_rq() argument sched: Feature to disable APERF/MPERF cpu_power x86: sched: Provide arch implementations using aperf/mperf x86: Add generic aperf/mperf code x86: Move APERF/MPERF into a X86_FEATURE ... Fix up trivial conflict in arch/x86/include/asm/processor.h due to nearby addition of amd_get_nb_id() declaration from the EDAC merge.
2009-09-17	sched: Add new wakeup preemption mode: WAKEUP_RUNNING	Peter Zijlstra	1	-7/+10
	Create a new wakeup preemption mode, preempt towards tasks that run shorter on avg. It sets next buddy to be sure we actually run the task we preempted for. Test results: root@twins:~# while :; do :; done & [1] 6537 root@twins:~# while :; do :; done & [2] 6538 root@twins:~# while :; do :; done & [3] 6539 root@twins:~# while :; do :; done & [4] 6540 root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 4750 usec Avg 497 usec Stdev 737 usec root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 14 usec Avg 5 usec Stdev 3 usec Disabled by default - needs more testing. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Mike Galbraith <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> LKML-Reference: <new-submission>
2009-09-17	sched: Fix TASK_WAKING & loadaverage breakage	Ingo Molnar	1	-0/+4
	Fix this: top - 21:54:00 up 2:59, 1 user, load average: 432512.33, 426421.74, 417432.74 Which happens because we now set TASK_WAKING before activate_task(). Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-16	sched: Optimize cgroup vs wakeup a bit	Peter Zijlstra	1	-7/+0
	We don't need to call update_shares() for each domain we iterate, just got the largets one. However, we should call it before wake_affine() as well, so that that can use up-to-date values too. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Fix sync wakeups again	Peter Zijlstra	1	-4/+4
	The sync argument rename to introduce WF_* broke stuff by missing a local alias for an argument in __wake_up_common, fix it by using the more descriptive wake_flags name. This restores WF_SYNC propagation, which fixes wake_affine() behaviour, which fixes pipe-test. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	Merge branch 'for-linus' of ↵	Linus Torvalds	1	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15	sched: Add WF_FORK	Peter Zijlstra	1	-1/+1
	Avoid the cache buddies from biasing the time distribution away from fork()ers. Normally the next buddy will be the preferred scheduling target, but this makes fork()s prefer to run the new child, whereas we prefer to run the parent, since that will generate more work. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Rename sync arguments	Peter Zijlstra	1	-14/+16
	In order to extend the functions to have more than 1 flag (sync), rename the argument to flags, and explicitly define a WF_ space for individual flags. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Feature to disable APERF/MPERF cpu_power	Peter Zijlstra	1	-2/+10
	I suspect a feed-back loop between cpuidle and the aperf/mperf cpu_power bits, where when we have idle C-states lower the ratio, which leads to lower cpu_power and then less load, which generates more idle time, etc.. Put in a knob to disable it. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Provide arch_scale_freq_power	Peter Zijlstra	1	-2/+19
	Provide an ach specific hook for cpufreq based scaling of cpu_power. Signed-off-by: Peter Zijlstra <[email protected]> [[email protected]: spotting bugs] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Weaken SD_POWERSAVINGS_BALANCE	Peter Zijlstra	1	-20/+20
	One of the problems of power-saving balancing is that under certain scenarios it is too slow and allows tons of real work to pile up. Avoid this by ignoring the powersave stuff when there's real work to be done. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Merge select_task_rq_fair() and sched_balance_self()	Peter Zijlstra	1	-38/+3
	The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Add TASK_WAKING	Peter Zijlstra	1	-16/+15
	We're going to want to drop rq->lock in try_to_wake_up() for a longer period of time, however we also want to deal with concurrent waking of the same task, which is currently handled by holding rq->lock. So introduce a new TASK state, namely TASK_WAKING, which indicates someone is already waking the task (other wakers will fail p->state & state). We also keep preemption disabled over the whole ttwu(). Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Hook sched_balance_self() into sched_class::select_task_rq()	Peter Zijlstra	1	-7/+7
	Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Move sched_balance_self() into sched_fair.c	Peter Zijlstra	1	-146/+0
	Move the sched_balance_self() code into sched_fair.c This facilitates the merger of sched_balance_self() and sched_fair::select_task_rq(). Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Move code around	Peter Zijlstra	1	-42/+39
	In preparation to other code movement, move weighted_cpuload(), source_load() and target_load() before the class includes. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-15	sched: Fix double_rq_lock() compile warning	Peter Zijlstra	1	-2/+2
	Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-11	Merge branch 'sched-core-for-linus' of ↵	Linus Torvalds	1	-397/+702
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...
2009-09-11	Merge branch 'core-rcu-for-linus' of ↵	Linus Torvalds	1	-3/+128
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits) rcu: Move end of special early-boot RCU operation earlier rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees rcu: Remove lockdep annotations from RCU's _notrace() API members rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds rcu: Add CPU-offline processing for single-node configurations rcu: Add "notrace" to RCU function headers used by ftrace rcu: Remove CONFIG_PREEMPT_RCU rcu: Merge preemptable-RCU functionality into hierarchical RCU rcu: Simplify rcu_pending()/rcu_check_callbacks() API rcu: Use debugfs_remove_recursive() simplify code. rcu: Merge per-RCU-flavor initialization into pre-existing macro rcu: Fix online/offline indication for rcudata.csv trace file rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h rcu: Renamings to increase RCU clarity rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation. rcu: Fix typo in rcu_irq_exit() comment header rcu: Make rcupreempt_trace.c look at offline CPUs ...
2009-09-04	sched: Fix dynamic power-balancing crash	Ingo Molnar	1	-2/+5
	This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Acked-by: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Remove reciprocal for cpu_power	Peter Zijlstra	1	-67/+34
	Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ v2: build fix from From: Andreas Herrmann ] Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Try to deal with low capacity, fix update_sd_power_savings_stats()	Gautham R Shenoy	1	-1/+1
	sgs.group_capacity can now be 0, if for some reason group->__cpu_power happens to be less than SCHED_LOAD_SCALE/2. In that case, we need the following fix to make it work for update_sd_power_savings_stats(). That's because both sum_nr_running and group_capacity are unsigned longs. Cc: Gautham R Shenoy <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andreas Herrmann <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Try to deal with low capacity	Peter Zijlstra	1	-5/+28
	When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Scale down cpu_power due to RT tasks	Peter Zijlstra	1	-3/+61
	Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Implement dynamic cpu_power	Peter Zijlstra	1	-3/+35
	Recompute the cpu_power for each cpu during load-balance. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Add smt_gain	Peter Zijlstra	1	-1/+7
	The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Update the cpu_power sum during load-balance	Peter Zijlstra	1	-4/+29
	In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Add SD_PREFER_SIBLING	Peter Zijlstra	1	-1/+13
	Do the placement thing using SD flags. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	sched: Restore __cpu_power to a straight sum of power	Peter Zijlstra	1	-16/+12
	cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Andreas Herrmann <[email protected]> Acked-by: Andreas Herrmann <[email protected]> Acked-by: Gautham R Shenoy <[email protected]> Cc: Balbir Singh <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-04	Merge branches 'sched/domains' and 'sched/clock' into sched/core	Ingo Molnar	1	-245/+320
	Merge reason: both topics are ready now, and we want to merge dependent changes. Signed-off-by: Ingo Molnar <[email protected]>