| Age | Commit message (Collapse) | Author | Files | Lines |
|
Yanmin Zhang reported:
Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.
With bisect, I located the following patch:
| 18d95a2832c1392a2d63227a7a6d433cb9f2037e is first bad commit
| commit 18d95a2832c1392a2d63227a7a6d433cb9f2037e
| Author: Peter Zijlstra <[email protected]>
| Date: Sat Apr 19 19:45:00 2008 +0200
|
| sched: fair-group: SMP-nice for group scheduling
Revert it so that we get v2.6.25 behavior.
Bisected-by: Yanmin Zhang <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
this replaces the rq->clock stuff (and possibly cpu_clock()).
- architectures that have an 'imperfect' hardware clock can set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
- the 'jiffie' window might be superfulous when we update tick_gtod
before the __update_sched_clock() call in sched_clock_tick()
- cpu_clock() might be implemented as:
sched_clock_cpu(smp_processor_id())
if the accuracy proves good enough - how far can TSC drift in a
single jiffie when considering the filtering and idle hooks?
[ [email protected]: various fixes and cleanups ]
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: Roman Zippel <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Russell King <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Patrick McHardy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add some extra debug output so we can get a better overview of the
full hierarchy.
We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
it's unused.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
improve affine wakeups. Maintain the 'overlap' metric based on CFS's
sum_exec_runtime - which means the amount of time a task executes
after it wakes up some other task.
Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
it means there's strong workload coupling between this task and the
woken up task. If the 'overlap' is large then the workload is decoupled
and the scheduler will move them to separate CPUs more easily.
( Also slightly move the preempt_check within try_to_wake_up() - this has
no effect on functionality but allows 'early wakeups' (for still-on-rq
tasks) to be correctly accounted as well.)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We monitor clock overflows, let's also monitor clock underflows.
Signed-off-by: Guillaume Chazarain <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Meelis Roos reported these warnings on sparc64:
CC kernel/sched.o
In file included from kernel/sched.c:879:
kernel/sched_debug.c: In function 'nsec_high':
kernel/sched_debug.c:38: warning: comparison of distinct pointer types lacks a cast
the debug check in do_div() is over-eager here, because the long long
is always positive in these places. Mark this by casting them to
unsigned long long.
no change in code output:
text data bss dec hex filename
51471 6582 376 58429 e43d sched.o.before
51471 6582 376 58429 e43d sched.o.after
md5:
7f7729c111f185bf3ccea4d542abc049 sched.o.before.asm
7f7729c111f185bf3ccea4d542abc049 sched.o.after.asm
Signed-off-by: Ingo Molnar <[email protected]>
|
|
clean up overlong line in kernel/sched_debug.c.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
bump version of kernel/sched_debug.c and remove CFS version
information from it.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
we lost the sched_min_granularity tunable to a clever optimization
that uses the sched_latency/min_granularity ratio - but the ratio
is quite unintuitive to users and can also crash the kernel if the
ratio is set to 0. So reintroduce the min_granularity tunable,
while keeping the ratio maintained internally.
no functionality changed.
[ [email protected]: some fixlets. ]
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Lockdep noticed that this lock can also be taken from hardirq context, and can
thus not unconditionally disable/enable irqs.
WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
[show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
[show_trace+18/32] show_trace+0x12/0x20
[dump_stack+22/32] dump_stack+0x16/0x20
[trace_hardirqs_on+405/416] trace_hardirqs_on+0x195/0x1a0
[_read_unlock_irq+34/48] _read_unlock_irq+0x22/0x30
[sched_debug_show+2615/4224] sched_debug_show+0xa37/0x1080
[show_state_filter+326/368] show_state_filter+0x146/0x170
[sysrq_handle_showstate+10/16] sysrq_handle_showstate+0xa/0x10
[__handle_sysrq+123/288] __handle_sysrq+0x7b/0x120
[handle_sysrq+40/64] handle_sysrq+0x28/0x40
[kbd_event+1045/1680] kbd_event+0x415/0x690
[input_pass_event+206/208] input_pass_event+0xce/0xd0
[input_handle_event+170/928] input_handle_event+0xaa/0x3a0
[input_event+95/112] input_event+0x5f/0x70
[atkbd_interrupt+434/1456] atkbd_interrupt+0x1b2/0x5b0
[serio_interrupt+59/128] serio_interrupt+0x3b/0x80
[i8042_interrupt+263/576] i8042_interrupt+0x107/0x240
[handle_IRQ_event+40/96] handle_IRQ_event+0x28/0x60
[handle_edge_irq+175/320] handle_edge_irq+0xaf/0x140
[do_IRQ+64/128] do_IRQ+0x40/0x80
[common_interrupt+46/52] common_interrupt+0x2e/0x34
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
schedstat is useful in investigating CPU scheduler behavior. Ideally,
I think it is beneficial to have it on all the time. However, the
cost of turning it on in production system is quite high, largely due
to number of events it collects and also due to its large memory
footprint.
Most of the fields probably don't need to be full 64-bit on 64-bit
arch. Rolling over 4 billion events will most like take a long time
and user space tool can be made to accommodate that. I'm proposing
kernel to cut back most of variable width on 64-bit system. (note,
the following patch doesn't affect 32-bit system).
Signed-off-by: Ken Chen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
increase width of debug line - in preparation of more debugging info.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Fix coding style issues reported by Randy Dunlap and others
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
speed up and simplify vslice calculations.
[ From: Mike Galbraith <[email protected]>: build fix ]
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
rename all 'cnt' fields and variables to the less yucky 'count' name.
yuckage noticed by Andrew Morton.
no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:
text data bss dec hex filename
38236 3506 24 41766 a326 sched.o.before
38240 3506 24 41770 a32a sched.o.after
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
debug feature: check how well we schedule within a reasonable
vruntime 'spread' range. (note that CPU overload can increase
the spread, so this is not a hard condition, but normal loads
should be within the spread.)
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
|
|
more width for parameter printouts in /proc/sched_debug.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
print the current value of all tunables in /proc/sched_debug output.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
build fix for the SCHED_DEBUG && !SCHEDSTATS case.
Signed-off-by: S.Ceglar Onur <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
add per task and per rq BKL usage statistics.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
Enable user-id based fair group scheduling. This is useful for anyone
who wants to test the group scheduler w/o having to enable
CONFIG_CGROUPS.
A separate scheduling group (i.e struct task_grp) is automatically created for
every new user added to the system. Upon uid change for a task, it is made to
move to the corresponding scheduling group.
A /proc tunable (/proc/root_user_share) is also provided to tune root
user's quota of cpu bandwidth.
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
- print nr_running and load information for cfs_rq in /proc/sched_debug
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
fix formatting of /proc/sched_debug
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
enhance debug output by changing 12345678 nsecs to 12.345678 output,
this is more human-readable.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
print the correct amount of dashes in /proc/sched_debug.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
Get rid of 'sched_entity::fair_key'.
As a side effect, 'current' is not kept withing the tree for
SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code
(e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes
them (e.g. a single update_curr() now vs. dequeue/enqueue() before in
entity_tick()).
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
remove wait_runtime based fields and features, now that the CFS
math has been changed over to the vruntime metric.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
remove the wait_runtime-limit fields and the code depending on it, now
that the math has been changed over to rely on the vruntime metric.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
'struct load_stat' is redundant now so let's get rid of it.
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
add more vruntime statistics.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
debug se->vruntime fields.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
|
|
CPU load calculations are statistical anyway, and there's little benefit
from having it calculated on every scheduling event. So remove this code,
it gets rid of a divide from the scheduler wakeup and context-switch
fastpath.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
track the maximum amount of time a task has executed while
the CPU load was at least 2x. (i.e. at least two nice-0
tasks were runnable)
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
small kernel/sched_debug.c cleanup - break up
multi-variable assignment.
no code changed:
text data bss dec hex filename
38869 3550 24 42443 a5cb sched.o.before
38869 3550 24 42443 a5cb sched.o.after
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|
|
when cleaning sched-stats also clear prev_sum_exec_runtime.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
construct a more or less wall-clock time out of sched_clock(), by
using ACPI-idle's existing knowledge about how much time we spent
idling. This allows the rq clock to work around TSC-stops-in-C2,
TSC-gets-corrupted-in-C3 type of problems.
( Besides the scheduler's statistics this also benefits blktrace and
printk-timestamps as well. )
Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
callbacks allow the scheduler to get out the most of the period where
the CPU has a reliable TSC. This results in slightly more precise
task statistics.
the ACPI bits were acked by Len.
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Len Brown <[email protected]>
|
|
Arjan van de Ven pointed out that we should not print kernel addresses
in world-readable /proc files - fix that.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Arjan van de Ven <[email protected]>
|
|
remove the 'u64 now' parameter from sched_debug.c:print_task()/_rq().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <[email protected]>
|
|
remove the 'u64 now' parameter from print_cfs_rq().
( identity transformation that causes no change in functionality. )
Signed-off-by: Ingo Molnar <[email protected]>
|
|
C99 6.10.3[11]: preprocessing directive within the argument list of
macro invocation => undefined behaviour. Don't do that...
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
move the rest of the debugging/instrumentation code to under
CONFIG_SCHEDSTATS too. This reduces code size and speeds code up:
text data bss dec hex filename
33044 4122 28 37194 914a sched.o.before
32708 4122 28 36858 8ffa sched.o.after
Signed-off-by: Ingo Molnar <[email protected]>
|