| Age | Commit message (Collapse) | Author | Files | Lines |
|
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...
Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep
* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
lockdep: annotate dir vs file i_mutex
lockdep: per filesystem inode lock class
lockdep: annotate kprobes irq fiddling
lockdep: annotate rcu_read_{,un}lock{,_bh}
lockdep: annotate journal_start()
lockdep: s390: connect the sysexit hook
lockdep: x86_64: connect the sysexit hook
lockdep: i386: connect the sysexit hook
lockdep: syscall exit check
lockdep: fixup mutex annotations
lockdep: fix mismatched lockdep_depth/curr_chain_hash
lockdep: Avoid /proc/lockdep & lock_stat infinite output
lockdep: maintainers
|
|
make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.
Signed-off-by: Laurent Vivier <[email protected]>
Acked-by: Avi Kivity <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.
Modify /proc/<pid>/stat to display these new fields.
Signed-off-by: Laurent Vivier <[email protected]>
Acked-by: Avi Kivity <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that we are calling this at runtime, a more relaxed error path is
suggested. If an allocation fails, we just register the partial table,
which will show empty directories.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains. This prevents the
sysctl table from pointing to freed memory for root to write.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables. If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data. If the online mask is not
contigious then we would show offline cpus and miss online ones.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
kcalloc checks for n * sizeof(element) overflows and it zeros.
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.
Signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
allow the immediate migration of cache-cold tasks.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
increase width of debug line - in preparation of more debugging info.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
speed up context-switches a bit by not clearing p->exec_start.
(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.
pointed out by Andi Kleen.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
remove stale comment from sched_group_set_shares().
Function never returns -EINVAL.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
clean up is_migration_thread() and turn it into an inline function.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Refactor common code of sleep_on / wait_for_completion
These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;
No functional changes.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
update comment: clarify time-slices and remove obsolete tuning detail.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Prevent wakeup over-scheduling. Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept. Instead, the
task is marked for preemption at the next tick. Tasks of higher
priority still preempt immediately.
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.
This patch fixes the issue.
Signed-off-by: Zou Nan hai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.
Use schedule_timeout_uninterruptible(1) instead.
Signed-off-by: Gautham R Shenoy <[email protected]>
Credit: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add tunables in sysfs to modify a user's cpu share.
A directory is created in sysfs for each new user in the system.
/sys/kernel/uids/<uid>/cpu_share
Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.
Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#
Signed-off-by: Srivatsa Vaddagiri <[email protected]>
Signed-off-by: Dhaval Giani <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
disable sleeper fairness for batch tasks - they are about
batch processing after all.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
unit mis-match: wakeup_gran was used against a vruntime
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
export cpu_clock() - the preferred API instead of sched_clock().
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
noticed by Peter Zijlstra:
fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.
(without this there can be large latencies)
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
|
|
noticed by Thomas Gleixner:
cleanup: function prototype cleanups - move into single line
wherever possible.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to
make SCHED_FEAT_ names more consistent.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
kfree(NULL) is valid.
pointed out by checkpatch.pl.
the fix shrinks the code a bit:
text data bss dec hex filename
40024 3842 100 43966 abbe sched.o.before
40002 3842 100 43944 aba8 sched.o.after
Signed-off-by: Ingo Molnar <[email protected]>
|
|
fix up __setup() style bug - noticed via checkpatch.pl.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
checkpatch.pl and Andy Whitcroft noticed the following bug: we did
not break out after printing an error.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
run sched_domain_debug() if CONFIG_SCHED_DEBUG=y, instead
of relying on the hand-crafted SCHED_DOMAIN_DEBUG switch.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.
zero effect, functionality-wise:
text data bss dec hex filename
34550 3026 100 37676 932c sched.o.before
34550 3026 100 37676 932c sched.o.after
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
remove obsolete code -- calc_weighted()
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.
[1] according to the following link, it's not compliant with SUSv3
(not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656
[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".
(i.e. it's more precise if a task is runnable at the moment)
yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.
results:
(SCHED_FIFO)
dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
time_slice: 0 : 0
(SCHED_RR)
dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
time_slice: 0 : 99984800
(SCHED_NORMAL)
dimm@earth:~/storage/prog$ ./rr_interval
time_slice: 0 : 19996960
(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)
dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
time_slice: 0 : 9998480
Signed-off-by: Dmitry Adamushko <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
* save ~300 bytes
* activate_idle_task() was moved to avoid a warning
bloat-o-meter output:
add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <===
function old new delta
__enqueue_entity - 165 +165
finish_task_switch - 110 +110
update_curr_rt - 79 +79
__load_balance_iterator - 32 +32
__task_rq_unlock - 28 +28
find_process_by_pid - 24 +24
do_sched_setscheduler 133 123 -10
sys_sched_rr_get_interval 176 165 -11
sys_sched_getparam 156 145 -11
normalize_rt_tasks 482 470 -12
sched_getaffinity 112 99 -13
sys_sched_getscheduler 86 72 -14
sched_setaffinity 226 212 -14
sched_setscheduler 666 642 -24
load_balance_start_fair 33 9 -24
load_balance_next_fair 33 9 -24
dequeue_task_rt 133 67 -66
put_prev_task_rt 97 28 -69
schedule_tail 133 50 -83
schedule 682 594 -88
enqueue_entity 499 366 -133
task_new_fair 317 180 -137
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
tweak wakeup granularity.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
optimize schedule() a bit on SMP, by moving the rq-clock update
outside the rq lock.
code size is the same:
text data bss dec hex filename
25725 2666 96 28487 6f47 sched.o.before
25725 2666 96 28487 6f47 sched.o.after
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
|