aboutsummaryrefslogtreecommitdiff
path: root/kernel/delayacct.c
AgeCommit message (Collapse)AuthorFilesLines
2018-10-26delayacct: track delays from thrashing cache pagesJohannes Weiner1-0/+15
Delay accounting already measures the time a task spends in direct reclaim and waiting for swapin, but in low memory situations tasks spend can spend a significant amount of their time waiting on thrashing page cache. This isn't tracked right now. To know the full impact of memory contention on an individual task, measure the delay when waiting for a recently evicted active cache page to read back into memory. Also update tools/accounting/getdelays.c: [hannes@computer accounting]$ sudo ./getdelays -d -p 1 print delayacct stats ON PID 1 CPU count real total virtual total delay total delay average 50318 745000000 847346785 400533713 0.008ms IO count delay total delay average 435 122601218 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 0 0 0ms THRASHING count delay total delay average 19 12621439 0ms Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Daniel Drake <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-27delayacct: Use raw_spinlocksSebastian Andrzej Siewior1-8/+9
try_to_wake_up() might invoke delayacct_blkio_end() while holding the pi_lock (which is a raw_spinlock_t). delayacct_blkio_end() acquires task_delay_info.lock which is a spinlock_t. This causes a might sleep splat on -RT where non raw spinlocks are converted to 'sleeping' spinlocks. task_delay_info.lock is only held for a short amount of time so it's not a problem latency wise to make convert it to a raw spinlock. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Balbir Singh <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-16delayacct: Account blkio completion on the correct taskJosh Snyder1-16/+26
Before commit: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") delayacct_blkio_end() was called after context-switching into the task which completed I/O. This resulted in double counting: the task would account a delay both waiting for I/O and for time spent in the runqueue. With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up(). In ttwu, we have not yet context-switched. This is more correct, in that the delay accounting ends when the I/O is complete. But delayacct_blkio_end() relies on 'get_current()', and we have not yet context-switched into the task whose I/O completed. This results in the wrong task having its delay accounting statistics updated. Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(), so that it can update the statistics of the correct task. Signed-off-by: Josh Snyder <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to move cputime functionality from <linux/sched.h> ↵Ingo Molnar1-0/+1
into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to move 'init_task' and 'init_thread_union' from ↵Ingo Molnar1-0/+1
<linux/sched.h> to <linux/sched/task.h> Update all usage sites first. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-01delaycct: Convert obsolete cputime type to nsecsFrederic Weisbecker1-5/+5
Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Stanislaw Gruszka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-01sched/cputime: Introduce special task_cputime_t() API to return old-typed ↵Frederic Weisbecker1-2/+2
cputime This API returns a task's cputime in cputime_t in order to ease the conversion of cputime internals to use nsecs units instead. Blindly converting all cputime readers to use this API now will later let us convert more smoothly and step by step all these places to use the new nsec based cputime. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Stanislaw Gruszka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-01-14kmemcg: account certain kmem allocations to memcgVladimir Davydov1-1/+1
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [[email protected]: coding-style fixes] Signed-off-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-07-23delayacct: Remove braindamaged type conversionsThomas Gleixner1-11/+7
Converting cputime to timespec and timespec to nanoseconds makes no sense. Use cputime_to_ns() and be done with it. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: John Stultz <[email protected]>
2014-07-23delayacct: Make accounting nanosecond basedThomas Gleixner1-22/+12
Kill the timespec juggling and calculate with plain nanoseconds. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: John Stultz <[email protected]>
2014-06-12delayacct: Use ktime_get_ts()Thomas Gleixner1-13/+3
do_posix_clock_monotonic_gettime() is a leftover from the initial posix timer implementation which maps to ktime_get_ts(). Remove the silly wrapper while at it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2013-11-13kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk()Chen Gang1-7/+0
The wrapper function delayacct_add_tsk() already checked 'tsk->delays', and __delayacct_add_tsk() has no another direct callers, so can remove the redundancy checking code. And the label 'done' is also useless, so remove it, too. Signed-off-by: Chen Gang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-01-27cputime: Use accessors to read task cputime statsFrederic Weisbecker1-2/+5
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Li Zhong <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]>
2011-07-14KVM: Steal time implementationGlauber Costa1-0/+2
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: Glauber Costa <[email protected]> Tested-by: Eric B Munson <[email protected]> CC: Rik van Riel <[email protected]> CC: Jeremy Fitzhardinge <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Anthony Liguori <[email protected]> Signed-off-by: Yongjie Ren <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2009-09-18headers: taskstats_kern.h trimAlexey Dobriyan1-0/+1
Remove net/genetlink.h inclusion, now sched.c won't be recompiled because of some networking changes. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-12-18schedstat: consolidate per-task cpu runtime statsKen Chen1-1/+1
Impact: simplify code When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time. These two stats are exactly the same. Given that task->se.sum_exec_runtime is always accumulated by the core scheduler, sched_info can reuse that data instead of duplicate the accounting. Signed-off-by: Ken Chen <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-07-25per-task-delay-accounting: update taskstats for memory reclaim delayKeika Kobayashi1-0/+3
Add members for memory reclaim delay to taskstats, and accumulate them in __delayacct_add_tsk() . Signed-off-by: Keika Kobayashi <[email protected]> Cc: Hiroshi Shimamoto <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-07-25per-task-delay-accounting: add memory reclaim delayKeika Kobayashi1-0/+13
Sometimes, application responses become bad under heavy memory load. Applications take a bit time to reclaim memory. The statistics, how long memory reclaim takes, will be useful to measure memory usage. This patch adds accounting memory reclaim to per-task-delay-accounting for accounting the time of do_try_to_free_pages(). <i.e> - When System is under low memory load, memory reclaim may not occur. $ free total used free shared buffers cached Mem: 8197800 1577300 6620500 0 4808 1516724 -/+ buffers/cache: 55768 8142032 Swap: 16386292 0 16386292 $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0 0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0 0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0 Measure the time of tar command. $ ls -s test.dat 1501472 test.dat $ time tar cvf test.tar test.dat real 0m13.388s user 0m0.116s sys 0m5.304s $ ./delayget -d -p <pid> CPU count real total virtual total delay total 428 5528345500 5477116080 62749891 IO count delay total 338 8078977189 SWAP count delay total 0 0 RECLAIM count delay total 0 0 - When system is under heavy memory load memory reclaim may occur. $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0 0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0 0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0 In this case, one process uses more 8G memory by execution of malloc() and memset(). $ time tar cvf test.tar test.dat real 1m38.563s <- increased by 85 sec user 0m0.140s sys 0m7.060s $ ./delayget -d -p <pid> CPU count real total virtual total delay total 9021 7140446250 7315277975 923201824 IO count delay total 8965 90466349669 SWAP count delay total 3 21036367 RECLAIM count delay total 740 61011951153 In the later case, the value of RECLAIM is increasing. So, taskstats can show how much memory reclaim influences TAT. Signed-off-by: Keika Kobayashi <[email protected]> Acked-by: Balbir Singh <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-18Add scaled time to taskstats based process accountingMichael Neuling1-0/+6
This adds items to the taststats struct to account for user and system time based on scaling the CPU frequency and instruction issue rates. Adds account_(user|system)_time_scaled callbacks which architectures can use to account for time using this mechanism. Signed-off-by: Michael Neuling <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Jay Lan <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-10-15sched: clean up schedstats, cnt -> countIngo Molnar1-1/+1
rename all 'cnt' fields and variables to the less yucky 'count' name. yuckage noticed by Andrew Morton. no change in code, other than the /proc/sched_debug bkl_count string got a bit larger: text data bss dec hex filename 38236 3506 24 41766 a326 sched.o.before 38240 3506 24 41770 a32a sched.o.after Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2007-07-09sched: update delay-accounting to use CFS's precise statsBalbir Singh1-5/+5
update delay-accounting to use CFS's precise stats. Signed-off-by: Ingo Molnar <[email protected]>
2007-05-07KMEM_CACHE(): simplify slab cache creationChristoph Lameter1-5/+1
This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter1-1/+1
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-12-07[PATCH] slab: remove SLAB_KERNELChristoph Lameter1-1/+1
SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-11-06[PATCH] lockdep: fix delayacct locking bugPeter Zijlstra1-6/+9
Make the delayacct lock irqsave; this avoids the possible deadlock where an interrupt is taken while holding the delayacct lock which needs to take the delayacct lock. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Shailabh Nagar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-09-01[PATCH] task delay accounting fixesShailabh Nagar1-16/+0
Cleanup allocation and freeing of tsk->delays used by delay accounting. This solves two problems reported for delay accounting: 1. oops in __delayacct_blkio_ticks http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html Currently tsk->delays is getting freed too early in task exit which can cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats. The patch fixes this problem by freeing tsk->delays closer to when task_struct itself is freed up. As a result, it also eliminates the use of tsk->delays_lock which was only being used (inadequately) to safeguard access to tsk->delays while a task was exiting. 2. Possible memory leak in kernel/delayacct.c http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html The patch cleans up tsk->delays allocations after a bad fork which was missing earlier. The patch has been tested to fix the problems listed above and stress tested with rapid calls to delay accounting's taskstats command interface (which is the other path that can access the same data, besides the /proc interface causing the oops above). Signed-off-by: Shailabh Nagar <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-07-31[PATCH] delay accounting: temporarily enable by defaultShailabh Nagar1-4/+4
Enable delay accounting by default so that feature gets coverage testing without requiring special measures. Earlier, it was off by default and had to be enabled via a boot time param. This patch reverses the default behaviour to improve coverage testing. It can be removed late in the kernel development cycle if its believed users shouldn't have to incur any cost if they don't want delay accounting. Or it can be retained forever if the utility of the stats is deemed common enough to warrant keeping the feature on. Signed-off-by: Shailabh Nagar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-07-14[PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delaysShailabh Nagar1-0/+12
Export I/O delays seen by a task through /proc/<tgid>/stats for use in top etc. Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed together with all other I/O here (this is not the case in the netlink interface where the swapin I/O is kept distinct) [[email protected]: printk warning fix] Signed-off-by: Shailabh Nagar <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Cc: Jes Sorensen <[email protected]> Cc: Peter Chubb <[email protected]> Cc: Erich Focht <[email protected]> Cc: Levent Serinol <[email protected]> Cc: Jay Lan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-07-14[PATCH] per-task-delay-accounting: delay accounting usage of taskstats interfaceShailabh Nagar1-1/+61
Usage of taskstats interface by delay accounting. Signed-off-by: Shailabh Nagar <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Cc: Jes Sorensen <[email protected]> Cc: Peter Chubb <[email protected]> Cc: Erich Focht <[email protected]> Cc: Levent Serinol <[email protected]> Cc: Jay Lan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-07-14[PATCH] per-task-delay-accounting: sync block I/O and swapin delay collectionShailabh Nagar1-0/+19
Unlike earlier iterations of the delay accounting patches, now delays are only collected for the actual I/O waits rather than try and cover the delays seen in I/O submission paths. Account separately for block I/O delays incurred as a result of swapin page faults whose frequency can be affected by the task/process' rss limit. Hence swapin delays can act as feedback for rss limit changes independent of I/O priority changes. Signed-off-by: Shailabh Nagar <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Cc: Jes Sorensen <[email protected]> Cc: Peter Chubb <[email protected]> Cc: Erich Focht <[email protected]> Cc: Levent Serinol <[email protected]> Cc: Jay Lan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-07-14[PATCH] per-task-delay-accounting: setupShailabh Nagar1-0/+87
Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc. The collection of statistics and the interface are in other patches. This patch sets up the data structures and allows the statistics collection to be disabled through a kernel boot parameter. Signed-off-by: Shailabh Nagar <[email protected]> Signed-off-by: Balbir Singh <[email protected]> Cc: Jes Sorensen <[email protected]> Cc: Peter Chubb <[email protected]> Cc: Erich Focht <[email protected]> Cc: Levent Serinol <[email protected]> Cc: Jay Lan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>