aboutsummaryrefslogtreecommitdiff
path: root/kernel/time/posix-cpu-timers.c
AgeCommit message (Collapse)AuthorFilesLines
2016-03-02posix-cpu-timers: Migrate to use new tick dependency mask modelFrederic Weisbecker1-41/+11
Instead of providing asynchronous checks for the nohz subsystem to verify posix cpu timers tick dependency, migrate the latter to the new mask. In order to keep track of the running timers and expose the tick dependency accordingly, we must probe the timers queuing and dequeuing on threads and process lists. Unfortunately it implies both task and signal level dependencies. We should be able to further optimize this and merge all that on the task level dependency, at the cost of a bit of complexity and may be overhead. Reviewed-by: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2015-10-15posix_cpu_timer: Reduce unnecessary sighand lock contentionJason Low1-2/+24
It was found while running a database workload on large systems that significant time was spent trying to acquire the sighand lock. The issue was that whenever an itimer expired, many threads ended up simultaneously trying to send the signal. Most of the time, nothing happened after acquiring the sighand lock because another thread had just already sent the signal and updated the "next expire" time. The fastpath_timer_check() didn't help much since the "next expire" time was updated after the threads exit fastpath_timer_check(). This patch addresses this by having the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Convert cputimer->running to boolJason Low1-2/+2
In the next patch in this series, a new field 'checking_timer' will be added to 'struct thread_group_cputimer'. Both this and the existing 'running' integer field are just used as boolean values. To save space in the structure, we can make both of these fields booleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin <[email protected]> Signed-off-by: Jason Low <[email protected]> Reviewed: George Spelvin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Check thread timers only when there are active thread timersJason Low1-6/+16
The fastpath_timer_check() contains logic to check for if any timers are set by checking if !task_cputime_zero(). Similarly, we can do this before calling check_thread_timers(). In the case where there are only process-wide timers, this will skip all of the computations for per-thread timers when there are no per-thread timers. As suggested by George, we can put the task_cputime_zero() check in check_thread_timers(), since that is more of an optization to the function. Similarly, we move the existing check of cputimer->running to check_process_timers(). Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-10-15posix_cpu_timer: Optimize fastpath_timer_check()Jason Low1-8/+3
In fastpath_timer_check(), the task_cputime() function is always called to compute the utime and stime values. However, this is not necessary if there are no per-thread timers to check for. This patch modifies the code such that we compute the task_cputime values only when there are per-thread timers set. Signed-off-by: Jason Low <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-05-08sched, timer: Use the atomic task_cputime in thread_group_cputimerJason Low1-13/+13
Recent optimizations were made to thread_group_cputimer to improve its scalability by keeping track of cputime stats without a lock. However, the values were open coded to the structure, causing them to be at a different abstraction level from the regular task_cputime structure. Furthermore, any subsequent similar optimizations would not be able to share the new code, since they are specific to thread_group_cputimer. This patch adds the new task_cputime_atomic data structure (introduced in the previous patch in the series) to thread_group_cputimer for keeping track of the cputime atomically, which also helps generalize the code. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Preeti U Murthy <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-05-08sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to ↵Jason Low1-29/+50
improve scalability While running a database workload, we found a scalability issue with itimers. Much of the problem was caused by the thread_group_cputimer spinlock. Each time we account for group system/user time, we need to obtain a thread_group_cputimer's spinlock to update the timers. On larger systems (such as a 16 socket machine), this caused more than 30% of total time spent trying to obtain this kernel lock to update these group timer stats. This patch converts the timers to 64-bit atomic variables and use atomic add to update them without a lock. With this patch, the percent of total time spent updating thread group cputimer timers was reduced from 30% down to less than 1%. Note: On 32-bit systems using the generic 64-bit atomics, this causes sample_group_cputimer() to take locks 3 times instead of just 1 time. However, we tested this patch on a 32-bit system ARM system using the generic atomics and did not find the overhead to be much of an issue. An explanation for why this isn't an issue is that 32-bit systems usually have small numbers of CPUs, and cacheline contention from extra spinlocks called periodically is not really apparent on smaller systems. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Preeti U Murthy <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-05-08sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to ↵Jason Low1-4/+4
READ_ONCE()/WRITE_ONCE() ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use the new READ_ONCE() and WRITE_ONCE() APIs as appropriate. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Preeti U Murthy <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-12all arches, signal: move restart_block to struct task_structAndy Lutomirski1-2/+1
If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [[email protected]: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: David Miller <[email protected]> Acked-by: Richard Weinberger <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Steven Miao <[email protected]> Cc: Mark Salter <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: David Howells <[email protected]> Cc: Richard Kuo <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Tested-by: Michael Ellerman <[email protected]> (powerpc) Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: James Hogan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-11-16sched/cputime: Fix cpu_timer_sample_group() double accountingPeter Zijlstra1-1/+1
While looking over the cpu-timer code I found that we appear to add the delta for the calling task twice, through: cpu_timer_sample_group() thread_group_cputimer() thread_group_cputime() times->sum_exec_runtime += task_sched_runtime(); *sample = cputime.sum_exec_runtime + task_delta_exec(); Which would make the sample run ahead, making the sleep short. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Stanislaw Gruszka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-09-08time, signal: Protect resource use statistics with seqlockRik van Riel1-14/+0
Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability issues on large systems, due to both functions being serialized with a lock. The lock protects against reporting a wrong value, due to a thread in the task group exiting, its statistics reporting up to the signal struct, and that exited task's statistics being counted twice (or not at all). Protecting that with a lock results in times() and clock_gettime() being completely serialized on large systems. This can be fixed by using a seqlock around the events that gather and propagate statistics. As an additional benefit, the protection code can be moved into thread_group_cputime(), slightly simplifying the calling functions. In the case of posix_cpu_clock_get_task() things can be simplified a lot, because the calling function already ensures that the task sticks around, and the rest is now taken care of in thread_group_cputime(). This way the statistics reporting code can run lockless. Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Daeseok Youn <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dongsheng Yang <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guillaume Morin <[email protected]> Cc: Ionut Alexa <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Li Zefan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Schmidt <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-06-23time/timers: Move all time(r) related files into kernel/timeThomas Gleixner1-0/+1490
Except for Kconfig.HZ. That needs a separate treatment. Signed-off-by: Thomas Gleixner <[email protected]>