Age | Commit message (Collapse) | Author | Files | Lines |
|
Instead of providing asynchronous checks for the nohz subsystem to verify
posix cpu timers tick dependency, migrate the latter to the new mask.
In order to keep track of the running timers and expose the tick
dependency accordingly, we must probe the timers queuing and dequeuing
on threads and process lists.
Unfortunately it implies both task and signal level dependencies. We
should be able to further optimize this and merge all that on the task
level dependency, at the cost of a bit of complexity and may be overhead.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.
The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had just already sent the signal and updated the "next expire" time.
The fastpath_timer_check() didn't help much since the "next expire"
time was updated after the threads exit fastpath_timer_check().
This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In the next patch in this series, a new field 'checking_timer' will
be added to 'struct thread_group_cputimer'. Both this and the
existing 'running' integer field are just used as boolean values. To
save space in the structure, we can make both of these fields booleans.
This is a preparatory patch to convert the existing running integer
field to a boolean.
Suggested-by: George Spelvin <[email protected]>
Signed-off-by: Jason Low <[email protected]>
Reviewed: George Spelvin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The fastpath_timer_check() contains logic to check for if any timers
are set by checking if !task_cputime_zero(). Similarly, we can do this
before calling check_thread_timers(). In the case where there
are only process-wide timers, this will skip all of the computations for
per-thread timers when there are no per-thread timers.
As suggested by George, we can put the task_cputime_zero() check in
check_thread_timers(), since that is more of an optization to the
function. Similarly, we move the existing check of cputimer->running
to check_process_timers().
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In fastpath_timer_check(), the task_cputime() function is always
called to compute the utime and stime values. However, this is not
necessary if there are no per-thread timers to check for. This patch
modifies the code such that we compute the task_cputime values only
when there are per-thread timers set.
Signed-off-by: Jason Low <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Reviewed-by: George Spelvin <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Recent optimizations were made to thread_group_cputimer to improve its
scalability by keeping track of cputime stats without a lock. However,
the values were open coded to the structure, causing them to be at
a different abstraction level from the regular task_cputime structure.
Furthermore, any subsequent similar optimizations would not be able to
share the new code, since they are specific to thread_group_cputimer.
This patch adds the new task_cputime_atomic data structure (introduced in
the previous patch in the series) to thread_group_cputimer for keeping
track of the cputime atomically, which also helps generalize the code.
Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Waiman Long <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
improve scalability
While running a database workload, we found a scalability issue with itimers.
Much of the problem was caused by the thread_group_cputimer spinlock.
Each time we account for group system/user time, we need to obtain a
thread_group_cputimer's spinlock to update the timers. On larger systems
(such as a 16 socket machine), this caused more than 30% of total time
spent trying to obtain this kernel lock to update these group timer stats.
This patch converts the timers to 64-bit atomic variables and use
atomic add to update them without a lock. With this patch, the percent
of total time spent updating thread group cputimer timers was reduced
from 30% down to less than 1%.
Note: On 32-bit systems using the generic 64-bit atomics, this causes
sample_group_cputimer() to take locks 3 times instead of just 1 time.
However, we tested this patch on a 32-bit system ARM system using the
generic atomics and did not find the overhead to be much of an issue.
An explanation for why this isn't an issue is that 32-bit systems usually
have small numbers of CPUs, and cacheline contention from extra spinlocks
called periodically is not really apparent on smaller systems.
Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Waiman Long <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
READ_ONCE()/WRITE_ONCE()
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.
Signed-off-by: Jason Low <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Waiman Long <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[[email protected]: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Al Viro <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: David Miller <[email protected]>
Acked-by: Richard Weinberger <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Hans-Christian Egtvedt <[email protected]>
Cc: Steven Miao <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Aurelien Jacquiot <[email protected]>
Cc: Mikael Starvik <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: David Howells <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Tested-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Chen Liqin <[email protected]>
Cc: Lennox Wu <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: James Hogan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:
cpu_timer_sample_group()
thread_group_cputimer()
thread_group_cputime()
times->sum_exec_runtime += task_sched_runtime();
*sample = cputime.sum_exec_runtime + task_delta_exec();
Which would make the sample run ahead, making the sleep short.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
issues on large systems, due to both functions being serialized with a
lock.
The lock protects against reporting a wrong value, due to a thread in the
task group exiting, its statistics reporting up to the signal struct, and
that exited task's statistics being counted twice (or not at all).
Protecting that with a lock results in times() and clock_gettime() being
completely serialized on large systems.
This can be fixed by using a seqlock around the events that gather and
propagate statistics. As an additional benefit, the protection code can
be moved into thread_group_cputime(), slightly simplifying the calling
functions.
In the case of posix_cpu_clock_get_task() things can be simplified a
lot, because the calling function already ensures that the task sticks
around, and the rest is now taken care of in thread_group_cputime().
This way the statistics reporting code can run lockless.
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Daeseok Youn <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dongsheng Yang <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guillaume Morin <[email protected]>
Cc: Ionut Alexa <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Schmidt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Except for Kconfig.HZ. That needs a separate treatment.
Signed-off-by: Thomas Gleixner <[email protected]>
|