Age | Commit message (Collapse) | Author | Files | Lines |
|
The check for the AP range in cpuhp_is_ap_state() is redundant after commit
8df3e07e7f21 "cpu/hotplug: Let upcoming cpu bring itself fully up" because all
states above CPUHP_BRINGUP_CPU are invoked on the hotplugged cpu. Remove it.
Reported-by: Richard Cochran <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A feature was added in 4.3 that allowed users to filter trace points
on a tasks "comm" field. But this prevented filtering on a comm field
that is within a trace event (like sched_migrate_task).
When trying to filter on when a program migrated, this change
prevented the filtering of the sched_migrate_task.
To fix this, the event fields are examined first, and then the extra
fields like "comm" and "cpu" are examined. Also, instead of testing
to assign the comm filter function based on the field's name, the
generic comm field is given a new filter type (FILTER_COMM). When
this field is used to filter the type is checked. The same is done
for the cpu filter field.
Two new special filter types are added: "COMM" and "CPU". This allows
users to still filter the tasks comm for events that have "comm" as
one of their fields, in cases that users would like to filter
sched_migrate_task on the comm of the task that called the event, and
not the comm of the task that is being migrated"
* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not have 'comm' filter override event 'comm' field
|
|
Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.
echo 'comm == "bash"' > events/sched_migrate_task/filter
will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.
This fix requires a couple of changes.
1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.
2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.
3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.
Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
Cc: [email protected] # v4.3+
Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
https://git.linaro.org/people/john.stultz/linux into timers/core
Pull the cross-timestamp infrastructure from John Stultz.
Allows precise correlation of device timestamps with system time. Primary use
cases being PTP and audio.
|
|
Revert commits:
a6e707ddbdf1: KVM: arm/arm64: timer: Switch to CLOCK_MONOTONIC_RAW
9006a01829a5: hrtimer: Catch illegal clockids
9c808765e88e: hrtimer: Add support for CLOCK_MONOTONIC_RAW
Marc found out, that there are fundamental issues with that patch series
because __hrtimer_get_next_event() and hrtimer_forward() need support for
CLOCK_MONOTONIC_RAW. Nothing which is easily fixed, so revert the whole lot.
Reported-by: Marc Zyngier <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Paul noticed that the conversion of the death reporting introduced a race
where the outgoing cpu might be delayed after waking the controll processor,
so it might not be able to call rcu_report_dead() before being physically
removed, leading to RCU stalls.
We cant call complete after rcu_report_dead(), so instead of going back to
busy polling, simply issue a function call to do the completion.
Fixes: 27d50c7eeb0f "rcu: Make CPU_DYING_IDLE an explicit call"
Reported-by: Paul E. McKenney <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
|
|
Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.
In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).
From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:
System Clock <-> Audio clock
System Clock <-> Network Device Clock [<-> PTP Master Clock]
Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock). The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence. This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.
Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.
When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christopher S. Hall <[email protected]>
[jstultz: Fixed up cycles_t/cycle_t type confusion]
Signed-off-by: John Stultz <[email protected]>
|
|
synchronization
ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
<[email protected]>. It has changed considerably and any mistakes are
mine.
The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.
Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.
The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds. The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time. The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.
Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christopher S. Hall <[email protected]>
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz <[email protected]>
|
|
The code in ktime_get_snapshot() is a superset of the code in
ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
called only by the PPS code, pps_get_ts(). Consolidate the
pps_get_ts() code into a single function calling ktime_get_snapshot()
and eliminate ktime_get_raw_and_real(). A side effect of this is that
the raw and real results of pps_get_ts() correspond to exactly the
same clock cycle. Previously these values represented separate reads
of the system clock.
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christopher S. Hall <[email protected]>
Signed-off-by: John Stultz <[email protected]>
|
|
In the current timekeeping code there isn't any interface to
atomically capture the current relationship between the system counter
and system time. ktime_get_snapshot() returns this triple (counter,
monotonic raw, realtime) in the system_time_snapshot struct.
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christopher S. Hall <[email protected]>
[jstultz: Moved structure definitions around to clean things up,
fixed cycles_t/cycle_t confusion.]
Signed-off-by: John Stultz <[email protected]>
|
|
The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Christopher S. Hall <[email protected]>
Signed-off-by: John Stultz <[email protected]>
|
|
Instead of checking sched_clock_stable from the nohz subsystem to verify
its tick dependency, migrate it to the new mask in order to include it
to the all-in-one check.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Instead of providing asynchronous checks for the nohz subsystem to verify
posix cpu timers tick dependency, migrate the latter to the new mask.
In order to keep track of the running timers and expose the tick
dependency accordingly, we must probe the timers queuing and dequeuing
on threads and process lists.
Unfortunately it implies both task and signal level dependencies. We
should be able to further optimize this and merge all that on the task
level dependency, at the cost of a bit of complexity and may be overhead.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Instead of providing asynchronous checks for the nohz subsystem to verify
sched tick dependency, migrate sched to the new mask.
Everytime a task is enqueued or dequeued, we evaluate the state of the
tick dependency on top of the policy of the tasks in the runqueue, by
order of priority:
SCHED_DEADLINE: Need the tick in order to periodically check for runtime
SCHED_FIFO : Don't need the tick (no round-robin)
SCHED_RR : Need the tick if more than 1 task of the same priority
for round robin (simplified with checking if more than
one SCHED_RR task no matter what priority).
SCHED_NORMAL : Need the tick if more than 1 task for round-robin.
We could optimize that further with one flag per sched policy on the tick
dependency mask and perform only the checks relevant to the policy
concerned by an enqueue/dequeue operation.
Since the checks aren't based on the current task anymore, we could get
rid of the task switch hook but it's still needed for posix cpu
timers.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
In order to evaluate the scheduler tick dependency without probing
context switches, we need to know how much SCHED_RR and SCHED_FIFO tasks
are enqueued as those policies don't have the same preemption
requirements.
To prepare for that, let's account SCHED_RR tasks, we'll be able to
deduce SCHED_FIFO tasks as well from it and the total RT tasks in the
runqueue.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Instead of providing asynchronous checks for the nohz subsystem to verify
perf event tick dependency, migrate perf to the new mask.
Perf needs the tick for two situations:
1) Freq events. We could set the tick dependency when those are
installed on a CPU context. But setting a global dependency on top of
the global freq events accounting is much easier. If people want that
to be optimized, we can still refine that on the per-CPU tick dependency
level. This patch dooesn't change the current behaviour anyway.
2) Throttled events: this is a per-cpu dependency.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
It makes nohz tracing more lightweight, standard and easier to parse.
Examples:
user_loop-2904 [007] d..1 517.701126: tick_stop: success=1 dependency=NONE
user_loop-2904 [007] dn.1 518.021181: tick_stop: success=0 dependency=SCHED
posix_timers-6142 [007] d..1 1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
user_loop-5463 [007] dN.1 1185.931939: tick_stop: success=0 dependency=PERF_EVENTS
Suggested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The tick dependency is evaluated on every IRQ and context switch. This
consists is a batch of checks which determine whether it is safe to
stop the tick or not. These checks are often split in many details:
posix cpu timers, scheduler, sched clock, perf events.... each of which
are made of smaller details: posix cpu timer involves checking process
wide timers then thread wide timers. Perf involves checking freq events
then more per cpu details.
Checking these informations asynchronously every time we update the full
dynticks state bring avoidable overhead and a messy layout.
Let's introduce instead tick dependency masks: one for system wide
dependency (unstable sched clock, freq based perf events), one for CPU
wide dependency (sched, throttling perf events), and task/signal level
dependencies (posix cpu timers). The subsystems are responsible
for setting and clearing their dependency through a set of APIs that will
take care of concurrent dependency mask modifications and kick targets
to restart the relevant CPU tick whenever needed.
This new dependency engine stays beside the old one until all subsystems
having a tick dependency are converted to it.
Suggested-by: Thomas Gleixner <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
invoked at the proper place.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Kill the busy spinning on the control side and just wait for the hotplugged
cpu to tell that it reached the dead state.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Let the upcoming cpu kick the hotplug thread and let itself complete the
bringup. That way the controll side can just wait for the completion or later
when we made the hotplug machinery async not care at all.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Let the hotplugged cpu invoke the setup/teardown callbacks
(CPU_ONLINE/CPU_DOWN_PREPARE) itself.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In order to let the hotplugged cpu take care of the setup/teardown, we need a
seperate hotplug thread.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We need that for running callbacks on the AP and the BP.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Handle the smpboot threads in the state machine.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Move the scheduler cpu online notifier part to the hotplug core. This is
anyway the highest priority callback and we need that functionality right now
for the next changes.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Implement function which allow to setup/remove hotplug state callbacks.
The default behaviour for setup is to call the startup function for this state
for (or on) all cpus which have a hotplug state >= the installed state.
The default behaviour for removal is to call the teardown function for this
state for (or on) all cpus which have a hotplug state >= the installed state.
This includes rollback to the previous state in case of failure.
A special state is CPUHP_ONLINE_DYN. Its for dynamically registering a hotplug
callback pair. This is for drivers which have no dependencies to avoid that we
need to allocate CPUHP states for each of them
For both setup and remove helper functions are provided, which prevent the
core to issue the callbacks. This simplifies the conversion of existing
hotplug notifiers.
[ Dynamic registering implemented by Sebastian Siewior ]
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Make it possible to write a target state to the per cpu state file, so we can
switch between states.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Add a sysfs interface so we can actually see in which state the cpus are in.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We want to be able to bringup/teardown the cpu to a particular state. Add a
target argument to _cpu_up/down.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Move the functions which need to run on the hotplugged processor into
a state machine array and let the code iterate through these functions.
In a later state, this will grow synchronization points between the
control processor and the hotplugged processor, so we can move the
various architecture implementations of the synchronizations to the
core.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Move the split out steps into a callback array and let the cpu_up/down
code iterate through the array functions. For now most of the
callbacks are asymmetric to resemble the current hotplug maze.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Split cpu_down in separate functions in preparation for state machine
conversion.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Split out into separate functions, so we can convert it to a state machine.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
There are only a few callbacks which really care about FROZEN
vs. !FROZEN. No need to have extra states for this.
Publish the frozen state in an extra variable which is updated under
the hotplug lock and let the users interested deal with it w/o
imposing that extra state checks on everyone.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: "Srivatsa S. Bhat" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sebastian Siewior <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Paul McKenney <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Some tracepoint have multiple fields with the same name, "nr", the first
one is a unique syscall ID, the other is a syscall argument:
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
name: sys_enter_io_getevents
ID: 747
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int nr; offset:8; size:4; signed:1;
field:aio_context_t ctx_id; offset:16; size:8; signed:0;
field:long min_nr; offset:24; size:8; signed:0;
field:long nr; offset:32; size:8; signed:0;
field:struct io_event * events; offset:40; size:8; signed:0;
field:struct timespec * timeout; offset:48; size:8; signed:0;
print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
#
Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".
Signed-off-by: Taeung Song <[email protected]>
[ Do not rename the struct member, just the '/format' field name ]
Signed-off-by: Steven Rostedt <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add detection for chain_key collision under CONFIG_DEBUG_LOCKDEP.
When a collision is detected the problem is reported and all lock
debugging is turned off.
Tested using liblockdep and the added tests before and after
applying the fix, confirming both that the code added for the
detection correctly reports the problem and that the fix actually
fixes it.
Tested tweaking lockdep to generate false collisions and
verified that the problem is reported and that lock debugging is
turned off.
Also tested with lockdep's test suite after applying the patch:
[ 0.000000] Good, all 253 testcases passed! |
Signed-off-by: Alfredo Alvarez Fernandez <[email protected]>
Cc: Alfredo Alvarez Fernandez <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/1455864533-7536-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The chain_key hashing macro iterate_chain_key(key1, key2) does not
generate a new different value if both key1 and key2 are 0. In that
case the generated value is again 0. This can lead to collisions which
can result in lockdep not detecting deadlocks or circular
dependencies.
Avoid the problem by using class_idx (1-based) instead of class id
(0-based) as an input for the hashing macro 'key2' in
iterate_chain_key(key1, key2).
The use of class id created collisions in cases like the following:
1.- Consider an initial state in which no class has been acquired yet.
Under these circumstances an AA deadlock will not be detected by
lockdep:
lock [key1,key2]->new key (key1=old chain_key, key2=id)
--------------------------
A [0,0]->0
A [0,0]->0 (collision)
The newly generated chain_key collides with the one used before and as
a result the check for a deadlock is skipped
A simple test using liblockdep and a pthread mutex confirms the
problem: (omitting stack traces)
new class 0xe15038: 0x7ffc64950f20
acquire class [0xe15038] 0x7ffc64950f20
acquire class [0xe15038] 0x7ffc64950f20
hash chain already cached, key: 0000000000000000 tail class:
[0xe15038] 0x7ffc64950f20
2.- Consider an ABBA in 2 different tasks and no class yet acquired.
T1 [key1,key2]->new key T2[key1,key2]->new key
-- --
A [0,0]->0
B [0,1]->1
B [0,1]->1 (collision)
A
In this case the collision prevents lockdep from creating the new
dependency A->B. This in turn results in lockdep not detecting the
circular dependency when T2 acquires A.
Signed-off-by: Alfredo Alvarez Fernandez <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/1455147212-2389-4-git-send-email-alfredoalvarezernandez@gmail.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.
Reviewed-by: Waiman Long <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ding Tianhong <[email protected]>
Cc: Jason Low <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch enables the tracking of the number of slowpath locking
operations performed. This can be used to compare against the number
of lock stealing operations to see what percentage of locks are stolen
versus acquired via the regular slowpath.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Douglas Hatch <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The newly introduced smp_cond_acquire() was used to replace the
slowpath lock acquisition loop. Similarly, the new function can also
be applied to the pending bit locking loop. This patch uses the new
function in that loop.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Douglas Hatch <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
pv_queued_spin_steal_lock()
This patch moves the lock stealing count tracking code into
pv_queued_spin_steal_lock() instead of via a jacket function simplifying
the code.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Douglas Hatch <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Scott J Norton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Similar to commit b4b29f94856a ("locking/osq: Fix ordering of node
initialisation in osq_lock") the use of xchg_acquire() is
fundamentally broken with MCS like constructs.
Furthermore, it turns out we rely on the global transitivity of this
operation because the unlock path observes the pointer with a
READ_ONCE(), not an smp_load_acquire().
This is non-critical because the MCS code isn't actually used and
mostly serves as documentation, a stepping stone to the more complex
things we've build on top of the idea.
Reported-by: Andrea Parri <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Fixes: 3552a07a9c4a ("locking/mcs: Use acquire/release semantics")
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
if (A || B) {
} else if (A && !B) {
}
If A we'll take the first branch, if !A we will not satisfy the second.
Therefore the second branch will never be taken.
Reported-by: luca abeni <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The preempt_disable() invokes preempt_count_add() which saves the caller
in ->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for
its caller but for the parent of the caller. Which means we get the correct
caller for something like spin_lock() unless the architectures inlines
those invocations. It is always wrong for preempt_disable() or
local_bh_disable().
This patch makes the function get_lock_parent_ip() which tries
CALLER_ADDR0,1,2 if the former is a locking function.
This seems to record the preempt_disable() caller properly for
preempt_disable() itself as well as for get_cpu_var() or
local_bh_disable().
Steven asked for the get_parent_ip() -> get_lock_parent_ip() rename.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When profiling syscall overhead on nohz-full kernels,
after removing __acct_update_integrals() from the profile,
native_sched_clock() remains as the top CPU user. This can be
reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and KVM guest entry & exit.
Additionally, only call the more expensive functions (and
advance the seqlock) when jiffies actually changed.
This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.
Run times for the microbenchmark:
4.4 3.8 seconds
4.5-rc1 3.7 seconds
4.5-rc1 + first patch 3.3 seconds
4.5-rc1 + first 3 patches 3.1 seconds
4.5-rc1 + all patches 2.3 seconds
A non-NOHZ_FULL cpu (not the housekeeping CPU):
all kernels 1.86 seconds
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
It looks like all the call paths that lead to __acct_update_integrals()
already have irqs disabled, and __acct_update_integrals() does not need
to disable irqs itself.
This is very convenient since about half the CPU time left in this
function was spent in local_irq_save alone.
Performance of a microbenchmark that calls an invalid syscall
ten million times in a row on a nohz_full CPU improves 21% vs.
4.5-rc1 with both the removal of divisions from __acct_update_integrals()
and this patch, with runtime dropping from 3.7 to 2.9 seconds.
With these patches applied, the highest remaining cpu user in
the trace is native_sched_clock, which is addressed in the next
patch.
For testing purposes I stuck a WARN_ON(!irqs_disabled()) test
in __acct_update_integrals(). It did not trigger.
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Change the indentation in __acct_update_integrals() to make the function
a little easier to read.
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|