aboutsummaryrefslogtreecommitdiff
path: root/kernel/time
AgeCommit message (Collapse)AuthorFilesLines
2014-01-15nohz: Get timekeeping max deferment outside jiffies_lockFrederic Weisbecker1-1/+2
We don't need to fetch the timekeeping max deferment under the jiffies_lock seqlock. If the clocksource is updated concurrently while we stop the tick, stop machine is called and the tick will be reevaluated again along with uptodate jiffies and its related values. Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alex Shi <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: John Stultz <[email protected]> Cc: Kevin Hilman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Frederic Weisbecker <[email protected]>
2014-01-15tick: Rename tick_check_idle() to tick_irq_enter()Frederic Weisbecker1-4/+4
This makes the code more symetric against the existing tick functions called on irq exit: tick_irq_exit() and tick_nohz_irq_exit(). These function are also symetric as they mirror each other's action: we start to account idle time on irq exit and we stop this accounting on irq entry. Also the tick is stopped on irq exit and timekeeping catches up with the tickless time elapsed until we reach irq entry. This rename was suggested by Peter Zijlstra a long while ago but it got forgotten in the mass. Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alex Shi <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: John Stultz <[email protected]> Cc: Kevin Hilman <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Frederic Weisbecker <[email protected]>
2014-01-13sched/clock, x86: Use a static_key for sched_clock_stablePeter Zijlstra1-1/+1
In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-01-12Merge branch 'fortglx/3.14/time' of ↵Ingo Molnar4-27/+29
git://git.linaro.org/people/john.stultz/linux into timers/core Pull timekeeping updates from John Stultz. Signed-off-by: Ingo Molnar <[email protected]>
2014-01-12Merge branch 'linus' into timers/coreIngo Molnar3-14/+28
Pick up the latest fixes and refresh the branch. Signed-off-by: Ingo Molnar <[email protected]>
2014-01-12sched_clock: Disable seqlock lockdep usage in sched_clock()John Stultz1-3/+3
Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Reported-by: Krzysztof Hałasa <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Uwe Kleine-König <[email protected]> Cc: Willy Tarreau <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-12-23timekeeping: Remove comment that's mostly out of dateJohn Stultz1-10/+0
Prior to 92bb1fcf57a0c2e45f7e67fbf0a8ed475a749236 (Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems), the comment here was accuate, but now we can mostly avoid the extra rounding which causes the unlikey to be actually likely here. So remove the out of date comment. Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeper: fix comment typo for tk_setup_internals()Yijing Wang1-1/+2
Fix trivial comment typo for tk_setup_internals(). Signed-off-by: Yijing Wang <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeping: Fix missing timekeeping_update in suspend pathJohn Stultz1-0/+2
Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. In the timekeeping suspend path, we udpate the timekeeper structure, so we should be sure to update the shadow-timekeeper before releasing the timekeeping locks. Currently this isn't done. In most cases, the next time related code to run would be timekeeping_resume, which does update the shadow-timekeeper, but in an abundence of caution, this patch adds the call to timekeeping_update() in the suspend path. Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: stable <[email protected]> #3.10+ Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeping: Fix CLOCK_TAI timer/nanosleep delaysJohn Stultz1-2/+2
A think-o in the calculation of the monotonic -> tai time offset results in CLOCK_TAI timers and nanosleeps to expire late (the latency is ~2x the tai offset). Fix this by adding the tai offset from the realtime offset instead of subtracting. Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: stable <[email protected]> #3.10+ Signed-off-by: John Stultz <[email protected]>
2013-12-23tick/timekeeping: Call update_wall_time outside the jiffies lockJohn Stultz4-15/+7
Since the xtime lock was split into the timekeeping lock and the jiffies lock, we no longer need to call update_wall_time() while holding the jiffies lock. Thus, this patch splits update_wall_time() out from do_timer(). This allows us to get away from calling clock_was_set_delayed() in update_wall_time() and instead use the standard clock_was_set() call that previously would deadlock, as it causes the jiffies lock to be acquired. Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeping: Avoid possible deadlock from clock_was_set_delayedJohn Stultz1-2/+16
As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0 [ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580 [ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0 [ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80 [ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0 [ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120 [ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30 [ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160 [ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sasha Levin <[email protected]> Cc: stable <[email protected]> #3.10+ Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeping: Fix potential lost pv notification of time changeJohn Stultz1-9/+11
In 780427f0e11 (Indicate that clock was set in the pvclock gtod notifier), logic was added to pass a CLOCK_WAS_SET notification to the pvclock notifier chain. While that patch added a action flag returned from accumulate_nsecs_to_secs(), it only uses the returned value in one location, and not in the logarithmic accumulation. This means if a leap second triggered during the logarithmic accumulation (which is most likely where it would happen), the notification that the clock was set would not make it to the pv notifiers. This patch extends the logarithmic_accumulation pass down that action flag so proper notification will occur. This patch also changes the varialbe action -> clock_set per Ingo's suggestion. Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Vrabel <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: <[email protected]> Cc: stable <[email protected]> #3.11+ Signed-off-by: John Stultz <[email protected]>
2013-12-23timekeeping: Fix lost updates to tai adjustmentJohn Stultz1-1/+2
Since 48cdc135d4840 (Implement a shadow timekeeper), we have to call timekeeping_update() after any adjustment to the timekeeping structure in order to make sure that any adjustments to the structure persist. Unfortunately, the updates to the tai offset via adjtimex do not trigger this update, causing adjustments to the tai offset to be made and then over-written by the previous value at the next update_wall_time() call. This patch resovles the issue by calling timekeeping_update() right after setting the tai offset. Cc: Sasha Levin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: stable <[email protected]> #3.10+ Signed-off-by: John Stultz <[email protected]>
2013-12-02nohz: Convert a few places to use local per cpu accessesFrederic Weisbecker3-28/+21
A few functions use remote per CPU access APIs when they deal with local values. Just do the right conversion to improve performance, code readability and debug checks. While at it, lets extend some of these function names with *_this_cpu() suffix in order to display their purpose more clearly. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-11-29nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=offThomas Gleixner1-1/+3
If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ. If CONFIG_NO_HZ=y and the nohz functionality is disabled via the command line option "nohz=off" or not enabled due to missing hardware support, then tick_nohz_get_sleep_length() returns 0. That happens because ts->sleep_length is never set in that case. Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive. Reported-by: Michal Hocko <[email protected]> Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2013-11-22time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLDMartin Schwidefsky1-1/+1
Since commit 1e75fa8be9f (time: Condense timekeeper.xtime into xtime_sec - merged in v3.6), there has been an problem with the error accounting in the timekeeping code, such that when truncating to nanoseconds, we round up to the next nsec, but the balancing adjustment to the ntp_error value was dropped. This causes 1ns per tick drift forward of the clock. In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD architectures (s390, ia64, powerpc). The fix is simply to balance the accounting and to subtract the added nanosecond from ntp_error. This allows the internal long-term clock steering to keep the clock accurate. While this fix removes the regression added in 1e75fa8be9f, the ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD and use the new VSYSCALL method, which avoids entirely the nanosecond granular rounding, and the resulting short-term clock adjustment oscillation needed to keep long term accurate time. [ jstultz: Many thanks to Martin for his efforts identifying this subtle bug, and providing the fix. ] Originally-from: Martin Schwidefsky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Paul Turner <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: stable <[email protected]> #v3.6+ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Stultz <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2013-11-19tick: Document tick_do_timer_cpuAndrew Morton1-0/+15
Taken straight from a tglx email ;) Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2013-11-19NOHZ: Check for nohz active instead of nohz enabledThomas Gleixner1-12/+9
RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-and-tested-by: Matthew Whitehead <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected]
2013-11-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds10-84/+107
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: "Main changes in this cycle were: - Updated full dynticks support. - Event stream support for architected (ARM) timers. - ARM clocksource driver updates. - Move arm64 to using the generic sched_clock framework & resulting cleanup in the generic sched_clock code. - Misc fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC clocksource: sun4i: remove IRQF_DISABLED clocksource: sun4i: Report the minimum tick that we can program clocksource: sun4i: Select CLKSRC_MMIO clocksource: Provide timekeeping for efm32 SoCs clocksource: em_sti: convert to clk_prepare/unprepare time: Fix signedness bug in sysfs_get_uname() and its callers timekeeping: Fix some trivial typos in comments alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist clocksource: arch_timer: Do not register arch_sys_counter twice timer stats: Add a 'Collection: active/inactive' line to timer usage statistics sched_clock: Remove sched_clock_func() hook arch_timer: Move to generic sched_clock framework clocksource: tcb_clksrc: Remove IRQF_DISABLED clocksource: tcb_clksrc: Improve driver robustness clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare clocksource: arm_arch_timer: Use clocksource for suspend timekeeping clocksource: dw_apb_timer_of: Mark a few more functions as __init clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally arm: zynq: Enable arm_global_timer ...
2013-10-23clockevents: Sanitize ticks to nsec conversionThomas Gleixner1-15/+50
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use clockevents_config_and_register() where possible" caused a regression for some of the converted subarchs. The reason is, that the clockevents core code converts the minimal hardware tick delta to a nanosecond value for core internal usage. This conversion is affected by integer math rounding loss, so the backwards conversion to hardware ticks will likely result in a value which is less than the configured hardware limitation. The affected subarchs used their own workaround (SIGH!) which got lost in the conversion. The solution for the issue at hand is simple: adding evt->mult - 1 to the shifted value before the integer divison in the core conversion function takes care of it. But this only works for the case where for the scaled math mult/shift pair "mult <= 1 << shift" is true. For the case where "mult > 1 << shift" we can apply the rounding add only for the minimum delta value to make sure that the backward conversion is not less than the given hardware limit. For the upper bound we need to omit the rounding add, because the backwards conversion is always larger than the original latch value. That would violate the upper bound of the hardware device. Though looking closer at the details of that function reveals another bogosity: The upper bounds check is broken as well. Checking for a resulting "clc" value greater than KTIME_MAX after the conversion is pointless. The conversion does: u64 clc = (latch << evt->shift) / evt->mult; So there is no sanity check for (latch << evt->shift) exceeding the 64bit boundary. The latch argument is "unsigned long", so on a 64bit arch the handed in argument could easily lead to an unnoticed shift overflow. With the above rounding fix applied the calculation before the divison is: u64 clc = (latch << evt->shift) + evt->mult - 1; So we need to make sure, that neither the shift nor the rounding add is overflowing the u64 boundary. [ukl: move assignment to rnd after eventually changing mult, fix build issue and correct comment with the right math] Signed-off-by: Thomas Gleixner <[email protected]> Cc: Russell King - ARM Linux <[email protected]> Cc: Marc Kleine-Budde <[email protected]> Cc: [email protected] Cc: Marc Pignat <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Ronald Wahl <[email protected]> Cc: LAK <[email protected]> Cc: Ludovic Desroches <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]>
2013-10-18time: Fix signedness bug in sysfs_get_uname() and its callersPatrick Palka3-3/+3
sysfs_get_uname() is erroneously declared as returning size_t even though it may return a negative value, specifically -EINVAL. Its callers then check whether its return value is less than zero and indeed that is never the case for size_t. This patch changes sysfs_get_uname() to return ssize_t and makes sure its callers use ssize_t accordingly. Signed-off-by: Patrick Palka <[email protected]> [jstultz: Didn't apply cleanly, as a similar partial fix was also applied so had to resolve the collisions] Signed-off-by: John Stultz <[email protected]>
2013-10-18timekeeping: Fix some trivial typos in commentsXie XiuQi1-1/+2
Fix some typos in timekeeping comments. Signed-off-by: Xie XiuQi <[email protected]> [jstultz: Commit message tweaks] Signed-off-by: John Stultz <[email protected]>
2013-10-18alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't existKOSAKI Motohiro1-2/+2
Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide on ARM. (http://bugs.ruby-lang.org/issues/9008) Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no RTC device is present) intruduced to return ENOTSUPP when clock_get{time,res} can't find a RTC device. However this is incorrect. First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the closest userland equivlents). Second, Posix and Linux man pages agree that clock_gettime and clock_getres should return EINVAL if clk_id argument is invalid. While the arugment that the clockid is valid, but just not supported on this hardware could be made, this is just a technicality that doesn't help userspace applicaitons, and only complicates error handling. Thus, this patch changes the code to use EINVAL. Cc: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: stable <[email protected]> #3.0 and up Reported-by: Vit Ondruch <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> [jstultz: Tweaks to commit message to include full rational] Signed-off-by: John Stultz <[email protected]>
2013-10-10timer stats: Add a 'Collection: active/inactive' line to timer usage statisticsDong Zhu1-4/+4
We can enable/disable timer statistics collection via: echo [1|0] > /proc/timers_stats and it would be nice if apps had the ability to check what the current collection status is. This patch adds a 'Collection: active/inactive' line to display the current timer collection status. Also bump up the timer stats version to v0.3. Signed-off-by: Dong Zhu <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Improved the changelog and the code. ] Signed-off-by: Ingo Molnar <[email protected]>
2013-10-10Merge branch 'fortglx/3.13/time' of ↵Ingo Molnar2-10/+3
git://git.linaro.org/people/jstultz/linux into timers/core Pull more timekeeping items for v3.13 from John Stultz: * Small cleanup in the clocksource code. * Fix for rtc-pl031 to let it work with alarmtimers. * Move arm64 to using the generic sched_clock framework & resulting cleanup in the generic sched_clock code. Signed-off-by: Ingo Molnar <[email protected]>
2013-10-09sched_clock: Remove sched_clock_func() hookStephen Boyd1-8/+1
Nobody is using sched_clock_func() anymore now that sched_clock supports up to 64 bits. Remove the hook so that new code only uses sched_clock_register(). Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-10-03Merge branch 'clockevents/3.13' of ↵Ingo Molnar1-0/+1
git://git.linaro.org/people/dlezcano/linux into timers/core Pull (mostly) ARM clocksource driver updates from Daniel Lezcano: " - Soren Brinkmann added FEAT_PERCPU to a clock device when it is local per cpu. This feature prevents the clock framework to choose a per cpu timer as a broadcast timer. This problem arised when the ARM global timer is used when switching to the broadcast timer which is the case now on Xillinx with its cpuidle driver. - Stephen Boyd extended the generic sched_clock code to support 64bit counters and removes the setup_sched_clock deprecation, as that causes lots of warnings since there's still users in the arch/arm tree. He added also the CLOCK_SOURCE_SUSPEND_NONSTOP flag on the architected timer as they continue counting during suspend. - Uwe Kleine-König added some missing __init sections and consolidated the code by moving the of_node_put call from the drivers to the function clocksource_of_init. " Signed-off-by: Ingo Molnar <[email protected]>
2013-10-03Merge branch 'timers/core' of ↵Ingo Molnar1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Merge updated full dynticks support from Frederic Weisbecker: - support 32-bit systems (full dynticks was 64-bit only before) - support ARM Signed-off-by: Ingo Molnar <[email protected]>
2013-10-03Merge tag 'v3.12-rc3' into timers/coreIngo Molnar2-4/+4
Merge Linux 3.12-rc3 - refresh the tree with the latest fixes before merging new bits. Signed-off-by: Ingo Molnar <[email protected]>
2013-10-02tick: broadcast: Deny per-cpu clockevents from being broadcast sourcesSoren Brinkmann1-0/+1
On most ARM systems the per-cpu clockevents are truly per-cpu in the sense that they can't be controlled on any other CPU besides the CPU that they interrupt. If one of these clockevents were to become a broadcast source we will run into a lot of trouble because the broadcast source is enabled on the first CPU to go into deep idle (if that CPU suffers from FEAT_C3_STOP) and that could be a different CPU than what the clockevent is interrupting (or even worse the CPU that the clockevent interrupts could be offline). Theoretically it's possible to support per-cpu clockevents as the broadcast source but so far we haven't needed this and supporting it is rather complicated. Let's just deny the possibility for now until this becomes a reality (let's hope it never does!). Signed-off-by: Soren Brinkmann <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Acked-by: Michal Simek <[email protected]>
2013-09-30nohz: Drop generic vtime obsolete dependency on CONFIG_64BITKevin Hilman1-1/+0
The CONFIG_64BIT requirement on vtime can finally be removed since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which already takes care of the arch ability to handle nsecs based cputime_t safely. Signed-off-by: Kevin Hilman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Russell King <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Arm Linux <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2013-09-30vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN KconfigKevin Hilman1-0/+1
With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order to use that feature, arch code should be audited to ensure there are no races in concurrent read/write of cputime_t. For example, reading/writing 64-bit cputime_t on some 32-bit arches may require multiple accesses for low and high value parts, so proper locking is needed to protect against concurrent accesses. Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can enable after they've been audited for potential races. This option is automatically enabled on 64-bit platforms. Feature requested by Frederic Weisbecker. Signed-off-by: Kevin Hilman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Russell King <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Arm Linux <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2013-09-18Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "An NTP related lockup fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix HRTICK related deadlock from ntp lock changes
2013-09-17clocksource: Fix 'ret' data type of sysfs_override_clocksource() and ↵Elad Wexler1-2/+2
sysfs_unbind_clocksource() sysfs_override_clocksource(): The expression 'if (ret >= 0)' is always true. This will cause clocksource_select() to always run. Thus modified ret to be of type ssize_t. sysfs_unbind_clocksource(): The expression 'if (ret < 0)' is always false. So in case sysfs_get_uname() failed, the expression won't take an effect. Thus modified ret to be of type ssize_t. Signed-off-by: Elad Wexler <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-09-16Merge branch 'fortglx/3.12/time' into fortglx/3.13/timeJohn Stultz2-1/+3
Merge in the timekeeping changes that missed 3.12 Signed-off-by: John Stultz <[email protected]>
2013-09-16Merge branch 'fortglx/3.12/sched-clock64-base' into fortglx/3.13/timeJohn Stultz2-65/+91
Merge in 64bit sched_clock support that missed 3.12. Conflicts: kernel/time/sched_clock.c Signed-off-by: John.Stultz <[email protected]>
2013-09-12timekeeping: Fix HRTICK related deadlock from ntp lock changesJohn Stultz2-4/+4
Gerlando Falauto reported that when HRTICK is enabled, it is possible to trigger system deadlocks. These were hard to reproduce, as HRTICK has been broken in the past, but seemed to be connected to the timekeeping_seq lock. Since seqlock/seqcount's aren't supported w/ lockdep, I added some extra spinlock based locking and triggered the following lockdep output: [ 15.849182] ntpd/4062 is trying to acquire lock: [ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480 [ 15.850051] [ 15.850051] but task is already holding lock: [ 15.850051] (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100 <snip> [ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock [ 15.850051] Possible unsafe locking scenario: [ 15.850051] [ 15.850051] CPU0 CPU1 [ 15.850051] ---- ---- [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&p->pi_lock); [ 15.850051] lock(timekeeper_lock); [ 15.850051] lock(&(&pool->lock)->rlock); [ 15.850051] [ 15.850051] *** DEADLOCK *** The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping: Hold timekeepering locks in do_adjtimex and hardpps") in 3.10 This patch avoids this deadlock, by moving the call to schedule_delayed_work() outside of the timekeeper lock critical section. Reported-by: Gerlando Falauto <[email protected]> Tested-by: Lin Ming <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: stable <[email protected]> #3.11, 3.10 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-09-04Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds2-33/+29
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers/nohz changes from Ingo Molnar: "It mostly contains fixes and full dynticks off-case optimizations, by Frederic Weisbecker" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) nohz: Include local CPU in full dynticks global kick nohz: Optimize full dynticks's sched hooks with static keys nohz: Optimize full dynticks state checks with static keys nohz: Rename a few state variables vtime: Always debug check snapshot source _before_ updating it vtime: Always scale generic vtime accounting results vtime: Optimize full dynticks accounting off case with static keys vtime: Describe overriden functions in dedicated arch headers m68k: hardirq_count() only need preempt_mask.h hardirq: Split preempt count mask definitions context_tracking: Split low level state headers vtime: Fix racy cputime delta update vtime: Remove a few unneeded generic vtime state checks context_tracking: User/kernel broundary cross trace events context_tracking: Optimize context switch off case with static keys context_tracking: Optimize guest APIs off case with static key context_tracking: Optimize main APIs off case with static key context_tracking: Ground setup for static key use context_tracking: Remove full dynticks' hacky dependency on wide context tracking nohz: Only enable context tracking on full dynticks CPUs ...
2013-09-03Merge branch 'rcu/next' of ↵Ingo Molnar1-0/+50
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: " * Update RCU documentation. These were posted to LKML at https://lkml.org/lkml/2013/8/19/611. * Miscellaneous fixes. These were posted to LKML at https://lkml.org/lkml/2013/8/19/619. * Full-system idle detection. This is for use by Frederic Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the timekeeping CPU to shut off its tick when all other CPUs are idle. These were posted to LKML at https://lkml.org/lkml/2013/8/19/648. * Improve rcutorture test coverage. These were posted to LKML at https://lkml.org/lkml/2013/8/19/675. " Signed-off-by: Ingo Molnar <[email protected]>
2013-08-31nohz_full: Add full-system-idle state machinePaul E. McKenney1-0/+27
This commit adds the state machine that takes the per-CPU idle data as input and produces a full-system-idle indication as output. This state machine is driven out of RCU's quiescent-state-forcing mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU idle state and then rcu_sysidle_report() to drive the state machine. The full-system-idle state is sampled using rcu_sys_is_idle(), which also drives the state machine if RCU is idle (and does so by forcing RCU to become non-idle). This function returns true if all but the timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long enough to avoid memory contention on the full_sysidle_state state variable. The rcu_sysidle_force_exit() may be called externally to reset the state machine back into non-idle state. For large systems the state machine is driven out of RCU's force-quiescent-state logic, which provides good scalability at the price of millisecond-scale latencies on the transition to full-system-idle state. This is not so good for battery-powered systems, which are usually small enough that they don't need to care about scalability, but which do care deeply about energy efficiency. Small systems therefore drive the state machine directly out of the idle-entry code. The number of CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL Kconfig parameter, which defaults to 8. Note that this is a build-time definition. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Lai Jiangshan <[email protected]> [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ] Reviewed-by: Josh Triplett <[email protected]> [ paulmck: Simplify logic and provide better comments for memory barriers, based on review comments and questions by Lai Jiangshan. ]
2013-08-28timer_list: correct the iterator for timer_listNathan Zimmer1-17/+24
Correct an issue with /proc/timer_list reported by Holger. When reading from the proc file with a sufficiently small buffer, 2k so not really that small, there was one could get hung trying to read the file a chunk at a time. The timer_list_start function failed to account for the possibility that the offset was adjusted outside the timer_list_next. Signed-off-by: Nathan Zimmer <[email protected]> Reported-by: Holger Hans Peter Freyther <[email protected]> Cc: John Stultz <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Berke Durak <[email protected]> Cc: Jeff Layton <[email protected]> Tested-by: Al Viro <[email protected]> Cc: <[email protected]> # 3.10.x Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-22ntp: Make periodic RTC update more reliableMiroslav Lichvar1-1/+2
The current code requires that the scheduled update of the RTC happens in the closest tick to the half of the second. This seems to be difficult to achieve reliably. The scheduled work may be missing the target time by a tick or two and be constantly rescheduled every second. Relax the limit to 10 ticks. As a typical RTC drifts in the 11-minute update interval by several milliseconds, this shouldn't affect the overall accuracy of the RTC much. Signed-off-by: Miroslav Lichvar <[email protected]> Signed-off-by: John Stultz <[email protected]>
2013-08-19Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Three small fixlets" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: fix compile warning in tick_nohz_init() nohz: Do not warn about unstable tsc unless user uses nohz_full sched_clock: Fix integer overflow
2013-08-18nohz_full: Add Kconfig parameter for scalable detection of all-idle statePaul E. McKenney1-0/+23
At least one CPU must keep the scheduling-clock tick running for timekeeping purposes whenever there is a non-idle CPU. However, with the new nohz_full adaptive-idle machinery, it is difficult to distinguish between all CPUs really being idle as opposed to all non-idle CPUs being in adaptive-ticks mode. This commit therefore adds a Kconfig parameter as a first step towards enabling a scalable detection of full-system idle state. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> [ paulmck: Update help text per Frederic Weisbecker. ] Reviewed-by: Josh Triplett <[email protected]>
2013-08-16nohz: Include local CPU in full dynticks global kickFrederic Weisbecker1-0/+1
tick_nohz_full_kick_all() is useful to notify all full dynticks CPUs that there is a system state change to checkout before re-evaluating the need for the tick. Unfortunately this is implemented using smp_call_function_many() that ignores the local CPU. This CPU also needs to re-evaluate the tick. on_each_cpu_mask() is not useful either because we don't want to re-evaluate the tick state in place but asynchronously from an IPI to avoid messing up with any random locking scenario. So lets call tick_nohz_full_kick() from tick_nohz_full_kick_all() so that the usual irq work takes care of it. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-14Merge branch 'timers/nohz-v3' of ↵Ingo Molnar3-34/+28
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz Pull nohz improvements from Frederic Weisbecker: " It mostly contains fixes and full dynticks off-case optimizations. I believe that distros want to enable this feature so it seems important to optimize the case where the "nohz_full=" parameter is empty. ie: I'm trying to remove any performance regression that comes with NO_HZ_FULL=y when the feature is not used. This patchset improves the current situation a lot (off-case appears to be around 11% faster with hackbench, although I guess it may vary depending on the configuration but it should be significantly faster in any case) now there is still some work to do: I can still observe a remaining loss of 1.6% throughput seen with hackbench compared to CONFIG_NO_HZ_FULL=n. " Signed-off-by: Ingo Molnar <[email protected]>
2013-08-14nohz: Optimize full dynticks's sched hooks with static keysFrederic Weisbecker1-4/+4
Scheduler IPIs and task context switches are serious fast path. Let's try to hide as much as we can the impact of full dynticks APIs' off case that are called on these sites through the use of static keys. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]>
2013-08-14nohz: Optimize full dynticks state checks with static keysFrederic Weisbecker1-12/+2
These APIs are frequenctly accessed and priority is given to optimize the full dynticks off-case in order to let distros enable this feature without suffering from significant performance regressions. Let's inline these APIs and optimize them with static keys. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]>
2013-08-14nohz: Rename a few state variablesFrederic Weisbecker1-22/+22
Rename the full dynticks's cpumask and cpumask state variables to some more exportable names. These will be used later from global headers to optimize the main full dynticks APIs in conjunction with static keys. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]>