aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-06-19Merge branch 'timers/core' into sched/hrtimersThomas Gleixner74-1324/+2057
Merge sched/core and timers/core so we can apply the sched balancing patch queue, which depends on both.
2015-06-19hrtimer: Allow hrtimer::function() to free the timerPeter Zijlstra2-48/+107
Currently an hrtimer callback function cannot free its own timer because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK after it. Freeing the timer would result in a clear use-after-free. Solve this by using a scheme similar to regular timers; track the current running timer in hrtimer_clock_base::running. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul McKenney <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19seqcount: Introduce raw_write_seqcount_barrier()Peter Zijlstra1-0/+41
Introduce raw_write_seqcount_barrier(), a new construct that can be used to provide write barrier semantics in seqcount read loops instead of the usual consistency guarantee. raw_write_seqcount_barier() is equivalent to: raw_write_seqcount_begin(); raw_write_seqcount_end(); But avoids issueing two back-to-back smp_wmb() instructions. This construct works because the read side will 'stall' when observing odd values. This means that -- referring to the example in the comment below -- even though there is no (matching) read barrier between the loads of X and Y, we cannot observe !x && !y, because: - if we observe Y == false we must observe the first sequence increment, which makes us loop, until - we observe !(seq & 1) -- the second sequence increment -- at which time we must also observe T == true. Suggested-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19seqcount: Rename write_seqcount_barrier()Peter Zijlstra2-11/+11
I'll shortly be introducing another seqcount primitive that's useful to provide ordering semantics and would like to use the write_seqcount_barrier() name for that. Seeing how there's only one user of the current primitive, lets rename it to invalidate, as that appears what its doing. While there, employ lockdep_assert_held() instead of assert_spin_locked() to not generate debug code for regular kernels. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Cc: Paul McKenney <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19hrtimer: Fix hrtimer_is_queued() holePeter Zijlstra1-10/+13
A queued hrtimer that gets restarted (hrtimer_start*() while hrtimer_is_queued()) will briefly appear as unqueued/inactive, even though the timer has always been active, we just moved it. Close this hole by preserving timer->state in hrtimer_start_range_ns()'s remove_hrtimer() call. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-19hrtimer: Remove HRTIMER_STATE_MIGRATEOleg Nesterov2-10/+3
I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally confused it looks buggy and simply unneeded. migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this is not enough: this can fool, say, hrtimer_is_queued() in dequeue_signal(). Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED? This fixes the race and we can kill STATE_MIGRATE. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-18selftest: Timers: Avoid signal deadlock in leap-a-dayJohn Stultz1-11/+12
In 0c4a5fc95b1df (Add leap-second timer edge testing to leap-a-day.c), we added a timer to the test which checks to make sure timers near the leapsecond edge behave correctly. However, the output generated from the timer uses ctime_r, which isn't async-signal safe, and should that signal land while the main test is using ctime_r to print its output, its possible for the test to deadlock on glibc internal locks. Thus this patch reworks the output to avoid using ctime_r in the signal handler. Signed-off-by: John Stultz <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-18timekeeping: Copy the shadow-timekeeper over the real timekeeper lastJohn Stultz1-1/+2
The fix in d151832650ed9 (time: Move clock_was_set_seq update before updating shadow-timekeeper) was unfortunately incomplete. The main gist of that change was to do the shadow-copy update last, so that any state changes were properly duplicated, and we wouldn't accidentally have stale data in the shadow. Unfortunately in the main update_wall_time() logic, we update use the shadow-timekeeper to calculate the next update values, then while holding the lock, copy the shadow-timekeeper over, then call timekeeping_update() to do some additional bookkeeping, (skipping the shadow mirror). The bug with this is the additional bookkeeping isn't all read-only, and some changes timkeeper state. Thus we might then overwrite this state change on the next update. To avoid this problem, do the timekeeping_update() on the shadow-timekeeper prior to copying the full state over to the real-timekeeper. This avoids problems with both the clock_was_set_seq and next_leap_ktime being overwritten and possibly the fast-timekeepers as well. Many thanks to Prarit for his rigorous testing, which discovered this problem, along with Prarit and Daniel's work validating this fix. Reported-by: Prarit Bhargava <[email protected]> Tested-by: Prarit Bhargava <[email protected]> Tested-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-18clockevents: Check state instead of mode in suspend/resume pathViresh Kumar1-2/+2
CLOCK_EVT_MODE_* macros are present for backward compatibility (as most of the drivers are still using old ->set_mode() interface). These macro's shouldn't be used anymore in code, that is common to both driver interfaces, i.e. ->set_mode() and ->set_state_*(). Drivers implementing ->set_state_*() interface, which have their clkevt->mode set to 0 (clkevt device structures are normally globally defined), will not participate in suspend/resume as they will always be marked as UNUSED. Fix this by checking state of the clockevent device instead of mode, which is updated for both the interfaces. Fixes: ac34ad27fc16 ("clockevents: Do not suspend/resume if unused") Signed-off-by: Viresh Kumar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-12selftests: timers: Add leap-second timer edge testing to leap-a-day.cJohn Stultz1-4/+72
Prarit reported an issue w/ timers around the leapsecond, where a timer set for Midnight UTC (00:00:00) might fire a second early right before the leapsecond (23:59:60 - though it appears as a repeated 23:59:59) is applied. So I've updated the leap-a-day.c test to integrate a similar test, where we set a timer and check if it triggers at the right time, and if the ntp state transition is managed properly. Reported-by: Daniel Bristot de Oliveira <[email protected]> Reported-by: Prarit Bhargava <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-12ntp: Do leapsecond adjustment in adjtimex read pathJohn Stultz1-0/+18
Since the leapsecond is applied at tick-time, this means there is a small window of time at the start of a leap-second where we cross into the next second before applying the leap. This patch modified adjtimex so that the leap-second is applied on the second edge. Providing more correct leapsecond behavior. This does make it so that adjtimex()'s returned time values can be inconsistent with time values read from gettimeofday() or clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at the leapsecond. However, those other interfaces do not provide the TIME_OOP time_state return that adjtimex() provides, which allows the leapsecond to be properly represented. They instead only see a time discontinuity, and cannot tell the first 23:59:59 from the repeated 23:59:59 leap second. This seems like a reasonable tradeoff given clock_gettime() / gettimeofday() cannot properly represent a leapsecond, and users likely care more about performance, while folks who are using adjtimex() more likely care about leap-second correctness. Signed-off-by: John Stultz <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-12time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edgeJohn Stultz5-8/+61
Currently, leapsecond adjustments are done at tick time. As a result, the leapsecond was applied at the first timer tick *after* the leapsecond (~1-10ms late depending on HZ), rather then exactly on the second edge. This was in part historical from back when we were always tick based, but correcting this since has been avoided since it adds extra conditional checks in the gettime fastpath, which has performance overhead. However, it was recently pointed out that ABS_TIME CLOCK_REALTIME timers set for right after the leapsecond could fire a second early, since some timers may be expired before we trigger the timekeeping timer, which then applies the leapsecond. This isn't quite as bad as it sounds, since behaviorally it is similar to what is possible w/ ntpd made leapsecond adjustments done w/o using the kernel discipline. Where due to latencies, timers may fire just prior to the settimeofday call. (Also, one should note that all applications using CLOCK_REALTIME timers should always be careful, since they are prone to quirks from settimeofday() disturbances.) However, the purpose of having the kernel do the leap adjustment is to avoid such latencies, so I think this is worth fixing. So in order to properly keep those timers from firing a second early, this patch modifies the ntp and timekeeping logic so that we keep enough state so that the update_base_offsets_now accessor, which provides the hrtimer core the current time, can check and apply the leapsecond adjustment on the second edge. This prevents the hrtimer core from expiring timers too early. This patch does not modify any other time read path, so no additional overhead is incurred. However, this also means that the leap-second continues to be applied at tick time for all other read-paths. Apologies to Richard Cochran, who pushed for similar changes years ago, which I resisted due to the concerns about the performance overhead. While I suspect this isn't extremely critical, folks who care about strict leap-second correctness will likely want to watch this. Potentially a -stable candidate eventually. Originally-suggested-by: Richard Cochran <[email protected]> Reported-by: Daniel Bristot de Oliveira <[email protected]> Reported-by: Prarit Bhargava <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-12ntp: Introduce and use SECS_PER_DAY macro instead of 86400John Stultz1-2/+3
Currently the leapsecond logic uses what looks like magic values. Improve this by defining SECS_PER_DAY and using that macro to make the logic more clear. Signed-off-by: John Stultz <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-12time: Move clock_was_set_seq update before updating shadow-timekeeperJohn Stultz1-4/+8
It was reported that 868a3e915f7f5eba (hrtimer: Make offset update smarter) was causing timer problems after suspend/resume. The problem with that change is the modification to clock_was_set_seq in timekeeping_update is done prior to mirroring the time state to the shadow-timekeeper. Thus the next time we do update_wall_time() the updated sequence is overwritten by whats in the shadow copy. This patch moves the shadow-timekeeper mirroring to the end of the function, after all updates have been made, so all data is kept in sync. (This patch also affects the update_fast_timekeeper calls which were also problematically done prior to the mirroring). Reported-and-tested-by: Jeremiah Mahler <[email protected]> Signed-off-by: John Stultz <[email protected]> Cc: Preeti U Murthy <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-10clocksource: Use current logging styleJoe Perches1-12/+12
clocksource messages aren't prefixed in dmesg so it's a bit unclear what subsystem emits the messages. Use pr_fmt and pr_<level> to auto-prefix the messages appropriately. Miscellanea: o Remove "Warning" from KERN_WARNING level messages o Align "timekeeping watchdog: " messages o Coalesce formats o Align multiline arguments Signed-off-by: Joe Perches <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-10time: Allow gcc to fold usecs_to_jiffies(constant)Nicholas Mc Guire1-1/+29
To allow constant folding in usecs_to_jiffies() conditionally calls the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not figure out constant folding, __usecs_to_jiffies, which is the renamed original usecs_to_jiffies() function. Signed-off-by: Nicholas Mc Guire <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Joe Perches <[email protected]> Cc: John Stultz <[email protected]> Cc: Andrew Hunter <[email protected]> Cc: Paul Turner <[email protected]> Cc: Michal Marek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-10time: Refactor usecs_to_jiffiesNicholas Mc Guire2-11/+29
Refactor the usecs_to_jiffies conditional code part in time.c and jiffies.h putting it into conditional functions rather than #ifdefs to improve readability. This is analogous to the msecs_to_jiffies() cleanup in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Nicholas Mc Guire <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Joe Perches <[email protected]> Cc: John Stultz <[email protected]> Cc: Andrew Hunter <[email protected]> Cc: Paul Turner <[email protected]> Cc: Michal Marek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-08hrtimers: Make sure hrtimer_resolution is unsigned intBorislav Petkov1-1/+1
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like this one: net/sched/sch_api.c: In function ‘psched_show’: net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=] (u32)NSEC_PER_SEC / hrtimer_resolution); Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]> Cc: Thomas Gleixner <[email protected]>
2015-06-07sched/numa: Only consider less busy nodes as numa balancing destinationsRik van Riel1-2/+28
Changeset a43455a1d572 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") fixes an issue where workloads would never converge on a fully loaded (or overloaded) system. However, it introduces a regression on less than fully loaded systems, where workloads converge on a few NUMA nodes, instead of properly staying spread out across the whole system. This leads to a reduction in available memory bandwidth, and usable CPU cache, with predictable performance problems. The root cause appears to be an interaction between the load balancer and NUMA balancing, where the short term load represented by the load balancer differs from the long term load the NUMA balancing code would like to base its decisions on. Simply reverting a43455a1d572 would re-introduce the non-convergence of workloads on fully loaded systems, so that is not a good option. As an aside, the check done before a43455a1d572 only applied to a task's preferred node, not to other candidate nodes in the system, so the converge-on-too-few-nodes problem still happens, just to a lesser degree. Instead, try to compensate for the impedance mismatch between the load balancer and NUMA balancing by only ever considering a lesser loaded node as a destination for NUMA balancing, regardless of whether the task is trying to move to the preferred node, or to another node. This patch also addresses the issue that a system with a single runnable thread would never migrate that thread to near its memory, introduced by 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced"). A test where the main thread creates a large memory area, and spawns a worker thread to iterate over the memory (placed on another node by select_task_rq_fair), after which the main thread goes to sleep and waits for the worker thread to loop over all the memory now sees the worker thread migrated to where the memory is, instead of having all the memory migrated over like before. Jirka has run a number of performance tests on several systems: single instance SpecJBB 2005 performance is 7-15% higher on a 4 node system, with higher gains on systems with more cores per socket. Multi-instance SpecJBB 2005 (one per node), linpack, and stream see little or no changes with the revert of 095bebf61a46 and this patch. Reported-by: Artem Bityutski <[email protected]> Reported-by: Jirka Hladky <[email protected]> Tested-by: Jirka Hladky <[email protected]> Tested-by: Artem Bityutskiy <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-07Revert 095bebf61a46 ("sched/numa: Do not move past the balance point if ↵Rik van Riel1-26/+15
unbalanced") Commit 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced") broke convergence of workloads with just one runnable thread, by making it impossible for the one runnable thread on the system to move from one NUMA node to another. Instead, the thread would remain where it was, and pull all the memory across to its location, which is much slower than just migrating the thread to where the memory is. The next patch has a better fix for the issue that 095bebf61a46 tried to address. Reported-by: Jirka Hladky <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-07sched/fair: Prevent throttling in early pick_next_task_fair()Ben Segall1-11/+14
The optimized task selection logic optimistically selects a new task to run without first doing a full put_prev_task(). This is so that we can avoid a put/set on the common ancestors of the old and new task. Similarly, we should only call check_cfs_rq_runtime() to throttle eligible groups if they're part of the common ancestry, otherwise it is possible to end up with no eligible task in the simple task selection. Imagine: /root /prev /next /A /B If our optimistic selection ends up throttling /next, we goto simple and our put_prev_task() ends up throttling /prev, after which we're going to bug out in set_next_entity() because there aren't any tasks left. Avoid this scenario by only throttling common ancestors. Reported-by: Mohammed Naser <[email protected]> Reported-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Ben Segall <[email protected]> [ munged Changelog ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-07preempt: Reorganize the notrace definitions a bitFrederic Weisbecker1-17/+15
preempt.h has two seperate "#ifdef CONFIG_PREEMPT" sections: one to define preempt_enable() and another to define preempt_enable_notrace(). Lets gather both. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-07preempt: Use preempt_schedule_context() as the official tracing preemption pointFrederic Weisbecker8-32/+13
preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-07sched: Make preempt_schedule_context() function-tracing safeFrederic Weisbecker1-2/+9
Since function tracing disables preemption, it needs a safe preemption point to use when preemption is re-enabled without worrying about tracing recursion. Ie: to avoid tracing recursion, that preemption point can't be traced (use of notrace qualifier) and it can't call any traceable function before that preemption point disables preemption itself, which disarms the recursion. preempt_schedule() was fine until commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") because PREEMPT_ACTIVE (which has the property to disable preemption and this disarm tracing preemption recursion) was set before calling any further function. But that commit introduced the use of preempt_count_add/sub() functions to set PREEMPT_ACTIVE and because these functions are called before preemption gets a chance to be disabled, we have a tracing recursion. preempt_schedule_context() is one of the possible preemption functions used by tracing. Its special purpose is to avoid tracing recursion against context tracking. Lets enhance this function to become more generally tracing safe by disabling preemption with raw accessors, such that no function is called before preemption gets disabled and disarm the tracing recursion. This function is going to become the specific tracing-safe preemption point in further commit. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-06-02Merge branch 'clockevents/4.2' of ↵Thomas Gleixner10-78/+632
http://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevents/clocksource changes from Daniel Lezcano: - Removed dead code in the files related to mach-msm for qcom (Stephen Boyd) - Cleaned up code for exynos_mct (Krzysztof Kozlowski) - Added the new timer lpc3220 (Joachim Eastwood) - Added the new timer STM32 and ARM system timer (Maxime Coquelin)
2015-06-02clockevents: Rename state to state_use_accessorsThomas Gleixner2-9/+9
The only sensible way to make abuse of core internal fields obvious and easy to grep for. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Peter Zijlstra <[email protected]>
2015-06-02clockevents: Use set/get state helper functionsThomas Gleixner2-6/+7
Signed-off-by: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Peter Zijlstra <[email protected]>
2015-06-02clockevents: Provide functions to set and get the stateThomas Gleixner5-24/+35
We want to rename dev->state, so provide proper get and set functions. Rename clockevents_set_state() to clockevents_switch_state() to avoid confusion. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Peter Zijlstra <[email protected]>
2015-06-02clockevents: Use helpers to check the state of a clockevent deviceViresh Kumar4-17/+17
Use accessor functions to check the state of clockevent devices in core code. Signed-off-by: Viresh Kumar <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-02clockevents: Add helpers to check the state of a clockevent deviceViresh Kumar1-0/+26
Some clockevent drivers, once migrated to use per-state callbacks, need to check the state of the clockevent device in their callbacks or interrupt handler. Add accessor functions clockevent_state_*() to get this information. Signed-off-by: Viresh Kumar <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/04a717d490335c688dd7af899fbcede97e1bb8ee.1432192527.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <[email protected]>
2015-06-02clockevents/drivers/timer-stm32: Fix build warning spotted by kbuild test robotMaxime Coquelin1-2/+2
This patch fixes below warning spotted by kbuild test robot when building with ARCH=powerpc: drivers/clocksource/timer-stm32.c: In function 'stm32_clockevent_init': >> drivers/clocksource/timer-stm32.c:140:9: warning: large integer implicitly truncated to unsigned type [-Woverflow] writel_relaxed(~0UL, data->base + TIM_ARR); The fix consists in using 0U instead of 0UL. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Maxime Coquelin <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02clockevents/drivers: Add STM32 Timer driverMaxime Coquelin3-0/+190
STM32 MCUs feature 16 and 32 bits general purpose timers with prescalers. The drivers detects whether the time is 16 or 32 bits, and applies a 1024 prescaler value if it is 16 bits. Reviewed-by: Linus Walleij <[email protected]> Tested-by: Chanwoo Choi <[email protected]> Signed-off-by: Maxime Coquelin <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02dt-bindings: Document the STM32 timer bindingsMaxime Coquelin1-0/+22
This adds documentation of device tree bindings for the STM32 timer. Tested-by: Chanwoo Choi <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Maxime Coquelin <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02clocksource/drivers/armv7m_systick: Add ARM System timer driverMaxime Coquelin3-0/+87
This patch adds clocksource support for ARMv7-M's System timer, also known as SysTick. Tested-by: Chanwoo Choi <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Signed-off-by: Maxime Coquelin <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02dt-bindings: Document the ARM System timer bindingsMaxime Coquelin1-0/+26
This adds documentation of device tree bindings for the ARM System timer. Tested-by: Chanwoo Choi <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Maxime Coquelin <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02doc: dt: Add documentation for lpc3220-timerJoachim Eastwood1-0/+26
Add DT bindings documentation for lpc3220-timer. This timer is used as clocksource on many NXP platforms. Signed-off-by: Joachim Eastwood <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
2015-06-02clocksource/drivers/lpc32xx: Add the lpc32xx timer driverJoachim Eastwood3-0/+278
Add support for using the NXP LPC timer as clocksource and clock event. These timers are present on many NXP devices including LPC32xx, LPC17xx, LPC18xx and LPC43xx. The timer has a 32-bit timer counter register with a programmable 32-bit prescaler. It supports up to 4 compare match values with interrupt generation and reset/stop timer counter action. Signed-off-by: Joachim Eastwood <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Reviewed-by: Ezequiel Garcia <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
2015-06-02clocksource/drivers/exynos_mct: Remove old platform mct_init()Krzysztof Kozlowski1-12/+0
Since commit 228e3023eb04 ("Merge tag 'mct-exynos-for-v3.10' of ...") the mct_init() was superseded by mct_init_dt() and is not referenced anywhere. Remove it. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02clocksource/drivers/exynos_mct: Staticize struct clocksourceKrzysztof Kozlowski1-1/+1
The struct clocksource 'mct_frc' is not exported and used outside so make it static. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02clocksource/drivers/exynos_mct: Change exynos4_mct_tick_clear return type to ↵Krzysztof Kozlowski1-6/+2
void Return value of exynos4_mct_tick_clear() was never checked so it can be safely changed to void. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02clocksource/drivers/qcom: Remove dead codeStephen Boyd1-59/+0
This code is no longer used now that mach-msm has been removed. Delete it. Cc: David Brown <[email protected]> Cc: Bryan Huntsman <[email protected]> Cc: Daniel Walker <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]>
2015-06-02Merge branch 'linus' into sched/core, to resolve conflictIngo Molnar409-1984/+3728
Conflicts: arch/sparc/include/asm/topology_64.h Signed-off-by: Ingo Molnar <[email protected]>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds32-119/+174
Pull networking fixes from David Miller: 1) Various VTI tunnel (mark handling, PMTU) bug fixes from Alexander Duyck and Steffen Klassert. 2) Revert ethtool PHY query change, it wasn't correct. The PHY address selected by the driver running the PHY to MAC connection decides what PHY address GET ethtool operations return information from. 3) Fix handling of sequence number bits for encryption IV generation in ESP driver, from Herbert Xu. 4) UDP can return -EAGAIN when we hit a bad checksum on receive, even when there are other packets in the receive queue which is wrong. Just respect the error returned from the generic socket recv datagram helper. From Eric Dumazet. 5) Fix BNA driver firmware loading on big-endian systems, from Ivan Vecera. 6) Fix regression in that we were inheriting the congestion control of the listening socket for new connections, the intended behavior always was to use the default in this case. From Neal Cardwell. 7) Fix NULL deref in brcmfmac driver, from Arend van Spriel. 8) OTP parsing fix in iwlwifi from Liad Kaufman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) vti6: Add pmtu handling to vti6_xmit. Revert "net: core: 'ethtool' issue with querying phy settings" bnx2x: Move statistics implementation into semaphores xen: netback: read hotplug script once at start of day. xen: netback: fix printf format string warning Revert "netfilter: ensure number of counters is >0 in do_replace()" net: dsa: Properly propagate errors from dsa_switch_setup_one tcp: fix child sockets to use system default congestion control if not set udp: fix behavior of wrong checksums sfc: free multiple Rx buffers when required bna: fix soft lock-up during firmware initialization failure bna: remove unreasonable iocpf timer start bna: fix firmware loading on big-endian machines bridge: fix br_multicast_query_expired() bug via-rhine: Resigning as maintainer brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails mac80211: Fix mac80211.h docbook comments iwlwifi: nvm: fix otp parsing in 8000 hw family iwlwifi: pcie: fix tracking of cmd_in_flight ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call ...
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds12-59/+282
Pull Sparc fixes from David Miller: 1) Setup the core/threads/sockets bitmaps correctly so that 'lscpus' and friends operate properly. Frtom Chris Hyser. 2) The bit that normally means "Cached Virtually" on sun4v systems, actually changes meaning in M7 and later chips. Fix from Khalid Aziz. 3) One some PCI-E systems we need to probe different OF properties to fill in the PCI slot information properly, from Eric Snowberg. 4) Kill an extraneous memset after kzalloc(), from Christophe Jaillet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE sparc64: pci slots information is not populated in sysfs sparc: kernel: GRPCI2: Remove a useless memset sparc64: Setup sysfs to mark LDOM sockets, cores and threads correctly
2015-06-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+1
Pull virtio fix from Michael Tsirkin: "Last-minute virtio fix for 4.1 This tweaks an exported user-space header to fix build breakage for userspace using it" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-4/+0
Pablo Neira Ayuso says: ==================== Netfilter fix for net The following patch reverts the ebtables chunk that enforces counters that was introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure number of counters is >0 in do_replace()"') since this breaks ebtables. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-06-01Merge tag 'wireless-drivers-for-davem-2015-06-01' of ↵David S. Miller5-27/+20
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== iwlwifi: * fix OTP parsing 8260 * fix powersave handling for 8260 brcmfmac: * fix null pointer crash ==================== Signed-off-by: David S. Miller <[email protected]>
2015-06-01vti6: Add pmtu handling to vti6_xmit.Steffen Klassert1-0/+14
We currently rely on the PMTU discovery of xfrm. However if a packet is localy sent, the PMTU mechanism of xfrm tries to to local socket notification what might not work for applications like ping that don't check for this. So add pmtu handling to vti6_xmit to report MTU changes immediately. Signed-off-by: Steffen Klassert <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-06-01Revert "net: core: 'ethtool' issue with querying phy settings"David S. Miller1-9/+1
This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f. It isn't right, ethtool is meant to manage one PHY instance per netdevice at a time, and this is selected by the SET command. Therefore by definition the GET command must only return the settings for the configured and selected PHY. Reported-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-06-01bnx2x: Move statistics implementation into semaphoresYuval Mintz3-11/+20
Commit dff173de84958 ("bnx2x: Fix statistics locking scheme") changed the bnx2x locking around statistics state into using a mutex - but the lock is being accessed via a timer which is forbidden. [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about accessing the mutex in interrupt context] This moves the implementation into using a semaphore [with size '1'] instead. Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>