aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-06-26cpu/speculation: Warn on unsupported mitigations= parameterGeert Uytterhoeven1-0/+3
Currently, if the user specifies an unsupported mitigation strategy on the kernel command line, it will be ignored silently. The code will fall back to the default strategy, possibly leaving the system more vulnerable than expected. This may happen due to e.g. a simple typo, or, for a stable kernel release, because not all mitigation strategies have been backported. Inform the user by printing a message. Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-25locking/lockdep: increase size of counters for lockdep statisticsKobe Wu1-20/+16
When system has been running for a long time, signed integer counters are not enough for some lockdep statistics. Using unsigned long counters can satisfy the requirement. Besides, most of lockdep statistics are unsigned. It is better to use unsigned int instead of int. Remove unused variables. - max_recursion_depth - nr_cyclic_check_recursions - nr_find_usage_forwards_recursions - nr_find_usage_backwards_recursions Signed-off-by: Kobe Wu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Eason Lin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-25locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && ↵Arnd Bergmann1-39/+34
CONFIG_PROVE_LOCKING The last cleanup patch triggered another issue, as now another function should be moved into the same section: kernel/locking/lockdep.c:3580:12: error: 'mark_lock' defined but not used [-Werror=unused-function] static int mark_lock(struct task_struct *curr, struct held_lock *this, Move mark_lock() into the same #ifdef section as its only caller, and remove the now-unused mark_lock_irq() stub helper. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yuyang Du <[email protected]> Fixes: 0d2cc3b34532 ("locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add uclamp support to energy_compute()Patrick Bellasi3-22/+48
The Energy Aware Scheduler (EAS) estimates the energy impact of waking up a task on a given CPU. This estimation is based on: a) an (active) power consumption defined for each CPU frequency b) an estimation of which frequency will be used on each CPU c) an estimation of the busy time (utilization) of each CPU Utilization clamping can affect both b) and c). A CPU is expected to run: - on an higher than required frequency, but for a shorter time, in case its estimated utilization will be smaller than the minimum utilization enforced by uclamp - on a smaller than required frequency, but for a longer time, in case its estimated utilization is bigger than the maximum utilization enforced by uclamp While compute_energy() already accounts clamping effects on busy time, the clamping effects on frequency selection are currently ignored. Fix it by considering how CPU clamp values will be affected by a task waking up and being RUNNABLE on that CPU. Do that by refactoring schedutil_freq_util() to take an additional task_struct* which allows EAS to evaluate the impact on clamp values of a task being eventually queued in a CPU. Clamp values are applied to the RT+CFS utilization only when a FREQUENCY_UTIL is required by compute_energy(). Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the computation of the cpu_util signal implies that we are more likely to estimate the highest OPP when a RT task is running in another CPU of the same performance domain. This can have an impact on energy estimation but: - it's not easy to say which approach is better, since it depends on the use case - the original approach could still be obtained by setting a smaller task-specific util_min whenever required Since we are at that: - rename schedutil_freq_util() into schedutil_cpu_util(), since it's not only used for frequency selection. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add uclamp_util_with()Patrick Bellasi2-1/+33
So far uclamp_util() allows to clamp a specified utilization considering the clamp values requested by RUNNABLE tasks in a CPU. For the Energy Aware Scheduler (EAS) it is interesting to test how clamp values will change when a task is becoming RUNNABLE on a given CPU. For example, EAS is interested in comparing the energy impact of different scheduling decisions and the clamp values can play a role on that. Add uclamp_util_with() which allows to clamp a given utilization by considering the possible impact on CPU clamp values of a specified task. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasksPatrick Bellasi4-3/+43
Each time a frequency update is required via schedutil, a frequency is selected to (possibly) satisfy the utilization reported by each scheduling class and irqs. However, when utilization clamping is in use, the frequency selection should consider userspace utilization clamping hints. This will allow, for example, to: - boost tasks which are directly affecting the user experience by running them at least at a minimum "requested" frequency - cap low priority tasks not directly affecting the user experience by running them only up to a maximum "allowed" frequency These constraints are meant to support a per-task based tuning of the frequency selection thus supporting a fine grained definition of performance boosting vs energy saving strategies in kernel space. Add support to clamp the utilization of RUNNABLE FAIR and RT tasks within the boundaries defined by their aggregated utilization clamp constraints. Do that by considering the max(min_util, max_util) to give boosted tasks the performance they need even when they happen to be co-scheduled with other capped tasks. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Set default clamps for RT tasksPatrick Bellasi1-2/+28
By default FAIR tasks start without clamps, i.e. neither boosted nor capped, and they run at the best frequency matching their utilization demand. This default behavior does not fit RT tasks which instead are expected to run at the maximum available frequency, if not otherwise required by explicitly capping them. Enforce the correct behavior for RT tasks by setting util_min to max whenever: 1. the task is switched to the RT class and it does not already have a user-defined clamp value assigned. 2. an RT task is forked from a parent with RESET_ON_FORK set. NOTE: utilization clamp values are cross scheduling class attributes and thus they are never changed/reset once a value has been explicitly defined from user-space. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Reset uclamp values on RESET_ON_FORKPatrick Bellasi1-0/+8
A forked tasks gets the same clamp values of its parent however, when the RESET_ON_FORK flag is set on parent, e.g. via: sys_sched_setattr() sched_setattr() __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK) the new forked task is expected to start with all attributes reset to default values. Do that for utilization clamp values too by checking the reset request from the existing uclamp_fork() call which already provides the required initialization for other uclamp related bits. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Extend sched_setattr() to support utilization clampingPatrick Bellasi1-7/+84
The SCHED_DEADLINE scheduling class provides an advanced and formal model to define tasks requirements that can translate into proper decisions for both task placements and frequencies selections. Other classes have a more simplified model based on the POSIX concept of priorities. Such a simple priority based model however does not allow to exploit most advanced features of the Linux scheduler like, for example, driving frequencies selection via the schedutil cpufreq governor. However, also for non SCHED_DEADLINE tasks, it's still interesting to define tasks properties to support scheduler decisions. Utilization clamping exposes to user-space a new set of per-task attributes the scheduler can use as hints about the expected/required utilization for a task. This allows to implement a "proactive" per-task frequency control policy, a more advanced policy than the current one based just on "passive" measured task utilization. For example, it's possible to boost interactive tasks (e.g. to get better performance) or cap background tasks (e.g. to be more energy/thermal efficient). Introduce a new API to set utilization clamping values for a specified task by extending sched_setattr(), a syscall which already allows to define task specific properties for different scheduling classes. A new pair of attributes allows to specify a minimum and maximum utilization the scheduler can consider for a task. Do that by validating the required clamp values before and then applying the required changes using _the_ same pattern already in use for __setscheduler(). This ensures that the task is re-enqueued with the new clamp values. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/core: Allow sched_setattr() to use the current policyPatrick Bellasi1-0/+2
The sched_setattr() syscall mandates that a policy is always specified. This requires to always know which policy a task will have when attributes are configured and this makes it impossible to add more generic task attributes valid across different scheduling policies. Reading the policy before setting generic tasks attributes is racy since we cannot be sure it is not changed concurrently. Introduce the required support to change generic task attributes without affecting the current task policy. This is done by adding an attribute flag (SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy. Add support for the SETPARAM_POLICY policy, which is already used by the sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX syscall. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add system default clampsPatrick Bellasi2-1/+114
Tasks without a user-defined clamp value are considered not clamped and by default their utilization can have any value in the [0..SCHED_CAPACITY_SCALE] range. Tasks with a user-defined clamp value are allowed to request any value in that range, and the required clamp is unconditionally enforced. However, a "System Management Software" could be interested in limiting the range of clamp values allowed for all tasks. Add a privileged interface to define a system default configuration via: /proc/sys/kernel/sched_uclamp_util_{min,max} which works as an unconditional clamp range restriction for all tasks. With the default configuration, the full SCHED_CAPACITY_SCALE range of values is allowed for each clamp index. Otherwise, the task-specific clamp is capped by the corresponding system default value. Do that by tracking, for each task, the "effective" clamp value and bucket the task has been refcounted in at enqueue time. This allows to lazy aggregate "requested" and "system default" values at enqueue time and simplifies refcounting updates at dequeue time. The cached bucket ids are used to avoid (relatively) more expensive integer divisions every time a task is enqueued. An active flag is used to report when the "effective" value is valid and thus the task is actually refcounted in the corresponding rq's bucket. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Enforce last task's UCLAMP_MAXPatrick Bellasi2-5/+46
When a task sleeps it removes its max utilization clamp from its CPU. However, the blocked utilization on that CPU can be higher than the max clamp value enforced while the task was running. This allows undesired CPU frequency increases while a CPU is idle, for example, when another CPU on the same frequency domain triggers a frequency update, since schedutil can now see the full not clamped blocked utilization of the idle CPU. Fix this by using: uclamp_rq_dec_id(p, rq, UCLAMP_MAX) uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value) to detect when a CPU has no more RUNNABLE clamped tasks and to flag this condition. Don't track any minimum utilization clamps since an idle CPU never requires a minimum frequency. The decay of the blocked utilization is good enough to reduce the CPU frequency. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add bucket local max trackingPatrick Bellasi1-18/+25
Because of bucketization, different task-specific clamp values are tracked in the same bucket. For example, with 20% bucket size and assuming to have: Task1: util_min=25% Task2: util_min=35% both tasks will be refcounted in the [20..39]% bucket and always boosted only up to 20% thus implementing a simple floor aggregation normally used in histograms. In systems with only few and well-defined clamp values, it would be useful to track the exact clamp value required by a task whenever possible. For example, if a system requires only 23% and 47% boost values then it's possible to track the exact boost required by each task using only 3 buckets of ~33% size each. Introduce a mechanism to max aggregate the requested clamp values of RUNNABLE tasks in the same bucket. Keep it simple by resetting the bucket value to its base value only when a bucket becomes inactive. Allow a limited and controlled overboosting margin for tasks recounted in the same bucket. In systems where the boost values are not known in advance, it is still possible to control the maximum acceptable overboosting margin by tuning the number of clamp groups. For example, 20 groups ensure a 5% maximum overboost. Remove the rq bucket initialization code since a correct bucket value is now computed when a task is refcounted into a CPU's rq. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add CPU's clamp buckets refcountingPatrick Bellasi2-0/+217
Utilization clamping allows to clamp the CPU's utilization within a [util_min, util_max] range, depending on the set of RUNNABLE tasks on that CPU. Each task references two "clamp buckets" defining its minimum and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp bucket is active if there is at least one RUNNABLE tasks enqueued on that CPU and refcounting that bucket. When a task is {en,de}queued {on,from} a rq, the set of active clamp buckets on that CPU can change. If the set of active clamp buckets changes for a CPU a new "aggregated" clamp value is computed for that CPU. This is because each clamp bucket enforces a different utilization clamp value. Clamp values are always MAX aggregated for both util_min and util_max. This ensures that no task can affect the performance of other co-scheduled tasks which are more boosted (i.e. with higher util_min clamp) or less capped (i.e. with higher util_max clamp). A task has: task_struct::uclamp[clamp_id]::bucket_id to track the "bucket index" of the CPU's clamp bucket it refcounts while enqueued, for each clamp index (clamp_id). A runqueue has: rq::uclamp[clamp_id]::bucket[bucket_id].tasks to track how many RUNNABLE tasks on that CPU refcount each clamp bucket (bucket_id) of a clamp index (clamp_id). It also has a: rq::uclamp[clamp_id]::bucket[bucket_id].value to track the clamp value of each clamp bucket (bucket_id) of a clamp index (clamp_id). The rq::uclamp::bucket[clamp_id][] array is scanned every time it's needed to find a new MAX aggregated clamp value for a clamp_id. This operation is required only when it's dequeued the last task of a clamp bucket tracking the current MAX aggregated clamp value. In this case, the CPU is either entering IDLE or going to schedule a less boosted or more clamped task. The expected number of different clamp values configured at build time is small enough to fit the full unordered array into a single cache line, for configurations of up to 7 buckets. Add to struct rq the basic data structures required to refcount the number of RUNNABLE tasks for each clamp bucket. Add also the max aggregation required to update the rq's clamp value at each enqueue/dequeue event. Use a simple linear mapping of clamp values into clamp buckets. Pre-compute and cache bucket_id to avoid integer divisions at enqueue/dequeue time. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/fair: Rename weighted_cpuload() to cpu_runnable_load()Dietmar Eggemann1-21/+21
The term 'weighted' is not needed since there is no 'unweighted' load. Instead use the term 'runnable' to distinguish 'runnable' load (avg.runnable_load_avg) used in load balance from load (avg.load_avg) which is the sum of 'runnable' and 'blocked' load. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Export the newly added tracepointsQais Yousef1-0/+11
So that external modules can hook into them and extract the info they need. Since these new tracepoints have no events associated with them exporting these tracepoints make them useful for external modules to perform testing and debugging. There's no other way otherwise to access them. BPF doesn't have infrastructure to access these bare tracepoints either. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add sched_overutilized tracepointQais Yousef1-2/+8
The new tracepoint allows us to track the changes in overutilized status. Overutilized status is associated with EAS. It indicates that the system is in high performance state. EAS is disabled when the system is in this state since there's not much energy savings while high performance tasks are pushing the system to the limit and it's better to default to the spreading behavior of the scheduler. This tracepoint helps understanding and debugging the conditions under which this happens. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add new tracepoint to track PELT at se levelQais Yousef2-0/+3
The new tracepoint allows tracking PELT signals at sched_entity level. Which is supported in CFS tasks and taskgroups only. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add new tracepoints to track PELT at rq levelQais Yousef2-1/+14
The new tracepoints allow tracking PELT signals at rq level for all scheduling classes + irq. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add a new sched_trace_*() helper functionsQais Yousef1-0/+99
The new functions allow modules to access internal data structures of unexported struct cfs_rq and struct rq to extract important information from the tracepoints to be introduced in later patches. While at it fix alphabetical order of struct declarations in sched.h Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/autogroup: Make autogroup_path() always availableQais Yousef1-2/+0
Remove the #ifdef CONFIG_SCHED_DEBUG. Some of the tracepoints to be introduced in later patches need to access this function. Hence make it always available since the tracepoints are not protected by CONFIG_SCHED_DEBUG. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/wait: Deduplicate code with do-whilePavel Begunkov1-6/+2
Statements in the loop's body and before it are identical. Use do-while to not repeat it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()Vincent Guittot8-13/+13
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit: 765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'") Remove it. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Viresh Kumar <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar18-88/+149
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang1-4/+14
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <[email protected]> Reported-by: Vince Weaver <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24perf/ioctl: Add check for the sample_period valueRavi Bangoria1-0/+3
perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24fork: don't check parent_tidptr with CLONE_PIDFDDmitry V. Levin1-12/+0
Give userspace a cheap and reliable way to tell whether CLONE_PIDFD is supported by the kernel or not. The easiest way is to pass an invalid file descriptor value in parent_tidptr, perform the syscall and verify that parent_tidptr has been changed to a valid file descriptor value. CLONE_PIDFD uses parent_tidptr to return pidfds. CLONE_PARENT_SETTID will use parent_tidptr to return the tid of the parent. The two flags cannot be used together. Old kernels that only support CLONE_PARENT_SETTID will not verify the value pointed to by parent_tidptr. This behavior is unchanged even with the introduction of CLONE_PIDFD. However, if CLONE_PIDFD is specified the kernel will currently check the value pointed to by parent_tidptr before placing the pidfd in the memory pointed to. EINVAL will be returned if the value in parent_tidptr is not 0. If CLONE_PIDFD is supported and fd 0 is closed, then the returned pidfd can and likely will be 0 and parent_tidptr will be unchanged. This means userspace must either check CLONE_PIDFD support beforehand or check that fd 0 is not closed when invoking CLONE_PIDFD. The check for pidfd == 0 was introduced during the v5.2 merge window by commit b3e583825266 ("clone: add CLONE_PIDFD") to ensure that CLONE_PIDFD could be potentially extended by passing in flags through the return argument. However, that extension would look horrible, and with the upcoming introduction of the clone3 syscall in v5.3 there is no need to extend legacy clone syscall this way. (Even if it would need to be extended, CLONE_DETACHED can be reused with CLONE_PIDFD.) So remove the pidfd == 0 check. Userspace that needs to be portable to kernels without CLONE_PIDFD support can then be advised to initialize pidfd to -1 and check the pidfd value returned by CLONE_PIDFD. Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2019-06-23softirq: Use __this_cpu_write() in takeover_tasklets()Muchun Song1-1/+1
The code is executed with interrupts disabled, so it's safe to use __this_cpu_write(). [ tglx: Massaged changelog ] Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-23smp: Remove smp_call_function() and on_each_cpu() return valuesNadav Amit2-9/+4
The return value is fixed. Remove it and amend the callers. [ tglx: Fixup arm/bL_switcher and powerpc/rtas ] Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-23smp: Do not mark call_function_data as sharedNadav Amit1-1/+1
cfd_data is marked as shared, but although it hold pointers to shared data structures, it is private per core. Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Rik van Riel <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-23timer_list: Guard procfs specific codeNathan Huckleberry1-17/+19
With CONFIG_PROC_FS=n the following warning is emitted: kernel/time/timer_list.c:361:36: warning: unused variable 'timer_list_sops' [-Wunused-const-variable] static const struct seq_operations timer_list_sops = { Add #ifdef guard around procfs specific code. Signed-off-by: Nathan Huckleberry <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://github.com/ClangBuiltLinux/linux/issues/534 Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Provide a generic update_vsyscall() implementationVincenzo Frascino2-0/+134
The new generic VDSO library allows to unify the update_vsyscall[_tz]() implementations. Provide a generic implementation based on the x86 code and the bindings which need to be implemented in architecture specific code. [ tglx: Moved it into kernel/time where it belongs. Removed the pointless line breaks in the stub functions. Massaged changelog ] Signed-off-by: Vincenzo Frascino <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shijith Thotton <[email protected]> Tested-by: Andre Przywara <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Huw Davies <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22posix-timers: Use spin_lock_irq() in itimer_delete()Sebastian Andrzej Siewior1-5/+3
itimer_delete() uses spin_lock_irqsave() to obtain a `flags' variable which can then be passed to unlock_timer(). It uses already spin_lock locking for the structure instead of lock_timer() because it has a timer which can not be removed by others at this point. The cleanup is always performed with enabled interrupts. Use spin_lock_irq() / spin_unlock_irq() so the `flags' variable can be removed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22posix-timers: Remove "it_signal = NULL" assignment in itimer_delete()Sebastian Andrzej Siewior1-5/+0
itimer_delete() is invoked during do_exit(). At this point it is the last thread in the group dying and doing the clean up. Since it is the last thread in the group, there can not be any other task attempting to lock the itimer which means the NULL assignment (which avoids lookups in __lock_timer()) is not required. The assignment and comment was copied in commit 0e568881178ff ("[PATCH] fix posix-timers to have proper per-process scope") from sys_timer_delete() which was/is the syscall interface and requires the assignment. Remove the superfluous ->it_signal = NULL assignment. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld3-4/+4
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Use proper ktime_add when adding nsecs in coarse offsetJason A. Donenfeld1-1/+1
While this doesn't actually amount to a real difference, since the macro evaluates to the same thing, every place else operates on ktime_t using these functions, so let's not break the pattern. Fixes: e3ff9c3678b4 ("timekeeping: Repair ktime_get_coarse*() granularity") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22Merge branch 'linus' into timers/coreThomas Gleixner12-84/+178
Pick up upstream fixes for pending changes.
2019-06-22ntp: Limit TAI-UTC offsetMiroslav Lichvar1-1/+3
Don't allow the TAI-UTC offset of the system clock to be set by adjtimex() to a value larger than 100000 seconds. This prevents an overflow in the conversion to int, prevents the CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock, and it is still large enough to allow leap seconds to be inserted at the maximum rate currently supported by the kernel (once per day) for the next ~270 years, however unlikely it is that someone can survive a catastrophic event which slowed down the rotation of the Earth so much. Reported-by: Weikang shi <[email protected]> Signed-off-by: Miroslav Lichvar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds12-42/+12
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-21arm64: Fix interrupt tracing in the presence of NMIsJulien Thierry1-2/+6
In the presence of any form of instrumentation, nmi_enter() should be done before calling any traceable code and any instrumentation code. Currently, nmi_enter() is done in handle_domain_nmi(), which is much too late as instrumentation code might get called before. Move the nmi_enter/exit() calls to the arch IRQ vector handler. On arm64, it is not possible to know if the IRQ vector handler was called because of an NMI before acknowledging the interrupt. However, It is possible to know whether normal interrupts could be taken in the interrupted context (i.e. if taking an NMI in that context could introduce a potential race condition). When interrupting a context with IRQs disabled, call nmi_enter() as soon as possible. In contexts with IRQs enabled, defer this to the interrupt controller, which is in a better position to know if an interrupt taken is an NMI. Fixes: bc3c03ccb464 ("arm64: Enable the support of pseudo-NMIs") Cc: <[email protected]> # 5.1.x- Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Mark Rutland <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Signed-off-by: Julien Thierry <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-06-21cgroup: export css_next_descendant_pre for bfqChristoph Hellwig1-0/+1
The bfq schedule now uses css_next_descendant_pre directly after the stats functionality depending on it has been from the core blk-cgroup code to bfq. Export the symbol so that bfq can still be build modular. Fixes: d6258980daf2 ("bfq-iosched: move bfq_stat_recursive_sum into the only caller") Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-06-19Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', ↵Paul E. McKenney15-150/+302
'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations. doc.2019.05.28a: Documentation updates. fixes.2019.06.13a: Miscellaneous fixes. srcu.2019.05.28a: SRCU updates. sync.2019.05.28a: RCU-sync flavor consolidation. torture.2019.05.28a: Torture-test updates.
2019-06-19keys: Cache result of request_key*() temporarily in task_structDavid Howells1-0/+9
If a filesystem uses keys to hold authentication tokens, then it needs a token for each VFS operation that might perform an authentication check - either by passing it to the server, or using to perform a check based on authentication data cached locally. For open files this isn't a problem, since the key should be cached in the file struct since it represents the subject performing operations on that file descriptor. During pathwalk, however, there isn't anywhere to cache the key, except perhaps in the nameidata struct - but that isn't exposed to the filesystems. Further, a pathwalk can incur a lot of operations, calling one or more of the following, for instance: ->lookup() ->permission() ->d_revalidate() ->d_automount() ->get_acl() ->getxattr() on each dentry/inode it encounters - and each one may need to call request_key(). And then, at the end of pathwalk, it will call the actual operation: ->mkdir() ->mknod() ->getattr() ->open() ... which may need to go and get the token again. However, it is very likely that all of the operations on a single dentry/inode - and quite possibly a sequence of them - will all want to use the same authentication token, which suggests that caching it would be a good idea. To this end: (1) Make it so that a positive result of request_key() and co. that didn't require upcalling to userspace is cached temporarily in task_struct. (2) The cache is 1 deep, so a new result displaces the old one. (3) The key is released by exit and by notify-resume. (4) The cache is cleared in a newly forked process. Signed-off-by: David Howells <[email protected]>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner3-12/+3
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 451Thomas Gleixner4-16/+4
Based on 1 normalized pattern(s): this file is subject to the terms and conditions of version 2 of the gnu general public license see the file copying in the main directory of the linux distribution for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 5 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): this file is released under the gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230Thomas Gleixner4-12/+4
Based on 2 normalized pattern(s): this source code is licensed under the gnu general public license version 2 see the file copying for more details this source code is licensed under general public license version 2 see extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 52 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Enrico Weigelt <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-19PM: suspend: Rename pm_suspend_via_s2idle()Rafael J. Wysocki1-3/+3
The name of pm_suspend_via_s2idle() is confusing, as it doesn't reflect the purpose of the function precisely enough and it is very similar to pm_suspend_via_firmware(), which has a different purpose, so rename it as pm_suspend_default_s2idle() and update its only caller, i8042_register_ports(), accordingly. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Dmitry Torokhov <[email protected]>
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds7-46/+137
Pull networking fixes from David Miller: "Lots of bug fixes here: 1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer. 2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John Crispin. 3) Use after free in psock backlog workqueue, from John Fastabend. 4) Fix source port matching in fdb peer flow rule of mlx5, from Raed Salem. 5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet. 6) Network header needs to be set for packet redirect in nfp, from John Hurley. 7) Fix udp zerocopy refcnt, from Willem de Bruijn. 8) Don't assume linear buffers in vxlan and geneve error handlers, from Stefano Brivio. 9) Fix TOS matching in mlxsw, from Jiri Pirko. 10) More SCTP cookie memory leak fixes, from Neil Horman. 11) Fix VLAN filtering in rtl8366, from Linus Walluij. 12) Various TCP SACK payload size and fragmentation memory limit fixes from Eric Dumazet. 13) Use after free in pneigh_get_next(), also from Eric Dumazet. 14) LAPB control block leak fix from Jeremy Sowden" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits) lapb: fixed leak of control-blocks. tipc: purge deferredq list for each grp member in tipc_group_delete ax25: fix inconsistent lock state in ax25_destroy_timer neigh: fix use-after-free read in pneigh_get_next tcp: fix compile error if !CONFIG_SYSCTL hv_sock: Suppress bogus "may be used uninitialized" warnings be2net: Fix number of Rx queues used for flow hashing net: handle 802.1P vlan 0 packets properly tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() tcp: add tcp_min_snd_mss sysctl tcp: tcp_fragment() should apply sane memory limits tcp: limit payload size of sacked skbs Revert "net: phylink: set the autoneg state in phylink_phy_change" bpf: fix nested bpf tracepoints with per-cpu data bpf: Fix out of bounds memory access in bpf_sk_storage vsock/virtio: set SOCK_DONE on peer shutdown net: dsa: rtl8366: Fix up VLAN filtering net: phylink: set the autoneg state in phylink_phy_change net: add high_order_alloc_disable sysctl/static key tcp: add tcp_tx_skb_cache sysctl ...
2019-06-17x86/percpu, sched/fair: Avoid local_clock()Peter Zijlstra1-2/+3
Nadav reported that code-gen changed because of the this_cpu_*() constraints, avoid this for select_idle_cpu() because that runs with preemption (and IRQs) disabled anyway. Reported-by: Nadav Amit <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>