aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-06-25dma-mapping: add a dma_alloc_need_uncached helperChristoph Hellwig1-2/+2
Check if we need to allocate uncached memory for a device given the allocation flags. Switch over the uncached segment check to this helper to deal with architectures that do not support the dma_cache_sync operation and thus should not returned cacheable memory for DMA_ATTR_NON_CONSISTENT allocations. Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-25dma-mapping: truncate dma masks to what dma_addr_t can holdChristoph Hellwig1-0/+12
The dma masks in struct device are always 64-bits wide. But for builds using a 32-bit dma_addr_t we need to ensure we don't store an unsupportable value. Before Linux 5.0 this was handled at least by the ARM dma mapping code by never allowing to set a larger dma_mask, but these days we allow the driver to just set the largest supported value and never fall back to a smaller one. Ensure this always works by truncating the value. Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections") Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-24perf/cgroups: Don't rotate events for cgroups unnecessarilyIan Rogers1-20/+22
Currently perf_rotate_context assumes that if the context's nr_events != nr_active a rotation is necessary for perf event multiplexing. With cgroups, nr_events is the total count of events for all cgroups and nr_active will not include events in a cgroup other than the current task's. This makes rotation appear necessary for cgroups when it is not. Add a perf_event_context flag that is set when rotation is necessary. Clear the flag during sched_out and set it when a flexible sched_in fails due to resources. Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24Merge tag 'v5.2-rc6' into perf/core, to refresh branchIngo Molnar30-172/+327
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add uclamp support to energy_compute()Patrick Bellasi3-22/+48
The Energy Aware Scheduler (EAS) estimates the energy impact of waking up a task on a given CPU. This estimation is based on: a) an (active) power consumption defined for each CPU frequency b) an estimation of which frequency will be used on each CPU c) an estimation of the busy time (utilization) of each CPU Utilization clamping can affect both b) and c). A CPU is expected to run: - on an higher than required frequency, but for a shorter time, in case its estimated utilization will be smaller than the minimum utilization enforced by uclamp - on a smaller than required frequency, but for a longer time, in case its estimated utilization is bigger than the maximum utilization enforced by uclamp While compute_energy() already accounts clamping effects on busy time, the clamping effects on frequency selection are currently ignored. Fix it by considering how CPU clamp values will be affected by a task waking up and being RUNNABLE on that CPU. Do that by refactoring schedutil_freq_util() to take an additional task_struct* which allows EAS to evaluate the impact on clamp values of a task being eventually queued in a CPU. Clamp values are applied to the RT+CFS utilization only when a FREQUENCY_UTIL is required by compute_energy(). Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the computation of the cpu_util signal implies that we are more likely to estimate the highest OPP when a RT task is running in another CPU of the same performance domain. This can have an impact on energy estimation but: - it's not easy to say which approach is better, since it depends on the use case - the original approach could still be obtained by setting a smaller task-specific util_min whenever required Since we are at that: - rename schedutil_freq_util() into schedutil_cpu_util(), since it's not only used for frequency selection. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add uclamp_util_with()Patrick Bellasi2-1/+33
So far uclamp_util() allows to clamp a specified utilization considering the clamp values requested by RUNNABLE tasks in a CPU. For the Energy Aware Scheduler (EAS) it is interesting to test how clamp values will change when a task is becoming RUNNABLE on a given CPU. For example, EAS is interested in comparing the energy impact of different scheduling decisions and the clamp values can play a role on that. Add uclamp_util_with() which allows to clamp a given utilization by considering the possible impact on CPU clamp values of a specified task. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasksPatrick Bellasi4-3/+43
Each time a frequency update is required via schedutil, a frequency is selected to (possibly) satisfy the utilization reported by each scheduling class and irqs. However, when utilization clamping is in use, the frequency selection should consider userspace utilization clamping hints. This will allow, for example, to: - boost tasks which are directly affecting the user experience by running them at least at a minimum "requested" frequency - cap low priority tasks not directly affecting the user experience by running them only up to a maximum "allowed" frequency These constraints are meant to support a per-task based tuning of the frequency selection thus supporting a fine grained definition of performance boosting vs energy saving strategies in kernel space. Add support to clamp the utilization of RUNNABLE FAIR and RT tasks within the boundaries defined by their aggregated utilization clamp constraints. Do that by considering the max(min_util, max_util) to give boosted tasks the performance they need even when they happen to be co-scheduled with other capped tasks. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Set default clamps for RT tasksPatrick Bellasi1-2/+28
By default FAIR tasks start without clamps, i.e. neither boosted nor capped, and they run at the best frequency matching their utilization demand. This default behavior does not fit RT tasks which instead are expected to run at the maximum available frequency, if not otherwise required by explicitly capping them. Enforce the correct behavior for RT tasks by setting util_min to max whenever: 1. the task is switched to the RT class and it does not already have a user-defined clamp value assigned. 2. an RT task is forked from a parent with RESET_ON_FORK set. NOTE: utilization clamp values are cross scheduling class attributes and thus they are never changed/reset once a value has been explicitly defined from user-space. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Reset uclamp values on RESET_ON_FORKPatrick Bellasi1-0/+8
A forked tasks gets the same clamp values of its parent however, when the RESET_ON_FORK flag is set on parent, e.g. via: sys_sched_setattr() sched_setattr() __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK) the new forked task is expected to start with all attributes reset to default values. Do that for utilization clamp values too by checking the reset request from the existing uclamp_fork() call which already provides the required initialization for other uclamp related bits. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Extend sched_setattr() to support utilization clampingPatrick Bellasi1-7/+84
The SCHED_DEADLINE scheduling class provides an advanced and formal model to define tasks requirements that can translate into proper decisions for both task placements and frequencies selections. Other classes have a more simplified model based on the POSIX concept of priorities. Such a simple priority based model however does not allow to exploit most advanced features of the Linux scheduler like, for example, driving frequencies selection via the schedutil cpufreq governor. However, also for non SCHED_DEADLINE tasks, it's still interesting to define tasks properties to support scheduler decisions. Utilization clamping exposes to user-space a new set of per-task attributes the scheduler can use as hints about the expected/required utilization for a task. This allows to implement a "proactive" per-task frequency control policy, a more advanced policy than the current one based just on "passive" measured task utilization. For example, it's possible to boost interactive tasks (e.g. to get better performance) or cap background tasks (e.g. to be more energy/thermal efficient). Introduce a new API to set utilization clamping values for a specified task by extending sched_setattr(), a syscall which already allows to define task specific properties for different scheduling classes. A new pair of attributes allows to specify a minimum and maximum utilization the scheduler can consider for a task. Do that by validating the required clamp values before and then applying the required changes using _the_ same pattern already in use for __setscheduler(). This ensures that the task is re-enqueued with the new clamp values. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/core: Allow sched_setattr() to use the current policyPatrick Bellasi1-0/+2
The sched_setattr() syscall mandates that a policy is always specified. This requires to always know which policy a task will have when attributes are configured and this makes it impossible to add more generic task attributes valid across different scheduling policies. Reading the policy before setting generic tasks attributes is racy since we cannot be sure it is not changed concurrently. Introduce the required support to change generic task attributes without affecting the current task policy. This is done by adding an attribute flag (SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy. Add support for the SETPARAM_POLICY policy, which is already used by the sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX syscall. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add system default clampsPatrick Bellasi2-1/+114
Tasks without a user-defined clamp value are considered not clamped and by default their utilization can have any value in the [0..SCHED_CAPACITY_SCALE] range. Tasks with a user-defined clamp value are allowed to request any value in that range, and the required clamp is unconditionally enforced. However, a "System Management Software" could be interested in limiting the range of clamp values allowed for all tasks. Add a privileged interface to define a system default configuration via: /proc/sys/kernel/sched_uclamp_util_{min,max} which works as an unconditional clamp range restriction for all tasks. With the default configuration, the full SCHED_CAPACITY_SCALE range of values is allowed for each clamp index. Otherwise, the task-specific clamp is capped by the corresponding system default value. Do that by tracking, for each task, the "effective" clamp value and bucket the task has been refcounted in at enqueue time. This allows to lazy aggregate "requested" and "system default" values at enqueue time and simplifies refcounting updates at dequeue time. The cached bucket ids are used to avoid (relatively) more expensive integer divisions every time a task is enqueued. An active flag is used to report when the "effective" value is valid and thus the task is actually refcounted in the corresponding rq's bucket. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Enforce last task's UCLAMP_MAXPatrick Bellasi2-5/+46
When a task sleeps it removes its max utilization clamp from its CPU. However, the blocked utilization on that CPU can be higher than the max clamp value enforced while the task was running. This allows undesired CPU frequency increases while a CPU is idle, for example, when another CPU on the same frequency domain triggers a frequency update, since schedutil can now see the full not clamped blocked utilization of the idle CPU. Fix this by using: uclamp_rq_dec_id(p, rq, UCLAMP_MAX) uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value) to detect when a CPU has no more RUNNABLE clamped tasks and to flag this condition. Don't track any minimum utilization clamps since an idle CPU never requires a minimum frequency. The decay of the blocked utilization is good enough to reduce the CPU frequency. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add bucket local max trackingPatrick Bellasi1-18/+25
Because of bucketization, different task-specific clamp values are tracked in the same bucket. For example, with 20% bucket size and assuming to have: Task1: util_min=25% Task2: util_min=35% both tasks will be refcounted in the [20..39]% bucket and always boosted only up to 20% thus implementing a simple floor aggregation normally used in histograms. In systems with only few and well-defined clamp values, it would be useful to track the exact clamp value required by a task whenever possible. For example, if a system requires only 23% and 47% boost values then it's possible to track the exact boost required by each task using only 3 buckets of ~33% size each. Introduce a mechanism to max aggregate the requested clamp values of RUNNABLE tasks in the same bucket. Keep it simple by resetting the bucket value to its base value only when a bucket becomes inactive. Allow a limited and controlled overboosting margin for tasks recounted in the same bucket. In systems where the boost values are not known in advance, it is still possible to control the maximum acceptable overboosting margin by tuning the number of clamp groups. For example, 20 groups ensure a 5% maximum overboost. Remove the rq bucket initialization code since a correct bucket value is now computed when a task is refcounted into a CPU's rq. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/uclamp: Add CPU's clamp buckets refcountingPatrick Bellasi2-0/+217
Utilization clamping allows to clamp the CPU's utilization within a [util_min, util_max] range, depending on the set of RUNNABLE tasks on that CPU. Each task references two "clamp buckets" defining its minimum and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp bucket is active if there is at least one RUNNABLE tasks enqueued on that CPU and refcounting that bucket. When a task is {en,de}queued {on,from} a rq, the set of active clamp buckets on that CPU can change. If the set of active clamp buckets changes for a CPU a new "aggregated" clamp value is computed for that CPU. This is because each clamp bucket enforces a different utilization clamp value. Clamp values are always MAX aggregated for both util_min and util_max. This ensures that no task can affect the performance of other co-scheduled tasks which are more boosted (i.e. with higher util_min clamp) or less capped (i.e. with higher util_max clamp). A task has: task_struct::uclamp[clamp_id]::bucket_id to track the "bucket index" of the CPU's clamp bucket it refcounts while enqueued, for each clamp index (clamp_id). A runqueue has: rq::uclamp[clamp_id]::bucket[bucket_id].tasks to track how many RUNNABLE tasks on that CPU refcount each clamp bucket (bucket_id) of a clamp index (clamp_id). It also has a: rq::uclamp[clamp_id]::bucket[bucket_id].value to track the clamp value of each clamp bucket (bucket_id) of a clamp index (clamp_id). The rq::uclamp::bucket[clamp_id][] array is scanned every time it's needed to find a new MAX aggregated clamp value for a clamp_id. This operation is required only when it's dequeued the last task of a clamp bucket tracking the current MAX aggregated clamp value. In this case, the CPU is either entering IDLE or going to schedule a less boosted or more clamped task. The expected number of different clamp values configured at build time is small enough to fit the full unordered array into a single cache line, for configurations of up to 7 buckets. Add to struct rq the basic data structures required to refcount the number of RUNNABLE tasks for each clamp bucket. Add also the max aggregation required to update the rq's clamp value at each enqueue/dequeue event. Use a simple linear mapping of clamp values into clamp buckets. Pre-compute and cache bucket_id to avoid integer divisions at enqueue/dequeue time. Signed-off-by: Patrick Bellasi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alessio Balsini <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Steve Muckle <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Viresh Kumar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/fair: Rename weighted_cpuload() to cpu_runnable_load()Dietmar Eggemann1-21/+21
The term 'weighted' is not needed since there is no 'unweighted' load. Instead use the term 'runnable' to distinguish 'runnable' load (avg.runnable_load_avg) used in load balance from load (avg.load_avg) which is the sum of 'runnable' and 'blocked' load. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Export the newly added tracepointsQais Yousef1-0/+11
So that external modules can hook into them and extract the info they need. Since these new tracepoints have no events associated with them exporting these tracepoints make them useful for external modules to perform testing and debugging. There's no other way otherwise to access them. BPF doesn't have infrastructure to access these bare tracepoints either. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add sched_overutilized tracepointQais Yousef1-2/+8
The new tracepoint allows us to track the changes in overutilized status. Overutilized status is associated with EAS. It indicates that the system is in high performance state. EAS is disabled when the system is in this state since there's not much energy savings while high performance tasks are pushing the system to the limit and it's better to default to the spreading behavior of the scheduler. This tracepoint helps understanding and debugging the conditions under which this happens. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add new tracepoint to track PELT at se levelQais Yousef2-0/+3
The new tracepoint allows tracking PELT signals at sched_entity level. Which is supported in CFS tasks and taskgroups only. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add new tracepoints to track PELT at rq levelQais Yousef2-1/+14
The new tracepoints allow tracking PELT signals at rq level for all scheduling classes + irq. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/debug: Add a new sched_trace_*() helper functionsQais Yousef1-0/+99
The new functions allow modules to access internal data structures of unexported struct cfs_rq and struct rq to extract important information from the tracepoints to be introduced in later patches. While at it fix alphabetical order of struct declarations in sched.h Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/autogroup: Make autogroup_path() always availableQais Yousef1-2/+0
Remove the #ifdef CONFIG_SCHED_DEBUG. Some of the tracepoints to be introduced in later patches need to access this function. Hence make it always available since the tracepoints are not protected by CONFIG_SCHED_DEBUG. Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavankumar Kondeti <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uwe Kleine-Konig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/wait: Deduplicate code with do-whilePavel Begunkov1-6/+2
Statements in the loop's body and before it are identical. Use do-while to not repeat it. Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()Vincent Guittot8-13/+13
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit: 765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'") Remove it. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Viresh Kumar <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar18-88/+149
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang1-4/+14
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <[email protected]> Reported-by: Vince Weaver <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24perf/ioctl: Add check for the sample_period valueRavi Bangoria1-0/+3
perf_event_open() limits the sample_period to 63 bits. See: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Make ioctl() consistent with it. Also on PowerPC, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-24bpf: fix NULL deref in btf_type_is_resolve_source_onlyStanislav Fomichev1-6/+6
Commit 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") added invocations of btf_type_is_resolve_source_only before btf_type_nosize_or_null which checks for the NULL pointer. Swap the order of btf_type_nosize_or_null and btf_type_is_resolve_source_only to make sure the do the NULL pointer check first. Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Reported-by: syzbot <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-24fork: don't check parent_tidptr with CLONE_PIDFDDmitry V. Levin1-12/+0
Give userspace a cheap and reliable way to tell whether CLONE_PIDFD is supported by the kernel or not. The easiest way is to pass an invalid file descriptor value in parent_tidptr, perform the syscall and verify that parent_tidptr has been changed to a valid file descriptor value. CLONE_PIDFD uses parent_tidptr to return pidfds. CLONE_PARENT_SETTID will use parent_tidptr to return the tid of the parent. The two flags cannot be used together. Old kernels that only support CLONE_PARENT_SETTID will not verify the value pointed to by parent_tidptr. This behavior is unchanged even with the introduction of CLONE_PIDFD. However, if CLONE_PIDFD is specified the kernel will currently check the value pointed to by parent_tidptr before placing the pidfd in the memory pointed to. EINVAL will be returned if the value in parent_tidptr is not 0. If CLONE_PIDFD is supported and fd 0 is closed, then the returned pidfd can and likely will be 0 and parent_tidptr will be unchanged. This means userspace must either check CLONE_PIDFD support beforehand or check that fd 0 is not closed when invoking CLONE_PIDFD. The check for pidfd == 0 was introduced during the v5.2 merge window by commit b3e583825266 ("clone: add CLONE_PIDFD") to ensure that CLONE_PIDFD could be potentially extended by passing in flags through the return argument. However, that extension would look horrible, and with the upcoming introduction of the clone3 syscall in v5.3 there is no need to extend legacy clone syscall this way. (Even if it would need to be extended, CLONE_DETACHED can be reused with CLONE_PIDFD.) So remove the pidfd == 0 check. Userspace that needs to be portable to kernels without CLONE_PIDFD support can then be advised to initialize pidfd to -1 and check the pidfd value returned by CLONE_PIDFD. Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2019-06-24module: allow arch overrides for .exit section namesMatthias Schiffer1-1/+6
Some archs like ARM store unwind information for .exit.text in sections with unusual names. As this unwind information refers to .exit.text, it must not be loaded when .exit.text is not loaded (when CONFIG_MODULE_UNLOAD is unset); otherwise, loading a module can fail due to relocation failures. Signed-off-by: Matthias Schiffer <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-24modules: fix BUG when load module with rodata=nYang Yingliang1-4/+7
When loading a module with rodata=n, it causes an executing NX-protected page BUG. [ 32.379191] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 32.382917] BUG: unable to handle page fault for address: ffffffffc0005000 [ 32.385947] #PF: supervisor instruction fetch in kernel mode [ 32.387662] #PF: error_code(0x0011) - permissions violation [ 32.389352] PGD 240c067 P4D 240c067 PUD 240e067 PMD 421a52067 PTE 8000000421a53063 [ 32.391396] Oops: 0011 [#1] SMP PTI [ 32.392478] CPU: 7 PID: 2697 Comm: insmod Tainted: G O 5.2.0-rc5+ #202 [ 32.394588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 32.398157] RIP: 0010:ko_test_init+0x0/0x1000 [ko_test] [ 32.399662] Code: Bad RIP value. [ 32.400621] RSP: 0018:ffffc900029f3ca8 EFLAGS: 00010246 [ 32.402171] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 32.404332] RDX: 00000000000004c7 RSI: 0000000000000cc0 RDI: ffffffffc0005000 [ 32.406347] RBP: ffffffffc0005000 R08: ffff88842fbebc40 R09: ffffffff810ede4a [ 32.408392] R10: ffffea00108e3480 R11: 0000000000000000 R12: ffff88842bee21a0 [ 32.410472] R13: 0000000000000001 R14: 0000000000000001 R15: ffffc900029f3e78 [ 32.412609] FS: 00007fb4f0c0a700(0000) GS:ffff88842fbc0000(0000) knlGS:0000000000000000 [ 32.414722] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.416290] CR2: ffffffffc0004fd6 CR3: 0000000421a90004 CR4: 0000000000020ee0 [ 32.418471] Call Trace: [ 32.419136] do_one_initcall+0x41/0x1df [ 32.420199] ? _cond_resched+0x10/0x40 [ 32.421433] ? kmem_cache_alloc_trace+0x36/0x160 [ 32.422827] do_init_module+0x56/0x1f7 [ 32.423946] load_module+0x1e67/0x2580 [ 32.424947] ? __alloc_pages_nodemask+0x150/0x2c0 [ 32.426413] ? map_vm_area+0x2d/0x40 [ 32.427530] ? __vmalloc_node_range+0x1ef/0x260 [ 32.428850] ? __do_sys_init_module+0x135/0x170 [ 32.430060] ? _cond_resched+0x10/0x40 [ 32.431249] __do_sys_init_module+0x135/0x170 [ 32.432547] do_syscall_64+0x43/0x120 [ 32.433853] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Because if rodata=n, set_memory_x() can't be called, fix this by calling set_memory_x in complete_formation(); Fixes: f2c65fb3221a ("x86/modules: Avoid breaking W^X while loading modules") Suggested-by: Jian Cheng <[email protected]> Reviewed-by: Nadav Amit <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-23softirq: Use __this_cpu_write() in takeover_tasklets()Muchun Song1-1/+1
The code is executed with interrupts disabled, so it's safe to use __this_cpu_write(). [ tglx: Massaged changelog ] Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-06-23smp: Remove smp_call_function() and on_each_cpu() return valuesNadav Amit2-9/+4
The return value is fixed. Remove it and amend the callers. [ tglx: Fixup arm/bL_switcher and powerpc/rtas ] Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-23smp: Do not mark call_function_data as sharedNadav Amit1-1/+1
cfd_data is marked as shared, but although it hold pointers to shared data structures, it is private per core. Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Rik van Riel <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-23timer_list: Guard procfs specific codeNathan Huckleberry1-17/+19
With CONFIG_PROC_FS=n the following warning is emitted: kernel/time/timer_list.c:361:36: warning: unused variable 'timer_list_sops' [-Wunused-const-variable] static const struct seq_operations timer_list_sops = { Add #ifdef guard around procfs specific code. Signed-off-by: Nathan Huckleberry <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://github.com/ClangBuiltLinux/linux/issues/534 Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Provide a generic update_vsyscall() implementationVincenzo Frascino2-0/+134
The new generic VDSO library allows to unify the update_vsyscall[_tz]() implementations. Provide a generic implementation based on the x86 code and the bindings which need to be implemented in architecture specific code. [ tglx: Moved it into kernel/time where it belongs. Removed the pointless line breaks in the stub functions. Massaged changelog ] Signed-off-by: Vincenzo Frascino <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Shijith Thotton <[email protected]> Tested-by: Andre Przywara <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: Mark Salyzyn <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Huw Davies <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller12-42/+12
Minor SPDX change conflict. Signed-off-by: David S. Miller <[email protected]>
2019-06-22posix-timers: Use spin_lock_irq() in itimer_delete()Sebastian Andrzej Siewior1-5/+3
itimer_delete() uses spin_lock_irqsave() to obtain a `flags' variable which can then be passed to unlock_timer(). It uses already spin_lock locking for the structure instead of lock_timer() because it has a timer which can not be removed by others at this point. The cleanup is always performed with enabled interrupts. Use spin_lock_irq() / spin_unlock_irq() so the `flags' variable can be removed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22posix-timers: Remove "it_signal = NULL" assignment in itimer_delete()Sebastian Andrzej Siewior1-5/+0
itimer_delete() is invoked during do_exit(). At this point it is the last thread in the group dying and doing the clean up. Since it is the last thread in the group, there can not be any other task attempting to lock the itimer which means the NULL assignment (which avoids lookups in __lock_timer()) is not required. The assignment and comment was copied in commit 0e568881178ff ("[PATCH] fix posix-timers to have proper per-process scope") from sys_timer_delete() which was/is the syscall interface and requires the assignment. Remove the superfluous ->it_signal = NULL assignment. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld3-4/+4
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22timekeeping: Use proper ktime_add when adding nsecs in coarse offsetJason A. Donenfeld1-1/+1
While this doesn't actually amount to a real difference, since the macro evaluates to the same thing, every place else operates on ktime_t using these functions, so let's not break the pattern. Fixes: e3ff9c3678b4 ("timekeeping: Repair ktime_get_coarse*() granularity") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-22Merge branch 'linus' into timers/coreThomas Gleixner12-84/+178
Pick up upstream fixes for pending changes.
2019-06-22ntp: Limit TAI-UTC offsetMiroslav Lichvar1-1/+3
Don't allow the TAI-UTC offset of the system clock to be set by adjtimex() to a value larger than 100000 seconds. This prevents an overflow in the conversion to int, prevents the CLOCK_TAI clock from getting too far ahead of the CLOCK_REALTIME clock, and it is still large enough to allow leap seconds to be inserted at the maximum rate currently supported by the kernel (once per day) for the next ~270 years, however unlikely it is that someone can survive a catastrophic event which slowed down the rotation of the Earth so much. Reported-by: Weikang shi <[email protected]> Signed-off-by: Miroslav Lichvar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: John Stultz <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-06-21Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds12-42/+12
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-21arm64: Fix interrupt tracing in the presence of NMIsJulien Thierry1-2/+6
In the presence of any form of instrumentation, nmi_enter() should be done before calling any traceable code and any instrumentation code. Currently, nmi_enter() is done in handle_domain_nmi(), which is much too late as instrumentation code might get called before. Move the nmi_enter/exit() calls to the arch IRQ vector handler. On arm64, it is not possible to know if the IRQ vector handler was called because of an NMI before acknowledging the interrupt. However, It is possible to know whether normal interrupts could be taken in the interrupted context (i.e. if taking an NMI in that context could introduce a potential race condition). When interrupting a context with IRQs disabled, call nmi_enter() as soon as possible. In contexts with IRQs enabled, defer this to the interrupt controller, which is in a better position to know if an interrupt taken is an NMI. Fixes: bc3c03ccb464 ("arm64: Enable the support of pseudo-NMIs") Cc: <[email protected]> # 5.1.x- Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Mark Rutland <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Signed-off-by: Julien Thierry <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-06-21cgroup: export css_next_descendant_pre for bfqChristoph Hellwig1-0/+1
The bfq schedule now uses css_next_descendant_pre directly after the stats functionality depending on it has been from the core blk-cgroup code to bfq. Export the symbol so that bfq can still be build modular. Fixes: d6258980daf2 ("bfq-iosched: move bfq_stat_recursive_sum into the only caller") Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-06-21arch: handle arches who do not yet define clone3Christian Brauner2-0/+4
This cleanly handles arches who do not yet define clone3. clone3() was initially placed under __ARCH_WANT_SYS_CLONE under the assumption that this would cleanly handle all architectures. It does not. Architectures such as nios2 or h8300 simply take the asm-generic syscall definitions and generate their syscall table from it. Since they don't define __ARCH_WANT_SYS_CLONE the build would fail complaining about sys_clone3 missing. The reason this doesn't happen for legacy clone is that nios2 and h8300 provide assembly stubs for sys_clone. This seems to be done for architectural reasons. The build failures for nios2 and h8300 were caught int -next luckily. The solution is to define __ARCH_WANT_SYS_CLONE3 that architectures can add. Additionally, we need a cond_syscall(clone3) for architectures such as nios2 or h8300 that generate their syscall table in the way I explained above. Fixes: 8f3220a80654 ("arch: wire-up clone3() syscall") Signed-off-by: Christian Brauner <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: David Howells <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Adrian Reber <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Al Viro <[email protected]> Cc: Florian Weimer <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected]
2019-06-20livepatch: Remove duplicate warning about missing reliable stacktrace supportPetr Mladek1-1/+0
WARN_ON_ONCE() could not be called safely under rq lock because of console deadlock issues. Moreover WARN_ON_ONCE() is superfluous in klp_check_stack(), because stack_trace_save_tsk_reliable() cannot return -ENOSYS thanks to klp_have_reliable_stack() check in klp_try_switch_task(). [ mbenes: changelog edited ] Signed-off-by: Miroslav Benes <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2019-06-20Revert "livepatch: Remove reliable stacktrace check in klp_try_switch_task()"Miroslav Benes1-0/+7
This reverts commit 1d98a69e5cef3aeb68bcefab0e67e342d6bb4dad. Commit 31adf2308f33 ("livepatch: Convert error about unsupported reliable stacktrace into a warning") weakened the enforcement for architectures to have reliable stack traces support. The system only warns now about it. It only makes sense to reintroduce the compile time checking in klp_try_switch_task() again and bail out early. Signed-off-by: Miroslav Benes <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2019-06-20stacktrace: Remove weak version of save_stack_trace_tsk_reliable()Miroslav Benes1-8/+0
Recent rework of stack trace infrastructure introduced a new set of helpers for common stack trace operations (commit e9b98e162aa5 ("stacktrace: Provide helpers for common stack trace operations") and related). As a result, save_stack_trace_tsk_reliable() is not directly called anywhere. Livepatch, currently the only user of the reliable stack trace feature, now calls stack_trace_save_tsk_reliable(). When CONFIG_HAVE_RELIABLE_STACKTRACE is set and depending on CONFIG_ARCH_STACKWALK, stack_trace_save_tsk_reliable() calls either arch_stack_walk_reliable() or mentioned save_stack_trace_tsk_reliable(). x86_64 defines the former, ppc64le the latter. All other architectures do not have HAVE_RELIABLE_STACKTRACE and include/linux/stacktrace.h defines -ENOSYS returning version for them. In short, stack_trace_save_tsk_reliable() returning -ENOSYS defined in include/linux/stacktrace.h serves the same purpose as the old weak version of save_stack_trace_tsk_reliable() which is therefore no longer needed. Signed-off-by: Miroslav Benes <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Signed-off-by: Petr Mladek <[email protected]>